Victor Roman
1 min readJun 3, 2019

--

Although a ROC = 0.73, is not a bad value, in most of cases it is not enough to be able to predict appropiately on binary classifications that they are so imbalanced. By predicting most of the samples as being the majority class, the model will have “good” theoretical results, but in practice will fail to predict well the ‘1’ classes.

I am dealing with a similiar imabalanced project and facing this issue, do you know how to perform the other suggested methods to deal with imabalanced datasets in PySpark? (SMOTE, Oversampling, undersampling…)

Thank you and best regards.

--

--

Victor Roman

Industrial Engineer and passionate about 4.0 Industry. My goal is to encourage people to learn and explore its technologies and their infinite posibilites.