Smote’ object has no attribute ‘fit_sample’ : Solved


smote’ object has no attribute ‘fit_sample’ error occurs because fit_sample is incorrect. Replace fit_sample() use fit_resample() function.  In this article we will see the complete implementation with dummy example. Firstly we will replicate the issue then we will fix the same. Apart from it we will explore a bit important fact over smote as well. So lets start.

Smote’ object has no attribute ‘fit_sample’ ( Solution )-

Error Replication & Reason ( Optional )-

Let’s replicate the same issue with some examples.

from sklearn import datasets
import numpy as np
from imblearn.over_sampling import SMOTE
data_frame = datasets.load_breast_cancer()
X = data_frame.data
y = data_frame.target
print(X.shape,y.shape)
oversample = SMOTE()
X, y = oversample.fit_sample(X, y)
print(X.shape,y.shape)

When we run the above code, It will reproduce the same error ( no attribute ‘fit_sample’). Here is the screenshot for the same.

Smote' object has no attribute 'fit_sample' cause
Smote’ object has no attribute ‘fit_sample’ cause

How to Fix?

We need to change fit_sample()  to fit_resample() and it will run. Here is the full code with output-

Smote' object has no attribute 'fit_sample' solution
Smote’ object has no attribute ‘fit_sample’ solution

 

What Smote actually do?

In real-world data for classification etc, There is no guarantee for the balance target variable. Generally, the real data is always imbalanced. Now if train the model with any machine learning algorithm, there is a great possibility to have biases in results. Lets me tell an example If you developing a cancer detection Machine Learning model which potentially predicts the presence of cancer or not. Typically if you collect the real data it would be around more than 95 % non-cancerous and 5 % cancerous.

If we train our model with this type of original data, there is a high chance of biases in the results. Some of the cancerous patients will get the level of “non-cancerous”. This is life-threatening. Here there are two approaches we use to train our model.

The first is using the right performance matrix and the second is using smote for performing balance in data either via under-sampling or oversampling.

 

Thanks

Data Science Learner Team

 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.





Source link