Member-only story
Solving Imbalanced Data in Python
Step-by-step follow-along | Data Series | Episode 16.2
In the previous episode we discussed different methods we can use to handle imbalanced data, and the advantages and disadvantages of each. In this episode we look to implement these methods in Python.
You can view and use the code and data in this episode here: Link
Objective
Implement various methods for balancing imbalanced data.
The methods we cover, include:
Undersampling Techniques
1. Random Undersampling
2. Cluster Centroids Undersampling
3. Tomek Links Undersampling
Oversampling Techniques
4. Random oversampling
5. SMOTE (Synthetic Minority Oversampling TEchnique)
Libraries
We start by importing some general python libraries that will enable us to import and manipulate our data such as pandas and produce graphs such as seaborn.
import pandas as pd
import warnings
import seaborn as sns
warnings.filterwarnings("ignore")
Sci-kit learn and imbalanced-learn versions