Member-only story

Solving Imbalanced Data in Python

Step-by-step follow-along | Data Series | Episode 16.2

Mazen Ahmed
4 min readDec 30, 2023

In the previous episode we discussed different methods we can use to handle imbalanced data, and the advantages and disadvantages of each. In this episode we look to implement these methods in Python.

You can view and use the code and data in this episode here: Link

Objective

Implement various methods for balancing imbalanced data.

The methods we cover, include:

Undersampling Techniques

1. Random Undersampling
2. Cluster Centroids Undersampling
3. Tomek Links Undersampling

Oversampling Techniques

4. Random oversampling
5. SMOTE (
Synthetic Minority Oversampling TEchnique)

Libraries

We start by importing some general python libraries that will enable us to import and manipulate our data such as pandas and produce graphs such as seaborn.

import pandas as pd
import warnings
import seaborn as sns

warnings.filterwarnings("ignore")

Sci-kit learn and imbalanced-learn versions

--

--

Mazen Ahmed
Mazen Ahmed

No responses yet