Member-only story

Data Science Project | Predicting Water Quality

Start to Finish Gradient Boosted Trees | Data Series | Project 4

Mazen Ahmed
7 min readDec 31, 2022

Viewing Episodes 11.7 — Gradient Boosted Trees for Classification and
12.1 — How to Optimize Hyperparameters for Machine Learning Models may be helpful for this project.

You can view and use the code and data in this episode here: Link

Overview

In this episode we go through a data science pipeline with the following structure:

1 - Data Exploration
2 - Modelling and Model Optimization
3 - Model Evaluation
4 - Feature Importance

For section 2 we look at implementing a gradient boosting algorithm built by Microsoft called LightGBM.

The package can be downloaded from PyPI using the following command:

pip install lightgbm

Objective

To predict whether water is potable (safe for human consumption).

1. Data Exploration

--

--

Mazen Ahmed
Mazen Ahmed

No responses yet