Member-only story
Data Science Project | Predicting Water Quality
Start to Finish Gradient Boosted Trees | Data Series | Project 4
Viewing Episodes 11.7 — Gradient Boosted Trees for Classification and
12.1 — How to Optimize Hyperparameters for Machine Learning Models may be helpful for this project.
You can view and use the code and data in this episode here: Link
Overview
In this episode we go through a data science pipeline with the following structure:
1 - Data Exploration
2 - Modelling and Model Optimization
3 - Model Evaluation
4 - Feature Importance
For section 2 we look at implementing a gradient boosting algorithm built by Microsoft called LightGBM.
The package can be downloaded from PyPI using the following command:
pip install lightgbm
Objective
To predict whether water is potable (safe for human consumption).