Step-by-step follow along | Data Series | Episode 8.4

Please consider reading Episode 8.1 if you are new to clustering algorithms.

An explanation of the Hierarchical clustering algorithm: Episode 8.3

Please consider watching this video if any section of this article is unclear.

Video Link

How to set up your programming environment can be found at the start of :
Episode 4.3

You can view and use the code and data used in this episode here: Link

Objective

(We will be working with the same dataset as Ep8.2 but will now work with 3 variables instead of 2)

Place the following data taken from iris plants into clusters to see if we can identify different plants given their sepal length, petal length and petal width.


Unsupervised Algorithms | Data Series | Episode 8.3

In the previous episode we have taken a look at the popular clustering technique called K-means clustering. In this episode we will take a look at another widely used clustering technique called Hierarchical clustering.

Please consider watching this video if any section of this article is unclear:

Video Link

What is Hierarchical clustering?

Hierarchical clustering is an unsupervised machine learning algorithm where its job is to find clusters within data. We can then use these clusters identified by the algorithm to make predictions for which group or cluster a new observation belongs to.

Overview

Similar to K-means clustering, Hierarchical clustering takes data and finds…


Step-by-step follow along | Data Series | Episode 8.2

An explanation of the K-means clustering algorithm: Episode 8.1

Please consider watching this video if any section of this article is unclear.

Video Link

How to set up your programming environment can be found at the start of :
Episode 4.3

You can view and use the code and data used in this episode here: Link

Objective

Place the following data taken from iris plants into clusters to see if we can identify different plants given their petal width and sepal length:

Image for post
Image for post
https://commons.wikimedia.org/wiki/Main_Page


Intro to Unsupervised Algorithms | Data Series | Episode 8.1

We have taken a look at linear and logistic regression and how to implement both algorithms in Python. These algorithms are examples of supervised machine learning algorithms since they take a final value output. We will now go on to look at our first unsupervised machine learning algorithm for this series.

Please consider watching this video if any section of this article is unclear:

Video Link

What is K-means clustering?

K-means clustering is an unsupervised machine learning algorithm, where its job is to find clusters within data. …


Start to Finish Logistic Regression Model | Data Series | Project 2

In this Episode we will be expanding on Logistic Regression in Python, implementing much more data pre-processing steps on a larger data set that contains both numerical and categorical data (words).

Please consider watching this video if any section of this article is unclear:

Video Link

Objective

Construct a logistic regression model to predict if it will rain tomorrow in a city in Australia.

Image for post
Image for post

Link to data and code can be found in the folder project 2 here: Github

1. Importing and Exploring our Data


Step-by-step follow along | Data Series | Episode 7.2

Consider reading Episode 7.1 before continuing, which explains how logistic regression works.

Please consider watching this video if any section of this article is unclear.

Video Link

How to set up your programming environment can be found at the start of :
Episode 4.3

You can view and use the code and data used in this episode here: Link

Objective

Predict whether it will rain tomorrow in Albury, Australia given the following data:

Image for post
Image for post

Importing our Data

  • We store our data in the variable df short for data frame.
  • df.shape gives the number of rows and columns in our data.
  • df.head displays the first few rows of data on our notebook. …


Intro to Classification Algorithms | Data Series | Episode 7.1

Logistic Regression can be thought of as an extension of Linear Regression. With Linear Regression our final output for our model took a single value, however, with logistic regression, we apply an extra function to Linear Regression that puts our final value output into a group i.e. 1 or 0

Please consider watching this video if any section of this article is unclear.

Video Link

What is Logistic Regression?

Logistic regression is a very common supervised machine learning algorithm (see Episode 3) used by Data Scientists to categorize data into groups.

Overview

The job of logistic regression is take a bunch of input data and organise the data into different groups. For example take a look at the follow table of weather data gathered from Albury, Australia. …


Start To Finish Linear Regression Model | Data Series | Project 1

This episode combines knowledge from all previous episodes to build, evaluate and improve a ridge regression model that makes predictions for weather data in Hungary, Szeged.

Video Link

You can view the code used in this Episode here: SampleCode

Objective

Construct a regression model that makes reasonable predictions for Humidity given the follow data:

Link

Image for post
Image for post

Our model should take new inputs of: Temperature, Wind-speed, Pressure e.t.c and come up with a reasonable estimate for: Humidity.

We are going to be using Jupyter Notebook and the Sci-kit learn library to construct this model. …


Testing our model’s performance | Data Series | Episode 6

Video Link

So far, when implementing all of our regression models in python, we have been using all of our data to construct our model:

Image for post
Image for post

This, however, often leads to models which overfit our data and it becomes very difficult to evaluate and make improvements to our model.

To address this problem, before creating our model, we split our data into two sections:


Explaining and solving bad models | The Data Series | Episode 5

Video Link

Underfitting and overfitting are both common problems data scientists come across when evaluating their model. It is important you are aware of these issues and what we can do resolve them.

Definitions

Underfitting: Occurs when our model fails to capture the underlying trend in our data:

Image for post
Image for post

Models which underfit our data:

  • Have a Low Variance and a High Bias
  • Tend to have less features [ 𝑥 ]
  • High-Bias: Assumes more about the form or trend our data takes
  • Low Variance: Changes to our data makes small changes to our model’s predicted values

— — — — — — — — — — — — — — — — — — — —…

About

Mazen Ahmed

Interested in Data Science? Consider giving me a follow for weekly lessons with video explanations.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store