Step-by-step follow along | Data Series | Episode 8.2

An explanation of the K-means clustering algorithm: Episode 8.1

Please consider watching this video if any section of this article is unclear.

Video Link

How to set up your programming environment can be found at the start of :
Episode 4.3

You can view and use the code and data used in this episode here: Link

Objective

Place the following data taken from iris plants into clusters to see if we can identify different plants given their petal width and sepal length:

Image for post
Image for post


Intro to Unsupervised Algorithms | Data Series | Episode 8.1

We have taken a look at linear and logistic regression and how to implement both algorithms in Python. These algorithms are examples of supervised machine learning algorithms since they take a final value output. We will now go on to look at our first unsupervised machine learning algorithm for this series.

Please consider watching this video if any section of this article is unclear:

Video Link

What is K-means clustering?

K-means clustering is an unsupervised machine learning algorithm, which means the job of this algorithm is not to produce a value or label but instead to identify patterns or structure in data.

Overview

Imagine we recorded some data and made a scatter…


Start to Finish Logistic Regression Model | Data Series | Project 2

In this Episode we will be expanding on Logistic Regression in Python, implementing much more data pre-processing steps on a larger data set that contains both numerical and categorical data (words).

Please consider watching this video if any section of this article is unclear:

Video Link

Objective

Construct a logistic regression model to predict if it will rain tomorrow in a city in Australia.

Image for post
Image for post

Link to data and code can be found in the folder project 2 here: Github

1. Importing and Exploring our Data

Image for post
Image for post

Importing our data into python

import pandas as pd
import
numpy as np # for math operations later
df = pd.read_csv("D:\ProjectData\weatherAus.csv")print('Size of weather data frame is :',df.shape) …


Step-by-step follow along | Data Series | Episode 7.2

Consider reading Episode 7.1 before continuing, which explains how logistic regression works.

Please consider watching this video if any section of this article is unclear.

Video Link

How to set up your programming environment can be found at the start of :
Episode 4.3

You can view and use the code and data used in this episode here: Link

Objective

Predict whether it will rain tomorrow in Albury, Australia given the following data:

Image for post
Image for post

Importing our Data

  • We store our data in the variable df short for data frame.
  • df.shape gives the number of rows and columns in our data.
  • df.head displays the first few rows of data on our notebook. …


Intro to Classification Algorithms | Data Series | Episode 7.1

Logistic Regression can be thought of as an extension of Linear Regression. With Linear Regression our final output for our model took a single value, however, with logistic regression, we apply an extra function to Linear Regression that puts our final value output into a group i.e. 1 or 0

Please consider watching this video if any section of this article is unclear.

Video Link

What is Logistic Regression?

Logistic regression is a very common supervised machine learning algorithm (see Episode 3) used by Data Scientists to categorize data into groups.

Overview

The job of logistic regression is take a bunch of input data and organise the data into different groups. For example take a look at the follow table of weather data gathered from Albury, Australia. …


Start To Finish Linear Regression Model | Data Series | Project 1

This episode combines knowledge from all previous episodes to build, evaluate and improve a ridge regression model that makes predictions for weather data in Hungary, Szeged.

Video Link

You can view the code used in this Episode here: SampleCode

Objective

Construct a regression model that makes reasonable predictions for Humidity given the follow data:

Link

Image for post
Image for post

Our model should take new inputs of: Temperature, Wind-speed, Pressure e.t.c and come up with a reasonable estimate for: Humidity.

We are going to be using Jupyter Notebook and the Sci-kit learn library to construct this model. …


Testing our model’s performance | Data Series | Episode 6

Video Link

So far, when implementing all of our regression models in python, we have been using all of our data to construct our model:

Image for post
Image for post

This, however, often leads to models which overfit our data and it becomes very difficult to evaluate and make improvements to our model.

To address this problem, before creating our model, we split our data into two sections:

Image for post
Image for post

1. Training Data

  • Training data can be thought of as the data we use to construct our model.
  • Most of our data should be used as training data as this is what provides insight into the relationship between our inputs [ Temperature, Wind Speed, Pressure] and our output Humidity. …


Explaining and solving bad models | The Data Series | Episode 5

Video Link

Underfitting and overfitting are both common problems data scientists come across when evaluating their model. It is important you are aware of these issues and what we can do resolve them.

Definitions

Underfitting: Occurs when our model fails to capture the underlying trend in our data:

Image for post
Image for post

Models which underfit our data:

  • Have a Low Variance and a High Bias
  • Tend to have less features [ 𝑥 ]
  • High-Bias: Assumes more about the form or trend our data takes
  • Low Variance: Changes to our data makes small changes to our model’s predicted values

— — — — — — — — — — — — — — — — — — — —…


Step-by-step follow along | Data Series | Episode 4.7

Video Link

You can view the code used in this Episode here: SampleCode

Importing our Data

The first step is to import our data into python.

We can do that by going on the following link: Data

Click on “code” and download ZIP.

Locate WeatherDataP.csv and copy it into your local disc under a new file called ProjectData

Note: WeatherData.csv and WeahterDataM.csv were used in Simple Linear Regression and Multiple Linear Regression.

Now we are ready to import our data into our Notebook:

How to set up a new Notebook can be found at the start of Episode 4.3

Note: Keep this medium post on a split screen so you can read and implement the code yourself. …


Capturing non-linear relationships | Data Series | Episode 4.6

Video Link

This Article expands on Simple Linear Regression and Multiple Linear Regression, ensure you have a good understanding of these two topic areas before continuing.

What is Polynomial Regression?

Polynomial Regression is used to capture non-linear relationships between variables.

For example:

Image for post
Image for post

For linear relationships we use Linear Regression.

Overview

Take a look at the following graph looking at the Humidity and Pressure values in Svged, Hungary. [ Yes i like Weather data :) ]

Image for post
Image for post
  • We can see there is a trend in the data, which is non-linear so we use Polynomial Regression
  • The job of Polynomial regression is to find a suitable relationship between Humidity and Pressure, such as the…

About

Mazen Ahmed

Interested in Data Science? Consider giving me a follow for weekly lessons with video explanations.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store