Please consider reading Episode 8.1 if you are new to clustering algorithms.
An explanation of the Hierarchical clustering algorithm: Episode 8.3
Please consider watching this video if any section of this article is unclear.
How to set up your programming environment can be found at the start of :
Episode 4.3
You can view and use the code and data used in this episode here: Link
(We will be working with the same dataset as Ep8.2 but will now work with 3 variables instead of 2)
Place the following data taken from iris plants into clusters to see if we can identify different plants given their sepal length, petal length and petal width. …
In the previous episode we have taken a look at the popular clustering technique called K-means clustering. In this episode we will take a look at another widely used clustering technique called Hierarchical clustering.
Please consider watching this video if any section of this article is unclear:
Hierarchical clustering is an unsupervised machine learning algorithm where its job is to find clusters within data. We can then use these clusters identified by the algorithm to make predictions for which group or cluster a new observation belongs to.
Similar to K-means clustering, Hierarchical clustering takes data and finds…
An explanation of the K-means clustering algorithm: Episode 8.1
Please consider watching this video if any section of this article is unclear.
How to set up your programming environment can be found at the start of :
Episode 4.3
You can view and use the code and data used in this episode here: Link
Place the following data taken from iris plants into clusters to see if we can identify different plants given their petal width and sepal length:
We have taken a look at linear and logistic regression and how to implement both algorithms in Python. These algorithms are examples of supervised machine learning algorithms since they take a final value output. We will now go on to look at our first unsupervised machine learning algorithm for this series.
Please consider watching this video if any section of this article is unclear:
K-means clustering is an unsupervised machine learning algorithm, where its job is to find clusters within data. …
In this Episode we will be expanding on Logistic Regression in Python, implementing much more data pre-processing steps on a larger data set that contains both numerical and categorical data (words).
Please consider watching this video if any section of this article is unclear:
Construct a logistic regression model to predict if it will rain tomorrow in a city in Australia.
Link to data and code can be found in the folder project 2 here: Github
Consider reading Episode 7.1 before continuing, which explains how logistic regression works.
Please consider watching this video if any section of this article is unclear.
How to set up your programming environment can be found at the start of :
Episode 4.3
You can view and use the code and data used in this episode here: Link
Predict whether it will rain tomorrow in Albury, Australia given the following data:
Logistic Regression can be thought of as an extension of Linear Regression. With Linear Regression our final output for our model took a single value, however, with logistic regression, we apply an extra function to Linear Regression that puts our final value output into a group i.e. 1 or 0
Please consider watching this video if any section of this article is unclear.
Logistic regression is a very common supervised machine learning algorithm (see Episode 3) used by Data Scientists to categorize data into groups.
The job of logistic regression is take a bunch of input data and organise the data into different groups. For example take a look at the follow table of weather data gathered from Albury, Australia. …
This episode combines knowledge from all previous episodes to build, evaluate and improve a ridge regression model that makes predictions for weather data in Hungary, Szeged.
You can view the code used in this Episode here: SampleCode
Construct a regression model that makes reasonable predictions for Humidity given the follow data:
Our model should take new inputs of: Temperature, Wind-speed, Pressure e.t.c and come up with a reasonable estimate for: Humidity.
We are going to be using Jupyter Notebook and the Sci-kit learn library to construct this model. …
So far, when implementing all of our regression models in python, we have been using all of our data to construct our model:
This, however, often leads to models which overfit our data and it becomes very difficult to evaluate and make improvements to our model.
To address this problem, before creating our model, we split our data into two sections:
Underfitting and overfitting are both common problems data scientists come across when evaluating their model. It is important you are aware of these issues and what we can do resolve them.
Underfitting: Occurs when our model fails to capture the underlying trend in our data:
Models which underfit our data:
— — — — — — — — — — — — — — — — — — — —…
About