Member-only story
Data Science Project | Clustering Mixed Data
Start to Finish Clustering Analysis | Data Series | Project 3
Link to data and code can be found in the folder project 3 here
Objective
To identify clusters of individuals that have suffered from heart failure.
The data consists of both numerical and categorical data, to cluster such data we can use K-prototypes clustering.
K-prototypes Clustering
K-prototypes work similarly to K-means clustering but works for both numerical and categorical data.
For Numerical Data (Height, Weight, Time, etc.)
K-prototypes clustering measures the distance between two numerical points using the standard Euclidian distance. For example:
In the above example, we are only dealing with data points in two-dimensional space (𝑥₁, 𝑥₂). What about for n dimensions?
More formally:
For the data points in n-dimensional Euclidian space, the distance between these two points are given as: