Member-only story

Data Science Project | Clustering Mixed Data

Start to Finish Clustering Analysis | Data Series | Project 3

8 min readDec 29, 2021

Link to data and code can be found in the folder project 3 here

To identify clusters of individuals that have suffered from heart failure.

The data consists of both numerical and categorical data, to cluster such data we can use K-prototypes clustering.

K-prototypes work similarly to K-means clustering but works for both numerical and categorical data.

For Numerical Data (Height, Weight, Time, etc.)

K-prototypes clustering measures the distance between two numerical points using the standard Euclidian distance. For example:

In the above example, we are only dealing with data points in two-dimensional space (𝑥₁, 𝑥₂). What about for n dimensions?

More formally:

For the data points in n-dimensional Euclidian space, the distance between these two points are given as: