How to Handle Missing Data
With video explanation | Data Series | Episode 15.1
6 min readNov 29, 2023
If we don’t handle missing values appropriately, this can impact the performance of our models. It is important therefore, that we are made aware of the different techniques. Handling missing values is usually not our end goal, but is involved in the pre-processing steps and can be used to improve model performance.
In this article I look to explain the methods we can use to handle missing data and the advantages and disadvantages of such methods.
Types of missing data
There are three types of missing data:
- Missing completely at random (MCAR): Missing data has nothing to do with the variable being studied. For example a blood sample gets damaged or a report is lost.
- Missing at random (MAR): Missing data can be predicted from other measured variables. For example a student is away and does not provide exam results. This could be predicted from his illness.
- Missing not at random (MNAR): Missing data is related to variables not measured: For example a low grading student does not put his maths exam results because of embarrassment.