Member-only story
Cross Validation Explained
With video explanation | Data Series | Episode 6
4 min readSep 9, 2020
So far, when implementing all of our regression models in python, we have been using all of our data to construct our model:
This, however, often leads to models which overfit our data and it becomes very difficult to evaluate and make improvements to our model.
To address this problem, before creating our model, we split our data into two sections:
1. Training Dataset
- The training dataset has two subsets: A training set and a validation set (The validation set can be omitted for more simple cases).
- Training set can be thought of as the data we use to construct our model.
- Most of our data should be used in the training set as this is what provides insight into the relationship between our inputs [ Temperature, Wind Speed, Pressure] and our output Humidity.
- Depending on the performance of our…