Question - Explain overfitting in big data? How to avoid the same.
Answer -
Overfitting is generally a modeling error referring to a model that is tightly fitted to the data, i.e. When a modeling function is closely fitted to a limited data set. Due to Overfitting, the predictivity of such models gets reduced. This effect leads to a decrease in generalization ability failing to generalize when applied outside the sample data.
There are several Methods to avoid Overfitting; some of them are:
- Cross-validation: A cross-validation method refers to dividing the data into multiple small test data sets, which can be used to tune the model.
- Early stopping: After a certain number of iterations, the generalizing capacity of the model weakens; in order to avoid that, a method called early stopping is used in order to avoid Overfitting before the model crosses that point.
- Regularization: this method is used to penalize all the parameters except intercept so that the model generalizes the data instead of Overfitting.