Question - Name some outlier detection techniques.
Answer -
Again, one of the most important big data interview questions. Here are six outlier detection methods:
- Extreme Value Analysis – This method determines the statistical tails of the data distribution. Statistical methods like ‘z-scores’ on univariate data are a perfect example of extreme value analysis.
- Probabilistic and Statistical Models – This method determines the ‘unlikely instances’ from a ‘probabilistic model’ of data. A good example is the optimization of Gaussian mixture models using ‘expectation-maximization’.
- Linear Models – This method models the data into lower dimensions. Proximity-based Models – In this approach, the data instances that are isolated from the data group are determined by Cluster, Density, or by the Nearest Neighbor Analysis.
- Information-Theoretic Models – This approach seeks to detect outliers as the bad data instances that increase the complexity of the dataset.
- High-Dimensional Outlier Detection – This method identifies the subspaces for the outliers according to the distance measures in higher dimensions.