Question - How can you handle missing values in Big Data?
Answer -
Final question in our big data interview questions and answers guide. Missing values refer to the values that are not present in a column. It occurs when there’s is no data value for a variable in an observation. If missing values are not handled properly, it is bound to lead to erroneous data which in turn will generate incorrect outcomes. Thus, it is highly recommended to treat missing values correctly before processing the datasets. Usually, if the number of missing values is small, the data is dropped, but if there’s a bulk of missing values, data imputation is the preferred course of action.
In Statistics, there are different ways to estimate the missing values. These include regression, multiple data imputation, listwise/pairwise deletion, maximum likelihood estimation, and approximate Bayesian bootstrap.