Machine Learning Interview Questions and Answers
Question - 101 : - What is the 68 per cent rule in normal distribution?
Answer - 101 : -
It is a test result which wrongly indicates that a particular condition or attribute is present.
Example – “Stress testing, a routine diagnostic tool used in detecting heart disease, results in a significant number of false positives in women”
Question - 102 : - What is the error term composed of in regression?
Answer - 102 : - Error is a sum of bias error+variance error+ irreducible error in regression. Bias and variance error can be reduced but not the irreducible error.
Question - 103 : - Which performance metric is better R2 or adjusted R2?
Answer - 103 : - Adjusted R2 because the performance of predictors impacts it. R2 is independent of predictors and shows performance improvement through increase if the number of predictors is increased.
Question - 104 : - What’s the difference between Type I and Type II error?
Answer - 104 : -
Type I and Type II error in machine learning refers to false values. Type I is equivalent to a False positive while Type II is equivalent to a False negative. In Type I error, a hypothesis which ought to be accepted doesn’t get accepted. Similarly, for Type II error, the hypothesis gets rejected which should have been accepted in the first place.
Question - 105 : - What do you understand by L1 and L2 regularization?
Answer - 105 : -
L2 regularization: It tries to spread error among all the terms. L2 corresponds to a Gaussian prior.
L1 regularization: It is more binary/sparse, with many variables either being assigned a 1 or 0 in weighting. L1 corresponds to setting a Laplacean prior on the terms.
Question - 106 : - What do you mean by AUC curve?
Answer - 106 : -
AUC (area under curve). Higher the area under the curve, better the prediction power of the model.
Question - 107 : - What are the advantages of SVM algorithms?
Answer - 107 : -
SVM algorithms have basically advantages in terms of complexity. First I would like to clear that both Logistic regression as well as SVM can form non linear decision surfaces and can be coupled with the kernel trick. If Logistic regression can be coupled with kernel then why use SVM?
● SVM is found to have better performance practically in most cases.
● SVM is computationally cheaper O(N^2*K) where K is no of support vectors (support vectors are those points that lie on the class margin) where as logistic regression is O(N^3)
● Classifier in SVM depends only on a subset of points . Since we need to maximize distance between closest points of two classes (aka margin) we need to care about only a subset of points unlike logistic regression.
Question - 108 : - Why does XGBoost perform better than SVM?
Answer - 108 : -
First reason is that XGBoos is an ensemble method that uses many trees to make a decision so it gains power by repeating itself.
SVM is a linear separator, when data is not linearly separable SVM needs a Kernel to project the data into a space where it can separate it, there lies its greatest strength and weakness, by being able to project data into a high dimensional space SVM can find a linear separation for almost any data but at the same time it needs to use a Kernel and we can argue that there’s not a perfect kernel for every dataset.
Question - 109 : - How is linear classifier relevant to SVM?
Answer - 109 : -
An svm is a type of linear classifier. If you don’t mess with kernels, it’s arguably the most simple type of linear classifier.
Linear classifiers (all?) learn linear fictions from your data that map your input to scores like so: scores = Wx + b. Where W is a matrix of learned weights, b is a learned bias vector that shifts your scores, and x is your input data. This type of function may look familiar to you if you remember y = mx + b from high school.
A typical svm loss function ( the function that tells you how good your calculated scores are in relation to the correct labels ) would be hinge loss. It takes the form: Loss = sum over all scores except the correct score of max(0, scores – scores(correct class) + 1).
Question - 110 : - In what real world applications is Naive Bayes classifier used?
Answer - 110 : -
Some of real world examples are as given below
- To mark an email as spam, or not spam?
- Classify a news article about technology, politics, or sports?
- Check a piece of text expressing positive emotions, or negative emotions?
- Also used for face recognition software