Machine Learning Interview Questions and Answers
Question - 81 : - How to Implement the KNN Classification Algorithm?
Answer - 81 : -
Iris dataset is used for implementing the KNN classification algorithm.
# KNN classification algorithm
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
from sklearn.model_selection import train_test_split
iris_dataset=load_iris()
A_train, A_test, B_train, B_test = ztrain_test_split(iris_dataset["data"], iris_dataset["target"], random_state=0)
kn = KNeighborsClassifier(n_neighbors=1)
kn.fit(A_train, B_train)
A_new = np.array([[8, 2.5, 1, 1.2]])
prediction = kn.predict(A_new)
print("Predicted target value: {}\n".format(prediction))
print("Predicted feature name: {}\n".format
(iris_dataset["target_names"][prediction]))
print("Test score: {:.2f}".format(kn.score(A_test, B_test)))
Output:
Predicted Target Name: [0]
Predicted Feature Name: [‘ Setosa’]
Test Score: 0.92
Question - 82 : - What is F1-score and How Is It Used?
Answer - 82 : -
F-score or F1-score is a measure of overall accuracy of a binary classification model. Before understanding F1-score, it is crucial to understand two more measures of accuracy, i.e., precision and recall.
Precision is defined as the percentage of True Positives to the total number of positive classifications predicted by the model. In other words,
Precision = (No. of True Positives / No. True Positives + No. of False Positives)
Recall is defined as the percentage of True Positives to the total number of actual positive labeled data passed to the model. In other words,
Precision = (No. of True Positives / No. True Positives + No. of False Negatives)
Both precision and recall are partial measures of accuracy of a model. F1-score combines precision and recall and provides an overall score to measure a model’s accuracy.
F1-score = 2 × (Precision × Recall) / (Precision + Recall)
This is why, F1-score is the most popular measure of accuracy in any Machine-Learning-based binary classification model.
Question - 83 : - Explain False Negative, False Positive, True Negative, and True Positive with a simple example.
Answer - 83 : -
True Positive (TP): When the Machine Learning model correctly predicts the condition, it is said to have a True Positive value.
True Negative (TN): When the Machine Learning model correctly predicts the negative condition or class, then it is said to have a True Negative value.
False Positive (FP): When the Machine Learning model incorrectly predicts a negative class or condition, then it is said to have a False Positive value.
False Negative (FN): When the Machine Learning model incorrectly predicts a positive class or condition, then it is said to have a False Negative value.
Question - 84 : - Imagine you are given a dataset consisting of variables having more than 30% missing values. Let’s say, out of 50 variables, 16 variables have missing values, which is higher than 30%. How will you deal with them?
Answer - 84 : -
To deal with the missing values, we will do the following:
- We will specify a different class for the missing values.
- Now, we will check the distribution of values, and we will hold those missing values that are defining a pattern.
- Then, we will charge these values into yet another class while eliminating others.
Question - 85 : - Executing a binary classification tree algorithm is a simple task. But how does tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes?
Answer - 85 : -
Gini index and Node Entropy assist the binary classification tree to make decisions. Basically, the tree algorithm determines the feasible feature that is used to distribute data into the most genuine child nodes.
According to the Gini index, if we arbitrarily pick a pair of objects from a group, then they should be of identical class and the probability for this event should be 1.
The following are the steps to compute the Gini index:
- Compute Gini for sub-nodes with the formula: The sum of the square of probability for success and failure (p^2 + q^2)
- Compute Gini for split by weighted Gini rate of every node of the split
Now, Entropy is the degree of indecency that is given by the following:
Where a and b are the probabilities of success and failure of the node
When Entropy = 0, the node is homogenous
When Entropy is high, both groups are present at 50–50 percent in the node.
Finally, to determine the suitability of the node as a root node, the entropy should be very low.
Question - 86 : - Which metrics can be used to measure correlation of categorical data?
Answer - 86 : - Chi square test can be used for doing so. It gives the measure of correlation between categorical predictors.
Question - 87 : - Which algorithm can be used in value imputation in both categorical and continuous categories of data?
Answer - 87 : - KNN is the only algorithm that can be used for imputation of both categorical and continuous variables.
Question - 88 : - When should ridge regression be preferred over lasso?
Answer - 88 : - We should use ridge regression when we want to use all predictors and not remove any as it reduces the coefficient values but does not nullify them.
Question - 89 : - Which algorithms can be used for important variable selection?
Answer - 89 : - Random Forest, Xgboost and plot variable importance charts can be used for variable selection.
Question - 90 : - What ensemble technique is used by Random forests?
Answer - 90 : - Bagging is the technique used by Random Forests. Random forests are a collection of trees which work on sampled data from the original dataset with the final prediction being a voted average of all trees.