Data Science Interview Questions and Answers
Question - 71 : - Explain the SVM machine learning algorithm in detail.
Answer - 71 : - SVM is an ML algorithm which is used for classification and regression. For classification, it finds out a muti dimensional hyperplane to distinguish between classes. SVM uses kernels which are namely linear, polynomial, and rbf. There are few parameters which need to be passed to SVM in order to specify the points to consider while the calculation of the hyperplane.
Question - 72 : - What are the various steps involved in an analytics project?
Answer - 72 : -
The steps involved in a text analytics project are:
- Data collection
- Data cleansing
- Data pre-processing
- Creation of train test and validation sets
- Model creation
- Hyperparameter tuning
- Model deployment
Question - 73 : - Explain Star Schema.
Answer - 73 : - Star schema is a data warehousing concept in which all schema is connected to a central schema.
Question - 74 : - How Regularly Must an Algorithm be Updated?
Answer - 74 : - It completely depends on the accuracy and precision being required at the point of delivery and also on how much new data we have to train on. For a model trained on 10 million rows its important to have new data with the same volume or close to the same volume. Training on 1 million new data points every alternate week, or fortnight won’t add much value in terms of increasing the efficiency of the model.
Question - 75 : - What is Collaborative Filtering?
Answer - 75 : - Collaborative filtering is a technique that can filter out items that a user might like on the basis of reactions by similar users. It works by searching a large group of people and finding a smaller set of users with tastes similar to a particular user.
Question - 76 : - How will you define the number of clusters in a clustering algorithm?
Answer - 76 : - By determining the Silhouette score and elbow method, we determine the number of clusters in the algorithm.
Question - 77 : - What is Ensemble Learning? Define types.
Answer - 77 : - Ensemble learning is clubbing of multiple weak learners (ml classifiers) and then using aggregation for result prediction. It is observed that even if the classifiers perform poorly individually, they do better when their results are aggregated. An example of ensemble learning is random forest classifier.
Question - 78 : - What are the support vectors in SVM?
Answer - 78 : - Support vectors are data points that are closer to the hyperplane and influence the position and orientation of the hyperplane. Using these support vectors, we maximise the margin of the classifier. Deleting the support vectors will change the position of the hyperplane. These are the points that help us build our SVM.
Question - 79 : - What is pruning in Decision Tree?
Answer - 79 : - Pruning is the process of reducing the size of a decision tree. The reason for pruning is that the trees prepared by the base algorithm can be prone to overfitting as they become incredibly large and complex.
Question - 80 : - What are the various classification algorithms?
Answer - 80 : - Different types of classification algorithms include logistic regression, SVM, Naive Bayes, decision trees, and random forest.