Top 20 Machine Learning Interview Questions You Should Know

From Facebook, Amazon, Microsoft, Apple, and more!

Machine learning is an interdisciplinary field that focuses on extracting actionable insights from data in any format - to enhance decision-making skills or predicting future trends. Since it covers a lot of sub-fields that are pretty detailed as well, it's amongst the hardest field to master of this age.

Consequently, you can imagine how stressful machine learning interviews are. So, today we’ll be going through the most frequently asked questions that you’re likely to get asked when appearing for an ML-based job.

I'll divide the article into segments, so you can conveniently go through different concepts involved in machine learning. Here's how it's split up:

  • Machine Learning Fundamentals
  • Data Analysis & Visualization
  • Statistics & Probability

So, let's start without any further ado!

Machine Learning Fundamentals

Q: What steps to take when dealing with unbalanced binary classification?

There are multiple ways we use to deal with this:

  • We can start considering other evaluation metrics such as precision and recall. Using accuracy in unbalanced classes is never a great idea, and other metrics could provide an appropriate overview even if the classes are imbalanced.
  • We can increase the cost factor associated with misclassifying the minority class. This penalty will result in higher accuracy when it comes to the prediction of the minority class.
  • Lastly, more sampling could be done from the minority as compared to the majority class. It will automatically reduce the imbalance.

Q: What are regularization methods? Provide an example of the regularization method.

Regularization methods are used to reduce the overfitting of a model and discourage them from learning a very complex model, which is not easy to generalize on new testing data.

L2 regularization is a commonly used regularization method, also called ridge regression.

Q: Differentiate between supervised and unsupervised learning using appropriate examples.

Supervised learning

It's referred to the training of machine learning models where you have a labeled dataset and know what classes the data instances belong to while training the model. This way, the machine learns the data according to the labels. Example: We can show some dogs to a child and tell him that they're called dogs. Based on this, the child can learn how a dog looks; so, the next time he sees a dog, he can identify it.

Unsupervised learning

This technique involves training machine learning models when the data is not label. The model figures out the labels of the data by itself. So basically, the model is not being supervised. The technique involves grouping similar data with the assumption that it belongs to the same class. Example: You have data about customer data but not the labels. You plot the data based on some features of customers such as their preferred products, budget, age, etc. Based on all these factors, you group similar customers and assign them the same class.

Q: What are the three stages of building a machine learning model?

The three stages of building machine learning models are:

Model Building

Choosing the dataset and algorithm to use to train a machine learning model. And processes like Preprocessing, EDA, Hyperparameter tuning, etc.

Model Testing

Using the test dataset to calculate different evaluation metrics and checking if the model performs according to the expectations.

Model Deployment

Deploying the model upon the suitable platform or service.

Q: Differentiate classification and regression. How will you choose what technique to use for a dataset?

Classification and regression are separated by the nature of the target variable. Classification is used when the target variable is categorical such as predicting colors or genders, while regression is used the target variable is continuous, for example, predicting house prices or scores. So, all I have to do is check the type of the target variable to check whether I’ll use a classification algorithm or a regression algorithm.

Q: What's the difference between a generative and discriminative model?

The models that learn the categories of data and train upon them are referred to as generative models. On the other hand, a discriminative model does the learning by only learning the distinction between the data categories. However, discriminative models usually outperform the former.

Q: A given machine learning model has a high training data accuracy but a much lower accuracy when it comes to testing data. Why could be the reason? Explain what’s happening.

It's because the model is likely to be overfitted. It means that the model has 'learned' the training data far too well. Even the minor fluctuations in the data are considered by the model so it's not generalized and hence struggles to label the new data correctly, even though it was learning the training data very accurately

Q: What do you understand by the term ‘Naïve’ in the Naïve Bayes algorithm?

The Naïve Bayes algorithm is referred to as 'naïve' because it makes a naïve assumption that all the features available are independent of each other, but this is hardly the case in real-world problems since most input features are related to one another.

Q: Do you think 100 small decision trees are better than a single larger one? If so, why?

Hint: This is just another way of asking if a random forest model is better than a decision tree. Well, 100 small decision trees are definitely better than a single larger one. The ensemble method takes away many weak decision trees, which makes it a strong learner. A large number of trees are more robust, less prone to overfitting, and generally more accurate as well, since the average of outputs is employed.

Data Analysis & Visualization

Q: How do you handle datasets that have missing or corrupted values?

We can either remove the values if there aren't many. Or replace them with measures such as the mean or median. Pandas library provides various built-in functions to find missing and treat missing values like isnull() and fillna().

Q: What are some of the best visualization tools or libraries one can conveniently use for EDA?

Choosing the best visualization tool or library depends upon the specific task or dataset. Some of the most used ones are Seaborn, matplotlib, or ggplot if R is used. For the tools, Tableau and are amongst the best.

Q: What’s the difference between a histogram and a box plot?

A histogram is used to depict the frequency distributions of variables that help us understand the shape of the distribution. However, a box plot cannot be used to estimate the shape of a distribution. Instead, it shows different measures about the data distribution, such as the range that the data is split upon, the quartiles, mean, outliers, or so forth.

Q: Differentiate Bivariate and Univariate analysis? Explain with examples.

The univariate analysis involves statistical analysis involving only a single variable. Techniques involve average, sum, variance, and so on. However, bivariate analysis is when two variables are involved, and we're finding their relationship. Such as scatter plots, correlation, and so on.

Q: Provide some techniques that can be used to visualize data that has more than three dimensions.

Various techniques are available to cope with data having more than three dimensions, such as using plot facets, different colors, and depths. Also, dimensionality reduction can be applied if the dimensions are too much to visualize. It reduces the dimensions while keeping its integrity.

Q: What is data transformation? Is it necessary?

Data transformation is known as converting the data from an available format to the desired format. It's a crucial technique that's required before the data is stored in a data warehouse. Raw data is available in various formats; we must transform it according to the requirements of the destination to ensure consistency.

Statistics & Probability

Q: What’s the difference between probability and likelihood?

While probability is simply calculating the odds of an event to happen, likelihood, on the other hand, is aimed at maximizing the chances of some event happening.

Q: How do you perform hypothesis testing?

There are 5 five main steps in hypothesis testing given below:

  1. Classify your hypothesis as null or alternative.
  2. Collect the data required to test the hypothesis.
  3. Perform the statistical test to get the results.
  4. Calculate if the null hypothesis is supported or not.
  5. Present your findings corresponding to your hypothesis setting.

Q: What is known as selection bias?

Selection bias occurs during the sampling of data from a given dataset that doesn't show the true nature of the distribution of data. The bias doesn't occur by chance, and there is a systematic approach behind it which is why it's referred to as 'active'.

Q: Differentiate between a Type I and Type II error.

Type I errors occur when the null hypothesis is true - but rejected. In contrast, a type II error occurs when the null hypothesis is false but fails to get rejected. Further details on both types of errors can be found here.

Q: What meant by binomial probability distribution? Explain.

The binomial probability distribution is a kind of distribution that comprises the probabilities for the possible numbers of successes on a total of N trials - that are completely independent and have the probability of 'pi'.


That's it for today! I hope you found the questions helpful. However, that's not all. As I have already mentioned, machine learning is a vast field, and you need to keep learning to become better at it since it's still in its evolving phase.