Ace your next data science interview

Get better at data science interviews by solving a few questions per week.
Join thousands of other data scientists and analysts practicing for interviews!

We will never spam. One-click unsubscribe.

How it works

1 We write questions

Get relevant data science interview questions frequently asked at top companies.

2 You solve them

Solve the problem before receiving the solution the next morning.

3 We send you the solution Premium

Check your work and get better at interviewing!

The schedule

calendar horizontal

Sample questions

Sample question 1: Statistical knowledge

Suppose there are 15 different color crayons in a box. Each time one obtains a crayon, it is equally likely to be any of the 15 types. Compute the expected # of different colors that are obtained in a set of 5 crayons. (Hint: use indicator variables and linearity of expectation)

We enumerate the crayons from 1 to 15. Let \(X_i\) indicate when the ith crayon is among the 5 crayons selected.

\(E(X_i) =\) Pr {Probability that at least one type i crayon is in set of 5}
\(E(X_i) =\) 1 - Pr {no type i crayons in set of 5}
\(E(X_i) = 1 - \frac{14}{15}^5\ \)

Therefore, the expected # of crayons is:

\( = \sum_{i=1}^{15} E(X_i)\)
\( = 15[1 - \frac{14}{15}^5]\)
\( = 4.38\)

Sample question 2: Coding/computation

Suppose you have a dataframe, df, with the following records:

age favorite_color grade name
0 20 blue 88 Willard Morris
1 19 blue 92 Al Jennings
2 22 yellow 95 Omar Mullins
3 21 green 70 Spencer McDaniel

The dataframe is showing information about students. Write code using Python Pandas to select the rows where the students' favorite color is blue or yellow and their grade is at least 90.

Click here to view this problem in an interactive Colab (Jupyter) notebook.

#define array of target colors

fav_color_filter = ['blue', 'yellow']

#To select rows whose column value is in an iterable array, which we defined as fav_color_filter, we can use isin

df = df.loc[df['favorite_color'].isin(fav_color_filter)]

#next, we need to filter on scores above 90. here we can use loc on our dataframe:

df = df.loc[(df['grade'] >= 90)]

#preview the dataframe


Resultant dataframe:

age favorite_color grade name
1 19 blue 92 Al Jennings
2 22 yellow 95 Omar Mullins

Click here to view this solution in an interactive Colab (Jupyter) notebook.

Sample question 3: Coding/computation

A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Given a single number, n, write a function using Python to return whether or not the number is prime. Additionally, if the inputted number is prime, save it into an array, a.

Click here to view this problem in an interactive Colab (Jupyter) notebook.

We'll set up a function below to determine whether or not a given number is prime, using simple if/else statements. Additionally, when a number is defined as prime we'll append it to our array, a.

#First, define an empty array to store prime numbers

a = []

#Define a function to identify whether or not a given number, x, is prime

def is_prime(x):

if x < 2:

#if the number is < 2, it's not prime, per definition of prime number

#(e.g. natural number greater than 1)

return False


#for all other numbers >=2

for n in range(2,x):

#if divisible by two smaller #s, then not prime

if x % n == 0:

return False

#s that don't meet the above conditions are prime! save them to our array, a


return True

Click here to view this solution in an interactive Colab (Jupyter) notebook.

See what others are saying

Dylan +

I've been on the mailing list since the initial beta, and found the questions to be very helpful with my data science interview at Facebook!

Melissa +

I've been enjoying the mix of questions coming out Data Interview Qs. The balance between stats, data manipulation, classic programming questions, and SQL came in handy during my Amazon interview.

Richard +

Data Interview Qs helped me land an analyst role at Google. The ROI here is great and would recommend for anyone seeking a role in the data science space.

Used by thousands of students and industry workers