Ace your next data science interview

Get better at data science interviews by solving a few questions per week.
Join 7,743 other data scientists and analysts practicing for interviews!


We will never spam. One-click unsubscribe.


How it works

1 We write questions

Get relevant data science interview questions frequently asked at top companies.

2 You solve them

Solve the problem before receiving the solution the next morning.

3 We send you the solution Premium

Check your work and get better at interviewing!


The schedule

calendar horizontal

Sample questions


Sample question 1: Statistical knowledge

Suppose there are 15 different color crayons in a box. Each time one obtains a crayon, it is equally likely to be any of the 15 types. Compute the expected # of different colors that are obtained in a set of 5 crayons. (Hint: use indicator variables and linearity of expectation)






We enumerate the crayons from 1 to 15. Let \(X_i\) indicate when the ith crayon is among the 5 crayons selected.

So,
\(E(X_i) =\) Pr {Probability that at least one type i crayon is in set of 5}
\(E(X_i) =\) 1 - Pr {no type i crayons in set of 5}
\(E(X_i) = 1 - \frac{14}{15}^5\ \)

Therefore, the expected # of crayons is:

\( = \sum_{i=1}^{25} E(X_i)\)
\( = 15[1 - \frac{14}{15}^5]\)
\( = 4.38\)






Sample question 2: Coding/computation

Given a dataframe, df, return only those rows which have missing values.
For example:

Name age favorite_color grade name
Willard Morris 20 blue Willard Morris
Al Jennings 19 red 92 Al Jennings
22 yellow 95 Omar Mullins
Spencer McDaniel 21 green 70 Spencer McDaniel

Will return...
Name age favorite_color grade name
Willard Morris 20 blue Willard Morris
22 yellow 95 Omar Mullins




#Written in Python (Pandas)

#First, we build a boolean series of the null values, using 'isnull' and 'any'

#-->df.isnull().any(axis=1) will return the series True, False, True, False

#We can then index this series against our dataframe to filter on the null values

df[df.isnull().any(axis=1)]




Sample question 3: Coding/computation

A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Given a single number, n, write a function using Python to return whether or not the number is prime. Additionally, if the inputted number is prime, save it into an array, a.






We'll set up a function below to determine whether or not a given number is prime, using simple if/else statements. Additionally, when a number is defined as prime we'll append it to our array, a.

#First, define an empty array to store prime numbers

a = []

#Define a function to identify whether or not a given number, x, is prime

def is_prime(x):

if x < 2:

#if the number is < 2, it's not prime, per definition of prime number

#(e.g. natural number greater than 1)

return False

else:

#for all other numbers >=2

for n in range(2,x):

#if divisible by two smaller #s, then not prime

if x % n == 0:

return False

#s that don't meet the above conditions are prime! save them to our array, a

a.append(x)

return True



See what others are saying

Dylan +

I've been on the mailing list since the initial beta a few months ago, and found the questions to be very helpful with my data science interview at Facebook!

Melissa +

I've been enjoying the mix of questions coming out Data Interview Qs. The balance between stats, data manipulation, classic programming questions, and SQL came in handy during my Amazon interview.

Richard +

Data Interview Qs helped me land a quantitative analyst role at Google. The ROI here is great and would recommend for anyone seeking a role in the data science space.


Used by thousands of students and industry workers