Ace your next data science interview
Get better at data science interviews by solving a few questions per week.
Join 4,291 other data scientists and analysts practicing for interviews!
We will never spam. One-click unsubscribe.
How it works
1 We write questions
Get relevant questions frequently asked at top companies.
2 You solve them
Solve the problem before receiving the solution the next morning.
3 We send you the solution
Premium
Check your work and get better at interviewing!
The schedule
Sample questions
Sample question 1: Statistical knowledge
Suppose there are 15 different color crayons in a box. Each time one obtains a crayon, it is equally likely to be any of the 15 types. Compute the expected # of different colors that are obtained in a set of 5 crayons. (Hint: use indicator variables and linearity of expectation)
We enumerate the crayons from 1 to 15. Let \(X_i\) indicate when the ith crayon is among the 5 crayons selected.
So,
\(E(X_i) =\) Pr {Probability that at least one type i crayon is in set of 5}
\(E(X_i) =\) 1 - Pr {no type i crayons in set of 5}
\(E(X_i) = 1 - \frac{14}{15}^5\ \)
Therefore, the expected # of crayons is:
\( = \sum_{i=1}^{25} E(X_i)\)
\( = 15[1 - \frac{14}{15}^5]\)
\( = 4.38\)
Sample question 2: Coding/computation
Given a dataframe, df, return only those rows which have missing values.
For example:
| Name |
age |
favorite_color |
grade |
name |
| Willard Morris |
20 |
blue |
|
Willard Morris |
| Al Jennings |
19 |
red |
92 |
Al Jennings |
|
22 |
yellow |
95 |
Omar Mullins |
| Spencer McDaniel |
21 |
green |
70 |
Spencer McDaniel |
Will return...
| Name |
age |
favorite_color |
grade |
name |
| Willard Morris |
20 |
blue |
|
Willard Morris |
|
22 |
yellow |
95 |
Omar Mullins |
#Written in Python (Pandas)
#First, we build a boolean series of the null values, using 'isnull' and 'any'
#-->df.isnull().any(axis=1) will return the series True, False, True, False
#We can then index this series against our dataframe to filter on the null values
df[df.isnull().any(axis=1)]
Sample question 3: Coding/computation
A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Given a single number, n, write a function using Python to return whether or not the number is prime. Additionally, if the inputted number is prime, save it into an array, a.
We'll set up a function below to determine whether or not a given number is prime, using simple if/else statements. Additionally, when a number is defined as prime we'll append it to our array, a.
#First, define an empty array to store prime numbers
a = []
#Define a function to identify whether or not a given number, x, is prime
def is_prime(x):
if x < 2:
#if the number is < 2, it's not prime, per definition of prime number
#(e.g. natural number greater than 1)
return False
else:
#for all other numbers >=2
for n in range(2,x):
#if divisible by two smaller #s, then not prime
if x % n == 0:
return False
#s that don't meet the above conditions are prime! save them to our array, a
a.append(x)
return True
Testimonials
Dylan + 
I've been on the mailing list since the initial beta a few months ago, and found the questions to be very helpful with my data science interview at Facebook!
Melissa + 
I've been enjoying the mix of questions coming out Data Interview Qs. The balance between stats, data manipulation, classic programming questions, and SQL came in handy during my Amazon interview.
Richard + 
Data Interview Qs helped me land a quantitative analyst role at Google. The ROI here is great and would recommend for anyone seeking a role in the data science space.