Get better at data science interviews by solving a few questions per week.

Join **3,527** other data scientists and analysts practicing for interviews!

We will never spam. One-click unsubscribe.

1 We write questions

Get relevant questions frequently asked at top companies.

2 You solve them

Solve the problem before receiving the solution the next morning.

3 We send you the solution Premium

Check your work and get better at interviewing!

Sample question 1: Statistical knowledge

Suppose there are 15 different color crayons in a box. Each time one obtains a crayon, it is equally likely to be any of the 15 types. Compute the expected # of different colors that are obtained in a set of 5 crayons. (Hint: use indicator variables and linearity of expectation)

We enumerate the crayons from 1 to 15. Let \(X_i\) indicate when the ith crayon is among the 5 crayons selected.

So,

\(E(X_i) =\) Pr {Probability that at least one type i crayon is in set of 5}

\(E(X_i) =\) 1 - Pr {no type i crayons in set of 5}

\(E(X_i) = 1 - \frac{14}{15}^5\ \)

Therefore, the expected # of crayons is:

\( = \sum_{i=1}^{25} E(X_i)\)

\( = 15[1 - \frac{14}{15}^5]\)

\( = 4.38\)

Sample question 2: Coding/computation

Given a dataframe, df, return only those rows which have missing values.

For example:

Name | age | favorite_color | grade | name |
---|---|---|---|---|

Willard Morris | 20 | blue | Willard Morris | |

Al Jennings | 19 | red | 92 | Al Jennings |

22 | yellow | 95 | Omar Mullins | |

Spencer McDaniel | 21 | green | 70 | Spencer McDaniel |

Will return...

Name | age | favorite_color | grade | name |
---|---|---|---|---|

Willard Morris | 20 | blue | Willard Morris | |

22 | yellow | 95 | Omar Mullins |

#Written in Python (Pandas)

#First, we build a boolean series of the null values, using 'isnull' and 'any'

#-->df.isnull().any(axis=1) will return the series True, False, True, False

#We can then index this series against our dataframe to filter on the null values

df[df.isnull().any(axis=1)]

Sample question 3: Coding/computation

A prime number is a natural number greater than 1 that cannot be formed by multiplying two smaller natural numbers. Given a single number, *n*, write a function using Python to return whether or not the number is prime. Additionally, if the inputted number is prime, save it into an array, *a*.

We'll set up a function below to determine whether or not a given number is prime, using simple if/else statements. Additionally, when a number is defined as prime we'll append it to our array, a.

#First, define an empty array to store prime numbers

a = []

#Define a function to identify whether or not a given number, x, is prime

def is_prime(x):

if x < 2:

#if the number is < 2, it's not prime, per definition of prime number

#(e.g. natural number greater than 1)

return False

else:

#for all other numbers >=2

for n in range(2,x):

#if divisible by two smaller #s, then not prime

if x % n == 0:

return False

#s that don't meet the above conditions are prime! save them to our array, a

a.append(x)

return True

**Dylan + **

I've been on the mailing list since the initial beta a few months ago, and found the questions to be very helpful with my data science interview at Facebook!

**Melissa + **

I've been enjoying the mix of questions coming out Data Interview Qs. The balance between stats, data manipulation, classic programming questions, and SQL came in handy during my Amazon interview.

**Richard + **

Data Interview Qs helped me land a quantitative analyst role at Google. The ROI here is great and would recommend for anyone seeking a role in the data science space.