SQL Window Function Example Interview Question - RANK()


When you're interviewing for a data scientist or data analyst role, it's highly likely you'll encounter SQL questions in your interview. Additionally, it's likely one or more of those SQL questions will require a window function to be solved. Window functions are a core concept in intermediate/advanced SQL, and mastering them will put you one step closer to landing your analytics role. In this post we'll give a quick overview of what window functions are, and then we'll dive into an example interview question along with our solution.

What are SQL window functions?

A window function defines a frame or ‘window’ of rows with a given length around the current row, and performs a calculation across the set of data in the window.

If you’re struggling with the definition above, don’t worry, it should become more clear as you put it into practice. Below we'll step through an example window function interview question that uses a simple aggregation.




Example question - RANK() in SQL window function

Suppose you're given the following tables called 'orders' and 'order_info'. The table 'orders' shows revenue values for unique orders along with the associated channel ('online' or 'in_store') while the table 'order_info' shows the order's ID along with its location.

Table: orders

order_id channel date month revenue
1 online 2020-09-01 00:00:00 9 100
2 online 2020-09-03 00:00:00 9 125
3 in_store 2020-10-11 00:00:00 10 208
4 in_store 2020-08-21 00:00:00 8 80
5 online 2020-08-13 00:00:00 8 200
6 online 2020-08-16 00:00:00 8 210
7 in_store 2020-08-16 00:00:00 8 205
8 online 2020-10-11 00:00:00 10 215
9 online 2020-08-16 00:00:00 8 203
10 in_store 2020-09-01 00:00:00 9 400
11 online 2020-08-01 00:00:00 8 107

Table: order_info

order_id location
1 NYC
2 NYC
3 LAX
4 LAX
5 SEA
6 AUS
7 LON
8 LAX
9 BLD
10 SEA
11 AUS

Using these tables, write a SQL query to return the top 3 'online' orders and their associated locations based on revenue generated. You can assume that each order has a unique revenue value, but you should be able to highlight the implications of ties in revenue values and how you would handle that.

Click here to view these tables in an interactive SQL fiddle.


Solution:

Click here to view this solution in an interactive SQL fiddle.


# SQL query to return the top

# 3 'online' orders and their

# associated locations based on revenue generated

# Pulling from our subquery below,

# filtering on just the top 3 records

SELECT * FROM (

# Subquery to pull online orders and

# add ranking next to each based on

# revenue generated

SELECT

o.order_id as order_id,

o.revenue as revenue,

oi.location as location,

# Add a ranking in descending order,

# highest revenue first

RANK() OVER (ORDER BY sum(o.revenue) DESC) as my_rank

FROM orders o

# Join in order info (for location)

# on order_id

LEFT JOIN order_info oi

ON o.order_id = oi.order_id

# Filter where the channel is online

WHERE o.channel = "online"

group by 1,2,3) stg

WHERE my_rank <= 3


Output:

order_id revenue location my_rank
8 215 LAX 1
6 210 AUS 2
9 203 BLD 3

Lastly, note that if there were multiple unique order ids with the same revenue values, our rank query could return more than 3 rows, since there could be ties (e.g. multiple rows ranked '2'). So, to handle that, one could simply add an ORDER by my_rank ASC LIMIT 3 statement to the end of the query instead of the WHERE clause. Example SQL fiddle here.