SQLWindow FunctionRANK()

SQL Window Function Example Interview Question - RANK()

Learn how to do SQL joins and use window functions like RANK() to solve interview questions. Provided by InterviewQs, a mailing list for coding and data interview problems.

September 1, 2020
5 mins read

When you're interviewing for a data scientist or data analyst role, it's highly likely you'll encounter SQL questions in your interview. Additionally, it's likely one or more of those SQL questions will require a window function to be solved. Window functions are a core concept in intermediate/advanced SQL, and mastering them will put you one step closer to landing your analytics role. In this post we'll give a quick overview of what window functions are, and then we'll dive into an example interview question along with our solution.

What are SQL window functions?

A window function defines a frame or ‘window’ of rows with a given length around the current row, and performs a calculation across the set of data in the window.

If you’re struggling with the definition above, don’t worry, it should become more clear as you put it into practice. Below we'll step through an example window function interview question that uses a simple aggregation.

Example question - RANK() in SQL window function

Suppose you're given the following tables called 'orders' and 'order_info'. The table 'orders' shows revenue values for unique orders along with the associated channel ('online' or 'in_store') while the table 'order_info' shows the order's ID along with its location.

Table: orders

order_idchanneldatemonthrevenue
1online2020-09-01 00:00:009100
2online2020-09-03 00:00:009125
3in_store2020-10-11 00:00:0010208
4in_store2020-08-21 00:00:00880
5online2020-08-13 00:00:008200
6online2020-08-16 00:00:008210
7in_store2020-08-16 00:00:008205
8online2020-10-11 00:00:0010215
9online2020-08-16 00:00:008203
10in_store2020-09-01 00:00:009400
11online2020-08-01 00:00:008107

Table: order_info

order_idlocation
1NYC
2NYC
3LAX
4LAX
5SEA
6AUS
7LON
8LAX
9BLD
10SEA
11AUS

Using these tables, write a SQL query to return the top 3 'online' orders and their associated locations based on revenue generated. You can assume that each order has a unique revenue value, but you should be able to highlight the implications of ties in revenue values and how you would handle that.

Click here to view these tables in an interactive SQL fiddle.

Solution: Click here to view this solution in an interactive SQL fiddle.

# SQL query to return the top 
# 3 'online' orders and their 
# associated locations based on revenue generated

# Pulling from our subquery below, 
# filtering on just the top 3 records


SELECT * 
FROM (
    SELECT 
        o.order_id,
        o.revenue,
        oi.location,
        RANK() OVER (ORDER BY o.revenue DESC) AS my_rank
    FROM orders o
    LEFT JOIN order_info oi ON o.order_id = oi.order_id
    WHERE o.channel = 'online'
) ranked_orders
WHERE my_rank <= 3;

Key Points / Notes

  1. RANK() vs DENSE_RANK() vs ROW_NUMBER()

    • RANK() assigns the same rank to tied revenue values and skips subsequent ranks.
      Example: Revenues [210, 200, 200, 190] → Ranks [1, 2, 2, 4].
    • DENSE_RANK() assigns the same rank to ties but does not skip ranks.
      Example: [210, 200, 200, 190] → Ranks [1, 2, 2, 3].
    • ROW_NUMBER() always gives a unique number, ignoring ties.
  2. Filtering with my_rank <= 3 ensures you get the top 3 revenues. Using RANK() handles ties, so if two orders tie for 2nd place, both appear.

  3. LEFT JOIN usage pulls in the location for each order. Always make sure your join key matches correctly (here, order_id) to avoid missing data.