SQL Window Function Example Interview Question - LEAD/LAG

When you're interviewing for a data scientist or data analyst role, it's highly likely you'll encounter SQL questions in your interview. Additionally, it's likely one or more of those SQL questions will require a window function to be solved. Window functions are a core concept in intermediate/advanced SQL, and mastering them will put you one step closer to landing your analytics role. In this post we'll give a quick overview of what window functions are, and then we'll dive into an example interview question along with our solution.

What are SQL window functions?

A window function defines a frame or ‘window’ of rows with a given length around the current row, and performs a calculation across the set of data in the window.

If you’re struggling with the definition above, don’t worry, it should become more clear as you put it into practice. Below we'll step through an example window function interview question that uses a simple aggregation.

Example question - LAG and LEAD in SQL window function

You are given the following table containing historical employee salaries for company XYZ:

Table: EmployeeSalaries

employee_ID salary year
1 80000 2020
1 70000 2019
1 60000 2018
2 65000 2020
2 65000 2019
2 60000 2018
3 65000 2019
3 60000 2018

Given the above table, can you write a SQL query to return the employees who have received at least 3 year over year raises based on the table's data?

Solution:

SELECT 
 a.employee_ID as employee_ID
FROM 
 # First, we'll build a subquery with employeeID, current salary, and prior year salary
 (SELECT 
     employee_ID, 
     salary, 
     # Using window function lead, we can grab the prior year's salary
     # Note we could alternatively use LAG and sort our year in ascending order to achieve the same result
     LEAD(salary) OVER (PARTITION BY employee_ID ORDER BY year DESC) as previous_year_sal 
  FROM Employee ) a

# Only pull records where the current salary is > the prior year salary
WHERE a.salary > a.previous_year_sal
GROUP BY employee_ID
# Since we want 3 consecutive years of raises, 
# there should be at least 2 records (e.g. yr 1 -> yr 2, yr 2 -> yr 3)
HAVING count(*) = 2