When you're interviewing for a data scientist or data analyst role, it's highly likely you'll encounter SQL questions in your interview. Additionally, it's likely one or more of those SQL questions will require a window function to be solved. Window functions are a core concept in intermediate/advanced SQL, and mastering them will put you one step closer to landing your analytics role. In this post we'll give a quick overview of what window functions are, and then we'll dive into an example interview question along with our solution.
What are SQL window functions?
A window function defines a frame or ‘window’ of rows with a given length around the current row, and performs a calculation across the set of data in the window.
If you’re struggling with the definition above, don’t worry, it should become more clear as you put it into practice. Below we'll step through an example window function interview question that uses a simple aggregation.
Example question - LAG and LEAD in SQL window function
You are given the following table containing historical employee salaries for company XYZ:
Given the above table, can you write a SQL query to return the employees who have received at least 3 year over year raises based on the table's data?
a.employee_ID as employee_ID
# First, we'll build a subquery with employeeID, current salary, and prior year salary
# Using window function lead, we can grab the prior year's salary
# Note we could alternatively use LAG and sort our year in ascending order to achieve the same result
LEAD(salary) OVER (PARTITION BY employee_ID ORDER BY year DESC) as previous_year_sal
FROM Employee ) a
# Only pull records where the current salary is > the prior year salary
WHERE a.salary > a.previous_year_sal
GROUP BY employee_ID
# Since we want 3 consecutive years of raises,
# there should be at least 2 records (e.g. yr 1 -> yr 2, yr 2 -> yr 3)
HAVING count(*) = 2