SQL Nested Subquery Example Interview Question

Overview of Nested Queries

A SQL nested query, also known as a subquery or subselect, is a SELECT query method embedded within another query. It’s an efficient way to get the desired output that may require multiple query steps or to handle queries that are interdependent.

When to use Nested Subqueries

You can use nested subqueries in a couple different ways:

  • You can query over subqueries (e.g. selecting from a subquery)
  • You can replace 1D arrays (e.g. a typical list of items) and single field joins with a single subquery in the WHERE or HAVING clause

In order to use a subquery, there are a couple of syntactical rules that need to be followed:

  • The subquery needs to be enclosed within parentheses
  • Depending on the SQL engine you are using, you might need to alias a given subquery
  • If using in a WHERE or HAVING clause, the SELECT statement of an subquery can only return the single field being evaluated

Example SQL interview question using a nested subquery

Suppose you are given the following table showing company sales:

Table: sales_info

date sale_id sale_usd
2020-01-05 1111 93695
2020-01-07 1112 879617
2020-01-07 1113 752878
... ... ...

Calculate the cumulative percent of total sales on a given day. The output of the table should look like the example table below.

date pct_total_sales
2020-01-05 X%
2020-01-07 Y%

You can work through this example using the interactive SQL fiddle here.

Before we start writing SQL, we'll break the question into steps:

  • Calculate the daily sales total
  • Calculate the cumulative sum of the daily sales total and total sales for all days
  • Divide the daily total sales by the cumulative sum

1. Calculate the daily sales total

First, we'll write the base query, which will become a subquery in the next step. The below query calculates the daily sales total and you can interact with the query with this SQL fiddle.

SELECT 
  #we'll need to sum sale_usd by the date 
  date, 
  sum(sale_usd) as total_usd
FROM sales_info
#because we're aggregating sale_usd by date, we need 
#need to group by date
GROUP BY date

2. Calculate the cumulative sum of the daily sales total and total sales for all days

The query below calculates the cumulative sum of the daily sales total and total sales for all days. You can interact with the query below with this SQL fiddle.

You'll notice in this step that we build a query around the previous query from the step above. In order to successfully build the subquery, we needed to enclose the subquery in parentheses and ensure that we alias the table (since we're using MySQL, note that this isn't a requirement for some SQL variants). We also need to ensure that we include all fields needed for the outer query (e.g. date and total sales).

SELECT 
#in this query, we do not group by since we're using a windowing function
  date, 
    SUM(total_usd) OVER (
      ORDER BY date ASC rows 
      BETWEEN unbounded preceding and current row) 
    as cum_total, # this is a windowing function to 
            # calculate the cumulative sum
    SUM(total_usd) OVER () as total # this is a windowing function to 
                    # calculate the total
FROM(
  SELECT 
      #we'll need to sum sale_usd by the date 
      date, 
      sum(sale_usd) as total_usd
  FROM sales_info
  #because we're aggregating sale_usd by date, we need 
  #need to group by date
  GROUP BY date
) as q1 #we create an alias for this table as required by MySQL

3. Divide the cumulative total sales by the cumulative sum

The last step is to divide the cum_total by the total. We can accomplish this in the same step as above (just dividing the two windowing functions), or we can build a subquery over the previous step. The query below uses another subquery resulting in the end query having two nested subqueries. You can interact with the query below with this SQL fiddle.

SELECT 
  date,
    100 * cum_total / total as
FROM(
    #in this query, we do not group by 
    #since we're using a windowing function
    SELECT
        date, 
        SUM(total_usd) OVER (ORDER BY date ASC 
              rows BETWEEN unbounded preceding and current row)
        as cum_total, # this is a windowing function to 
                      # calculate the cumulative sum
        SUM(total_usd) OVER () as total # this is a windowing function
                                        # to calculate the total
    FROM(
      SELECT 
          #we'll need to sum sale_usd by the date 
          date, 
          sum(sale_usd) as total_usd
      FROM sales_info
      #because we're aggregating sale_usd by date, we need 
      #need to group by date
      GROUP BY date
    ) as q1 #we create an alias for this table as required by MySQL
) as q2