SQL Database Window Functions

Window functions allow users to perform calculations across a set of rows that are somehow related to the current row. This can include calculations like running totals, moving averages, and ranking without the need to group the entire query into one aggregate result.

Despite their flexibility, window functions are sometimes underutilised, either because users are unaware of them or because they’re considered too complex for everyday tasks. Learning how to effectively use window functions can improve the efficiency and readability of SQL queries, particularly for reporting and data analysis purposes. This article will explore several use cases.

Numbering Rows

The simplest application area for window functions is the numbering of rows. The ROW_NUMBER() function assigns a unique number to each row within the partition of a result set. The numbering is sequential and starts at 1. It’s useful for creating a unique identifier for rows within a partition, even when the rows are identical in terms of data.

Consider the following database table of library checkouts:

bookcheckout_datemember_id
The Great Adventure2024-02-15102
The Great Adventure2024-01-10105
Mystery of the Seas2024-01-20103
Mystery of the Seas2024-03-01101
Journey Through Time2024-02-01104
Journey Through Time2024-02-18102

We want to assign a unique row number to each checkout instance for every book, ordered by the checkout date to analyze the circulation trend:

SELECT
 book,
checkout_date,
member_id,
ROW_NUMBER() OVER (PARTITION BY book ORDER BY checkout_date) AS checkout_order
FROM library_checkouts;

The result:

bookcheckout_datemember_idcheckout_order
The Great Adventure2024-01-101051
The Great Adventure2024-02-151022
Mystery of the Seas2024-01-201031
Mystery of the Seas2024-03-011012
Journey Through Time2024-02-011041
Journey Through Time2024-02-181022

Ranking

In the context of SQL and specifically regarding window functions, “ranking” refers to the process of assigning a unique position or rank to each row within a partition of a result set based on a specified ordering.

The RANK() function provides a ranking for each row within a partition, with gaps in the ranking sequence when there are ties. It’s useful for ranking items that have the same value.

Consider the following database table of scores in a game tournament:

playergamescore
AliceSpace Invaders4200
BobSpace Invaders5700
CharlieSpace Invaders5700
DanaDonkey Kong6000
EveDonkey Kong4800
FrankDonkey Kong6000
AliceAsteroids8500
BobAsteroids9300
CharlieAsteroids7600

We want to rank the players within each game based on their score, with gaps in rank for ties:

SELECT
 player,
 game,
score,
RANK() OVER (PARTITION BY game ORDER BY score DESC) AS rank
FROM scores;

The result looks like this:

playergamescorerank
BobSpace Invaders57001
CharlieSpace Invaders57001
AliceSpace Invaders42003
DanaDonkey Kong60001
FrankDonkey Kong60001
EveDonkey Kong48003
BobAsteroids93001
AliceAsteroids85002
CharlieAsteroids76003

If you don’t want to have gaps in the ranking sequence when there are ties, you can substitute DENSE_RANK() for RANK().

Cumulative Sum

The SUM() function can be used as a window function to calculate the cumulative sum of a column over a partition of rows.

Example: We are tracking our garden’s vegetable harvest in a database table, and we want to calculate the cumulative yield for each type of vegetable over the harvesting season.

vegetableharvest_dateyield_kg
Carrots2024-06-1810
Carrots2024-07-1015
Tomatos2024-06-1520
Tomatos2024-07-0130
Tomatos2024-07-2025
Zucchini2024-06-2015
Zucchini2024-07-0520

We calculate the running total (cumulative yield) for each vegetable type as the season progresses, using the SUM() function:

SELECT
 vegetable,
harvest_date,
yield_kg,
SUM(yield_kg) OVER (PARTITION BY vegetable ORDER BY harvest_date ASC) AS cumulative_yield
FROM garden_harvest;

Now we can see which vegetables are most productive and how yield accumulates throughout the season:

vegetableharvest_dateyield_kgcumulative_yield
Carrots2024-06-181010
Carrots2024-07-101525
Tomatos2024-06-152020
Tomatos2024-07-013050
Tomatos2024-07-202575
Zucchini2024-06-201515
Zucchini2024-07-052035

Pagination in SQL

Pagination is the task of dividing a data set into subsequent parts of the whole data set. For example, a search engine initially only shows the first 15 results for a search query. The user can then step through the rest of the results the by clicking a “Next” button.

Ideally this feature is also supported by the underlying database system. Otherwise, the application would have to load all matching data records from the database, just to filter out the major part of of them, because the user only wanted to see page 3 of 50. A pagination request has two components: a limit and an offset. If a page contains a maximum of 15 items and page 3 is requested, then the limit would be 15 and the offset would be 30 = (page-1) × limit.

PostgreSQL, MySQL, MariaDB

The database systems PostgreSQL, MySQL and MariaDB have a straight forward syntax for pagination: LIMIT {number} OFFSET {number} . So a simple SQL query with pagination might look like this:

SELECT * FROM users ORDER BY name LIMIT 15 OFFSET 30;

Oracle DB

Oracle DB didn’t have a dedicated syntax for pagination before Oracle 12c, but it was still possible to achieve the same result with other means. With Oracle 12c a new syntax for pagination was introduced under the name “Row limiting clause”. First I’ll show the old method, then the new syntax.

The old method is based on ROWNUM . If you wanted to specify both an offset and a limit, you had to nest multiple queries:

SELECT *
FROM (SELECT *, rownum AS rnum
      FROM (SELECT *
            FROM users
            ORDER BY name)
      WHERE rownum < 45)
WHERE rnum >= 30;

The newer row limiting clause syntax is shorter and looks as follows:

SELECT * FROM users ORDER BY name
  OFFSET 30 ROWS FETCH NEXT 15 ROWS ONLY;

This syntax also allows the option to specify a percentage of rows instead of a fixed number of rows:

SELECT * FROM users ORDER BY name
  FETCH FIRST 20 PERCENT ROWS ONLY;

MS SQL Server

Microsoft’s SQL Server also supports the Oracle-like syntax with OFFSET and FETCH clauses and recommends the usage of this syntax for pagination.