sql server – Schneide Blog

Window functions allow users to perform calculations across a set of rows that are somehow related to the current row. This can include calculations like running totals, moving averages, and ranking without the need to group the entire query into one aggregate result.

Despite their flexibility, window functions are sometimes underutilised, either because users are unaware of them or because they’re considered too complex for everyday tasks. Learning how to effectively use window functions can improve the efficiency and readability of SQL queries, particularly for reporting and data analysis purposes. This article will explore several use cases.

Numbering Rows

The simplest application area for window functions is the numbering of rows. The ROW_NUMBER() function assigns a unique number to each row within the partition of a result set. The numbering is sequential and starts at 1. It’s useful for creating a unique identifier for rows within a partition, even when the rows are identical in terms of data.

Consider the following database table of library checkouts:

book	checkout_date	member_id
The Great Adventure	2024-02-15	102
The Great Adventure	2024-01-10	105
Mystery of the Seas	2024-01-20	103
Mystery of the Seas	2024-03-01	101
Journey Through Time	2024-02-01	104
Journey Through Time	2024-02-18	102

We want to assign a unique row number to each checkout instance for every book, ordered by the checkout date to analyze the circulation trend:

SELECT
  book,
  checkout_date,
  member_id,
  ROW_NUMBER() OVER (PARTITION BY book ORDER BY checkout_date) AS checkout_order
FROM library_checkouts;

The result:

book	checkout_date	member_id	checkout_order
The Great Adventure	2024-01-10	105	1
The Great Adventure	2024-02-15	102	2
Mystery of the Seas	2024-01-20	103	1
Mystery of the Seas	2024-03-01	101	2
Journey Through Time	2024-02-01	104	1
Journey Through Time	2024-02-18	102	2

Ranking

In the context of SQL and specifically regarding window functions, “ranking” refers to the process of assigning a unique position or rank to each row within a partition of a result set based on a specified ordering.

The RANK() function provides a ranking for each row within a partition, with gaps in the ranking sequence when there are ties. It’s useful for ranking items that have the same value.

Consider the following database table of scores in a game tournament:

player	game	score
Alice	Space Invaders	4200
Bob	Space Invaders	5700
Charlie	Space Invaders	5700
Dana	Donkey Kong	6000
Eve	Donkey Kong	4800
Frank	Donkey Kong	6000
Alice	Asteroids	8500
Bob	Asteroids	9300
Charlie	Asteroids	7600

We want to rank the players within each game based on their score, with gaps in rank for ties:

SELECT
 player, 
 game, 
  score,
  RANK() OVER (PARTITION BY game ORDER BY score DESC) AS rank
FROM scores;

The result looks like this:

player	game	score	rank
Bob	Space Invaders	5700	1
Charlie	Space Invaders	5700	1
Alice	Space Invaders	4200	3
Dana	Donkey Kong	6000	1
Frank	Donkey Kong	6000	1
Eve	Donkey Kong	4800	3
Bob	Asteroids	9300	1
Alice	Asteroids	8500	2
Charlie	Asteroids	7600	3

If you don’t want to have gaps in the ranking sequence when there are ties, you can substitute DENSE_RANK() for RANK().

Cumulative Sum

The SUM() function can be used as a window function to calculate the cumulative sum of a column over a partition of rows.

Example: We are tracking our garden’s vegetable harvest in a database table, and we want to calculate the cumulative yield for each type of vegetable over the harvesting season.

vegetable	harvest_date	yield_kg
Carrots	2024-06-18	10
Carrots	2024-07-10	15
Tomatos	2024-06-15	20
Tomatos	2024-07-01	30
Tomatos	2024-07-20	25
Zucchini	2024-06-20	15
Zucchini	2024-07-05	20

We calculate the running total (cumulative yield) for each vegetable type as the season progresses, using the SUM() function:

SELECT
 vegetable,
 harvest_date,
 yield_kg,
 SUM(yield_kg) OVER (PARTITION BY vegetable ORDER BY harvest_date ASC) AS cumulative_yield
FROM garden_harvest;

Now we can see which vegetables are most productive and how yield accumulates throughout the season:

vegetable	harvest_date	yield_kg	cumulative_yield
Carrots	2024-06-18	10	10
Carrots	2024-07-10	15	25
Tomatos	2024-06-15	20	20
Tomatos	2024-07-01	30	50
Tomatos	2024-07-20	25	75
Zucchini	2024-06-20	15	15
Zucchini	2024-07-05	20	35

Pagination is the task of dividing a data set into subsequent parts of the whole data set. For example, a search engine initially only shows the first 15 results for a search query. The user can then step through the rest of the results the by clicking a “Next” button.

Ideally this feature is also supported by the underlying database system. Otherwise, the application would have to load all matching data records from the database, just to filter out the major part of of them, because the user only wanted to see page 3 of 50. A pagination request has two components: a limit and an offset. If a page contains a maximum of 15 items and page 3 is requested, then the limit would be 15 and the offset would be 30 = (page-1) × limit.

PostgreSQL, MySQL, MariaDB

The database systems PostgreSQL, MySQL and MariaDB have a straight forward syntax for pagination: LIMIT {number} OFFSET {number} . So a simple SQL query with pagination might look like this:

SELECT * FROM users ORDER BY name LIMIT 15 OFFSET 30;

Oracle DB

Oracle DB didn’t have a dedicated syntax for pagination before Oracle 12c, but it was still possible to achieve the same result with other means. With Oracle 12c a new syntax for pagination was introduced under the name “Row limiting clause”. First I’ll show the old method, then the new syntax.

The old method is based on ROWNUM . If you wanted to specify both an offset and a limit, you had to nest multiple queries:

SELECT *
FROM (SELECT *, rownum AS rnum
      FROM (SELECT *
            FROM users
            ORDER BY name)
      WHERE rownum < 45)
WHERE rnum >= 30;

The newer row limiting clause syntax is shorter and looks as follows:

SELECT * FROM users ORDER BY name
  OFFSET 30 ROWS FETCH NEXT 15 ROWS ONLY;

This syntax also allows the option to specify a percentage of rows instead of a fixed number of rows:

SELECT * FROM users ORDER BY name
  FETCH FIRST 20 PERCENT ROWS ONLY;

MS SQL Server

Microsoft’s SQL Server also supports the Oracle-like syntax with OFFSET and FETCH clauses and recommends the usage of this syntax for pagination.

	Writing Integration… on Every Unit Test Is a Stage Pla…
	mariuselvert on C# is very strict about modify…
	Anonymous on C# is very strict about modify…
	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…

Tag: sql server

SQL Database Window Functions

Numbering Rows

Ranking

Cumulative Sum