PostgreSQL’s new MERGE command

PostgreSQL version 15 introduces a new SQL command: the MERGE command. This allows merging a table into another table. The MERGE command has existed for some time in other databases such as Oracle or SQL Server.

The principle of this command is that you have a target table in which you want to insert or remove data based on a source table under certain conditions, or you want to update existing entries with data from the source table. The source table doesn’t have to be a real table, it can just as easily be a SELECT query.

How to use it, step-by-step

The command begins with MERGE INTO, followed by the name of the target table. We call it dest here:

  INTO dest ...

Then you specify the source table with USING, here we call it src:

  INTO dest
  USING src

If you want to use a SELECT query as the source instead of a real table, you can do it like this:

  INTO dest
  USING (SELECT ... FROM ...) AS src

Now you need a condition that is used to match entries from one table to entries from the other table. This is specified after ON. In this example we simply use the IDs of the two tables:

  INTO dest
  USING src

This is followed by a case distinction that describes what should happen if the condition either applies or not. The possible actions can be: UPDATE, DELETE, INSERT, or DO NOTHING.

The two cases are specified with WHEN MATCHED THEN and WHEN NOT MATCHED THEN:

  INTO dest
  USING src
    UPDATE SET ...
    INSERT (...) VALUES (...);

If a match exists, then reasonable actions are UPDATE, DELETE, or DO NOTHING. If no match exists, then reasonable actions are INSERT or DO NOTHING.

In the WHEN cases, additional conditions can be specified with AND:

  INTO dest
  USING src
  WHEN MATCHED AND dest.value > src.value THEN
    UPDATE SET ...

A realistic example

Here’s an example demonstrating a use case that might occur in the real world:

  INTO account a
  USING transaction t
  UPDATE SET balance = a.balance + t.amount
  INSERT (id, balance) VALUES (t.account_id, t.amount);

This statement processes a table of monetary transactions and applies them to their matching customer accounts by adding the amount of each transaction to the balance of the matching account. If no matching account exists it will be created and the initial balance is the amount of the first transaction.

PostgreSQL’s “DISTINCT ON” clause

Anyone who uses SQL databases knows the DISTINCT modifier for SELECT queries to get result sets without duplicates. However, PostgreSQL has another variant of it that not everyone knows, but which is very useful: the SELECT DISTINCT ON clause. It can be used to query only the first row of each set of rows according to a grouping.

To understand its usefulness, let’s look at an example and solve it in the classical way first.

The complicated way

Given the following table of items we want to query for each category the item with the highest value.

 name │ category │ value
 A    │ X        │ 52
 B    │ X        │ 35
 C    │ X        │ 52
 D    │ Y        │ 27
 E    │ Y        │ 31
 F    │ Y        │ 20

Usually we’d start out with a query like this:

  MAX(value) AS highest_value
FROM items
GROUP BY category;
category │ highest_value
 X       │ 52
 Y       │ 31

And then use this query as a sub-select:

WHERE (category, value) IN (
    MAX(value) AS highest_value
  FROM items
  GROUP BY category
 name │ category │ value
 A    │ X        │ 52
 C    │ X        │ 52
 E    │ Y        │ 31

Unfortunately, there are multiple items in category X with the same highest value 52. But we really only want one row for each category. In this case we might use the ROW_NUMBER() function:

  name, category, value
      PARTITION BY category
      ORDER BY value DESC, name
    ) AS rownum
  FROM items
) WHERE rownum = 1;
 name │ category │ value
 A    │ X        │ 52
 E    │ Y        │ 31

This is finally our desired result.

The easy way

But I promised it can be easier with the DISTINCT ON clause. How does it work?

FROM items
  category, value DESC, name;

After DISTINCT ON we specify one or more columns by which to group by in parentheses. The ORDER BY clause determines which row will be the first in each group. We get the same result:

 name │ category │ value
 A    │ X        │ 52
 E    │ Y        │ 31

Range Types in PostgreSQL

How do you store ranges in an SQL database? By ranges I mean things like price ranges, temperature ranges, date ranges for scheduling, etc. You’d probably represent them with two columns in a table, like min_price and max_price, min_temperature and max_temperature, start_date and end_date. If you want to represent an unbounded range, you’d probably make one or both columns nullable and then take NULL as +/- infinity.

If you want to test if a value is in a range you can use the BETWEEN operator:

  target_price BETWEEN min_price AND max_price;

This doesn’t work as nicely anymore if you work with unbounded ranges as described above. You’d have to add additional checks for NULL. What if you want to test if one of the ranges in the table overlaps with a given range?

  max_given >= min_price AND
  min_given <= max_price;

Did I make a mistake here? I’m not sure. What if they should overlap but not cover each other? And again, this becomes even more complicated with unbounded ranges.

Enter range types

PostgreSQL has a better solution for these problems — range types. It comes with these additional built-in data types:

  • int4range: Range of integer
  • int8range: Range of bigint
  • numrange: Range of numeric
  • tsrange: Range of timestamp without time zone
  • tstzrange: Range of timestamp with time zone
  • daterange: Range of date

You can use them as a column type in a table:

CREATE TABLE products (…, price_range numrange);


You can construct range values for these types like this:

'[2022-05-01, 2022-05-31]'::daterange
'[9:30, 12:00)'::timerange

As you can see, they use mathematical interval notation. A square bracket means inclusive bound, and a round parenthesis means exclusive bound. They can also be unbounded (infinite) or empty:


You can get the bounds of a range individually with the lower() and upper() functions:

SELECT * FROM products ORDER BY lower(price_range);


The range types become really powerful through the range operators. There are a lot, so I will only show some basic examples:

  • The && operators tests if two ranges overlap: range_a && range_b
  • The @> and <@ operators test if the first range contains the second or vice versa: range_a <@ range_b. If used with an element on one side they test if the element is in a range: element <@ range or range @> element.
  • The -|- operator tests if two ranges are adjacent: range_a -|- range_b

Additionally to these boolean tests you can also calculate new ranges based on existing ranges:

The + operator computes the union of two overlapping or adjacent ranges: range_a + range_b. The * computes the intersection of ranges, and the - operator the difference.


There is one more thing I want to mention: For each one of the range types there is also a multirange type: int4multirange, int8multirange, nummultirange, tsmultirange, tstzmultirange, datemultirange. As their names suggest, they store multiple ranges in one value:

'{[2,9), [12,20)}'::int4multirange

The mentioned range operators work with them as well.

Full-text Search with PostgreSQL

If you want to add simple text search functionality to an application backed by an SQL database one of the first things that may come to your mind is the SQL LIKE operator. The LIKE operator and its case-insensitive sibling ILIKE find substrings in text data via wildcards such as %, which matches any sequence of zero or more characters:

SELECT * FROM book WHERE title ILIKE '%dog%'.

However, this approach satisfies only very basic requirements for text search, because it only matches exact substrings. That’s why application developers often use an external search engine like Elasticsearch based on the Apache Lucene library.

With a PostgreSQL database there is another option: it comes with a built-in full-text search. A full-text search analyzes text according to the language of the text, parses it into tokens and converts them into so-called lexemes. These are strings, just like tokens, but they have been normalized so that different forms of the same word, for example “pony” and “ponies”, are made alike. Additionally, stop words are eliminated, which are words that are so common that they are useless for searching, like “a” or “the”. For this purpose the search engine uses a dictionary of the target language.

In PostgreSQL, there are two main functions to perform full-text search: they are to_tsvector and to_tsquery. The ts part in the function names stands for “text search”. The to_tsvector function breaks up the input string and creates a vector of lexemes out of it, which are then used to perform full-text search using the to_tsquery function. The two functions can be combined with the @@ (match) operator, which applies a search query to a search vector:

SELECT title
  FROM book
  WHERE to_tsvector(title) @@ to_tsquery('(cat | dog) & pony')

The query syntax of ts_query supports boolean operators like | (or), & (and), ! (not) and grouping using parentheses, but also other operators like and <-> (“followed by”) and * (prefix matching).

You can specify the target language as a parameter of to_tsvector:

# SELECT to_tsvector('english', 'Thousands of ponies were grazing on the prairie.');

'graze':5 'poni':3 'prairi':8 'thousand':1

Here’s another example in German:

# SELECT to_tsvector('german', 'Wer einen Fehler begeht, und ihn nicht korrigiert, begeht einen zweiten (Konfuzius)');

'begeht':4,9 'fehl':3 'konfuzius':12 'korrigiert':8 'wer':1 'zweit':11

PostgreSQL supports dictionaries for about 80+ languages out-of-the-box.

The examples in this article are just a small glimpse of what is possible with regards to full-text search in PostgreSQL. If you want to learn more you should consult the documentation. The key takeaway is that there is another option between simple LIKE clauses and an external search engine.

Commenting SQL database objects

Did you know that you can annotate database object like tables, views and columns with comments in many SQL database systems? By that I don’t mean comments in SQL scripts, indicated by double dashes (--), but comments attached to the objects themselves, stored in the database. These may be helpful to the database admin by providing context via a description text on what is stored in these objects.

For PostgreSQL and Oracle databases the syntax is as follows:

COMMENT ON TABLE [schema_name.]table_name IS '...';
COMMENT ON COLUMN [schema_name.]table_name.column_name IS '...';

For example:

COMMENT ON COLUMN IS 'The main author''s last name';
COMMENT ON TABLE books IS 'Contains only the best books';

These comments can be viewed in database tools like SQL Developer:

Comments on columns
Comments on tables

You can also view the comments in psql:

db=# \d+ books
 Column |  Type   |          Description
id      | integer |
author  | text    | The main author''s last name
title   | text    |

And for a table:

db=# \dt+ books
                    List of relations
 Schema | Name  | Type  |     |        Description
--------+-------+-------+ ... +------------------------------
public  | books | table |     | Contains only the best books

In Oracle you can query the comments from the data dictionary views ALL_TAB_COMMENTS and ALL_COL_COMMENTS:

> SELECT * FROM all_col_comments WHERE table_name='BOOKS';
LIBRARY	 BOOKS	     ID           (null)
LIBRARY	 BOOKS	     AUTHOR       The main author's last name
LIBRARY	 BOOKS	     TITLE        (null)

> SELECT * FROM all_tab_comments WHERE table_name='BOOKS';
LIBRARY	 BOOKS	     TABLE       Contains only the best books

In Oracle comments are limited to tables, views, materialized views, columns, operators and indextypes, but in PostgreSQL you can attach comments to nearly everything. Another good use case for this are documentation comments on database functions:

This function does something important.

Example usage:

Note: the $$ delimits multi-line strings (called dollar quoted string constants).

Linking separate PostgreSQL servers with a Foreign Data Wrapper

If you want to query a table in one database server from another you need a way to connect these two servers with each other. For PostgreSQL databases the feature that makes this possible is called Foreign Data Wrapper.

To use this feature you have to load its extension into the current database:

CREATE EXTENSION postgres_fdw;

The postgres_fdw extension ships with a default PostgreSQL installation. It allows you to connect a PosgreSQL database with another PostgreSQL database. You can connect a PostgreSQL database with databases by other vendors, too, but then you need external extensions like oracle_fdw for Oracle databases or mysql_fdw for MySQL databases. In this article we will only use postgres_fdw.

You can check if the extension was loaded successfully. The following query should return a row for the extension:

SELECT * FROM pg_extension WHERE extname='postgres_fdw';

The next step is to set up the remote server instance:

    host '',
    port '5432',
    dbname 'remotedb'

This statement registers a remote server under the name remotesrv, a name you can choose freely. You have to specify which Foreign Data Wrapper to use (postgres_fdw in this case) as well the target host, port and database name.

The CREATE SERVER statement didn’t contain any user login information, you have to provide it via a user mapping:

  SERVER remotesrv
    user 'joe',
    password 'secret'

In this case we map the remote user joe to the current user (CURRENT_USER) of the local server. It doesn’t have to be the current user, you could specify any user name.

Now you have to import tables from the remote database. You can either explicitly import individual tables or a whole schema. Here’s how to import the public schema of the remote (“foreign”) database into the public schema of the local database:

  FROM SERVER remotesrv
  INTO public;

You can restrict which tables to import with the LIMIT TO or EXCEPT clauses. The following statement will only import the tables books and students:

  LIMIT TO (students, books)
  FROM SERVER remotesrv
  INTO public;

Now you can access these tables as if they were in the local database.

Tables as types in PostgreSQL

In SQL each column of a database table has a data type. These are types like NUMBER, VARCHAR(size) / TEXT, TIMESTAMP. What you perhaps don’t know yet is that in PostgreSQL you can use tables as types. What does this mean? Let’s say we have a table:

  firstname TEXT,
  lastname  TEXT,
  email     TEXT

Now you can use this table as a type for columns in other tables:

CREATE TABLE article (
  id       SERIAL,
  content  TEXT,
  author   person,
  reviewer person

Instead of repeating the three columns of a person twice, e.g. author_firstname, author_lastname, author_email, and reviewer_firstname, reviewer_lastname, reviewer_email, the person table defined before acts as a type. Of course, the usual way in SQL is to give each person an ID and reference persons via these IDs from other tables. But sometimes you do not want this reference semantics. In this example you might want to fix the values of author and reviewer for articles in time and not retroactively update them automatically if a person changes their last name or email address later.

How to access the columns of these types? For INSERT the syntax is as follows:

INSERT INTO article (content, author, reviewer)
  VALUES ('...',
    ('Jane', 'Doe', ''),
    ('John', 'Roe', '')

Or with explicit names of the sub-columns:

INSERT INTO article (content,
  VALUES ('...',
    'Jane', 'Doe', '',
    'John', 'Roe', ''

In a SELECT query individual values can be accessed with the following syntax:

FROM article;

Of course, tables that uses other tables as data types for their columns can be used as data types again.

One last thing worth mentioning is that these nested definitions can be mapped nicely to JSON:

SELECT jsonb_pretty(to_jsonb(article)) FROM article;
  "id": 1,
  "content": "...",
  "author": {
    "email": "",
    "firstname": "Jane",
    "lastname": "Doe"
  "reviewer": {
    "email": "",
    "firstname": "John",
    "lastname": "Roe"

Geometric shapes, functions and operators in PostgreSQL

On this blog I frequently write about features of relational database systems and their SQL dialects. One feature many developers do not know about is support for geometric shapes, although a lot of RDBMs support them in one form or the other, each with its own syntax, of course. In this article I’m going to demonstrate this feature with PostgreSQL.

PostgreSQL has data types for geometric shapes like point, box, line segment, line, path, polygon, and circle. These data types are only defined for two dimensions with Euclidean (x, y) coordinates. Here are some literals for these types:

point '(3.2,4)'
box '((1,2),(6,4))'
lseg '((-4,0),(3,2))'
path '((0,0),(2,1),(5,3))'
polygon '((0,0),(1,1),(2,0),(3,1))'
circle '((5,2),1.5)'

You can create tables with columns of these types and insert shapes:

CREATE TABLE shapes (p point, c circle);

INSERT INTO shapes (p, c) VALUES
  (point '(1,0)', circle '(0,0),3'),
  (point '(10,20)', circle '(2,3),4'),
  (point '(0.5,1.5)', circle '(1,2),1');

Now you can query shapes and filter them with special operators:

SELECT * FROM shapes WHERE c @> p;

This query uses the contains operator @> in the WHERE clause. It selects all rows where the circle c contains the point p.

Here’s another operator: <-> determines the Euclidean distance between two points.

SELECT point '(0,0)' <-> point '(1,1)';
=> 2.23606797749979

The ?|| operator tests if two lines are parallel:

SELECT line '((1,2),(1,3))' ?|| line '((2,3),(2,4))';
=> true

You can translate a shape with the + operator:

SELECT box '((0,0),(1,1))' + point '(1,2)';
=> box '(2,3),(1,2)'

Or you can test with && if two shapes overlap:

SELECT box '((1,2),(4,3))' && box '(2,3),(1,2)';
=> true

This is only a small selection of geometric operators. See the full list in the official documentation. There you can also find a list of geometric functions like area , center, isclosed, npoints, etc.

SELECT area(box '((4,6),(10,12))');
=> 36

As mentioned in the beginning, other database systems support similar functionality. Check out MySQL’s spatial data types, Oracle Spatial, and MS SQL’s spatial data types.

Pagination in SQL

Pagination is the task of dividing a data set into subsequent parts of the whole data set. For example, a search engine initially only shows the first 15 results for a search query. The user can then step through the rest of the results the by clicking a “Next” button.

Ideally this feature is also supported by the underlying database system. Otherwise, the application would have to load all matching data records from the database, just to filter out the major part of of them, because the user only wanted to see page 3 of 50. A pagination request has two components: a limit and an offset. If a page contains a maximum of 15 items and page 3 is requested, then the limit would be 15 and the offset would be 30 = (page-1) × limit.

PostgreSQL, MySQL, MariaDB

The database systems PostgreSQL, MySQL and MariaDB have a straight forward syntax for pagination: LIMIT {number} OFFSET {number} . So a simple SQL query with pagination might look like this:


Oracle DB

Oracle DB didn’t have a dedicated syntax for pagination before Oracle 12c, but it was still possible to achieve the same result with other means. With Oracle 12c a new syntax for pagination was introduced under the name “Row limiting clause”. First I’ll show the old method, then the new syntax.

The old method is based on ROWNUM . If you wanted to specify both an offset and a limit, you had to nest multiple queries:

FROM (SELECT *, rownum AS rnum
      FROM (SELECT *
            FROM users
            ORDER BY name)
      WHERE rownum < 45)
WHERE rnum >= 30;

The newer row limiting clause syntax is shorter and looks as follows:


This syntax also allows the option to specify a percentage of rows instead of a fixed number of rows:


MS SQL Server

Microsoft’s SQL Server also supports the Oracle-like syntax with OFFSET and FETCH clauses and recommends the usage of this syntax for pagination.

Working with JSON data in PostgreSQL

Today most common SQL-based relational database management systems (DBMS) like PostgreSQL, MySQL, MariaDB, SQL Server and Oracle offer functionality to efficiently store and query JSON data in one form or another, with varying syntax. While a standard named SQL/JSON is in the works, it is not yet fully supported by all of these DBMS. This blog post is specific to PostgreSQL.

JSON data types

In PostgreSQL there are two data types for JSON columns: json and jsonb. The former stores JSON data as-is with any formatting preserved, while the latter stores JSON in a decomposed binary format. Operations on data in jsonb format are potentially more efficient.

We’ll use the jsonb data type in the following example to store a sequence of events, for example for an event sourcing based application, in a table.

                     event JSONB NOT NULL);

JSON literals look like string literals. Let’s insert some events:

INSERT INTO events (date, event) VALUES
  (NOW(), '{"type": "add_shelf", "payload": {"id": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}'),
  (NOW(), '{"type": "add_shelf", "payload": {"id": 2}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');


PostgreSQL has two operators for navigating a JSON structure: -> and ->>. The former accesses an object field by key and the latter accesses an object field as text. These operators can be used both in the SELECT clause and the WHERE clause:

SELECT event->>'type' AS type FROM events;
SELECT event FROM events
        WHERE event->>'type'='add_book'
          AND event->'payload'->>'shelf'='1';
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Note that in the example above the value of "shelf" is compared to a string literal ('1'). In order to treat the value as a number we have to use the CAST function, and then we can use numerical comparison operators:

SELECT event FROM events
        WHERE CAST(
          event->'payload'->>'shelf' AS INTEGER
        ) > 1;
{"type":"add_book","payload":{"shelf":2,"title":"Don Quixote"}}


Updating JSON object fields is a bit more complicated. It is only possible with the jsonb data type and can be done via the JSONB_SET function, which takes four arguments:

1) the original JSON,
2) a path specifying which object fields should be updated,
3) a jsonb value, which is the new value, and
4) a boolean flag that specifies if missing fields should be created.

In this example we convert all the event "type" fields from lower case to upper case:

UPDATE events SET event=JSONB_SET(

PostgreSQL provides a lot more operators and functions for working with JSON data. This post only covered the most basic ones. See the PostgreSQL JSON reference for more.