Inline and Implicit Foreign Key Constraints in SQL

Foreign key constraints are a key part of database design, ensuring that relationships between tables are consistent and reliable. They create a relationship between two tables, ensuring that data matches across them. For example, a column in an “Orders” table (like CustomerID) might reference a column in a “Customers” table. This guarantees that every order belongs to a valid customer.

In earlier versions of SQL systems, defining foreign key constraints often required separate ALTER TABLE statements after the table was created:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL,
  OrderDate  DATE
);

ALTER TABLE Orders
ADD CONSTRAINT FK_Customer FOREIGN KEY (CustomerID)
REFERENCES Customers(CustomerID);

This two-step process was prone to errors and required careful management to ensure all constraints were applied correctly.

Inline Foreign Key Constraints

Most of the popular SQL database systems – PostgreSQL, Oracle, SQL Server, and MySQL since version 9.0, released in July 2024 – now support inline foreign key constraints. This means you can define the relationship directly in the column definition, making table creation easier to read:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL REFERENCES Customers(CustomerID),
  OrderDate  DATE
);

Fortunately, this syntax is the same across these systems. However, MySQL 9 additionally supports implicit foreign key constraints:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL REFERENCES Customers,
  OrderDate  DATE
);

By leaving out the (CustomerID) in the REFERENCES clause it will assume that you want to reference the primary key of the parent table. This syntax is unique to MySQL, and you should avoid it if you need to write SQL DDL statements that works across multiple database systems.

Dependent Subqueries and LATERAL in PostgreSQL

When working with databases, you often need to run a query that depends on results from another query. PostgreSQL offers two main ways to handle this: Dependent Subqueries and LATERAL joins.

A dependent subquery is like a mini-query inside another query. The inner query (subquery) depends on the outer query for its input. For every row in the outer query, the subquery runs separately.

Imagine you have two tables: customers, which holds information about customers (e.g., their id and name), and orders, which holds information about orders (e.g., customer_id, order_date, and amount). Now, you want to find the latest order date for each customer. You can use a dependent subquery like this:

SELECT id AS customer_id,
  name,
  (SELECT order_date
     FROM orders
    WHERE customer_id=customers.id
    ORDER BY order_date DESC
    LIMIT 1) AS latest_order_date
FROM customers;

For each customer in the customers table, the subquery looks for their orders in the orders table. The subquery sorts the orders by date (ORDER BY order_date DESC) and picks the most recent one (LIMIT 1).

This works, but it has a drawback: If you have many customers, this approach can be slow because the subquery runs once for every customer.

LATERAL join

A LATERAL join is a smarter way to solve the same problem. It allows you to write a subquery that depends on the outer query, but instead of repeating the subquery for every row, PostgreSQL handles it more efficiently.

Let’s solve the “latest order date” problem using LATERAL:

SELECT
  c.id AS customer_id,
  c.name,
  o.order_date AS latest_order_date
FROM customers c
LEFT JOIN LATERAL (
  SELECT order_date
   FROM orders
  WHERE orders.customer_id=c.id
  ORDER BY order_date DESC
  LIMIT 1
) o ON TRUE;

For each customer (c.id), the LATERAL subquery finds the latest order in the orders table. The LEFT JOIN ensures that customers with no orders are included in the results, with NULL for the latest_order_date.

It’s easier to read and often faster, especially for large datasets, because PostgreSQL optimizes it better.

Both dependent subqueries and LATERAL joins allow you to handle scenarios where one query depends on the results of another. Dependent subqueries are straightforward and good enough for simple tasks with small datasets. However, you should consider using LATERAL for large datasets where performance matters.

JSON as a table in PostgreSQL 17

Some time ago, I described in a blog post how to work with JSON data in a PostgreSQL database. Last month (September 2024), PostgreSQL 17 was released, which offers another feature for working with JSON data: the JSON_TABLE() function. Such a function already existed in other database systems, such as MySQL and Oracle.

The core idea behind JSON_TABLE() is to temporarily transform JSON data into a relational format that can be queried using standard SQL commands. In doing so, developers can apply the full power of SQL, such as filtering, sorting, aggregating, and joining, to data that originally exists in JSON format: It enables you to use SQL to query JSON data stored in a column as though it were relational data, and you can join JSON data with regular relational tables for a seamless mix of structured and semi-structured data.

Syntax

The syntax for JSON_TABLE looks a bit complex at first glance, but once you break it down, it becomes quite clear. Here’s the basic structure:

JSON_TABLE(
  json_doc, path
  COLUMNS (
    column_name type PATH 'path_to_value' [DEFAULT default_value ON ERROR],
    ...
  )
) AS alias_name

The json_doc is the JSON data you’re querying. It can be a JSON column or a JSON string. It is followed by a path expression, which describes the location in the JSON document that you want to process. Path expressions are specified as strings in single quotation marks and have a special syntax called JSONPath (loosely inspired by XPath for XML). For example, such an expression could look like this: '$.store.book[0].title'. The dollar sign represents the root of the JSON tree, the sub-properties are separated by dots, and array elements are referenced using square brackets.

This is followed by the COLUMNS keyword. It specifies the columns to be extracted from the JSON document. Each column is specified by a name of the column in the resulting table, a data type of the extracted value (e.g., VARCHAR, INT), followed by the PATH keyword and a JSON path expression that references a JSON value below the original path expression.

The DEFAULT keyword optionally provides a default value if the path does not exist or if there’s an error during extraction.

After the JSON_TABLE function call you can specify the alias name for the table with the AS keyword.

Example

It’s time for an example. Let’s assume we have a table named fruit_store with a column named json_details that holds the some JSON data about fruits:

INSERT INTO fruit_store (id, json_details) VALUES (
    1,
    '{
        "category": "Citrus",
        "fruits": [
            {
                "name": "Orange",
                "color": "Orange",
                "taste": "Sweet",
                "price_per_kg": 3.5
            },
            {
                "name": "Lemon",
                "color": "Yellow",
                "taste": "Sour",
                "price_per_kg": 4.0
            }
        ]
    }'
);

Now we can use the JSON_TABLE function to extract the details of each fruit from the JSON document. The goal is to retrieve the fruit name, color, taste, and price for each fruit in the array. Here’s the query:

SELECT *
  FROM fruit_store,
  JSON_TABLE(
    json_details, '$.fruits[*]'
    COLUMNS (
      fruit_name   VARCHAR(50)  PATH '$.name',
      fruit_color  VARCHAR(50)  PATH '$.color',
      fruit_taste  VARCHAR(50)  PATH '$.taste',
      price_per_kg DECIMAL(5,2) PATH '$.price_per_kg'
    )
  ) AS fruit_table;

Running this query will give you the following result:

id	fruit_name	fruit_color	fruit_taste	price_per_kg
1	Orange	Orange	Sweet	3.50
1	Lemon	Yellow	Sour	4.00

A Tale of Hidden Variables

Today was one of those frustrating moments that every developer encounters at some point. We were working on a Docker Compose setup and observed behavior that could only happen if a specific environment variable had been set. To ensure that this environment variable wasn’t being set I scoured through the Docker Compose file, checked the local environment variables using the export command, and grepped all the relevant files in the project directory. But no matter what I did, this environment variable was still haunting us, wreaking havoc on the setup.

After what felt like an eternity of troubleshooting, we finally uncovered the culprit: an old, hidden .env file left over from a long-forgotten configuration. This file had been silently setting the environment variable I was desperately trying to eliminate.

Here’s how it all unfolded and what I learned from the experience:

When I first suspected that the environment variable might be lurking somewhere in the project, my instinct was to use grep to search for it in all the files within my local directory. I ran something along the lines of:

grep -r 'MY_ENV_VAR' *

To my surprise, nothing relevant showed up. I had expected this command to search through everything in my local directory. However, I had forgotten one important detail: grep doesn’t search hidden files by default when you use *.

Since .env files are typically hidden (starting with a dot), grep completely skipped over them. Little did I know, that old .env file was sitting quietly in the background, setting the environment variable that was causing all my issues.

After some frustration, my colleague finally had the realization that there might be hidden files at play. In Unix-like operating systems, files that start with a dot (.), like .env, are treated as hidden and are not listed or searched by default with common commands. Just as hidden variables in physics could influence particles without being directly observable, the hidden .env file was affecting my environment variables without being immediately visible.

To include hidden files in your search, you need to modify the grep command to look for them explicitly:

grep -r 'MY_ENV_VAR' . --include=".*"

This experience led me to reflect on whether deployment-relevant files like .env should be hidden in the first place, since they can easily be overlooked during debugging. It also makes them more prone to being forgotten. Hidden files are easy to miss when troubleshooting, especially when you’re under pressure.

Given that .env files can have a significant impact on the behavior of applications, containerized setups, and CI/CD pipelines, making them hidden by default might not always be the best approach. After all, if an environment variable has the power to alter how an entire application runs, it’s something we want to be highly visible and readily accessible.

In the end, this experience taught me two important lessons:

Always search for hidden files when troubleshooting issues related to environment variables. If your Docker Compose or other environment-dependent setups aren’t behaving as expected, don’t forget to check for hidden .env files.

Consider the visibility of critical configuration files. Should .env files be hidden by default, or should they be treated as first-class citizens in our directory structures? In many cases, keeping them visible might help avoid unexpected behavior and wasted hours of debugging.

Time Intervals in SQL

SQL not only supports date and timestamp data types but also time intervals.

An interval in SQL represents a period of time, such as days, hours, minutes, or seconds. SQL supports intervals in two main types: INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND.

The two types

YEAR TO MONTH is used to represent a span of years and months. Examples include calculating ages, durations of contracts, or project timelines.

DAY TO SECOND is used to represent a span of days, hours, minutes, and seconds. Examples include scheduling events, logging activities, and tracking precise durations.

-- Represents 1 year and 6 months
INTERVAL '1-6' YEAR TO MONTH

-- Represents 5 days, 12 hours, 30 minutes, and 45 seconds
INTERVAL '5 12:30:45' DAY TO SECOND

This distinction may seem arbitrary at first, but it aligns with real-world use cases. An interval of “2 months” is clearly different from “60 days,” even though they might seem similar. The former is handled as a “YEAR TO MONTH” interval, and the latter as a “DAY TO SECOND” interval, making it clear how operations should be performed. Storing only the relevant units also reduces the amount of space required. For instance, storing a “YEAR TO MONTH” interval doesn’t need to allocate space for hours or seconds, thus saving storage.

Despite its name a “DAY TO SECOND” interval can also include fractional seconds:

INTERVAL '1 12:30:45.123' DAY TO SECOND

There are also shorter forms of interval literals if you only want to use one or several units of the interval:

For “YEAR TO MONTH”:

INTERVAL '3' YEAR   -- 3 years
INTERVAL '5' MONTH  -- 5 months

For “DAY TO SECOND”:

INTERVAL '2' DAY                -- 2 days
INTERVAL '5:30' HOUR TO MINUTE  -- 5 hours and 30 minutes
INTERVAL '15' MINUTE            -- 15 minutes
INTERVAL '30' SECOND            -- 30 seconds
INTERVAL '45.678' SECOND        -- 45.678 seconds

Note that the syntax always uses the singular forms of the unit words.

The two types can be used as data types for table columns:

CREATE TABLE interval_examples (
  id NUMBER,
  a INTERVAL YEAR TO MONTH,
  b INTERVAL DAY TO SECOND
);

Calculations with intervals

Intervals can be added to or subtracted from dates to calculate future or past dates.

-- Adding 10 days to the current date
CURRENT_DATE + INTERVAL '10' DAY

-- Subtracting 3 months from a specific date
DATE '2024-06-14' - INTERVAL '3' MONTH

It’s important to understand that intervals of type YEAR TO MONTH do not have a fixed duration. The number of days added to CURRENT_TIMESTAMP when you add an INTERVAL ‘1’ MONTH depends on the current month and year, as different months have different numbers of days.

Intervals can also be compared to each other with the usual operators such as =, <, <=, >, >=, <>.

Combining both types

Suppose you need to add an interval of 1 year, 2 months, 3 days, 4 hours, 5 minutes, and 6 seconds to the current timestamp. In this case you need to combine intervals of the two types:

SELECT (CURRENT_TIMESTAMP + INTERVAL '1-2' YEAR TO MONTH) + INTERVAL '3 04:05:06' DAY TO SECOND AS result;

Notes on PostgreSQL

PostgreSQL supports the above standard interval syntax, but it also supports a unified syntax for intervals with the following syntax:

INTERVAL '1 year 2 months'
INTERVAL '10 days 12 hours 30 minutes 45.123 seconds'
INTERVAL '1 year 2 months 3 days 4 hours 5 minutes 6 seconds'

In fact, in PostgreSQL, YEAR TO MONTH and DAY TO SECOND are not a distinct data type on their own; they are specializations of the broader INTERVAL data type.

It also supports multiplication and division of intervals:

-- Results in an interval of 6 days
SELECT INTERVAL '2 days' * 3 AS multiplied_interval;

-- Results in an interval of 2 hours
SELECT INTERVAL '3 hours' / 1.5 AS divided_interval;

Unveiling the secrets of invisible database columns

After my last blog post, where I wrote about Generated and Virtual Columns, I would like to dedicate this post to another type of database column: Invisible Columns. This feature exists in MySQL since version 8.0 and in Oracle Database since version 12c. PostgreSQL and MS SQL Server do not support this feature.

Invisible columns, as the name suggests, are columns within a table that are hidden from standard query results by default. Unlike traditional columns that are visible and accessible in query results, invisible columns are not included unless explicitly specified in the query.

This feature provides a level of control over data visibility, allowing developers to hide certain columns from applications or other database users while still retaining their functionality within the database.

Defining invisible columns

When creating a table in MySQL or Oracle, you can designate certain columns as invisible by using the INVISIBLE keyword in the column definition. For example:

CREATE TABLE your_table (
  visible_column   INT,
  invisible_column INT INVISIBLE
);

In this example, the invisible_column is marked as invisible, while the visible_column remains visible by default. To alter an existing table and make a column invisible:

ALTER TABLE your_table
  MODIFY COLUMN existing_column_name INVISIBLE;

Replace your_table with the name of your table and existing_column_name with the name of the column you want to make invisible.

When querying the your_table, the invisible column will not be included in the result set unless explicitly specified:

SELECT * FROM your_table;

visible_column
--------------
   4
   8
  15

By default, invisible columns are hidden from query results, providing a cleaner and more concise view of the data. However, developers can still access invisible columns when needed by explicitly including them in the query:

SELECT visible_column, invisible_column FROM your_table;

visible_column | invisible_column
---------------------------------
   4           |   16
   8           |   23
  15           |   42

Unveiling invisible columns

To list the invisible columns of a table in MySQL, you can query the information_schema.columns system table and filter the results based on the COLUMN_DEFAULT column. Invisible columns have NULL as their default value. Here’s a simple SQL query to accomplish this:

SELECT COLUMN_NAME
  FROM information_schema.columns
  WHERE TABLE_SCHEMA = 'your_database'
    AND TABLE_NAME = 'your_table'
    AND COLUMN_DEFAULT IS NULL;

In Oracle, you can query the USER_TAB_COLUMNS or ALL_TAB_COLUMNS data dictionary views to list the invisible columns of a table. Here’s how you can do it:

SELECT COLUMN_NAME
  FROM USER_TAB_COLUMNS
  WHERE TABLE_NAME = 'your_table'
    AND INVISIBLE = 'YES';

If you want to list invisible columns from all tables in the current schema, you can use the ALL_TAB_COLUMNS view instead:

SELECT TABLE_NAME, COLUMN_NAME
  FROM ALL_TAB_COLUMNS
  WHERE INVISIBLE = 'YES';

Are invisible columns actually useful?

Invisible columns can make schema evolution easier by providing a flexible mechanism for evolving database schemas over time without disrupting existing applications or queries. You can test new features or data structures without committing to them fully. Invisible columns provide a way to add experimental columns to your tables without exposing them to production environments until they are fully tested and ready for use.

They can create cleaner and more concise views of your data by hiding less relevant columns. This can make it easier for developers, analysts, and users to work with the data without unnecessary clutter. However, I would argue that this is also achievable with normal database views.

The downside of introducing invisible columns is that they add complexity to the database schema, which can make it harder to understand and maintain, especially for developers who are not familiar with the invisible columns feature. They also add potential for confusion: Developers may forget about the presence of invisible columns, leading to unexpected behavior in queries or applications.

You probably shouldn’t use them to hide sensitive data, since invisible columns don’t have any additional access control, and security through obscurity is not a good idea. If you grant SELECT permission on the table to a user, they will be able to query visible and invisible columns alike.

Now that you know about them, you can make your own choice.

Half table, half view: Generated Columns

Anyone familiar with SQL database basics knows the two fundamental structures of relational databases: tables and views. Data is stored in tables, while views are virtual tables calculated on-the-fly from a SQL query. Additionally, relational database management systems often support materialized views, which, like views, are based on a query from other tables, but their results are actually persisted and only recalculated as needed.

What many don’t know is that the most common SQL databases (PostgreSQL, MySQL, Oracle) nowadays also support something in between: we’re talking about Generated Columns, which will be introduced in this blog post.

So, what are Generated Columns? Generated Columns are columns within a normal database table. But unlike regular columns, their values are not stored as independent individual values; rather, they are computed from other values in the table.

Below is an example of how to define a Generated Column. The example is for PostgreSQL, but the syntax is similar in other popular relational database systems that support this feature.

CREATE TABLE products (
  id          SERIAL PRIMARY KEY,
  name        VARCHAR(100),
  quantity    INTEGER,
  unit_price  DECIMAL(10, 2),
  total_price DECIMAL(10, 2) GENERATED ALWAYS
                             AS (quantity * unit_price) STORED
);

As seen above, a Generated Column is defined with the keywords GENERATED ALWAYS AS. The GENERATED ALWAYS is even optional (you could just write AS), but it clarifies what it’s about. Following AS is the expression that computes the value.

At the end of the column definition, either the keyword STORED or VIRTUAL can be used. In the example above, it says STORED, which means the computed value is physically stored on the disk. The value is recalculated and stored only after an INSERT or UPDATE. In contrast, with VIRTUAL, the value is not stored but always computed on-the-fly. Thus, virtual Generated Columns behave similarly to a view, while STORED is more comparable to a materialized view.

The choice between the two options depends on the specific requirements. Stored Generated Columns consume more disk space, while virtual Generated Columns save space at the expense of performance.

In the expression following AS, other columns of the table can be referenced. Even other Generated Columns can be referenced, as long as they are specified in the table definition before the current column. However, SQL subqueries cannot be used in the expression.

In conclusion, Generated Columns are a useful feature that combines parts of a table with the benefits of a view.

SQL Database Window Functions

Window functions allow users to perform calculations across a set of rows that are somehow related to the current row. This can include calculations like running totals, moving averages, and ranking without the need to group the entire query into one aggregate result.

Despite their flexibility, window functions are sometimes underutilised, either because users are unaware of them or because they’re considered too complex for everyday tasks. Learning how to effectively use window functions can improve the efficiency and readability of SQL queries, particularly for reporting and data analysis purposes. This article will explore several use cases.

Numbering Rows

The simplest application area for window functions is the numbering of rows. The ROW_NUMBER() function assigns a unique number to each row within the partition of a result set. The numbering is sequential and starts at 1. It’s useful for creating a unique identifier for rows within a partition, even when the rows are identical in terms of data.

Consider the following database table of library checkouts:

book	checkout_date	member_id
The Great Adventure	2024-02-15	102
The Great Adventure	2024-01-10	105
Mystery of the Seas	2024-01-20	103
Mystery of the Seas	2024-03-01	101
Journey Through Time	2024-02-01	104
Journey Through Time	2024-02-18	102

We want to assign a unique row number to each checkout instance for every book, ordered by the checkout date to analyze the circulation trend:

SELECT
  book,
  checkout_date,
  member_id,
  ROW_NUMBER() OVER (PARTITION BY book ORDER BY checkout_date) AS checkout_order
FROM library_checkouts;

The result:

book	checkout_date	member_id	checkout_order
The Great Adventure	2024-01-10	105	1
The Great Adventure	2024-02-15	102	2
Mystery of the Seas	2024-01-20	103	1
Mystery of the Seas	2024-03-01	101	2
Journey Through Time	2024-02-01	104	1
Journey Through Time	2024-02-18	102	2

Ranking

In the context of SQL and specifically regarding window functions, “ranking” refers to the process of assigning a unique position or rank to each row within a partition of a result set based on a specified ordering.

The RANK() function provides a ranking for each row within a partition, with gaps in the ranking sequence when there are ties. It’s useful for ranking items that have the same value.

Consider the following database table of scores in a game tournament:

player	game	score
Alice	Space Invaders	4200
Bob	Space Invaders	5700
Charlie	Space Invaders	5700
Dana	Donkey Kong	6000
Eve	Donkey Kong	4800
Frank	Donkey Kong	6000
Alice	Asteroids	8500
Bob	Asteroids	9300
Charlie	Asteroids	7600

We want to rank the players within each game based on their score, with gaps in rank for ties:

SELECT
 player, 
 game, 
  score,
  RANK() OVER (PARTITION BY game ORDER BY score DESC) AS rank
FROM scores;

The result looks like this:

player	game	score	rank
Bob	Space Invaders	5700	1
Charlie	Space Invaders	5700	1
Alice	Space Invaders	4200	3
Dana	Donkey Kong	6000	1
Frank	Donkey Kong	6000	1
Eve	Donkey Kong	4800	3
Bob	Asteroids	9300	1
Alice	Asteroids	8500	2
Charlie	Asteroids	7600	3

If you don’t want to have gaps in the ranking sequence when there are ties, you can substitute DENSE_RANK() for RANK().

Cumulative Sum

The SUM() function can be used as a window function to calculate the cumulative sum of a column over a partition of rows.

Example: We are tracking our garden’s vegetable harvest in a database table, and we want to calculate the cumulative yield for each type of vegetable over the harvesting season.

vegetable	harvest_date	yield_kg
Carrots	2024-06-18	10
Carrots	2024-07-10	15
Tomatos	2024-06-15	20
Tomatos	2024-07-01	30
Tomatos	2024-07-20	25
Zucchini	2024-06-20	15
Zucchini	2024-07-05	20

We calculate the running total (cumulative yield) for each vegetable type as the season progresses, using the SUM() function:

SELECT
 vegetable,
 harvest_date,
 yield_kg,
 SUM(yield_kg) OVER (PARTITION BY vegetable ORDER BY harvest_date ASC) AS cumulative_yield
FROM garden_harvest;

Now we can see which vegetables are most productive and how yield accumulates throughout the season:

vegetable	harvest_date	yield_kg	cumulative_yield
Carrots	2024-06-18	10	10
Carrots	2024-07-10	15	25
Tomatos	2024-06-15	20	20
Tomatos	2024-07-01	30	50
Tomatos	2024-07-20	25	75
Zucchini	2024-06-20	15	15
Zucchini	2024-07-05	20	35

Time travel with Oracle database’s Flashback Queries

Oracle’s database management system offers a feature known as Flashback Queries, allowing users to peek into the past and retrieve data as it existed at a previous point in time. This functionality can eliminate the need for manual data restoration from backups, making it a useful asset for both developers and database administrators.

Enabling Flashback Queries

Before using Flashback Queries, ensure that the database has the required configuration. Firstly, confirm that the database’s DB_FLASHBACK_RETENTION_TARGET parameter is appropriately set. This parameter defines the period for which historical data is retained. Adjust it based on your organization’s data retention policies. Before making changes, you can check its current value:

SHOW PARAMETER DB_FLASHBACK_RETENTION_TARGET;

Use the ALTER SYSTEM command to set the parameter. For example, to set it to retain data for 7 days:

ALTER SYSTEM SET DB_FLASHBACK_RETENTION_TARGET=10080 SCOPE=BOTH;

10080 is the retention period in minutes (7 days × 24 hours × 60 minutes). Please note that setting the parameter to a higher value consumes more space in the flashback recovery area, so consider your storage constraints. SCOPE=BOTH ensures that the change persists across database restarts, i.e. it changes the value both in memory and in the server parameter file.

To enable Flashback Queries for a specific table, execute the ALTER TABLE command with the FLASHBACK option:

ALTER TABLE table_name FLASHBACK ARCHIVE;

This setup allows Oracle to maintain historical changes for the specified table.

Using Flashback Queries

Consider a scenario where an employee accidentally updates critical data in the employees table. With Flashback Queries, you can rectify the mistake:

SELECT * FROM employees AS OF TIMESTAMP TO_TIMESTAMP('2023-11-10 15:00:00', 'YYYY-MM-DD HH24:MI:SS');

This query retrieves the data from the employees table as it existed before the erroneous update.

You can also recover dropped tables if they are still within the retention period:

FLASHBACK TABLE orders TO BEFORE DROP;

This command restores the dropped table and its data.

Flashback Queries offer a mechanism to navigate through time within a database, providing a simple way to recover historical data or inspect changes. They stand as a useful asset in the arsenal of database administrators and developers, fostering greater confidence in managing data.

PostgreSQL’s auto-explain feature and execution plans

PostgreSQL’s auto-explain is a built-in feature that automatically generates and logs execution plans for SQL statements. It’s a useful tool for developers to understand how the query planner is executing SQL queries.

You enable and configure auto-explain by setting parameters in the PostgreSQL configuration file (postgresql.conf). Set auto_explain.log_analyze to on to log execution plans along with statistics, and set auto_explain.log_min_duration to specify the minimum execution time in milliseconds that a query must take to be logged. For example, if you want to log queries taking longer than 100 milliseconds, set it to 100. Set auto_explain.log_buffers to on if you want to include information about memory usage, and auto_explain.log_timing to log timing information.

Here’s an example of how to configure these parameters in postgresql.conf:

auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_min_duration = 100

Reading the execution plan

Suppose you have a “recipe database” that stores recipes, ingredients, and chefs. You want to retrieve a list of recipes along with the names of the chefs who created them and the ingredients they contain. Here’s a query that accomplishes this:

SELECT recipes.recipe_name, chefs.chef_name, ingredients.ingredient_name
  FROM recipes
  JOIN chefs ON recipes.chef_id=chefs.chef_id
  JOIN recipe_ingredients ON recipes.recipe_id=recipe_ingredients.recipe_id
  JOIN ingredients ON recipe_ingredients.ingredient_id=ingredients.ingredient_id
WHERE recipes.cuisine='Italian';

This query fetches Italian recipes, their respective chefs, and the ingredients they use.

When you run this query with auto-explain enabled, PostgreSQL will log the execution plan. The query plan might look something like this:

Hash Join  (cost=100.25..350.75 rows=50 width=96)
  Hash Cond: (recipe_ingredients.recipe_id = recipes.recipe_id)
  ->  Hash Join  (cost=50.12..200.37 rows=50 width=60)
        Hash Cond: (recipes.chef_id = chefs.chef_id)
        ->  Seq Scan on recipes  (cost=0.00..100.00 rows=50 width=24)
              Filter: (cuisine = 'Italian'::text)
        ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
              ->  Seq Scan on chefs  (cost=0.00..30.00 rows=1000 width=36)
  ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
        ->  Seq Scan on recipe_ingredients  (cost=0.00..30.00 rows=1000 width=36)
              Filter: (recipe_id IS NOT NULL)

In this query plan Hash Join indicates a join operation using a hash-based algorithm. Seq Scan signifies a sequential scan of the table, which might imply a full table scan. Hash Cond shows the join condition for the respective hash join.

cost represents the estimated execution cost for each operation, and rows indicates the estimated number of rows returned by each operation.

The estimated cost in PostgreSQL query execution plans is typically represented in an abstract unit known as “cost units.” These cost units are used for relative cost estimation and are not expressed in any specific real-world measurement like time or money. They are designed to provide a relative measure of the cost of different query plan operations so that the query planner can make informed decisions about which plan to choose.

Reading this plan, PostgreSQL starts by filtering Italian recipes (a Seq Scan with a filter). It then joins the recipes with chefs using a hash join, and the result is further joined with ingredients using another hash join. The cost values provide relative estimates of resource usage, allowing you to identify potentially expensive parts of the query, and you can consider improving the performance of the SQL statement with optimisations like indexing.

	mariuselvert on C# is very strict about modify…
	Anonymous on C# is very strict about modify…
	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…