Deferred Constraints in Oracle DB

Foreign key constraints are like rules in your Oracle database that make sure data is linked properly between tables. For example, you can’t add an order for a customer who doesn’t exist – that’s the kind of thing a foreign key will stop. They help enforce data integrity by ensuring that relationships between tables remain consistent. But hidden in the toolbox of Oracle Database is a lesser-known trick: deferred foreign key constraints.

What Are Deferred Constraints?

By default, when you insert or update data that violates a foreign key constraint, Oracle will throw an error immediately. That’s immediate constraint checking.

But with deferred constraints, Oracle lets you temporarily violate a constraint during a transaction – as long as the constraint is satisfied by the time the transaction is committed.

Here’s how you make a foreign key deferrable:

ALTER TABLE orders
  ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id)
  REFERENCES customers(customer_id)
  DEFERRABLE INITIALLY DEFERRED;

That last part – DEFERRABLE INITIALLY DEFERRED – is the secret sauce. Now, the constraint check for fk_orders_customer is deferred until the COMMIT.

Use Cases

Let’s look at a few situations where this is really helpful.

One use case are circular references between tables. Say you have two tables: one for employees, one for departments. Each employee belongs to a department. But each department also has a manager – who is an employee. You end up in a “chicken and egg” situation. Which do you insert first? With deferred constraints, it doesn’t matter – you can insert them in any order, and Oracle will only check everything after you’re done.

Another use case is the bulk import of data. If you’re importing a bunch of data (like copying from another system), it can be really hard to insert things in the perfect order to keep all the foreign key rules happy. Deferred constraints let you just insert everything, then validate it all at the end with one COMMIT.

Deferred constraints also help when dealing with temporary incomplete data: Let’s say your application creates a draft invoice before all the customer info is ready. Normally, this would break a foreign key rule. But if the constraint is deferred, Oracle gives you time to finish adding all the pieces before checking.

Caution

Using deferred constraints recklessly can lead to runtime surprises. Imagine writing a huge batch job that appears to work fine… until it crashes at COMMIT with a constraint violation error – rolling back the entire transaction. So only defer constraints when you really need to.

One last tip

If you want to check if a constraint is deferrable in your database you can use the following SQL query:

SELECT constraint_name, deferrable, deferred
  FROM user_constraints
 WHERE table_name='ORDERS';

Exploring Dynamic SQL in Oracle: Counting Specific Values Across Multiple Tables

Imagine you have a large database where multiple tables contain a column named BOOK_ID. Perhaps you’re tasked with finding how many times a particular book (say, with an ID of 12345) appears across all these tables. How would you approach this?

In this post, I’ll explain a small piece of Oracle PL/SQL code that uses dynamic SQL to search for a specific value in any table that has a column with a specific name.

Finding the relevant tables

First, we need to determine which tables in the schema have a column with the desired name. To do this, we must look at a metadata table or view that contains information about all the tables in the schema and their columns. Most database systems offer such metadata tables, although their names and their structures vary greatly between different systems. In Oracle, the metadata table relevant to our task is called DBA_TAB_COLUMNS. Therefore, to find the names of all tables that contain a column BOOK_ID, you can use the following query:

SELECT table_name 
  FROM dba_tab_columns 
  WHERE column_name = 'BOOK_ID';

The output might be:

TABLE_NAME
-----------------
BOOKS
LIBRARY_BOOKS
ARCHIVED_BOOKS
TEMP_BOOKS
OLD_BOOKS

Looping through the tables

Now we want to loop through these tables in order to execute an SQL query for each of them. In Oracle we use a PL/SQL FOR loop to do this:

BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    -- do something for each record
  END LOOP;
END;
/

Dynamic SQL Construction

We can use the loop variable rec to dynamically create an SQL statement as a string by using the string concatenation operator || and assign it to a variable, in this case v_sql:

v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

The :val part is a placeholder that will become relevant later.

Of course, the variable needs to be declared for the PL/SQL code block first, so we add a DECLARE section:

DECLARE
  v_sql VARCHAR2(4000);
BEGIN
  -- ...
END;

How do we execute the SQL statement that we stored in the variable? By using the EXECUTE IMMEDIATE statement and two other variables; let’s call them v_result and v_value:

EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;

This will execute the SQL in v_sql, replace the :val placeholder by the value in v_value, and store the result in v_result. The latter will capture the result of our dynamic query, which is the count of occurrences.

Of course, we have to declare these two variables as well. We’ll set v_value to the book ID we are looking for. The whole code so far is:

DECLARE
  v_sql VARCHAR2(4000);
  v_value NUMBER := 12345;  -- Value to search for
  v_result VARCHAR2(4000);
BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

    EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
  END LOOP;
END;
/

Printing the results

If we execute the code above, we might be a little disappointed because it is accepted and executed without any errors, but nothing is printed out. How can we see the results? For that, we need to include DBMS_OUTPUT.PUT_LINE calls:

DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);

But how do we handle the cases if no record was found or if there was an error in the SQL query? We’ll wrap it in an EXCEPTION handling block:

BEGIN
  EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
  DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);
EXCEPTION
  WHEN NO_DATA_FOUND THEN
    NULL;  -- No matching rows found in this table
  WHEN OTHERS THEN
    DBMS_OUTPUT.PUT_LINE('Error in ' || rec.table_name || ': ' || SQLERRM);
END;

There’s still one thing to do. We first have to enable server output to see any of the printed lines:

SET SERVEROUTPUT ON;

This command ensures that any output generated by DBMS_OUTPUT.PUT_LINE will be displayed in your SQL*Plus or SQL Developer session.

Let’s put it all together. Here’s the full code:

SET SERVEROUTPUT ON;

DECLARE
  v_sql VARCHAR2(4000);
  v_value NUMBER := 12345;  -- Value to search for
  v_result VARCHAR2(4000);
BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

    BEGIN
      EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
      DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        NULL; -- No rows found in this table
      WHEN OTHERS THEN
        DBMS_OUTPUT.PUT_LINE('Error in ' || rec.table_name || ': ' || SQLERRM);
    END;
  END LOOP;
END;
/

Here’s what the output might look like when the block is executed:

Found in BOOKS: 5
Found in LIBRARY_BOOKS: 2
Found in ARCHIVED_BOOKS: 0

Alternatively, if one of the tables throws an error (for instance, due to a permissions issue or if the table doesn’t exist in the current schema), you might see an output like this:

Found in BOOKS: 5
Error in TEMP_BOOKS: ORA-00942: table or view does not exist
Found in LIBRARY_BOOKS: 2

Conclusion

Dynamic SQL is particularly useful when the structure of your query is not known until runtime. In this case, since the table names come from a data dictionary view (dba_tab_columns), the query must be constructed dynamically.

Instead of writing a separate query for each table, the above code automatically finds and processes every table with a BOOK_ID column. It works on any table with the right column, making it useful for large databases.

Building and running SQL statements on the fly allows you to handle tasks that are not possible with static SQL alone.

Inline and Implicit Foreign Key Constraints in SQL

Foreign key constraints are a key part of database design, ensuring that relationships between tables are consistent and reliable. They create a relationship between two tables, ensuring that data matches across them. For example, a column in an “Orders” table (like CustomerID) might reference a column in a “Customers” table. This guarantees that every order belongs to a valid customer.

In earlier versions of SQL systems, defining foreign key constraints often required separate ALTER TABLE statements after the table was created:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL,
  OrderDate  DATE
);

ALTER TABLE Orders
ADD CONSTRAINT FK_Customer FOREIGN KEY (CustomerID)
REFERENCES Customers(CustomerID);

This two-step process was prone to errors and required careful management to ensure all constraints were applied correctly.

Inline Foreign Key Constraints

Most of the popular SQL database systems – PostgreSQL, Oracle, SQL Server, and MySQL since version 9.0, released in July 2024 – now support inline foreign key constraints. This means you can define the relationship directly in the column definition, making table creation easier to read:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL REFERENCES Customers(CustomerID),
  OrderDate  DATE
);

Fortunately, this syntax is the same across these systems. However, MySQL 9 additionally supports implicit foreign key constraints:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL REFERENCES Customers,
  OrderDate  DATE
);

By leaving out the (CustomerID) in the REFERENCES clause it will assume that you want to reference the primary key of the parent table. This syntax is unique to MySQL, and you should avoid it if you need to write SQL DDL statements that works across multiple database systems.

Dependent Subqueries and LATERAL in PostgreSQL

When working with databases, you often need to run a query that depends on results from another query. PostgreSQL offers two main ways to handle this: Dependent Subqueries and LATERAL joins.

A dependent subquery is like a mini-query inside another query. The inner query (subquery) depends on the outer query for its input. For every row in the outer query, the subquery runs separately.

Imagine you have two tables: customers, which holds information about customers (e.g., their id and name), and orders, which holds information about orders (e.g., customer_id, order_date, and amount). Now, you want to find the latest order date for each customer. You can use a dependent subquery like this:

SELECT id AS customer_id,
  name,
  (SELECT order_date
     FROM orders
    WHERE customer_id=customers.id
    ORDER BY order_date DESC
    LIMIT 1) AS latest_order_date
FROM customers;

For each customer in the customers table, the subquery looks for their orders in the orders table. The subquery sorts the orders by date (ORDER BY order_date DESC) and picks the most recent one (LIMIT 1).

This works, but it has a drawback: If you have many customers, this approach can be slow because the subquery runs once for every customer.

LATERAL join

A LATERAL join is a smarter way to solve the same problem. It allows you to write a subquery that depends on the outer query, but instead of repeating the subquery for every row, PostgreSQL handles it more efficiently.

Let’s solve the “latest order date” problem using LATERAL:

SELECT
  c.id AS customer_id,
  c.name,
  o.order_date AS latest_order_date
FROM customers c
LEFT JOIN LATERAL (
  SELECT order_date
   FROM orders
  WHERE orders.customer_id=c.id
  ORDER BY order_date DESC
  LIMIT 1
) o ON TRUE;

For each customer (c.id), the LATERAL subquery finds the latest order in the orders table. The LEFT JOIN ensures that customers with no orders are included in the results, with NULL for the latest_order_date.

It’s easier to read and often faster, especially for large datasets, because PostgreSQL optimizes it better.

Both dependent subqueries and LATERAL joins allow you to handle scenarios where one query depends on the results of another. Dependent subqueries are straightforward and good enough for simple tasks with small datasets. However, you should consider using LATERAL for large datasets where performance matters.

Working with JSON-DOM mapping in EntityFramework and PostgreSQL

A while ago, one of my colleagues covered JSON usage in PostgreSQL on the database level in two interesting blog posts (“Working with JSON data in PostgreSQL” and “JSON as a table in PostgreSQL 17”).

Today, I want to show the usage of JSON in EntityFramework with PostgreSQL as the database. We have an event sourcing application similar to the one in my colleagues first blog post written in C#/AspNetCore using EntityFramework Core (EF Core). Fortunately, EF Core and the PostgreSQL database driver have relatively easy to use JSON support.

You have essentially three options when working with JSON data and EF Core:

  1. Simple string
  2. EF owned entities
  3. System.Text.Json DOM types

Our event sourcing use case requires query support on the JSON data and the data has no stable and fixed schema, so the first two options are not really appealing. For more information on them, see the npgsql documentation.

Let us have a deeper look at the third option which suits our event sourcing use-case best.

Setup

The setup is ultra-simple. Just declare the relevant properties in your entities as JsonDocument and make them disposable:

public class Event : IDisposable
{
    public long Id { get; set; }

    public DateTime Date { get; set; }
    public string Type { get; set; }
    public JsonDocument Data { get; set; }
    public string Username { get; set; }
    public void Dispose() => Json?.Dispose();
}

Using dotnet ef migrate EventJsonSupport should generate changes for the database migrations and the database context. Now we are good to start querying and deserializing our JSON data.

Saving our events to the database does not require additional changes!

Writing queries using JSON properties

With this setup we can use JSON properties in our LINQ database queries like this:

var eventsForId = db.Events.Where(ev =>
  ev.Data.RootElement.GetProperty("payload").GetProperty("id").GetInt64() == id
)
.ToList();

Deserializing the JSON data

Now, that our entities contain JsonDocument (or JsonElement) properties, we can of course use the System.Text.Json API to create our own domain objects from the JSON data as we need it:

var eventData = event.Data.RootElement.GetProperty("payload");
return new HistoryEntry
{
    Timestamp = eventData.Date,
    Action = new Action
    {
        Id = eventData.GetProperty("id").GetInt64(),
        Action = eventData.GetProperty("action").GetString(),
    },
    Username = eventData.Username,
};

We could for example deserialize different domain object depending on the event type or deal with evolution of our JSON data over time to accomodate new features or refactorings on the data side.

Conclusion

Working with JSON data inside a classical application using an ORM and a relational database has become suprisingly easy and efficient. The times of fragile full-text queries using LIKE or similar stuff to find your data are over!

JSON as a table in PostgreSQL 17

Some time ago, I described in a blog post how to work with JSON data in a PostgreSQL database. Last month (September 2024), PostgreSQL 17 was released, which offers another feature for working with JSON data: the JSON_TABLE() function. Such a function already existed in other database systems, such as MySQL and Oracle.

The core idea behind JSON_TABLE() is to temporarily transform JSON data into a relational format that can be queried using standard SQL commands. In doing so, developers can apply the full power of SQL, such as filtering, sorting, aggregating, and joining, to data that originally exists in JSON format: It enables you to use SQL to query JSON data stored in a column as though it were relational data, and you can join JSON data with regular relational tables for a seamless mix of structured and semi-structured data.

Syntax

The syntax for JSON_TABLE looks a bit complex at first glance, but once you break it down, it becomes quite clear. Here’s the basic structure:

JSON_TABLE(
  json_doc, path
  COLUMNS (
    column_name type PATH 'path_to_value' [DEFAULT default_value ON ERROR],
    ...
  )
) AS alias_name

The json_doc is the JSON data you’re querying. It can be a JSON column or a JSON string. It is followed by a path expression, which describes the location in the JSON document that you want to process. Path expressions are specified as strings in single quotation marks and have a special syntax called JSONPath (loosely inspired by XPath for XML). For example, such an expression could look like this: '$.store.book[0].title'. The dollar sign represents the root of the JSON tree, the sub-properties are separated by dots, and array elements are referenced using square brackets.

This is followed by the COLUMNS keyword. It specifies the columns to be extracted from the JSON document. Each column is specified by a name of the column in the resulting table, a data type of the extracted value (e.g., VARCHAR, INT), followed by the PATH keyword and a JSON path expression that references a JSON value below the original path expression.

The DEFAULT keyword optionally provides a default value if the path does not exist or if there’s an error during extraction.

After the JSON_TABLE function call you can specify the alias name for the table with the AS keyword.

Example

It’s time for an example. Let’s assume we have a table named fruit_store with a column named json_details that holds the some JSON data about fruits:

INSERT INTO fruit_store (id, json_details) VALUES (
    1,
    '{
        "category": "Citrus",
        "fruits": [
            {
                "name": "Orange",
                "color": "Orange",
                "taste": "Sweet",
                "price_per_kg": 3.5
            },
            {
                "name": "Lemon",
                "color": "Yellow",
                "taste": "Sour",
                "price_per_kg": 4.0
            }
        ]
    }'
);

Now we can use the JSON_TABLE function to extract the details of each fruit from the JSON document. The goal is to retrieve the fruit name, color, taste, and price for each fruit in the array. Here’s the query:

SELECT *
  FROM fruit_store,
  JSON_TABLE(
    json_details, '$.fruits[*]'
    COLUMNS (
      fruit_name   VARCHAR(50)  PATH '$.name',
      fruit_color  VARCHAR(50)  PATH '$.color',
      fruit_taste  VARCHAR(50)  PATH '$.taste',
      price_per_kg DECIMAL(5,2) PATH '$.price_per_kg'
    )
  ) AS fruit_table;

Running this query will give you the following result:

idfruit_namefruit_colorfruit_tasteprice_per_kg
1OrangeOrangeSweet3.50
1LemonYellowSour4.00

Defeating TimescaleDB or shooting yourself in the foot

Today, almost everything produces periodically data: A car, cloud-connected smart-home solutions, your “smart” refrigerator and yourself using a fitness tracker. This data has to be stored in a certain way to be usable. We want fast access, beautiful charts and different aggregations over time.

There a several options both free and commercial for storing such time-series data like InfluxDB, Prometheus or TimescaleDB. They are optimised for this kind of data taking advantage of different time-series properties:

  • Storing is (mostly) append-only
  • Data points have a (strict) ordering in time
  • Data points often have fixed and variable meta data in addition to the data value itself

In one of our projects we have to store large quantities of data every 200ms. Knowing PostgreSQL as a powerful relational database management system we opted for TimescaleDB that promises time-series features for an SQL database.

Ingestion, probably the biggest problem of traditional (SQL) databases, of the data worked as expected from the beginning. We were able to insert new data at a constant rate without degrading performance.

The problem

However we got severe performance problems when querying data after some time…

So one of the key points of using a time-series database strangely did not work for us… Fortunately, since Timescale is only an extension to PostgreSQL we could use just use explain/explain analyse with our slow queries to find out what was going on.

It directly showed us that *all* chunks of our hypertable were queried instead of only the ones containing the data we were looking for. Checking our setup and hypertable definition showed that Timescale itself was working as expected and chunking the data correctly on the time axis.

After a bit more analysis and thought is was clear: The problem was our query! We used something like start_time <= to AND from <= end_time in our where clause to denote the time interval containing the requested data. Here start_time is the partition column of our hypertable.

This way we were forcing the database to look at all chunks from long ago until our end-time timestamp.

The solution

Ok, so we have to reformulate our where clause that only the relevant chunks are queried. Timescale can easily do this when we do something like start_time >= from AND start_time < to where from and to are the start and end of our desired interval. That way usually only one or only a few chunks of the hypertable are search and everything is lighning-fast even with billions of rows in our hypertable.

Of course the results of our two queries are not 100% the same sematically. But you can easily achieve the desired results by correctly calculation the start and end of the time-series data to fetch.

Conclusion

Time-series databases are powerful tools for todays data requirements. As with other tools you need some understanding of the underlying concepts and sometimes even implementation to not accidently defeat the purpose and features of the tools.

Used in the correct way they enable you to work with large amounts of data in a performant way todays users would expect. We had similar experiences with full text search engines like solr and ElasticSearch, too.

Time Intervals in SQL

SQL not only supports date and timestamp data types but also time intervals.

An interval in SQL represents a period of time, such as days, hours, minutes, or seconds. SQL supports intervals in two main types: INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND.

The two types

YEAR TO MONTH is used to represent a span of years and months. Examples include calculating ages, durations of contracts, or project timelines.

DAY TO SECOND is used to represent a span of days, hours, minutes, and seconds. Examples include scheduling events, logging activities, and tracking precise durations.

-- Represents 1 year and 6 months
INTERVAL '1-6' YEAR TO MONTH

-- Represents 5 days, 12 hours, 30 minutes, and 45 seconds
INTERVAL '5 12:30:45' DAY TO SECOND

This distinction may seem arbitrary at first, but it aligns with real-world use cases. An interval of “2 months” is clearly different from “60 days,” even though they might seem similar. The former is handled as a “YEAR TO MONTH” interval, and the latter as a “DAY TO SECOND” interval, making it clear how operations should be performed. Storing only the relevant units also reduces the amount of space required. For instance, storing a “YEAR TO MONTH” interval doesn’t need to allocate space for hours or seconds, thus saving storage.

Despite its name a “DAY TO SECOND” interval can also include fractional seconds:

INTERVAL '1 12:30:45.123' DAY TO SECOND

There are also shorter forms of interval literals if you only want to use one or several units of the interval:

For “YEAR TO MONTH”:

INTERVAL '3' YEAR   -- 3 years
INTERVAL '5' MONTH  -- 5 months

For “DAY TO SECOND”:

INTERVAL '2' DAY                -- 2 days
INTERVAL '5:30' HOUR TO MINUTE  -- 5 hours and 30 minutes
INTERVAL '15' MINUTE            -- 15 minutes
INTERVAL '30' SECOND            -- 30 seconds
INTERVAL '45.678' SECOND        -- 45.678 seconds

Note that the syntax always uses the singular forms of the unit words.

The two types can be used as data types for table columns:

CREATE TABLE interval_examples (
  id NUMBER,
  a INTERVAL YEAR TO MONTH,
  b INTERVAL DAY TO SECOND
);

Calculations with intervals

Intervals can be added to or subtracted from dates to calculate future or past dates.

-- Adding 10 days to the current date
CURRENT_DATE + INTERVAL '10' DAY

-- Subtracting 3 months from a specific date
DATE '2024-06-14' - INTERVAL '3' MONTH

It’s important to understand that intervals of type YEAR TO MONTH do not have a fixed duration. The number of days added to CURRENT_TIMESTAMP when you add an INTERVAL ‘1’ MONTH depends on the current month and year, as different months have different numbers of days.

Intervals can also be compared to each other with the usual operators such as =, <, <=, >, >=, <>.

Combining both types

Suppose you need to add an interval of 1 year, 2 months, 3 days, 4 hours, 5 minutes, and 6 seconds to the current timestamp. In this case you need to combine intervals of the two types:

SELECT (CURRENT_TIMESTAMP + INTERVAL '1-2' YEAR TO MONTH) + INTERVAL '3 04:05:06' DAY TO SECOND AS result;

Notes on PostgreSQL

PostgreSQL supports the above standard interval syntax, but it also supports a unified syntax for intervals with the following syntax:

INTERVAL '1 year 2 months'
INTERVAL '10 days 12 hours 30 minutes 45.123 seconds'
INTERVAL '1 year 2 months 3 days 4 hours 5 minutes 6 seconds'

In fact, in PostgreSQL, YEAR TO MONTH and DAY TO SECOND are not a distinct data type on their own; they are specializations of the broader INTERVAL data type.

It also supports multiplication and division of intervals:

-- Results in an interval of 6 days
SELECT INTERVAL '2 days' * 3 AS multiplied_interval;

-- Results in an interval of 2 hours
SELECT INTERVAL '3 hours' / 1.5 AS divided_interval;

Unveiling the secrets of invisible database columns

After my last blog post, where I wrote about Generated and Virtual Columns, I would like to dedicate this post to another type of database column: Invisible Columns. This feature exists in MySQL since version 8.0 and in Oracle Database since version 12c. PostgreSQL and MS SQL Server do not support this feature.

Invisible columns, as the name suggests, are columns within a table that are hidden from standard query results by default. Unlike traditional columns that are visible and accessible in query results, invisible columns are not included unless explicitly specified in the query.

This feature provides a level of control over data visibility, allowing developers to hide certain columns from applications or other database users while still retaining their functionality within the database.

Defining invisible columns

When creating a table in MySQL or Oracle, you can designate certain columns as invisible by using the INVISIBLE keyword in the column definition. For example:

CREATE TABLE your_table (
  visible_column   INT,
  invisible_column INT INVISIBLE
);

In this example, the invisible_column is marked as invisible, while the visible_column remains visible by default. To alter an existing table and make a column invisible:

ALTER TABLE your_table
  MODIFY COLUMN existing_column_name INVISIBLE;

Replace your_table with the name of your table and existing_column_name with the name of the column you want to make invisible.

When querying the your_table, the invisible column will not be included in the result set unless explicitly specified:

SELECT * FROM your_table;

visible_column
--------------
   4
   8
  15

By default, invisible columns are hidden from query results, providing a cleaner and more concise view of the data. However, developers can still access invisible columns when needed by explicitly including them in the query:

SELECT visible_column, invisible_column FROM your_table;

visible_column | invisible_column
---------------------------------
   4           |   16
   8           |   23
  15           |   42
Unveiling invisible columns

To list the invisible columns of a table in MySQL, you can query the information_schema.columns system table and filter the results based on the COLUMN_DEFAULT column. Invisible columns have NULL as their default value. Here’s a simple SQL query to accomplish this:

SELECT COLUMN_NAME
  FROM information_schema.columns
  WHERE TABLE_SCHEMA = 'your_database'
    AND TABLE_NAME = 'your_table'
    AND COLUMN_DEFAULT IS NULL;

In Oracle, you can query the USER_TAB_COLUMNS or ALL_TAB_COLUMNS data dictionary views to list the invisible columns of a table. Here’s how you can do it:

SELECT COLUMN_NAME
  FROM USER_TAB_COLUMNS
  WHERE TABLE_NAME = 'your_table'
    AND INVISIBLE = 'YES';

If you want to list invisible columns from all tables in the current schema, you can use the ALL_TAB_COLUMNS view instead:

SELECT TABLE_NAME, COLUMN_NAME
  FROM ALL_TAB_COLUMNS
  WHERE INVISIBLE = 'YES';
Are invisible columns actually useful?

Invisible columns can make schema evolution easier by providing a flexible mechanism for evolving database schemas over time without disrupting existing applications or queries. You can test new features or data structures without committing to them fully. Invisible columns provide a way to add experimental columns to your tables without exposing them to production environments until they are fully tested and ready for use.

They can create cleaner and more concise views of your data by hiding less relevant columns. This can make it easier for developers, analysts, and users to work with the data without unnecessary clutter. However, I would argue that this is also achievable with normal database views.

The downside of introducing invisible columns is that they add complexity to the database schema, which can make it harder to understand and maintain, especially for developers who are not familiar with the invisible columns feature. They also add potential for confusion: Developers may forget about the presence of invisible columns, leading to unexpected behavior in queries or applications.

You probably shouldn’t use them to hide sensitive data, since invisible columns don’t have any additional access control, and security through obscurity is not a good idea. If you grant SELECT permission on the table to a user, they will be able to query visible and invisible columns alike.

Now that you know about them, you can make your own choice.

Half table, half view: Generated Columns

Anyone familiar with SQL database basics knows the two fundamental structures of relational databases: tables and views. Data is stored in tables, while views are virtual tables calculated on-the-fly from a SQL query. Additionally, relational database management systems often support materialized views, which, like views, are based on a query from other tables, but their results are actually persisted and only recalculated as needed.

What many don’t know is that the most common SQL databases (PostgreSQL, MySQL, Oracle) nowadays also support something in between: we’re talking about Generated Columns, which will be introduced in this blog post.

So, what are Generated Columns? Generated Columns are columns within a normal database table. But unlike regular columns, their values are not stored as independent individual values; rather, they are computed from other values in the table.

Below is an example of how to define a Generated Column. The example is for PostgreSQL, but the syntax is similar in other popular relational database systems that support this feature.

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
quantity INTEGER,
unit_price DECIMAL(10, 2),
total_price DECIMAL(10, 2) GENERATED ALWAYS
AS (quantity * unit_price) STORED
);

As seen above, a Generated Column is defined with the keywords GENERATED ALWAYS AS. The GENERATED ALWAYS is even optional (you could just write AS), but it clarifies what it’s about. Following AS is the expression that computes the value.

At the end of the column definition, either the keyword STORED or VIRTUAL can be used. In the example above, it says STORED, which means the computed value is physically stored on the disk. The value is recalculated and stored only after an INSERT or UPDATE. In contrast, with VIRTUAL, the value is not stored but always computed on-the-fly. Thus, virtual Generated Columns behave similarly to a view, while STORED is more comparable to a materialized view.

The choice between the two options depends on the specific requirements. Stored Generated Columns consume more disk space, while virtual Generated Columns save space at the expense of performance.

In the expression following AS, other columns of the table can be referenced. Even other Generated Columns can be referenced, as long as they are specified in the table definition before the current column. However, SQL subqueries cannot be used in the expression.

In conclusion, Generated Columns are a useful feature that combines parts of a table with the benefits of a view.