Scheduling Jobs in Oracle Database for Reliable Background Tasks

As a software developer, you often write code that must run reliably in the background: data cleanup, periodic recalculations, imports, exports, and batch processing. Pushing this responsibility to external scripts or application-level schedulers is one way to do it, but it can add complexity and increase failure points.

Oracle’s job scheduling feature lets you move this logic into the database, close to the data it operates on, using plain SQL and PL/SQL. The result is simpler code, fewer moving parts, and background tasks that run predictably without constant supervision.

The Scheduler

This feature is provided by the DBMS_SCHEDULER package.

A basic example is a daily cleanup task. Suppose you have a table called ORDERS and you want to remove records older than five years every night. First, you create a stored procedure that performs the cleanup.

CREATE OR REPLACE PROCEDURE cleanup_old_orders IS
BEGIN
  DELETE FROM orders
  WHERE order_date < ADD_MONTHS(SYSDATE, -60);
  COMMIT;
END;
/

Next, you create a scheduler job that runs this procedure every day at 2 a.m.:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'cleanup_old_orders_job',
    job_type        => 'STORED_PROCEDURE',
    job_action      => 'CLEANUP_OLD_ORDERS',
    start_date      => TIMESTAMP '2026-01-01 02:00:00',
    repeat_interval => 'FREQ=DAILY;BYHOUR=2;BYMINUTE=0;BYSECOND=0',
    enabled         => TRUE
  );
END;
/

Once enabled, Oracle runs this job automatically every night.

Another common example is running a job at short, repeating intervals. For instance, you may want to refresh a summary table every 10 minutes:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name        => 'refresh_sales_summary_job',
    job_type        => 'PLSQL_BLOCK',
    job_action      => '
    BEGIN
      refresh_sales_summary;
    END;',
    repeat_interval => 'FREQ=MINUTELY;INTERVAL=10',
    enabled         => TRUE
  );
END;
/

The scheduler can also be used to run jobs only once. For example, you might need to perform a one-time data fix during a maintenance window:

BEGIN
  DBMS_SCHEDULER.create_job (
    job_name   => 'one_time_data_fix_job',
    job_type   => 'PLSQL_BLOCK',
    job_action => '
     BEGIN
       update_customer_status;
     END;',
    start_date => TIMESTAMP '2026-03-15 23:00:00',
    enabled    => TRUE
  );
END;
/
Required privileges

In addition to defining jobs, it is important to understand the required privileges. To create and manage scheduler jobs, a user needs the CREATE JOB privilege. To create jobs in another schema, CREATE ANY JOB is required. Running jobs that use DBMS_SCHEDULER also requires EXECUTE privilege on the DBMS_SCHEDULER package.

If a job runs a stored procedure, the job owner must have direct privileges on the objects used by that procedure, not privileges granted through roles. For example, if a job deletes rows from the ORDERS table, the job owner must have a direct DELETE grant on that table. For external jobs, additional privileges such as CREATE EXTERNAL JOB and specific operating system credentials are required.

These rules ensure that jobs run securely and only perform actions explicitly allowed by the database administrator.

Monitoring

For monitoring, Oracle DB stores job execution details in system views. You can check whether a job is enabled and when it last ran with a simple query:

SELECT job_name, enabled, last_start_date, last_run_duration
  FROM  dba_scheduler_jobs
  WHERE job_name = 'CLEANUP_OLD_ORDERS_JOB';

To see errors and execution history, you can query the job run details:

SELECT job_name, status, actual_start_date, run_duration, error#
  FROM  dba_scheduler_job_run_details
  WHERE job_name = 'CLEANUP_OLD_ORDERS_JOB'
  ORDER BY actual_start_date DESC;

These examples show how Oracle job scheduling works in practice. You define the SQL or PL/SQL logic, attach it to a schedule, and let the database handle execution and logging. This approach keeps automation close to the data, reduces manual intervention, and makes recurring tasks easier to manage and troubleshoot.

Common SQL Performance Gotchas in Application Development

When building apps that use a SQL database, it’s easy to run into performance problems without noticing. Many of these issues come from the way queries are written and used in the code. Below are seven common SQL mistakes developers make, why they happen, and how you can avoid them.

Not Using Prepared Statements

One of the most common mistakes is building SQL queries by concatenating strings. This approach not only introduces the risk of SQL injection but also prevents the database from reusing execution plans. Prepared statements or parameterized queries let the database understand the structure of the query ahead of time, which improves performance and security. They also help avoid subtle bugs caused by incorrect string formatting or escaping.

// Vulnerable and inefficient
String userId = "42";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users WHERE id = " + userId);
// Safe and performant
String sql = "SELECT * FROM users WHERE id = ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 42);
ResultSet rs = ps.executeQuery();

The N+1 Query Problem

The N+1 problem happens when an application fetches a list of items and then runs a separate query for each item to retrieve related data. For example, fetching a list of users and then querying each user’s posts in a loop. This results in one query to fetch the users and N additional queries for their posts. The fix is to restructure the query using joins or batch-fetching strategies, so all the data can be retrieved in fewer queries.

We have written about it on our blog before: Understanding, identifying and fixing the N+1 query problem

Missing Indexes

When queries filter or join on columns that do not have indexes, the database may need to scan entire tables to find matching rows. This can be very slow, especially as data grows. Adding the right indexes can drastically improve performance. It’s important to monitor slow queries and check whether indexes exist on the columns used in WHERE clauses, JOINs, and ORDER BY clauses.

Here’s how to create an index on an “orders” table for its “customer_id” column:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Once the index is added, the query can efficiently find matching rows without scanning the full table.

Retrieving Too Much Data

Using SELECT * to fetch all columns from a table is a common habit, but it often retrieves more data than the application needs. This can increase network load and memory usage. Similarly, not using pagination when retrieving large result sets can lead to long query times and a poor user experience. Always select only the necessary columns and use LIMIT or OFFSET clauses to manage result size.

For example:

String sql = "SELECT id, name, price FROM products LIMIT ? OFFSET ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 50);
ps.setInt(2, 0);
ResultSet rs = ps.executeQuery();

Chatty Database Interactions

Some applications make many small queries in a single request cycle, creating high overhead from repeated database access. Each round-trip to the database introduces latency. Here’s an inefficient example:

for (int id : productIds) {
    PreparedStatement ps = connection.prepareStatement(
        "UPDATE products SET price = price * 1.1 WHERE id = ?"
    );
    ps.setInt(1, id);
    ps.executeUpdate();
}

Instead of issuing separate queries, it’s often better to combine them or use batch operations where possible. This reduces the number of database interactions and improves overall throughput:

PreparedStatement ps = connection.prepareStatement(
    "UPDATE products SET price = price * 1.1 WHERE id = ?"
);

for (int id : productIds) {
    ps.setInt(1, id);
    ps.addBatch();
}
ps.executeBatch();

Improper Connection Pooling

Establishing a new connection to the database for every query or request is slow and resource-intensive. Connection pooling allows applications to reuse database connections, avoiding the cost of repeatedly opening and closing them. Applications that do not use pooling efficiently may suffer from connection exhaustion or high latency under load. To avoid this use a connection pooler and configure it with appropriate limits for the workload.

Unbounded Wildcard Searches

Using wildcard searches with patterns like '%term%' in a WHERE clause causes the database to scan the entire table, because indexes cannot be used effectively. These searches are expensive and scale poorly. To handle partial matches more efficiently, consider using full-text search features provided by the database, which are designed for fast text searching. Here’s an example in PosgreSQL:

SELECT * FROM articles
WHERE to_tsvector('english', title) @@ to_tsquery('database');

One of our previous blog posts dives deeper into this topic: Full-text Search with PostgreSQL.

By being mindful of these common pitfalls, you can write SQL that scales well and performs reliably under load. Good database performance isn’t just about writing correct queries – it’s about writing efficient ones.

Have you faced any of these problems before? Every project is different, and we all learn a lot from the challenges we run into. Feel free to share your experiences or tips in the comments. Your story could help someone else improve their app’s performance too.

Oracle and the materialized view update

Materialized views are powerful. They give us precomputed, queryable snapshots of expensive joins and aggregations. But the moment you start layering other views on top of them, you enter tricky territory.

The Scenario

You define a materialized view to speed up a reporting query. Soon after, others discover it and start building new views on top of it. The structure spreads.

Now imagine: you need to extend the base materialized view. Maybe add a column, or adjust its definition. That’s when the trouble starts.

The Problem

Unlike regular views, materialized views don’t offer a convenient CREATE OR REPLACE. You can’t just adjust the definition in place. Oracle also doesn’t allow a simple ALTER to add a column or tweak the structure—recreating the materialized views is often the only option.

Things get even more complicated when other views depend on your materialized view. In that case, Oracle won’t even let you drop it. Instead, you’re greeted with an error about dependent objects, leaving you stuck in a dependency lock-in.

The more dependencies there are, the more brittle the setup becomes. What started as a performance optimization can lock you into a rigid structure that resists change.

As a short example, let’s look at how other databases handle this scenario. In Postgres, you can drop a materialized view even if other views depend on it. The dependent views temporarily lose their base and will fail if queried, but you won’t get an error on the drop. Once you recreate the materialized view with the same name and structure, the dependent views automatically start working again.

What to Do?

That is the hard question. Sometimes you can try to hide materialized views behind stable views. Or you take the SQL of all dependent views, drop them, change the materialized view, and then recreate all dependent views— a process that can be a huge pain.

How do you manage changes to materialized views that already have dependent views stacked on top? Do you design around it, fight with rebuild scripts every time, or have another solution?

Tuning Without Dropping: Oracle’s Invisible Indexes

When tuning database performance, removing unused indexes can help reduce write overhead and improve efficiency. But dropping an index outright is risky, especially in production systems, because it’s hard to know for sure whether it’s still needed. Oracle Database offers a practical solution: invisible indexes. This feature allows you to hide an index from the optimizer without deleting it, giving you a safe way to test and fine-tune your indexing strategy.

An invisible index behaves like a regular index in most respects. It is maintained during inserts, updates, and deletes. It consumes storage and has a presence in the data dictionary. However, it is ignored by the optimizer when generating execution plans for SQL statements unless explicitly instructed to consider it.

Creating an Invisible Index

To define an index as invisible at creation time, the INVISIBLE keyword is used:

CREATE INDEX idx_salary ON employees(salary) INVISIBLE;

In this example, the index on the salary column will be maintained, but Oracle’s optimizer will not consider it when evaluating execution plans.

Making an Existing Index Invisible

You can also alter an existing index to become invisible:

ALTER INDEX idx_salary INVISIBLE;

To revert the change and make the index visible again:

ALTER INDEX idx_salary VISIBLE;

This change is instantaneous and does not require the index to be rebuilt. It is also fully reversible.

Verifying Index Visibility

To check the visibility status of an index, query the DBA_INDEXES or USER_INDEXES data dictionary view:

SELECT index_name, visibility
  FROM user_indexes
  WHERE table_name = 'EMPLOYEES';

This will show whether each index is VISIBLE or INVISIBLE.

Forcing the Optimizer to Use Invisible Indexes

By default, the optimizer does not use invisible indexes. However, you can enable their use in a specific session by setting the following parameter:

ALTER SESSION SET OPTIMIZER_USE_INVISIBLE_INDEXES = TRUE;

With this setting in place, the optimizer will consider invisible indexes as if they were visible. This is particularly useful when testing query performance with and without a specific index.

Alternatively, you can use a SQL hint to explicitly direct the optimizer to use a specific index, even if it is invisible:

SELECT /*+ INDEX(employees idx_salary) */ *
  FROM employees
  WHERE salary > 100000;

This gives fine-grained control over execution plans without changing global settings or making the index permanently visible.

Use Cases for Invisible Indexes

Invisible indexes are helpful in scenarios where performance needs to be tested under different indexing strategies. For example, if you suspect that an index is unused or causing performance issues, you can make it invisible and observe how queries behave without it. This avoids the risk of dropping an index that might still be needed.

Invisible indexes also provide a safe way to prepare for index removal in production systems. If no queries rely on the index while it is invisible, it is likely safe to drop it later.

They can also be used for temporarily disabling indexes during bulk data loads, without affecting the application logic that relies on the schema.

Deferred Constraints in Oracle DB

Foreign key constraints are like rules in your Oracle database that make sure data is linked properly between tables. For example, you can’t add an order for a customer who doesn’t exist – that’s the kind of thing a foreign key will stop. They help enforce data integrity by ensuring that relationships between tables remain consistent. But hidden in the toolbox of Oracle Database is a lesser-known trick: deferred foreign key constraints.

What Are Deferred Constraints?

By default, when you insert or update data that violates a foreign key constraint, Oracle will throw an error immediately. That’s immediate constraint checking.

But with deferred constraints, Oracle lets you temporarily violate a constraint during a transaction – as long as the constraint is satisfied by the time the transaction is committed.

Here’s how you make a foreign key deferrable:

ALTER TABLE orders
  ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id)
  REFERENCES customers(customer_id)
  DEFERRABLE INITIALLY DEFERRED;

That last part – DEFERRABLE INITIALLY DEFERRED – is the secret sauce. Now, the constraint check for fk_orders_customer is deferred until the COMMIT.

Use Cases

Let’s look at a few situations where this is really helpful.

One use case are circular references between tables. Say you have two tables: one for employees, one for departments. Each employee belongs to a department. But each department also has a manager – who is an employee. You end up in a “chicken and egg” situation. Which do you insert first? With deferred constraints, it doesn’t matter – you can insert them in any order, and Oracle will only check everything after you’re done.

Another use case is the bulk import of data. If you’re importing a bunch of data (like copying from another system), it can be really hard to insert things in the perfect order to keep all the foreign key rules happy. Deferred constraints let you just insert everything, then validate it all at the end with one COMMIT.

Deferred constraints also help when dealing with temporary incomplete data: Let’s say your application creates a draft invoice before all the customer info is ready. Normally, this would break a foreign key rule. But if the constraint is deferred, Oracle gives you time to finish adding all the pieces before checking.

Caution

Using deferred constraints recklessly can lead to runtime surprises. Imagine writing a huge batch job that appears to work fine… until it crashes at COMMIT with a constraint violation error – rolling back the entire transaction. So only defer constraints when you really need to.

One last tip

If you want to check if a constraint is deferrable in your database you can use the following SQL query:

SELECT constraint_name, deferrable, deferred
  FROM user_constraints
 WHERE table_name='ORDERS';

Exploring Dynamic SQL in Oracle: Counting Specific Values Across Multiple Tables

Imagine you have a large database where multiple tables contain a column named BOOK_ID. Perhaps you’re tasked with finding how many times a particular book (say, with an ID of 12345) appears across all these tables. How would you approach this?

In this post, I’ll explain a small piece of Oracle PL/SQL code that uses dynamic SQL to search for a specific value in any table that has a column with a specific name.

Finding the relevant tables

First, we need to determine which tables in the schema have a column with the desired name. To do this, we must look at a metadata table or view that contains information about all the tables in the schema and their columns. Most database systems offer such metadata tables, although their names and their structures vary greatly between different systems. In Oracle, the metadata table relevant to our task is called DBA_TAB_COLUMNS. Therefore, to find the names of all tables that contain a column BOOK_ID, you can use the following query:

SELECT table_name 
  FROM dba_tab_columns 
  WHERE column_name = 'BOOK_ID';

The output might be:

TABLE_NAME
-----------------
BOOKS
LIBRARY_BOOKS
ARCHIVED_BOOKS
TEMP_BOOKS
OLD_BOOKS

Looping through the tables

Now we want to loop through these tables in order to execute an SQL query for each of them. In Oracle we use a PL/SQL FOR loop to do this:

BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    -- do something for each record
  END LOOP;
END;
/

Dynamic SQL Construction

We can use the loop variable rec to dynamically create an SQL statement as a string by using the string concatenation operator || and assign it to a variable, in this case v_sql:

v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

The :val part is a placeholder that will become relevant later.

Of course, the variable needs to be declared for the PL/SQL code block first, so we add a DECLARE section:

DECLARE
  v_sql VARCHAR2(4000);
BEGIN
  -- ...
END;

How do we execute the SQL statement that we stored in the variable? By using the EXECUTE IMMEDIATE statement and two other variables; let’s call them v_result and v_value:

EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;

This will execute the SQL in v_sql, replace the :val placeholder by the value in v_value, and store the result in v_result. The latter will capture the result of our dynamic query, which is the count of occurrences.

Of course, we have to declare these two variables as well. We’ll set v_value to the book ID we are looking for. The whole code so far is:

DECLARE
  v_sql VARCHAR2(4000);
  v_value NUMBER := 12345;  -- Value to search for
  v_result VARCHAR2(4000);
BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

    EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
  END LOOP;
END;
/

Printing the results

If we execute the code above, we might be a little disappointed because it is accepted and executed without any errors, but nothing is printed out. How can we see the results? For that, we need to include DBMS_OUTPUT.PUT_LINE calls:

DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);

But how do we handle the cases if no record was found or if there was an error in the SQL query? We’ll wrap it in an EXCEPTION handling block:

BEGIN
  EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
  DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);
EXCEPTION
  WHEN NO_DATA_FOUND THEN
    NULL;  -- No matching rows found in this table
  WHEN OTHERS THEN
    DBMS_OUTPUT.PUT_LINE('Error in ' || rec.table_name || ': ' || SQLERRM);
END;

There’s still one thing to do. We first have to enable server output to see any of the printed lines:

SET SERVEROUTPUT ON;

This command ensures that any output generated by DBMS_OUTPUT.PUT_LINE will be displayed in your SQL*Plus or SQL Developer session.

Let’s put it all together. Here’s the full code:

SET SERVEROUTPUT ON;

DECLARE
  v_sql VARCHAR2(4000);
  v_value NUMBER := 12345;  -- Value to search for
  v_result VARCHAR2(4000);
BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

    BEGIN
      EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
      DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        NULL; -- No rows found in this table
      WHEN OTHERS THEN
        DBMS_OUTPUT.PUT_LINE('Error in ' || rec.table_name || ': ' || SQLERRM);
    END;
  END LOOP;
END;
/

Here’s what the output might look like when the block is executed:

Found in BOOKS: 5
Found in LIBRARY_BOOKS: 2
Found in ARCHIVED_BOOKS: 0

Alternatively, if one of the tables throws an error (for instance, due to a permissions issue or if the table doesn’t exist in the current schema), you might see an output like this:

Found in BOOKS: 5
Error in TEMP_BOOKS: ORA-00942: table or view does not exist
Found in LIBRARY_BOOKS: 2

Conclusion

Dynamic SQL is particularly useful when the structure of your query is not known until runtime. In this case, since the table names come from a data dictionary view (dba_tab_columns), the query must be constructed dynamically.

Instead of writing a separate query for each table, the above code automatically finds and processes every table with a BOOK_ID column. It works on any table with the right column, making it useful for large databases.

Building and running SQL statements on the fly allows you to handle tasks that are not possible with static SQL alone.

Inline and Implicit Foreign Key Constraints in SQL

Foreign key constraints are a key part of database design, ensuring that relationships between tables are consistent and reliable. They create a relationship between two tables, ensuring that data matches across them. For example, a column in an “Orders” table (like CustomerID) might reference a column in a “Customers” table. This guarantees that every order belongs to a valid customer.

In earlier versions of SQL systems, defining foreign key constraints often required separate ALTER TABLE statements after the table was created:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL,
  OrderDate  DATE
);

ALTER TABLE Orders
ADD CONSTRAINT FK_Customer FOREIGN KEY (CustomerID)
REFERENCES Customers(CustomerID);

This two-step process was prone to errors and required careful management to ensure all constraints were applied correctly.

Inline Foreign Key Constraints

Most of the popular SQL database systems – PostgreSQL, Oracle, SQL Server, and MySQL since version 9.0, released in July 2024 – now support inline foreign key constraints. This means you can define the relationship directly in the column definition, making table creation easier to read:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL REFERENCES Customers(CustomerID),
  OrderDate  DATE
);

Fortunately, this syntax is the same across these systems. However, MySQL 9 additionally supports implicit foreign key constraints:

CREATE TABLE Orders (
  OrderID    INT PRIMARY KEY,
  CustomerID INT NOT NULL REFERENCES Customers,
  OrderDate  DATE
);

By leaving out the (CustomerID) in the REFERENCES clause it will assume that you want to reference the primary key of the parent table. This syntax is unique to MySQL, and you should avoid it if you need to write SQL DDL statements that works across multiple database systems.

Time Intervals in SQL

SQL not only supports date and timestamp data types but also time intervals.

An interval in SQL represents a period of time, such as days, hours, minutes, or seconds. SQL supports intervals in two main types: INTERVAL YEAR TO MONTH and INTERVAL DAY TO SECOND.

The two types

YEAR TO MONTH is used to represent a span of years and months. Examples include calculating ages, durations of contracts, or project timelines.

DAY TO SECOND is used to represent a span of days, hours, minutes, and seconds. Examples include scheduling events, logging activities, and tracking precise durations.

-- Represents 1 year and 6 months
INTERVAL '1-6' YEAR TO MONTH

-- Represents 5 days, 12 hours, 30 minutes, and 45 seconds
INTERVAL '5 12:30:45' DAY TO SECOND

This distinction may seem arbitrary at first, but it aligns with real-world use cases. An interval of “2 months” is clearly different from “60 days,” even though they might seem similar. The former is handled as a “YEAR TO MONTH” interval, and the latter as a “DAY TO SECOND” interval, making it clear how operations should be performed. Storing only the relevant units also reduces the amount of space required. For instance, storing a “YEAR TO MONTH” interval doesn’t need to allocate space for hours or seconds, thus saving storage.

Despite its name a “DAY TO SECOND” interval can also include fractional seconds:

INTERVAL '1 12:30:45.123' DAY TO SECOND

There are also shorter forms of interval literals if you only want to use one or several units of the interval:

For “YEAR TO MONTH”:

INTERVAL '3' YEAR   -- 3 years
INTERVAL '5' MONTH  -- 5 months

For “DAY TO SECOND”:

INTERVAL '2' DAY                -- 2 days
INTERVAL '5:30' HOUR TO MINUTE  -- 5 hours and 30 minutes
INTERVAL '15' MINUTE            -- 15 minutes
INTERVAL '30' SECOND            -- 30 seconds
INTERVAL '45.678' SECOND        -- 45.678 seconds

Note that the syntax always uses the singular forms of the unit words.

The two types can be used as data types for table columns:

CREATE TABLE interval_examples (
  id NUMBER,
  a INTERVAL YEAR TO MONTH,
  b INTERVAL DAY TO SECOND
);

Calculations with intervals

Intervals can be added to or subtracted from dates to calculate future or past dates.

-- Adding 10 days to the current date
CURRENT_DATE + INTERVAL '10' DAY

-- Subtracting 3 months from a specific date
DATE '2024-06-14' - INTERVAL '3' MONTH

It’s important to understand that intervals of type YEAR TO MONTH do not have a fixed duration. The number of days added to CURRENT_TIMESTAMP when you add an INTERVAL ‘1’ MONTH depends on the current month and year, as different months have different numbers of days.

Intervals can also be compared to each other with the usual operators such as =, <, <=, >, >=, <>.

Combining both types

Suppose you need to add an interval of 1 year, 2 months, 3 days, 4 hours, 5 minutes, and 6 seconds to the current timestamp. In this case you need to combine intervals of the two types:

SELECT (CURRENT_TIMESTAMP + INTERVAL '1-2' YEAR TO MONTH) + INTERVAL '3 04:05:06' DAY TO SECOND AS result;

Notes on PostgreSQL

PostgreSQL supports the above standard interval syntax, but it also supports a unified syntax for intervals with the following syntax:

INTERVAL '1 year 2 months'
INTERVAL '10 days 12 hours 30 minutes 45.123 seconds'
INTERVAL '1 year 2 months 3 days 4 hours 5 minutes 6 seconds'

In fact, in PostgreSQL, YEAR TO MONTH and DAY TO SECOND are not a distinct data type on their own; they are specializations of the broader INTERVAL data type.

It also supports multiplication and division of intervals:

-- Results in an interval of 6 days
SELECT INTERVAL '2 days' * 3 AS multiplied_interval;

-- Results in an interval of 2 hours
SELECT INTERVAL '3 hours' / 1.5 AS divided_interval;

Half table, half view: Generated Columns

Anyone familiar with SQL database basics knows the two fundamental structures of relational databases: tables and views. Data is stored in tables, while views are virtual tables calculated on-the-fly from a SQL query. Additionally, relational database management systems often support materialized views, which, like views, are based on a query from other tables, but their results are actually persisted and only recalculated as needed.

What many don’t know is that the most common SQL databases (PostgreSQL, MySQL, Oracle) nowadays also support something in between: we’re talking about Generated Columns, which will be introduced in this blog post.

So, what are Generated Columns? Generated Columns are columns within a normal database table. But unlike regular columns, their values are not stored as independent individual values; rather, they are computed from other values in the table.

Below is an example of how to define a Generated Column. The example is for PostgreSQL, but the syntax is similar in other popular relational database systems that support this feature.

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
quantity INTEGER,
unit_price DECIMAL(10, 2),
total_price DECIMAL(10, 2) GENERATED ALWAYS
AS (quantity * unit_price) STORED
);

As seen above, a Generated Column is defined with the keywords GENERATED ALWAYS AS. The GENERATED ALWAYS is even optional (you could just write AS), but it clarifies what it’s about. Following AS is the expression that computes the value.

At the end of the column definition, either the keyword STORED or VIRTUAL can be used. In the example above, it says STORED, which means the computed value is physically stored on the disk. The value is recalculated and stored only after an INSERT or UPDATE. In contrast, with VIRTUAL, the value is not stored but always computed on-the-fly. Thus, virtual Generated Columns behave similarly to a view, while STORED is more comparable to a materialized view.

The choice between the two options depends on the specific requirements. Stored Generated Columns consume more disk space, while virtual Generated Columns save space at the expense of performance.

In the expression following AS, other columns of the table can be referenced. Even other Generated Columns can be referenced, as long as they are specified in the table definition before the current column. However, SQL subqueries cannot be used in the expression.

In conclusion, Generated Columns are a useful feature that combines parts of a table with the benefits of a view.

SQL Database Window Functions

Window functions allow users to perform calculations across a set of rows that are somehow related to the current row. This can include calculations like running totals, moving averages, and ranking without the need to group the entire query into one aggregate result.

Despite their flexibility, window functions are sometimes underutilised, either because users are unaware of them or because they’re considered too complex for everyday tasks. Learning how to effectively use window functions can improve the efficiency and readability of SQL queries, particularly for reporting and data analysis purposes. This article will explore several use cases.

Numbering Rows

The simplest application area for window functions is the numbering of rows. The ROW_NUMBER() function assigns a unique number to each row within the partition of a result set. The numbering is sequential and starts at 1. It’s useful for creating a unique identifier for rows within a partition, even when the rows are identical in terms of data.

Consider the following database table of library checkouts:

bookcheckout_datemember_id
The Great Adventure2024-02-15102
The Great Adventure2024-01-10105
Mystery of the Seas2024-01-20103
Mystery of the Seas2024-03-01101
Journey Through Time2024-02-01104
Journey Through Time2024-02-18102

We want to assign a unique row number to each checkout instance for every book, ordered by the checkout date to analyze the circulation trend:

SELECT
 book,
checkout_date,
member_id,
ROW_NUMBER() OVER (PARTITION BY book ORDER BY checkout_date) AS checkout_order
FROM library_checkouts;

The result:

bookcheckout_datemember_idcheckout_order
The Great Adventure2024-01-101051
The Great Adventure2024-02-151022
Mystery of the Seas2024-01-201031
Mystery of the Seas2024-03-011012
Journey Through Time2024-02-011041
Journey Through Time2024-02-181022

Ranking

In the context of SQL and specifically regarding window functions, “ranking” refers to the process of assigning a unique position or rank to each row within a partition of a result set based on a specified ordering.

The RANK() function provides a ranking for each row within a partition, with gaps in the ranking sequence when there are ties. It’s useful for ranking items that have the same value.

Consider the following database table of scores in a game tournament:

playergamescore
AliceSpace Invaders4200
BobSpace Invaders5700
CharlieSpace Invaders5700
DanaDonkey Kong6000
EveDonkey Kong4800
FrankDonkey Kong6000
AliceAsteroids8500
BobAsteroids9300
CharlieAsteroids7600

We want to rank the players within each game based on their score, with gaps in rank for ties:

SELECT
 player,
 game,
score,
RANK() OVER (PARTITION BY game ORDER BY score DESC) AS rank
FROM scores;

The result looks like this:

playergamescorerank
BobSpace Invaders57001
CharlieSpace Invaders57001
AliceSpace Invaders42003
DanaDonkey Kong60001
FrankDonkey Kong60001
EveDonkey Kong48003
BobAsteroids93001
AliceAsteroids85002
CharlieAsteroids76003

If you don’t want to have gaps in the ranking sequence when there are ties, you can substitute DENSE_RANK() for RANK().

Cumulative Sum

The SUM() function can be used as a window function to calculate the cumulative sum of a column over a partition of rows.

Example: We are tracking our garden’s vegetable harvest in a database table, and we want to calculate the cumulative yield for each type of vegetable over the harvesting season.

vegetableharvest_dateyield_kg
Carrots2024-06-1810
Carrots2024-07-1015
Tomatos2024-06-1520
Tomatos2024-07-0130
Tomatos2024-07-2025
Zucchini2024-06-2015
Zucchini2024-07-0520

We calculate the running total (cumulative yield) for each vegetable type as the season progresses, using the SUM() function:

SELECT
 vegetable,
harvest_date,
yield_kg,
SUM(yield_kg) OVER (PARTITION BY vegetable ORDER BY harvest_date ASC) AS cumulative_yield
FROM garden_harvest;

Now we can see which vegetables are most productive and how yield accumulates throughout the season:

vegetableharvest_dateyield_kgcumulative_yield
Carrots2024-06-181010
Carrots2024-07-101525
Tomatos2024-06-152020
Tomatos2024-07-013050
Tomatos2024-07-202575
Zucchini2024-06-201515
Zucchini2024-07-052035