Time travel with Oracle database’s Flashback Queries

Oracle’s database management system offers a feature known as Flashback Queries, allowing users to peek into the past and retrieve data as it existed at a previous point in time. This functionality can eliminate the need for manual data restoration from backups, making it a useful asset for both developers and database administrators.

Enabling Flashback Queries

Before using Flashback Queries, ensure that the database has the required configuration. Firstly, confirm that the database’s DB_FLASHBACK_RETENTION_TARGET parameter is appropriately set. This parameter defines the period for which historical data is retained. Adjust it based on your organization’s data retention policies. Before making changes, you can check its current value:

SHOW PARAMETER DB_FLASHBACK_RETENTION_TARGET;

Use the ALTER SYSTEM command to set the parameter. For example, to set it to retain data for 7 days:

ALTER SYSTEM SET DB_FLASHBACK_RETENTION_TARGET=10080 SCOPE=BOTH;

10080 is the retention period in minutes (7 days × 24 hours × 60 minutes). Please note that setting the parameter to a higher value consumes more space in the flashback recovery area, so consider your storage constraints. SCOPE=BOTH ensures that the change persists across database restarts, i.e. it changes the value both in memory and in the server parameter file.

To enable Flashback Queries for a specific table, execute the ALTER TABLE command with the FLASHBACK option:

ALTER TABLE table_name FLASHBACK ARCHIVE;

This setup allows Oracle to maintain historical changes for the specified table.

Using Flashback Queries

Consider a scenario where an employee accidentally updates critical data in the employees table. With Flashback Queries, you can rectify the mistake:

SELECT * FROM employees AS OF TIMESTAMP TO_TIMESTAMP('2023-11-10 15:00:00', 'YYYY-MM-DD HH24:MI:SS');

This query retrieves the data from the employees table as it existed before the erroneous update.

You can also recover dropped tables if they are still within the retention period:

FLASHBACK TABLE orders TO BEFORE DROP;

This command restores the dropped table and its data.

Flashback Queries offer a mechanism to navigate through time within a database, providing a simple way to recover historical data or inspect changes. They stand as a useful asset in the arsenal of database administrators and developers, fostering greater confidence in managing data.

PostgreSQL’s auto-explain feature and execution plans

PostgreSQL’s auto-explain is a built-in feature that automatically generates and logs execution plans for SQL statements. It’s a useful tool for developers to understand how the query planner is executing SQL queries.

You enable and configure auto-explain by setting parameters in the PostgreSQL configuration file (postgresql.conf). Set auto_explain.log_analyze to on to log execution plans along with statistics, and set auto_explain.log_min_duration to specify the minimum execution time in milliseconds that a query must take to be logged. For example, if you want to log queries taking longer than 100 milliseconds, set it to 100. Set auto_explain.log_buffers to on if you want to include information about memory usage, and auto_explain.log_timing to log timing information.

Here’s an example of how to configure these parameters in postgresql.conf:

auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_min_duration = 100

Reading the execution plan

Suppose you have a “recipe database” that stores recipes, ingredients, and chefs. You want to retrieve a list of recipes along with the names of the chefs who created them and the ingredients they contain. Here’s a query that accomplishes this:

SELECT recipes.recipe_name, chefs.chef_name, ingredients.ingredient_name
  FROM recipes
  JOIN chefs ON recipes.chef_id=chefs.chef_id
  JOIN recipe_ingredients ON recipes.recipe_id=recipe_ingredients.recipe_id
  JOIN ingredients ON recipe_ingredients.ingredient_id=ingredients.ingredient_id
WHERE recipes.cuisine='Italian';

This query fetches Italian recipes, their respective chefs, and the ingredients they use.

When you run this query with auto-explain enabled, PostgreSQL will log the execution plan. The query plan might look something like this:

Hash Join  (cost=100.25..350.75 rows=50 width=96)
  Hash Cond: (recipe_ingredients.recipe_id = recipes.recipe_id)
  ->  Hash Join  (cost=50.12..200.37 rows=50 width=60)
        Hash Cond: (recipes.chef_id = chefs.chef_id)
        ->  Seq Scan on recipes  (cost=0.00..100.00 rows=50 width=24)
              Filter: (cuisine = 'Italian'::text)
        ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
              ->  Seq Scan on chefs  (cost=0.00..30.00 rows=1000 width=36)
  ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
        ->  Seq Scan on recipe_ingredients  (cost=0.00..30.00 rows=1000 width=36)
              Filter: (recipe_id IS NOT NULL)

In this query plan Hash Join indicates a join operation using a hash-based algorithm. Seq Scan signifies a sequential scan of the table, which might imply a full table scan. Hash Cond shows the join condition for the respective hash join.

cost represents the estimated execution cost for each operation, and rows indicates the estimated number of rows returned by each operation.

The estimated cost in PostgreSQL query execution plans is typically represented in an abstract unit known as “cost units.” These cost units are used for relative cost estimation and are not expressed in any specific real-world measurement like time or money. They are designed to provide a relative measure of the cost of different query plan operations so that the query planner can make informed decisions about which plan to choose.

Reading this plan, PostgreSQL starts by filtering Italian recipes (a Seq Scan with a filter). It then joins the recipes with chefs using a hash join, and the result is further joined with ingredients using another hash join. The cost values provide relative estimates of resource usage, allowing you to identify potentially expensive parts of the query, and you can consider improving the performance of the SQL statement with optimisations like indexing.

Porting an application from Oracle to PostgreSQL

Currently, we are in the process of porting a larger application from an Oracle database to a PostgreSQL database as an additional database system. The aim is for the application to be functional not only with one database system but with both: with Oracle for one customer and with PostgreSQL for another customer.

The first step in this process was to conduct a high-level analysis of what effort this would entail and where the application would need to be adjusted. The application consists of multiple services. A frontend service in Ruby on Rails and a variety of backend services in Java. Rails uses an O/R mapper called Active Record, which is relatively database-agnostic. No major adaptation effort was expected here. Ultimately, only the Active Record adapter needs to be configured per instance.

The Java services do not use an O/R mapper. They make direct SQL queries to the target database system using JDBC. This is where the greatest potential for issues existed. Fortunately, all database queries here are encapsulated in repository classes, making the SQL queries easily accessible and adaptable for different database management systems. However, this has not proven necessary so far. The SQL queries are written in standard SQL, without using database-specific features. The code is limited to basic features such as tables, views, sequences, SELECTs, INSERTs, UPDATEs, and JOINS, all of which are supported by both databases. Particularly, no stored procedures are used, which wouldn’t necessarily be portable.

For implementation, the initial goal was to make the existing Oracle database schema of the application reproducible on a PostgreSQL database, in a way that a new instance can be set up fresh in a Docker container at any time.

The best way to achieve this is with a database migration tool. The options considered were: Active Record Migrations, Liquibase, and Flyway. We ultimately chose Liquibase, which we have had positive experiences with. It can be used standalone and allows for formulating migrations in a database-independent manner. Additionally, it offers the option to export an existing schema as a migrations changelog file using the generate-changelog command.

We only had to slightly adjust the generated changelog to make it work for both database systems. In a Liquibase changelog, you can define variables referred to as properties, which can take on different values for various database systems. One use case for this is having different names for column data types:

<property name="varchar" value="varchar" dbms="postgresql"/>
<property name="varchar" value="VARCHAR2" dbms="oracle"/>

These properties can then be used, for example, in the following way:

<column name="DESCRIPTION" type="${varchar}(4000)">

Here’s a helpful table that maps Oracle datatypes to their equivalent PostgreSQL data types. After some editing of the changelog, mainly through search and replace, we had a migration for the database schema that works for both systems.

Oracle database identity column

A few days ago, I came across the topic of identity columns in Oracle. So I decided to try it out to check if it would benefit my database tables.  I would like to share the results in this article. First, I will explain what identity columns are and how to create them.

General

If you declare a column as an identity column, the value will be automatically populated from an associated sequence generator. It is possible to add an identity column by using either the CREATE TABLE or ALTER TABLE statement. In the example below, I will show how to add it using the CREATE TABLE statement.

The column marked as an identity needs to be of the data type integer, long, or number.

CREATE TABLE EXAMPLE (
    id NUMBER GENERATED ALWAYS AS IDENTITY,
    ...
);

In this case, I ALWAYS want a value to be automatically filled in. There is also the option to only fill it if no value was specified (BY DEFAULT) or when the given value is null (BY DEFAULT ON NULL).

Additionally, you can configure the sequence as shown in the following example.

CREATE TABLE EXAMPLE (
    id NUMBER GENERATED ALWAYS AS IDENTITY
    (START WITH 1 INCREMENT BY 2 MAXVALUE 5000 NO CYCLE CACHE 100),
    ...
);

With these  settings, the sequence starts with 1 and increments by 2, so the values are 1, 3, 5, and so on. The sequence counts up to 2000 and then stop. If CYCLE is set, the sequence continues with 1 again.

The cache defines a client to request 100 pre-defined values in advance. Because of the step width of 2 for example the values 1 – 199 will be assigned. If a second client asks, it will receive the values 201 – 399. As a result, the entries in the table can have the IDs 1, 201, 3, 5, 7, 203 for example.

Using with Ruby on Rails / Active Records

My use case is to use the table in Ruby on Rails with Active Records. Here, I want Active Record to auto-increment the ID, so I can retrieve it in my program. However, if I do not use the identity column feature and also do not specify the name of a manually created sequence, Ruby always returns 0 as the ID. When I add a sequence name, I get the correct ID.

Now I am trying to change this to work with the identity column. The insert in Oracle directly with SQL works fine.

The next step is to use it in my Ruby program, and the first problem occurs. Unfortunately, it is not possible to define my own sequence name. Oracle automatically creates sequences with cryptic names like ISEQ$$_12345 for identity columns which reduces readability in code. Additionally if I recreate the table, the sequence name changes, which means I have to update my code afterwards. This seems to be the first showstopper.

Nonetheless I continue and try to create a new entry with it. The entry was created. It works! Or wait a moment. Ruby always shows 0 instead of the right ID. It seems like the program can not use the sequence correctly.  I halt my testing. Maybe it is possible to make Ruby work with identity sequences, maybe it is not. If you have already found a solution, feel free to comment.

In the end I decide that using the manually created sequence still fit better. The main reason is the naming, so I do not want to spend more time searching for a solution to make it work with Ruby.

Conclusion

Although the identity column option does not fit my use case, it is a possible to create a table column with auto-generated values in Oracle. For situations where the ID is not of interest, it can simplify the work. You do not have to create the logic with sequences and triggers on your own.

Table inheritance in PostgreSQL

In a previous post I have written about tables as data types in PostgreSQL. In addition to that, PostgreSQL has a feature similar to the inheritance mechanism of data types in object-oriented programming: table inheritance.

Table inheritance allows you to create a hierarchical relationship between tables. With this feature you define a parent table, and child tables inherit columns and some constraints (like CHECK constraints and NOT NULL constraints) from it.

How it works

To begin, we create the parent table products using the following SQL code:

CREATE TABLE products (
  id     SERIAL         PRIMARY KEY,
  name   TEXT           NOT NULL,
  price  DECIMAL(10, 2) NOT NULL
);

Next, let’s create child tables that inherit from the products table. We will create two product categories: electronics and clothing. Each child table will have its own specific columns in addition to the inherited columns. To set up the inheritance relation we use the INHERITS keyword:

CREATE TABLE electronics (
  warranty_period   INT,
  power_consumption INT
) INHERITS (products);

CREATE TABLE clothing (
  size  TEXT,
  color TEXT
) INHERITS (products);

Now we insert some rows into the child tables:

INSERT INTO electronics
  (name, price, warranty_period, power_consumption)
  VALUES ('Laptop', 1439.0, 2, 536);

INSERT INTO clothing
  (name, price, size, color)
  VALUES ('T-Shirt', 5.99, 'L', 'red');

When we query each child table individually, we only get the rows we inserted into it, as expected. However, when we query the parent table, we get all the products from the child the tables:

SELECT * FROM products;

id	name	 price
--------------------
1	Laptop	 1439.00
2	T-Shirt	 5.99

Note that you can also insert rows into the parent table:

INSERT INTO products
  (name, price)
  VALUES ('Milk', 1.15);

This row will not show up in the child tables, only when querying the parent table. If you are asking yourself now if there is an equivalent to an “abstract class”, there isn’t one. Just don’t insert into the parent table if it doesn’t make sense in your domain.

If you want to query rows only from the parent table you can use the ONLY keyword:

SELECT name, price FROM ONLY products;

This keyword works also with other commands like UPDATE and DELETE.

On a final note, you can also remove the inheritance relationship from a child table with the NO INHERIT clause:

ALTER TABLE clothing NO INHERIT products;

Developing for Cordova + SQLite in a standard Browser environment

As any developer, who doesn’t just love it when a product that has grown over the years suddenly needs to target a new platform (e.g. operating system) because some customer demands changed, some dependency broke or some other totally unexpected thing called “progress” happened?

Fortunately, there are some approachs to cross-platform development and if one expects such a change of direction, one can early on adopt a suitable runtime environment such as Apache Cordova or Capacitor/Ionic or similar, who all promise you a Write-Once-Run-Anywhere experience, decoupling the application logic from the lower-level OS interactions.

Unfortunately though, this promise is a total lie and usually, after starting such a totally platform-agnostic project, really soon you will want to use a dependency that will only work for one platform and then your options are limited.

One such example is a Cordova project we are currently moving from Android to iOS, and in that process also redesigning a nice, modern frontend to replace a very outdated (read: unmaintainable) Vanilla JS application. So now we have set it up smoothly (React + Vite + Typescript – you name it!), so technically we do not need anything iOS-specific yet, so we can work on our redesign in a pure-browser environment with hot reloading and the likes – life is good!

Then comes the realization that our application is quite data heavy and uses an on-device SQL database to persist its data, and we don’t have that in the browser – so, life turned bad.

What to do? There had been a client-side WebSQL database specification once, but this was unofficial and never fully implemented, abandoned in 2010, still present in Chrome but they are even live announcing how they are removing it, so this is not the future-proof way to go.

We crave a smooth flow of development.

  • It is not an option to re-build the app at every change.
  • It is not an option to have the production system use its SQLite DB and the development environment to use a totally different one like IndexedDB – certain SQLite queries are too ingrained in our application.
  • It’s only probably an option to use an experimental technology like absurd-sql, which aims to fill in that gap but then again needs advanced API features like Web Workers, SharedArrayBuffer, Atomics API which we wouldn’t require else
  • It is possible to use in-memory SQLite via sql.js but for persistence, it wasn’t instantly obvious to me how to couple that with the partially supported Origin Private File System API

So after all, this is the easiest solution that still gave me most of my developer smoothness back: Use sql.js in memory and for development, display two nice buttons on the UI which let me download the whole DB and upload one from file again. This is the sketch:

We create a CombinedDatabase class which, depending on the environment, can hand out such a database in a Singleton-like manner

class CombinedDatabase {

    // This is the Singleton-part

    private static instance: CombinedDatabase;

    public static get = async (): Promise<CombinedDatabase> => {
        if (!this.instance) {
            const {db, type} = await this.createDatabase();
            this.instance = new CombinedDatabase(db, type);
        }
        return this.instance;
    };

    private static createDatabase = async () => {
        if (inProductionEnvironment()) {
            return {
                db: createCordovaSqliteInstance(),
                type: "CordovaSqlite"
             };
        } else {
            const sqlWasmUrl = (await import("../assets/sql-wasm.wasm?url")).default;
            // we extend the window object for reasons I tell you below
            window.sqlJs = await initSqlJs({locateFile: () => sqlWasmUrl});
            const db = new window.sqlJs.Database();
            return {db, type: "InMemory"};
        }
    }


    // This is the actual flesh, i.e. a switch of which API to use

    private readonly type: string;
    private cordovaSqliteDb: SQLitePlugin.Database | null = null;
    private inMemorySqlJsDb: SqlJsDatabase | null = null;

    private constructor(db: SQLitePlugin.Database | SqlJsDatabase, type: string) {
        this.type = type;
        switch(type) {
            case "CordovaSqlite":
                this.cordovaSqliteDb = db as SQLitePlugin.Database;
                break;
            case "InMemory":
                this.inMemorySqlJsDb = db as SqlJsDatabase;
                break;
            default:
                throw Error("Invalid CombinedDatabase type: " + type);
        }
    }

   // ... and then there are some methods

}

(This is simplified – in actual, type is an enum for me , and there’s also error handling, but you know – not the point here).

This structure is nice, because you can now implement low-level methods like some executeQuery(...) etc. which just decide depending on the type, which of the private DB instances it can address, and even if they work differently, return a unified response format.

The rest of our application does not know anything about any Cordova-SQLite-dependency, or sql.js, or whatever. Life is good again.

So How do Import / Export work?

I gave the CombinedDatabase some interfacing methods, similar to


    public async export() {
        switch (this.type) {
            case "CordovaSqlite":
                throw Error("Not implemented for cordova-sqlite database");
            case "InMemorySqlJs":
                return this.inMemorySqlJsDb!.export();
            default:
                throw Error("DB not initialized, cannot export.");
        }
    }

    public async import(binaryData: Uint8Array) {
        if (this.type !== CombinedDatabaseType.InMemorySqlJs) {
            throw Error("DB import only implemented for the in-memory/sql.js database, this is a DEVELOPMENT feature!");
        }
        await this.close();
        this.inMemorySqlJsDb = new window.sqlJs.Database(binaryData);
    }

This is also the reason why I monkey-patched the window object earlier, so I still have this API around outside the Singleton instantiation (createDatabase). Yes, this is a global variable and a kind of hack, but imo is what can safely be done inside the Browser within some good measure.

Remember, in Typescript you need to declare this e.g. in some global.d.ts file

import {SqlJsStatic} from "sql.js";

declare global {
    interface Window {
        sqlJs?: SqlJsStatic
    }
}

Or go around the Window interface by casting (window as any).sqlJs – you decide what you prefer.

Anyway, the export() functionality can then be used quite handily, it returns the in-memory database as a binary array and you can make the browser download that via a Blob URL:

api.db.export().then((array: Uint8Array) => {
    const blob = new Blob([array], {type: "application/x-sqlite3"});
    const link = document.createElement("a");
    link.href = URL.createObjectURL(blob);
    link.download = `bonpland${Date.now()}.db`;
    link.target = "_blank";
    link.click();
});

And similarly, you can use import() by reading a Uint8Array from a temporary <input type="file"> element with a FileReader() (somewhat common solution, but just comment below if you want the details).

To be exact, I don’t even use the import() button anymore because I pass my development DB as an asset to the dev server. This is nice (and only takes a few seconds on hot reloading because our DB is like 50 MB in size), but somewhat Vite-specific, which is why I will postpone this topic to some later blog time.

Oracle DB’s Gradual Password Rollover Feature

It is good security practice to change passwords regularly. When changing a database password, however, the problem arises that applications that access this database have to be reconfigured if the password changes. If multiple applications or services use the same database user, then they all need to be reconfigured at once, typically during a scheduled downtime.

Oracle 21c introduced a new feature called Gradual Password Rollover that can help make such a password change less disruptive. The feature was also backported to Oracle 19c. If this feature is switched on for a user profile, a transition time is granted when the password is changed, during which both the old and the new password are valid. The applications can then change their configuration to the new password within this period according to their own schedule.

How to enable it

You must first be logged in as a privileged user who is allowed to manage users. The grace period for which both passwords should be valid after a password change is set via a user profile. A user profile is a set of limits on the database resources and the user password. The profile setting for this feature is called PASSWORD_ROLLOVER_TIME. You either create a new profile and specify this setting as a limit, or you adjust an existing profile. Here are both variants:

-- Create a new profile ...
CREATE PROFILE example_profile LIMIT PASSWORD_ROLLOVER_TIME 1;

-- ... or alter an existing profile
ALTER PROFILE example_profile LIMIT PASSWORD_ROLLOVER_TIME 1;

The unit of this setting is days. The minimum value is one hour (1/24) and the maximum value is 60 (days). You can assign this profile to a user with the following statement:

ALTER USER example_user PROFILE example_profile;

Now change the user’s password:

ALTER USER example_user IDENTIFIED BY thenewpassword;

Now you should be able to log in as this user with both the old and the new password. You can query the current status from the dba_users table:

SELECT username, account_status, profile
  FROM  dba_users
  WHERE username='example_user';

The value of the account_status column should have changed from OPEN to OPEN & IN ROLLOVER. This indicates that the user account is in the password rollover phase, and two passwords are active at the same time. You can end this period early with the following command:

ALTER USER example_user EXPIRE PASSWORD ROLLOVER PERIOD;

A final note: If you change the password again during the rollover period only the original password (the one before the rollover period was started) and the latest password are valid, which means a user account can’t have more than two valid passwords at the same time.

Materialized views in Oracle

Most relational database management systems (RDBMs) support not only views, but also materialized views. Materialized views and normal views are both database objects used to present data to users, but they work in different ways.

A database view is a virtual table or a named query that presents data from one or more tables in a specific way. They are not physical tables and do not store data directly, but instead retrieve data from the underlying tables based on a specified query.

Materialized views are similar to normal views, but they store pre-computed results. When a materialized view is created, the results of the underlying query are computed and stored in the database. One advantage of materialized views is faster query performance by avoiding the need to compute the same results repeatedly. This is especially useful for complex and time-consuming queries, as the results can be stored and accessed quickly.

The syntax for creating a normal view in an Oracle database is as follows:

CREATE VIEW view_name AS SELECT … FROM … WHERE …;

To create a materialized view instead of a normal view you add the MATERIALIZED keyword:

CREATE MATERIALIZED VIEW view_name AS SELECT … FROM … WHERE …;

When creating a materialized view you should think about and decide on three aspects of materialized views:

  1. the refresh method,
  2. the refresh interval,
  3. and the storage properties

The refresh method

The refresh method determines how the data in the materialized view is updated or refreshed to reflect changes in the base tables.

  • COMPLETE: This one completely rebuilds the materialized view from scratch. It drops the existing contents of the materialized view and then re-executes the query to populate it with fresh data. This method can be resource-intensive and slow, especially for large materialized views.
  • FAST: Updates only the rows in the materialized view that have changed since the last refresh. It uses the materialized view logs on the base tables to identify the changed rows and then applies the changes to the materialized view. It can be much faster than a complete refresh, especially if there are only a few changes to the data.
  • FORCE: Tries to perform a fast refresh if possible, but falls back to a complete refresh if necessary. This method is useful if you want to try to perform a fast refresh, but you’re not sure if it will be possible due to the nature of the data or the query.

You can specify the refresh method when creating the materialized view using the REFRESH keyword:

CREATE MATERIALIZED VIEW view_name
  REFRESH FAST
  AS SELECT ...;

If you do not specify a refresh mode FORCE is the default.

The refresh interval

The refresh interval controls how often the materialized view is automatically refreshed. It determines how frequently the materialized view is updated to reflect changes in the underlying data.

Some refresh interval options in Oracle are:

  • ON COMMIT: The materialized view is refreshed automatically every time a transaction that modifies the underlying data is committed. This interval is useful when you need to keep the materialized view up-to-date in near real-time.
  • ON DEMAND: The materialized view is refreshed only when you explicitly request a refresh using the DBMS_MVIEW.REFRESH.
  • START WITH … NEXT: With this interval, the materialized view is refreshed automatically at regular intervals. It is useful when you want to balance the need for up-to-date data with the resources required to refresh the materialized view.

You can specify the refresh interval when creating the materialized view by adding it to the REFRESH clause when creating the view:

CREATE MATERIALIZED VIEW view_name
  REFRESH FAST ON COMMIT
  AS SELECT ...;

The following materialized view gets refreshed every hour:

CREATE MATERIALIZED VIEW view_name
  REFRESH FAST START WITH SYSDATE NEXT SYSDATE + 1/24
  AS SELECT ...;

Storage properties

Storage properties affect how the data in the materialized view is stored and accessed. In Oracle, some of these are:

  • CACHE: The data is stored in the database buffer cache, which is a portion of memory used to cache frequently accessed data. It improves query performance by reducing disk I/O, but it can consume a significant amount of memory.
  • LOGGING: Changes to the materialized view data are logged in the database redo logs. This property ensures that changes to the materialized view can be recovered in case of a system failure but can result in additional overhead.
  • TABLESPACE: Allows you to specify the tablespace where the materialized view data is stored.

Again, you can specify these properties when creating the materialized view:

CREATE MATERIALIZED VIEW view_name
  CACHE
  LOGGING
  TABLESPACE tablespace_name
AS SELECT ... FROM ... WHERE ...;

Now you know the basics for creating materialized views in an Oracle database when needed. There is still more to learn about them. You can find the full reference here.

Grails Domain update optimisation

As many readers may know we are developing and maintaining some Grails applications for more than 10 years now. One of the main selling points of Grails is its domain model and object-relational-mapper (ORM) called GORM.

In general ORMs are useful for easy and convenient development at the cost of a bit of performance and flexibility. One of the best features of GORM is the availability of several flexible APIs for use-cases where dynamic finders are not enough. Let us look at a real-world example.

The performance problem

In one part of our application we have personal messages that are marked as read after viewing. For some users there can be quite a lot messages so we implemented a “mark all as read”-feature. The naive implementation looks like this:

def markAllAsRead() {
    def user = securityService.loggedInUser
    def messages = Messages.findAllByUserAndTimelineEntry.findAllByAuthorAndRead(user, false)
    messages.each { message ->
        message.read = true
        message.save()
    }
    Messages.withSession { session -> session.flush()}
 }

While this is both correct and simple it only works well for a limited amount of messages per user. Performance will degrade because all the domain objects are loaded into domain objects, then modified and save one-by-one to the session. Finally the session is persisted to the database. In our use case this could take several seconds which is much too long for a good user experience.

DetachedCriteria to the rescue

GORM offers a much better solution for such use-cases that does not sacrifice expressiveness. Instead it offers a succinct API called “Where Queries” that creates DetachedCriteria and offers batch-updates.

def markAllAsRead() {
    def user = securityService.loggedInUser
    def messages = Messages.where {
        read == false
        addressee == user
    }
    messages.updateAll(read: true)
}

This implementation takes only a few milliseconds to execute with the same dataset as above which is de facto native SQL performance.

Conclusion

Before cursing GORM for bad performance one should have a deeper look at the alternative querying APIs like Where Queries, Criteria, DetachedCriteria, SQL Projections and Restrictions to enhance your ORM toolbox. Compared to dynamic finders and GORM-methods on domain objects they offer better composability and performance without resorting to HQL or plain SQL.

Partitioning in Oracle Database: Because Who Wants to Search an Endless Table?

As data volumes continue to grow, managing large database tables and indexes can become a challenge. This is where partitioning comes in. Partitioning is a feature of database systems that allows you to divide large tables and indexes into smaller, more manageable parts, known as partitions. This can improve the performance and manageability of your database. Aside from performance considerations, maintenance operations, such as backups and index rebuilds, can become easier by allowing them to be performed on smaller subsets of data.

This is achieved by reducing the amount of data that needs to be scanned during query execution. When a query is executed, the database can use the partitioning information to skip over partitions that do not contain the relevant data, instead of having to scan the entire table. This reduces the amount of I/O required to execute the query, which can result in significant performance gains, especially for large tables.

There are several types of partitioning available in Oracle Database, including range partitioning, hash partitioning, list partitioning, and composite partitioning. Each type of partitioning is suited to different use cases and can be used to optimize the performance of your database in different ways. In this blog post we will look range partitioning.

Range partitioning

Here is an example of range-based partitioning in Oracle:

CREATE TABLE books (
  id NUMBER,
  title VARCHAR2(200),
  publication_year NUMBER
)

PARTITION BY RANGE (publication_year) (
  PARTITION p_before_2000 VALUES LESS THAN (2000),
  PARTITION p_2000s VALUES LESS THAN (2010),
  PARTITION p_2010s VALUES LESS THAN (2020),
  PARTITION p_after_2020 VALUES LESS THAN (MAXVALUE)
);

In this example, we have created a table called books that stores book titles, partitioned by the year of publication. We have defined four partitions, p_before_2000, p_2000s, p_2010s, and p_after_2020.

Now, when we insert data into the books table, it will automatically be placed in the appropriate partition based on the year of publication:

INSERT INTO books (id, title, publication_year)
  VALUES (1, 'Nineteen Eighty-Four', 1949);

This book will be inserted into partition p_before_2000, as the year of publication is before 2000. The following book will be placed into partition p_2000s:

INSERT INTO books (id, title, publication_year)
  VALUES (2, 'The Hunger Games', 2008);

When we query the books table, the database will only access the partitions that contain the data we need. For example, if we want to retrieve data for books published in 2015 and 2016, the database will only access partition p_2010s.

SELECT * FROM books WHERE publication_year>=2015 AND publication_year<=2016:

However, you should be aware that while partitioning can improve query performance for some types of queries, it can also negatively impact query performance for others, especially if the partitioning scheme does not align well with the query patterns. Therefore, you should tailor the partitioning to your needs and check if it brings the desired effect.