Unveiling the secrets of invisible database columns

After my last blog post, where I wrote about Generated and Virtual Columns, I would like to dedicate this post to another type of database column: Invisible Columns. This feature exists in MySQL since version 8.0 and in Oracle Database since version 12c. PostgreSQL and MS SQL Server do not support this feature.

Invisible columns, as the name suggests, are columns within a table that are hidden from standard query results by default. Unlike traditional columns that are visible and accessible in query results, invisible columns are not included unless explicitly specified in the query.

This feature provides a level of control over data visibility, allowing developers to hide certain columns from applications or other database users while still retaining their functionality within the database.

Defining invisible columns

When creating a table in MySQL or Oracle, you can designate certain columns as invisible by using the INVISIBLE keyword in the column definition. For example:

CREATE TABLE your_table (
  visible_column   INT,
  invisible_column INT INVISIBLE
);

In this example, the invisible_column is marked as invisible, while the visible_column remains visible by default. To alter an existing table and make a column invisible:

ALTER TABLE your_table
  MODIFY COLUMN existing_column_name INVISIBLE;

Replace your_table with the name of your table and existing_column_name with the name of the column you want to make invisible.

When querying the your_table, the invisible column will not be included in the result set unless explicitly specified:

SELECT * FROM your_table;

visible_column
--------------
   4
   8
  15

By default, invisible columns are hidden from query results, providing a cleaner and more concise view of the data. However, developers can still access invisible columns when needed by explicitly including them in the query:

SELECT visible_column, invisible_column FROM your_table;

visible_column | invisible_column
---------------------------------
   4           |   16
   8           |   23
  15           |   42
Unveiling invisible columns

To list the invisible columns of a table in MySQL, you can query the information_schema.columns system table and filter the results based on the COLUMN_DEFAULT column. Invisible columns have NULL as their default value. Here’s a simple SQL query to accomplish this:

SELECT COLUMN_NAME
  FROM information_schema.columns
  WHERE TABLE_SCHEMA = 'your_database'
    AND TABLE_NAME = 'your_table'
    AND COLUMN_DEFAULT IS NULL;

In Oracle, you can query the USER_TAB_COLUMNS or ALL_TAB_COLUMNS data dictionary views to list the invisible columns of a table. Here’s how you can do it:

SELECT COLUMN_NAME
  FROM USER_TAB_COLUMNS
  WHERE TABLE_NAME = 'your_table'
    AND INVISIBLE = 'YES';

If you want to list invisible columns from all tables in the current schema, you can use the ALL_TAB_COLUMNS view instead:

SELECT TABLE_NAME, COLUMN_NAME
  FROM ALL_TAB_COLUMNS
  WHERE INVISIBLE = 'YES';
Are invisible columns actually useful?

Invisible columns can make schema evolution easier by providing a flexible mechanism for evolving database schemas over time without disrupting existing applications or queries. You can test new features or data structures without committing to them fully. Invisible columns provide a way to add experimental columns to your tables without exposing them to production environments until they are fully tested and ready for use.

They can create cleaner and more concise views of your data by hiding less relevant columns. This can make it easier for developers, analysts, and users to work with the data without unnecessary clutter. However, I would argue that this is also achievable with normal database views.

The downside of introducing invisible columns is that they add complexity to the database schema, which can make it harder to understand and maintain, especially for developers who are not familiar with the invisible columns feature. They also add potential for confusion: Developers may forget about the presence of invisible columns, leading to unexpected behavior in queries or applications.

You probably shouldn’t use them to hide sensitive data, since invisible columns don’t have any additional access control, and security through obscurity is not a good idea. If you grant SELECT permission on the table to a user, they will be able to query visible and invisible columns alike.

Now that you know about them, you can make your own choice.

Half table, half view: Generated Columns

Anyone familiar with SQL database basics knows the two fundamental structures of relational databases: tables and views. Data is stored in tables, while views are virtual tables calculated on-the-fly from a SQL query. Additionally, relational database management systems often support materialized views, which, like views, are based on a query from other tables, but their results are actually persisted and only recalculated as needed.

What many don’t know is that the most common SQL databases (PostgreSQL, MySQL, Oracle) nowadays also support something in between: we’re talking about Generated Columns, which will be introduced in this blog post.

So, what are Generated Columns? Generated Columns are columns within a normal database table. But unlike regular columns, their values are not stored as independent individual values; rather, they are computed from other values in the table.

Below is an example of how to define a Generated Column. The example is for PostgreSQL, but the syntax is similar in other popular relational database systems that support this feature.

CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(100),
quantity INTEGER,
unit_price DECIMAL(10, 2),
total_price DECIMAL(10, 2) GENERATED ALWAYS
AS (quantity * unit_price) STORED
);

As seen above, a Generated Column is defined with the keywords GENERATED ALWAYS AS. The GENERATED ALWAYS is even optional (you could just write AS), but it clarifies what it’s about. Following AS is the expression that computes the value.

At the end of the column definition, either the keyword STORED or VIRTUAL can be used. In the example above, it says STORED, which means the computed value is physically stored on the disk. The value is recalculated and stored only after an INSERT or UPDATE. In contrast, with VIRTUAL, the value is not stored but always computed on-the-fly. Thus, virtual Generated Columns behave similarly to a view, while STORED is more comparable to a materialized view.

The choice between the two options depends on the specific requirements. Stored Generated Columns consume more disk space, while virtual Generated Columns save space at the expense of performance.

In the expression following AS, other columns of the table can be referenced. Even other Generated Columns can be referenced, as long as they are specified in the table definition before the current column. However, SQL subqueries cannot be used in the expression.

In conclusion, Generated Columns are a useful feature that combines parts of a table with the benefits of a view.

SQL Database Window Functions

Window functions allow users to perform calculations across a set of rows that are somehow related to the current row. This can include calculations like running totals, moving averages, and ranking without the need to group the entire query into one aggregate result.

Despite their flexibility, window functions are sometimes underutilised, either because users are unaware of them or because they’re considered too complex for everyday tasks. Learning how to effectively use window functions can improve the efficiency and readability of SQL queries, particularly for reporting and data analysis purposes. This article will explore several use cases.

Numbering Rows

The simplest application area for window functions is the numbering of rows. The ROW_NUMBER() function assigns a unique number to each row within the partition of a result set. The numbering is sequential and starts at 1. It’s useful for creating a unique identifier for rows within a partition, even when the rows are identical in terms of data.

Consider the following database table of library checkouts:

bookcheckout_datemember_id
The Great Adventure2024-02-15102
The Great Adventure2024-01-10105
Mystery of the Seas2024-01-20103
Mystery of the Seas2024-03-01101
Journey Through Time2024-02-01104
Journey Through Time2024-02-18102

We want to assign a unique row number to each checkout instance for every book, ordered by the checkout date to analyze the circulation trend:

SELECT
 book,
checkout_date,
member_id,
ROW_NUMBER() OVER (PARTITION BY book ORDER BY checkout_date) AS checkout_order
FROM library_checkouts;

The result:

bookcheckout_datemember_idcheckout_order
The Great Adventure2024-01-101051
The Great Adventure2024-02-151022
Mystery of the Seas2024-01-201031
Mystery of the Seas2024-03-011012
Journey Through Time2024-02-011041
Journey Through Time2024-02-181022

Ranking

In the context of SQL and specifically regarding window functions, “ranking” refers to the process of assigning a unique position or rank to each row within a partition of a result set based on a specified ordering.

The RANK() function provides a ranking for each row within a partition, with gaps in the ranking sequence when there are ties. It’s useful for ranking items that have the same value.

Consider the following database table of scores in a game tournament:

playergamescore
AliceSpace Invaders4200
BobSpace Invaders5700
CharlieSpace Invaders5700
DanaDonkey Kong6000
EveDonkey Kong4800
FrankDonkey Kong6000
AliceAsteroids8500
BobAsteroids9300
CharlieAsteroids7600

We want to rank the players within each game based on their score, with gaps in rank for ties:

SELECT
 player,
 game,
score,
RANK() OVER (PARTITION BY game ORDER BY score DESC) AS rank
FROM scores;

The result looks like this:

playergamescorerank
BobSpace Invaders57001
CharlieSpace Invaders57001
AliceSpace Invaders42003
DanaDonkey Kong60001
FrankDonkey Kong60001
EveDonkey Kong48003
BobAsteroids93001
AliceAsteroids85002
CharlieAsteroids76003

If you don’t want to have gaps in the ranking sequence when there are ties, you can substitute DENSE_RANK() for RANK().

Cumulative Sum

The SUM() function can be used as a window function to calculate the cumulative sum of a column over a partition of rows.

Example: We are tracking our garden’s vegetable harvest in a database table, and we want to calculate the cumulative yield for each type of vegetable over the harvesting season.

vegetableharvest_dateyield_kg
Carrots2024-06-1810
Carrots2024-07-1015
Tomatos2024-06-1520
Tomatos2024-07-0130
Tomatos2024-07-2025
Zucchini2024-06-2015
Zucchini2024-07-0520

We calculate the running total (cumulative yield) for each vegetable type as the season progresses, using the SUM() function:

SELECT
 vegetable,
harvest_date,
yield_kg,
SUM(yield_kg) OVER (PARTITION BY vegetable ORDER BY harvest_date ASC) AS cumulative_yield
FROM garden_harvest;

Now we can see which vegetables are most productive and how yield accumulates throughout the season:

vegetableharvest_dateyield_kgcumulative_yield
Carrots2024-06-181010
Carrots2024-07-101525
Tomatos2024-06-152020
Tomatos2024-07-013050
Tomatos2024-07-202575
Zucchini2024-06-201515
Zucchini2024-07-052035

Naming table columns in user interface

A few days ago, I had a conversation with a customer regarding naming. They had created a file containing definitions for numerous tables and their corresponding column names for a new user interface. Some names consisted of entire sentences, with words like “from” and “to“.

I found myself dissatisfied with some of these names and thought about why I don’t like them and how to make them nicer.

Guidelines for simplifying names

To me, column names should resemble key points rather than complete sentences. So I’ve compiled a few guidelines that can help you in transforming sentences to key points.

  1. Eliminate filler words: Remove words like verbs and adjectives if they don’t carry relevant information.
  2. Remove articles.
  3. Remove words without additional information. For example, if the information is already included in another word.
  4. Remove information already included in the table name.
  5. Sometimes it makes sense to change the order of the remaining words.

An example

In our example, the table name is “CW 39” and the column name is “The only day of Monday of CW 39 before CW 40“.

1. Remove all filler words: 
The information about there being only one Monday per week is widely known and unnecessary here. Therefore, the new column name is:
The day of Monday of CW 39 before CW 40

2. Remove articles:
The article “The” can be removed. So the new column name is: 
Day of Monday CW 39 before CW 40

3. Remove words without additional information:
The information that it’s about a day is already part of the word ‘Monday’. It’s also obvious, that CW 39 comes before CW 40. So the new column name is: 
Monday CW 39

4. Remove information already included in the table name:
The table name “CW 39” already tells the reader that all columns contain information on this CW. So the new column name is:
Monday

It is much better to read, isn’t it?

Conclusion

After the reformatting, the table became significantly smaller. The names are easier to read, and the understanding is faster as there’s no need to decipher the true meaning within a lengthy sentence.

Thus, it’s a huge advantage to keep the names as short as possible without losing essential information.

Time travel with Oracle database’s Flashback Queries

Oracle’s database management system offers a feature known as Flashback Queries, allowing users to peek into the past and retrieve data as it existed at a previous point in time. This functionality can eliminate the need for manual data restoration from backups, making it a useful asset for both developers and database administrators.

Enabling Flashback Queries

Before using Flashback Queries, ensure that the database has the required configuration. Firstly, confirm that the database’s DB_FLASHBACK_RETENTION_TARGET parameter is appropriately set. This parameter defines the period for which historical data is retained. Adjust it based on your organization’s data retention policies. Before making changes, you can check its current value:

SHOW PARAMETER DB_FLASHBACK_RETENTION_TARGET;

Use the ALTER SYSTEM command to set the parameter. For example, to set it to retain data for 7 days:

ALTER SYSTEM SET DB_FLASHBACK_RETENTION_TARGET=604800 SCOPE=BOTH;

604800 is the retention period in seconds (7 days × 24 hours × 60 minutes × 60 seconds). Please note that setting the parameter to a higher value consumes more space in the flashback recovery area, so consider your storage constraints. SCOPE=BOTH ensures that the change persists across database restarts, i.e. it changes the value both in memory and in the server parameter file.

To enable Flashback Queries for a specific table, execute the ALTER TABLE command with the FLASHBACK option:

ALTER TABLE table_name FLASHBACK ARCHIVE;

This setup allows Oracle to maintain historical changes for the specified table.

Using Flashback Queries

Consider a scenario where an employee accidentally updates critical data in the employees table. With Flashback Queries, you can rectify the mistake:

SELECT * FROM employees AS OF TIMESTAMP TO_TIMESTAMP('2023-11-10 15:00:00', 'YYYY-MM-DD HH24:MI:SS');

This query retrieves the data from the employees table as it existed before the erroneous update.

You can also recover dropped tables if they are still within the retention period:

FLASHBACK TABLE orders TO BEFORE DROP;

This command restores the dropped table and its data.

Flashback Queries offer a mechanism to navigate through time within a database, providing a simple way to recover historical data or inspect changes. They stand as a useful asset in the arsenal of database administrators and developers, fostering greater confidence in managing data.

PostgreSQL’s auto-explain feature and execution plans

PostgreSQL’s auto-explain is a built-in feature that automatically generates and logs execution plans for SQL statements. It’s a useful tool for developers to understand how the query planner is executing SQL queries.

You enable and configure auto-explain by setting parameters in the PostgreSQL configuration file (postgresql.conf). Set auto_explain.log_analyze to on to log execution plans along with statistics, and set auto_explain.log_min_duration to specify the minimum execution time in milliseconds that a query must take to be logged. For example, if you want to log queries taking longer than 100 milliseconds, set it to 100. Set auto_explain.log_buffers to on if you want to include information about memory usage, and auto_explain.log_timing to log timing information.

Here’s an example of how to configure these parameters in postgresql.conf:

auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_min_duration = 100

Reading the execution plan

Suppose you have a “recipe database” that stores recipes, ingredients, and chefs. You want to retrieve a list of recipes along with the names of the chefs who created them and the ingredients they contain. Here’s a query that accomplishes this:

SELECT recipes.recipe_name, chefs.chef_name, ingredients.ingredient_name
  FROM recipes
  JOIN chefs ON recipes.chef_id=chefs.chef_id
  JOIN recipe_ingredients ON recipes.recipe_id=recipe_ingredients.recipe_id
  JOIN ingredients ON recipe_ingredients.ingredient_id=ingredients.ingredient_id
WHERE recipes.cuisine='Italian';

This query fetches Italian recipes, their respective chefs, and the ingredients they use.

When you run this query with auto-explain enabled, PostgreSQL will log the execution plan. The query plan might look something like this:

Hash Join  (cost=100.25..350.75 rows=50 width=96)
  Hash Cond: (recipe_ingredients.recipe_id = recipes.recipe_id)
  ->  Hash Join  (cost=50.12..200.37 rows=50 width=60)
        Hash Cond: (recipes.chef_id = chefs.chef_id)
        ->  Seq Scan on recipes  (cost=0.00..100.00 rows=50 width=24)
              Filter: (cuisine = 'Italian'::text)
        ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
              ->  Seq Scan on chefs  (cost=0.00..30.00 rows=1000 width=36)
  ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
        ->  Seq Scan on recipe_ingredients  (cost=0.00..30.00 rows=1000 width=36)
              Filter: (recipe_id IS NOT NULL)

In this query plan Hash Join indicates a join operation using a hash-based algorithm. Seq Scan signifies a sequential scan of the table, which might imply a full table scan. Hash Cond shows the join condition for the respective hash join.

cost represents the estimated execution cost for each operation, and rows indicates the estimated number of rows returned by each operation.

The estimated cost in PostgreSQL query execution plans is typically represented in an abstract unit known as “cost units.” These cost units are used for relative cost estimation and are not expressed in any specific real-world measurement like time or money. They are designed to provide a relative measure of the cost of different query plan operations so that the query planner can make informed decisions about which plan to choose.

Reading this plan, PostgreSQL starts by filtering Italian recipes (a Seq Scan with a filter). It then joins the recipes with chefs using a hash join, and the result is further joined with ingredients using another hash join. The cost values provide relative estimates of resource usage, allowing you to identify potentially expensive parts of the query, and you can consider improving the performance of the SQL statement with optimisations like indexing.

Porting an application from Oracle to PostgreSQL

Currently, we are in the process of porting a larger application from an Oracle database to a PostgreSQL database as an additional database system. The aim is for the application to be functional not only with one database system but with both: with Oracle for one customer and with PostgreSQL for another customer.

The first step in this process was to conduct a high-level analysis of what effort this would entail and where the application would need to be adjusted. The application consists of multiple services. A frontend service in Ruby on Rails and a variety of backend services in Java. Rails uses an O/R mapper called Active Record, which is relatively database-agnostic. No major adaptation effort was expected here. Ultimately, only the Active Record adapter needs to be configured per instance.

The Java services do not use an O/R mapper. They make direct SQL queries to the target database system using JDBC. This is where the greatest potential for issues existed. Fortunately, all database queries here are encapsulated in repository classes, making the SQL queries easily accessible and adaptable for different database management systems. However, this has not proven necessary so far. The SQL queries are written in standard SQL, without using database-specific features. The code is limited to basic features such as tables, views, sequences, SELECTs, INSERTs, UPDATEs, and JOINS, all of which are supported by both databases. Particularly, no stored procedures are used, which wouldn’t necessarily be portable.

For implementation, the initial goal was to make the existing Oracle database schema of the application reproducible on a PostgreSQL database, in a way that a new instance can be set up fresh in a Docker container at any time.

The best way to achieve this is with a database migration tool. The options considered were: Active Record Migrations, Liquibase, and Flyway. We ultimately chose Liquibase, which we have had positive experiences with. It can be used standalone and allows for formulating migrations in a database-independent manner. Additionally, it offers the option to export an existing schema as a migrations changelog file using the generate-changelog command.

We only had to slightly adjust the generated changelog to make it work for both database systems. In a Liquibase changelog, you can define variables referred to as properties, which can take on different values for various database systems. One use case for this is having different names for column data types:

<property name="varchar" value="varchar" dbms="postgresql"/>
<property name="varchar" value="VARCHAR2" dbms="oracle"/>

These properties can then be used, for example, in the following way:

<column name="DESCRIPTION" type="${varchar}(4000)">

Here’s a helpful table that maps Oracle datatypes to their equivalent PostgreSQL data types. After some editing of the changelog, mainly through search and replace, we had a migration for the database schema that works for both systems.

Oracle database identity column

A few days ago, I came across the topic of identity columns in Oracle. So I decided to try it out to check if it would benefit my database tables.  I would like to share the results in this article. First, I will explain what identity columns are and how to create them.

General

If you declare a column as an identity column, the value will be automatically populated from an associated sequence generator. It is possible to add an identity column by using either the CREATE TABLE or ALTER TABLE statement. In the example below, I will show how to add it using the CREATE TABLE statement.

The column marked as an identity needs to be of the data type integer, long, or number.

CREATE TABLE EXAMPLE (
    id NUMBER GENERATED ALWAYS AS IDENTITY,
    ...
);

In this case, I ALWAYS want a value to be automatically filled in. There is also the option to only fill it if no value was specified (BY DEFAULT) or when the given value is null (BY DEFAULT ON NULL).

Additionally, you can configure the sequence as shown in the following example.

CREATE TABLE EXAMPLE (
    id NUMBER GENERATED ALWAYS AS IDENTITY
    (START WITH 1 INCREMENT BY 2 MAXVALUE 5000 NO CYCLE CACHE 100),
    ...
);

With these  settings, the sequence starts with 1 and increments by 2, so the values are 1, 3, 5, and so on. The sequence counts up to 2000 and then stop. If CYCLE is set, the sequence continues with 1 again.

The cache defines a client to request 100 pre-defined values in advance. Because of the step width of 2 for example the values 1 – 199 will be assigned. If a second client asks, it will receive the values 201 – 399. As a result, the entries in the table can have the IDs 1, 201, 3, 5, 7, 203 for example.

Using with Ruby on Rails / Active Records

My use case is to use the table in Ruby on Rails with Active Records. Here, I want Active Record to auto-increment the ID, so I can retrieve it in my program. However, if I do not use the identity column feature and also do not specify the name of a manually created sequence, Ruby always returns 0 as the ID. When I add a sequence name, I get the correct ID.

Now I am trying to change this to work with the identity column. The insert in Oracle directly with SQL works fine.

The next step is to use it in my Ruby program, and the first problem occurs. Unfortunately, it is not possible to define my own sequence name. Oracle automatically creates sequences with cryptic names like ISEQ$$_12345 for identity columns which reduces readability in code. Additionally if I recreate the table, the sequence name changes, which means I have to update my code afterwards. This seems to be the first showstopper.

Nonetheless I continue and try to create a new entry with it. The entry was created. It works! Or wait a moment. Ruby always shows 0 instead of the right ID. It seems like the program can not use the sequence correctly.  I halt my testing. Maybe it is possible to make Ruby work with identity sequences, maybe it is not. If you have already found a solution, feel free to comment.

In the end I decide that using the manually created sequence still fit better. The main reason is the naming, so I do not want to spend more time searching for a solution to make it work with Ruby.

Conclusion

Although the identity column option does not fit my use case, it is a possible to create a table column with auto-generated values in Oracle. For situations where the ID is not of interest, it can simplify the work. You do not have to create the logic with sequences and triggers on your own.

Table inheritance in PostgreSQL

In a previous post I have written about tables as data types in PostgreSQL. In addition to that, PostgreSQL has a feature similar to the inheritance mechanism of data types in object-oriented programming: table inheritance.

Table inheritance allows you to create a hierarchical relationship between tables. With this feature you define a parent table, and child tables inherit columns and some constraints (like CHECK constraints and NOT NULL constraints) from it.

How it works

To begin, we create the parent table products using the following SQL code:

CREATE TABLE products (
  id     SERIAL         PRIMARY KEY,
  name   TEXT           NOT NULL,
  price  DECIMAL(10, 2) NOT NULL
);

Next, let’s create child tables that inherit from the products table. We will create two product categories: electronics and clothing. Each child table will have its own specific columns in addition to the inherited columns. To set up the inheritance relation we use the INHERITS keyword:

CREATE TABLE electronics (
  warranty_period   INT,
  power_consumption INT
) INHERITS (products);

CREATE TABLE clothing (
  size  TEXT,
  color TEXT
) INHERITS (products);

Now we insert some rows into the child tables:

INSERT INTO electronics
  (name, price, warranty_period, power_consumption)
  VALUES ('Laptop', 1439.0, 2, 536);

INSERT INTO clothing
  (name, price, size, color)
  VALUES ('T-Shirt', 5.99, 'L', 'red');

When we query each child table individually, we only get the rows we inserted into it, as expected. However, when we query the parent table, we get all the products from the child the tables:

SELECT * FROM products;

id	name	 price
--------------------
1	Laptop	 1439.00
2	T-Shirt	 5.99

Note that you can also insert rows into the parent table:

INSERT INTO products
  (name, price)
  VALUES ('Milk', 1.15);

This row will not show up in the child tables, only when querying the parent table. If you are asking yourself now if there is an equivalent to an “abstract class”, there isn’t one. Just don’t insert into the parent table if it doesn’t make sense in your domain.

If you want to query rows only from the parent table you can use the ONLY keyword:

SELECT name, price FROM ONLY products;

This keyword works also with other commands like UPDATE and DELETE.

On a final note, you can also remove the inheritance relationship from a child table with the NO INHERIT clause:

ALTER TABLE clothing NO INHERIT products;

Developing for Cordova + SQLite in a standard Browser environment

As any developer, who doesn’t just love it when a product that has grown over the years suddenly needs to target a new platform (e.g. operating system) because some customer demands changed, some dependency broke or some other totally unexpected thing called “progress” happened?

Fortunately, there are some approachs to cross-platform development and if one expects such a change of direction, one can early on adopt a suitable runtime environment such as Apache Cordova or Capacitor/Ionic or similar, who all promise you a Write-Once-Run-Anywhere experience, decoupling the application logic from the lower-level OS interactions.

Unfortunately though, this promise is a total lie and usually, after starting such a totally platform-agnostic project, really soon you will want to use a dependency that will only work for one platform and then your options are limited.

One such example is a Cordova project we are currently moving from Android to iOS, and in that process also redesigning a nice, modern frontend to replace a very outdated (read: unmaintainable) Vanilla JS application. So now we have set it up smoothly (React + Vite + Typescript – you name it!), so technically we do not need anything iOS-specific yet, so we can work on our redesign in a pure-browser environment with hot reloading and the likes – life is good!

Then comes the realization that our application is quite data heavy and uses an on-device SQL database to persist its data, and we don’t have that in the browser – so, life turned bad.

What to do? There had been a client-side WebSQL database specification once, but this was unofficial and never fully implemented, abandoned in 2010, still present in Chrome but they are even live announcing how they are removing it, so this is not the future-proof way to go.

We crave a smooth flow of development.

  • It is not an option to re-build the app at every change.
  • It is not an option to have the production system use its SQLite DB and the development environment to use a totally different one like IndexedDB – certain SQLite queries are too ingrained in our application.
  • It’s only probably an option to use an experimental technology like absurd-sql, which aims to fill in that gap but then again needs advanced API features like Web Workers, SharedArrayBuffer, Atomics API which we wouldn’t require else
  • It is possible to use in-memory SQLite via sql.js but for persistence, it wasn’t instantly obvious to me how to couple that with the partially supported Origin Private File System API

So after all, this is the easiest solution that still gave me most of my developer smoothness back: Use sql.js in memory and for development, display two nice buttons on the UI which let me download the whole DB and upload one from file again. This is the sketch:

We create a CombinedDatabase class which, depending on the environment, can hand out such a database in a Singleton-like manner

class CombinedDatabase {

    // This is the Singleton-part

    private static instance: CombinedDatabase;

    public static get = async (): Promise<CombinedDatabase> => {
        if (!this.instance) {
            const {db, type} = await this.createDatabase();
            this.instance = new CombinedDatabase(db, type);
        }
        return this.instance;
    };

    private static createDatabase = async () => {
        if (inProductionEnvironment()) {
            return {
                db: createCordovaSqliteInstance(),
                type: "CordovaSqlite"
             };
        } else {
            const sqlWasmUrl = (await import("../assets/sql-wasm.wasm?url")).default;
            // we extend the window object for reasons I tell you below
            window.sqlJs = await initSqlJs({locateFile: () => sqlWasmUrl});
            const db = new window.sqlJs.Database();
            return {db, type: "InMemory"};
        }
    }


    // This is the actual flesh, i.e. a switch of which API to use

    private readonly type: string;
    private cordovaSqliteDb: SQLitePlugin.Database | null = null;
    private inMemorySqlJsDb: SqlJsDatabase | null = null;

    private constructor(db: SQLitePlugin.Database | SqlJsDatabase, type: string) {
        this.type = type;
        switch(type) {
            case "CordovaSqlite":
                this.cordovaSqliteDb = db as SQLitePlugin.Database;
                break;
            case "InMemory":
                this.inMemorySqlJsDb = db as SqlJsDatabase;
                break;
            default:
                throw Error("Invalid CombinedDatabase type: " + type);
        }
    }

   // ... and then there are some methods

}

(This is simplified – in actual, type is an enum for me , and there’s also error handling, but you know – not the point here).

This structure is nice, because you can now implement low-level methods like some executeQuery(...) etc. which just decide depending on the type, which of the private DB instances it can address, and even if they work differently, return a unified response format.

The rest of our application does not know anything about any Cordova-SQLite-dependency, or sql.js, or whatever. Life is good again.

So How do Import / Export work?

I gave the CombinedDatabase some interfacing methods, similar to


    public async export() {
        switch (this.type) {
            case "CordovaSqlite":
                throw Error("Not implemented for cordova-sqlite database");
            case "InMemorySqlJs":
                return this.inMemorySqlJsDb!.export();
            default:
                throw Error("DB not initialized, cannot export.");
        }
    }

    public async import(binaryData: Uint8Array) {
        if (this.type !== CombinedDatabaseType.InMemorySqlJs) {
            throw Error("DB import only implemented for the in-memory/sql.js database, this is a DEVELOPMENT feature!");
        }
        await this.close();
        this.inMemorySqlJsDb = new window.sqlJs.Database(binaryData);
    }

This is also the reason why I monkey-patched the window object earlier, so I still have this API around outside the Singleton instantiation (createDatabase). Yes, this is a global variable and a kind of hack, but imo is what can safely be done inside the Browser within some good measure.

Remember, in Typescript you need to declare this e.g. in some global.d.ts file

import {SqlJsStatic} from "sql.js";

declare global {
    interface Window {
        sqlJs?: SqlJsStatic
    }
}

Or go around the Window interface by casting (window as any).sqlJs – you decide what you prefer.

Anyway, the export() functionality can then be used quite handily, it returns the in-memory database as a binary array and you can make the browser download that via a Blob URL:

api.db.export().then((array: Uint8Array) => {
    const blob = new Blob([array], {type: "application/x-sqlite3"});
    const link = document.createElement("a");
    link.href = URL.createObjectURL(blob);
    link.download = `bonpland${Date.now()}.db`;
    link.target = "_blank";
    link.click();
});

And similarly, you can use import() by reading a Uint8Array from a temporary <input type="file"> element with a FileReader() (somewhat common solution, but just comment below if you want the details).

To be exact, I don’t even use the import() button anymore because I pass my development DB as an asset to the dev server. This is nice (and only takes a few seconds on hot reloading because our DB is like 50 MB in size), but somewhat Vite-specific, which is why I will postpone this topic to some later blog time.