Working with JSON data in Oracle databases

In my last post I showed how to work with JSON data in PostgreSQL. This time I want show how it is done with an Oracle database for comparison. I will use the same example scenario: a table named “events” where application events are stored in JSON format.

JSON data types

In Oracle there is no special data type for JSON data. You can use character string datatypes like VARCHAR2 or CLOB. However, you can add a special CHECK constraint to a column in order to ensure that only valid JSON is inserted:

CREATE TABLE events (
  datetime TIMESTAMP NOT NULL,
  event CLOB NOT NULL
  CONSTRAINT event_is_json CHECK (event IS JSON)
);

If you try to insert something other than JSON you will get a constraint violaiton error:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, 'This is not JSON.');

ORA-02290: check constraint (EVENT_IS_JSON) violated

Let’s insert some valid JSON data:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 2}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

In Oracle you use the JSON_VALUE function to select a value from a JSON structure. It uses a special path syntax for navigating JSON objects where the object root is represented as ‘$’ and properties are accessed via dot notation. This function can be used both in the SELECT clause and the WHERE clause:

SELECT JSON_VALUE(event, '$.type') AS type
  FROM events;
TYPE
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
  WHERE JSON_VALUE(event, '$.type')='add_book'
    AND JSON_VALUE(event, '$.payload.shelf')=1;
EVENT
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Constructing JSON objects

JSON objects can be constructed from values via the JSON_OBJECT and JSON_ARRAY functions:

SELECT JSON_OBJECT(
  'id' VALUE 1,
  'name' VALUE 'tree',
  'isPlant' VALUE 'true' FORMAT JSON,
  'colors' VALUE JSON_ARRAY('green', 'brown')
) FROM dual;
{"id":1,"name":"tree","isPlant":true,"colors":["green","brown"]}

Note that you have to use string values with the additional FORMAT JSON clause for boolean values.

Updating

Modifying JSON object fields has become feasible with the introduction of the JSON_MERGEPATCH function in Oracle 19c. It takes two JSON parameters:

1) the original JSON data
2) a JSON “patch” snippet that will be merged into the original JSON data. This can either add or update JSON properties.

It can be used in combination with JSON_VALUE and JSON_OBJECT. In this example we convert all the event “type” fields from lower case to upper case:

UPDATE events SET event=JSON_MERGEPATCH(
  event,
  JSON_OBJECT('type' VALUE UPPER(JSON_VALUE(event, '$.type')))
);

Oracle provides a lot more functions for working with JSON data. This post only covered the most basic ones. See the Oracle JSON reference for more.

Working with JSON data in PostgreSQL

Today most common SQL-based relational database management systems (DBMS) like PostgreSQL, MySQL, MariaDB, SQL Server and Oracle offer functionality to efficiently store and query JSON data in one form or another, with varying syntax. While a standard named SQL/JSON is in the works, it is not yet fully supported by all of these DBMS. This blog post is specific to PostgreSQL.

JSON data types

In PostgreSQL there are two data types for JSON columns: json and jsonb. The former stores JSON data as-is with any formatting preserved, while the latter stores JSON in a decomposed binary format. Operations on data in jsonb format are potentially more efficient.

We’ll use the jsonb data type in the following example to store a sequence of events, for example for an event sourcing based application, in a table.

CREATE TABLE events (date TIMESTAMP NOT NULL,
                     event JSONB NOT NULL);

JSON literals look like string literals. Let’s insert some events:

INSERT INTO events (date, event) VALUES
  (NOW(), '{"type": "add_shelf", "payload": {"id": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}'),
  (NOW(), '{"type": "add_shelf", "payload": {"id": 2}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

PostgreSQL has two operators for navigating a JSON structure: -> and ->>. The former accesses an object field by key and the latter accesses an object field as text. These operators can be used both in the SELECT clause and the WHERE clause:

SELECT event->>'type' AS type FROM events;
type
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
        WHERE event->>'type'='add_book'
          AND event->'payload'->>'shelf'='1';
event
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Note that in the example above the value of "shelf" is compared to a string literal ('1'). In order to treat the value as a number we have to use the CAST function, and then we can use numerical comparison operators:

SELECT event FROM events
        WHERE CAST(
          event->'payload'->>'shelf' AS INTEGER
        ) > 1;
event
{"type":"add_book","payload":{"shelf":2,"title":"Don Quixote"}}

Updating

Updating JSON object fields is a bit more complicated. It is only possible with the jsonb data type and can be done via the JSONB_SET function, which takes four arguments:

1) the original JSON,
2) a path specifying which object fields should be updated,
3) a jsonb value, which is the new value, and
4) a boolean flag that specifies if missing fields should be created.

In this example we convert all the event "type" fields from lower case to upper case:

UPDATE events SET event=JSONB_SET(
  event,
  '{type}',
  TO_JSONB(UPPER(event->>'type')),
  false
);

PostgreSQL provides a lot more operators and functions for working with JSON data. This post only covered the most basic ones. See the PostgreSQL JSON reference for more.

Oracle database date and time literals

For one of our projects I work with time series data stored in an Oracle database, so I write a lot of SQL queries involving dates and timestamps. Most Oracle SQL queries I came across online use the TO_DATE function to specify date and time literals within queries:

SELECT * FROM events WHERE created >= TO_DATE('2012-04-23 16:30:00', 'YYYY-MM-DD HH24:MI:SS')

So this is what I started using as well. Of course, this is very flexible, because you can specify exactly the format you want to use. On the other hand it is very verbose.

From other database systems like PostgreSQL databases I was used to specify dates in queries as simple string literals:

SELECT * FROM events WHERE created BETWEEN '2018-01-01' AND '2018-01-31'

This doesn’t work in Oracle, but I was happy find out that Oracle supports short date and timestamp literals in another form:

SELECT * FROM events WHERE created BETWEEN DATE'2018-01-01' AND DATE'2018-01-31'

SELECT * FROM events WHERE created > TIMESTAMP'2012-04-23 16:30:00'

These date/time literals where introduced in Oracle 9i, which isn’t extremely recent. However, since most online tutorials and examples seem to use the TO_DATE function, you may be happy to find out about this little convenience just like me.

Database table naming conventions

Naming things well is an important part of writing maintainable software, and renaming things once their names have become established in a code base can be tedious work. This is true as well for the names of an application database schema, where a schema change usually requires a database migration script. That’s why you should take some time beforehand to set up a naming convention.

Many applications use object-relational mappers (ORM), which have a default naming convention to map class and property names to table and column names. But if you’re not using an ORM, you should set up conventions as well. Here are some tips:

  • Be consistent. For example, choose either only plural or singular for table names, e.g. “books” or “book”, and stick to it. Many sources recommend singular for table names.
  • On abbreviations: Some database systems like Oracle have a character limit for names. The limit for Oracle database table names is 30 characters, which means abbreviations are almost inevitable. If you introduce abbreviations be consistent and document them in a glossary, for example in the project Wiki.
  • Separate word boundaries with underscores and form a hierarchy like “namespace_entity_subentity”, e.g. “blog_post_author”. This way you can sort the tables by name and have them grouped by topic.
  • Avoid unnecessary type markers. A table is still a table if you don’t prefix it with “tbl_”, and adding a “_s” postfix to a column of type string doesn’t really add useful information that couldn’t be seen in the schema browser of any database tool. This is similar to Hungarian notation, which has fallen out of use in today’s software development. If you still want to mark special database objects, for example materialised views, then you should prefer a postfix, e.g. “_mv”, over a prefix, because a prefix would mess up the lexicographic hierarchy established by the previous tip.

And the final advice: Document your conventions so that other team members are aware of them and make them mandatory.

Organize complex SQL queries with Common Table Expressions

Complex SQL database queries often contain subqueries.

SELECT * FROM ... 
   WHERE name IN (SELECT name 
         FROM ... 
         WHERE ...)

These can quickly become unreadable, especially if multiple subqueries are involved. A nice way to organise such queries with multiple subqueries is called Common Table Expressions (CTE), or colloquially: “WITH queries”. They are supported by most contemporary SQL databases.

When using Common Table Expressions you list all the subqueries after a WITH keyword in front of the actual query and assign a name to each subquery:

WITH
  <subquery1_name> AS (SELECT ...),
  <subquery2_name> AS (SELECT ...),
  [ ... ]
SELECT ...
FROM ...
[ ... ]

You can now refer to these subqueries in the main query by their names. They are conceptually just like temporary views. You can also refer to a subquery in the main query multiple times, whereas if the subquery was inlined you would have to repeat it. A subquery defined as a Common Table Expression can also refer to the preceding subqueries in the subquery list.

Recursion

A Common Table Expression can even refer to itself, which is a recursive definition. In some database systems you have to add the RECURSIVE keyword after WITH, in others you can leave it out. Here’s a recursive CTE that calculates factorial numbers:

WITH RECURSIVE factorials (n, fact) AS 
  (SELECT 0, 1
   UNION ALL
   SELECT n+1, (n+1)*fact FROM factorials
          WHERE n < 6)
SELECT * FROM factorials;
N FACT
0    1
1    1
2    2
3    6
4   24
5  120

You could also use such a recursive query to descend into a tree hierarchy.

While the use cases for recursive queries are less frequent, I find the general concept of Common Table Expressions very useful to make complex queries more readable.

Determining the sizes of Oracle database tables and indexes

For one of our projects we store large amounts of timeseries data in an Oracle database. Sometimes we want to get an overview of how big the tables and related indexes are. Some database client tools like Toad for Oracle can show this information directly in their user interface, but if you use other tools like the SQuirreL SQL Client or JetBrains DataGrip you have to gather this information yourself via SQL queries.

DBA_SEGMENTS and DBA_INDEXES

For Oracle databases this meta information is available via the DBA_SEGMENTS and DBA_INDEXES tables. To query the sizes of several tables in MB use the following query:

SELECT segment_name, segment_type, bytes/1024/1024 MB
  FROM dba_segments
  WHERE segment_type = 'TABLE'
    AND segment_name IN ('TABLE_NAME_1', 'TABLE_NAME_2');

This SQL query returns the sizes of TABLE_NAME_1 and TABLE_NAME_2.

If you want to see the sizes of all the indexes that are associated with a table or a set of tables you can use the following query:

SELECT idx.table_name, idx.index_name, SUM(bytes)/1024/1024 MB
  FROM dba_segments seg,
       dba_indexes idx
  WHERE idx.table_owner = 'SCHEMA_NAME'
    AND idx.table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2')
    AND idx.owner       = seg.owner
    AND idx.index_name  = seg.segment_name
  GROUP BY idx.index_name, idx.table_name;

Of course, you have to replace SCHEMA_NAME, and TABLE_NAME_x with the names in your database.

Unfortunately, access to this kind of meta information is different for each database system, and the queries above only work for Oracle databases.

Using PostgreSQL for time-series data

The number of sensors and other things that periodically collect data is ever growing. This advent of the  internet of things (IoT) demands a way of storing and analyzing all this so-called time-series data. There are many options for such data – the most prominent being special time-series databases like InfluxDB or well suited, nicely scaling databases like Apache Cassandra.

The problem is you have to tailor your solution to one of these technologies whereas there is SQL with mature database management systems (DBMS) and drivers/bindings for almost any programming language.

Why not use a plain SQL database?

Relational SQL databases are a mature and well understood piece of technology albeit not as sexy as all those new NoSQL databases. Using them for time series data may not be a problem for smaller datasets but sooner or later your ingestion and query performance will degrade massivly. So in general it is not a good option to store all your time-series data in a traditional relational DBMS (RDBMS).

Why use PostgreSQL with TimescaleDB?

With the PostgreSQL extension TimescaleDB you get the best of both worlds: a well known query language, robust tools and scalability.

You access and manage your time-series database just like your ordinary PostgreSQL database. Almost everything including replication and backups will continue to work like before.

You do not have to deal with limitations of specialized solutions or learn a completely new ecosystem just for one aspect of your solution.

The future

We are successfully using TimescaleDB in one of our projects and will continue to share tipps and experience with this technology taking its rising importance into account.