Contiguous date ranges in Oracle SQL

In one of my last posts from a couple of weeks ago I wrote about querying gaps between non-contiguous date ranges in Oracle SQL. This week’s post is about contiguous date ranges.

While non-contiguous date ranges are best represented in a database table with a start_date and an end_date column, it is better to represent contiguous date ranges only by one date column, so that we avoid redundancy and do not have to keep the start date of a date range in sync with the end date of the previous date range. In this post I will use the start date:

CREATE TABLE date_ranges (
name VARCHAR2(100),
start_date DATE
);

The example content of the table is:

NAME	START_DATE
----	----------
A	05/02/2020
B	02/04/2020
C	16/04/2020
D	01/06/2020
E	21/06/2020
F	02/07/2020
G	05/08/2020

This representation means that the date range with the most recent start date does not have an end. The application using this data model can choose whether to interpret this as a date range with an open end or just as the end point for the previous range and not as a date range by itself.

While this is a nice non-redundant representation, it is less convenient for queries where we want to have both a start and an end date per row, for example in order to check wether a given date lies within a date range or not. Luckily, we can transform the ranges with a query:

SELECT
date_ranges.*,
LEAD(date_ranges.start_date)
OVER (ORDER BY start_date)
AS end_date
FROM date_ranges;

As in the previous post on non-contiguous date ranges the LEAD analytic function allows you to access the following row from the current row without using a self-join. Here’s the result:

NAME	START_DATE	END_DATE
----	----------	--------
A	05/02/2020	02/04/2020
B	02/04/2020	16/04/2020
C	16/04/2020	01/06/2020
D	01/06/2020	21/06/2020
E	21/06/2020	02/07/2020
F	02/07/2020	05/08/2020
G	05/08/2020	(null)

By using a WITH clause you can use this query like a view and join it with the another table, for example with the join condition that a date lies within a date range:

WITH ranges AS
(SELECT date_ranges.*, LEAD(date_ranges.start_date) OVER (ORDER BY start_date) AS end_date FROM date_ranges)
SELECT timeseries.*, ranges.name
FROM timeseries LEFT OUTER JOIN ranges ON
timeseries.measurement_date
BETWEEN ranges.start_date AND ranges.end_date;

Changing the keyboard navigation behaviour of form inputs

The default behaviour in HTML forms is that you can move the focus from one input element to the next via the tab key and submit the form via the enter key. This is also how dialogs work on most operating systems when using the native UI components. This behaviour is consistent across all browsers, and changing it messes with the user’s expectations and reduces accessibility. So I would normally advise against changing this behaviour without good reasons.

However, one of our customers wanted a different behaviour for an application developed by us. This application replaced an older application where the enter key did not submit the form, but moved the focus to the next input element. The ‘muscle memory’ effect made users accidentally submit the form by hitting the enter key, causing frustration. Since this application is not a public web site, but merely a web technology based intranet application with a small and specialized user base, changing the default behaviour is acceptable if the users want it.

So here’s how to do it. The following JavaScript function focusNextInputOnEnter takes a form element as a parameter and changes the focus behaviour on the input elements within this form.

function focusNextInputOnEnter(form) {
  var inputs = form.querySelectorAll('input, select, textarea');
  for (var i = 0; i < inputs.length; i++) {
    var input = inputs[i];
    input.addEventListener('keypress', (function(index) {
      return function(event) {
        if (!isEnter(event.which)) {
          return;
        }
        var nextIndex = index + 1;
        while (nextIndex < inputs.length) {
          var nextInput = inputs[nextIndex];
          if (nextInput.disabled) {
            nextIndex++;
            continue;
          }
          nextInput.focus();
          break;
        }
      };
    })(i));
  }

  function isEnter(keyCode) {
    return keyCode === 13;
  }
}

It works by handling the keypress events on the input elements and checking the key code for the enter key (code 13). It has an additional check so that disabled input elements are skipped.

To apply this change in behaviour to a form we have to call the function when the DOM content is loaded:

<form id="demo-form">
  <input type="text">
  <input type="text" disabled="disabled">
  <input type="checkbox">
  <select>
    <option>A</option>
    <option>B</option>
  </select>
  <textarea></textarea>
  <input type="text">
  <input type="text">
</form>

<script>
  document.addEventListener('DOMContentLoaded', function() {
    focusNextInputOnEnter(document.getElementById('demo-form'));
  });
</script>

I want to reiterate my warning that you should definitely not do this for public web sites, and elsewhere only if you know that this is what your users want.

Querying gaps between date ranges in Oracle SQL

Let’s say we have a database table with date ranges, each range designated by a RANGE_START and a RANGE_END column:

CREATE TABLE date_ranges (
  range_start DATE,
  range_end   DATE
);
RANGE_START	RANGE_END
-----------	---------
05/02/2020	01/04/2020
02/04/2020	15/04/2020
16/04/2020	01/05/2020
01/06/2020	20/06/2020
21/06/2020	01/07/2020
02/07/2020	31/07/2020
05/08/2020	30/08/2020

We are now interested in finding the gaps between these date ranges. If we look at this example data set we can see that there are two gaps:

RANGE_START	RANGE_END
05/02/2020	01/04/2020
02/04/2020	15/04/2020
16/04/2020	01/05/2020
-- gap --
01/06/2020	20/06/2020
21/06/2020	01/07/2020
02/07/2020	31/07/2020
-- gap --
05/08/2020	30/08/2020

What would be the SQL query to find these automatically? With standard SQL this would be a difficult task. However, there are some special functions in Oracle SQL called analytic functions that greatly help with this task. Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. In this case we will use the analytic functions MAX and LEAD:

SELECT * FROM (
  SELECT
    MAX(range_end)
      OVER(ORDER BY range_start) + 1 gap_start,
    LEAD(range_start)
      OVER(ORDER BY range_start) - 1 gap_end
  FROM date_ranges
) WHERE gap_start <= gap_end;

The result of this query are the date range gaps we are interested in:

GAP_START	GAP_END
---------	-------
02/05/2020	31/05/2020
01/08/2020	04/08/2020

Note that the MAX function in the query is the analytic MAX function, not the aggregate MAX function, indicated by the OVER keyword with an analytic clause. It operates on a sliding window. The LEAD analytic function allows you to access the following row from the current row without using a self-join.

Using CSV data as external table in Oracle DB

If you want to import CSV data into an Oracle database you can use the SQL*Loader command line tool. You simple create a control file that describes how to load the data and then call the sqlldr command with the control file name as an argument:

example.ctl

LOAD DATA
INFILE example.csv
INTO TABLE example_table
FIELDS TERMINATED BY ';'
(ID, NAME, AMOUNT, DESCRIPTION)
> sqlldr username/password example.ctl

But there’s another way to load CSV data into an Oracle database: External tables.

External tables

Oracle’s external tables feature allows you to query data from a file on the filesystem like a regular database table.

First you have to create a directory in the file system and put your CSV file inside:

mkdir -p /path/to/directory

example.csv

1;Water;250
2;Beer;500
3;Wine;150

Now connect to the database as “SYS as SYSDBA”, define the directory as a database object and grant read/write access to your user:

CREATE OR REPLACE DIRECTORY
  external_tables_dir AS '/path/to/directory';
GRANT READ,WRITE ON DIRECTORY
  external_tables_dir TO example_user;

Now you can connect as example_user and create an external table for the CSV file:

CREATE TABLE example_table (
  id NUMBER(4,0),
  name VARCHAR2(50),
  amount NUMBER(8,0)
)
ORGANIZATION EXTERNAL (
  DEFAULT DIRECTORY external_tables_dir
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY ';'
  )
  LOCATION ('example.csv')  
);

The relevant part here is the ORGANIZATION EXTERNAL block. It references the directory and the CSV file inside the directory and allows you to specify format parameters of the CSV file such as record and field delimiters.

Now you can query the table like a regular table:

SELECT * FROM example_table
ID NAME  AMOUNT
-- ----- ------
1  Water 250
2  Beer  500
3  Wine  150

Access information and errors such as bad or discarded records are stored in log files in the specified directory. The default names of these log files consist of the table name and an ID, e.g. example_table_12345.log, example_table_12345.bad and example_table_12345.dsc.

Generating Rows in Oracle Database

Sometimes you want to automatically populate a database table with a number rows. Maybe you need a big table with lots of entries for a performance experiment or some dummy data for development. Unfortunately, there’s no standard SQL statement to achieve this task. There are different possibilities for the various database management systems. For the Oracle database (10g or later) I will show you the simplest one I have encountered so far. It actually “abuses” an unrelated functionality: the CONNECT BY clause for hierarchical queries in combination with the DUAL table.

Here’s how it can be used:

SELECT ROWNUM id
FROM dual
CONNECT BY LEVEL <= 1000;

This select creates a result set with the numbers from 1 to 1000. You can combine it with INSERT to populate the following table with rows:

CREATE TABLE example (
  id   NUMBER(5,0),
  name VARCHAR2(200)
);

INSERT INTO example (id, name)
SELECT ROWNUM, 'Name '||ROWNUM
FROM dual
CONNECT BY LEVEL <= 10;

The resulting table is:

ID  NAME
1   Name 1
2   Name 2
3   Name 3
...
10  Name 10

Of course, you can use the incrementing ROWNUM in more creative ways. The following example populates a table for time series data with a million values forming a sinus curve with equidistant timestamps (in this case 15 minute intervals) starting with a specified time:

CREATE TABLE example (
  id    NUMBER(5,0),
  time  TIMESTAMP,
  value NUMBER
);

INSERT INTO example (id, time, value)
SELECT
  ROWNUM,
  TIMESTAMP'2020-05-01 12:00:00'
     + (ROWNUM-1)*(INTERVAL '15' MINUTE),
  SIN(ROWNUM/10)
FROM dual
CONNECT BY LEVEL <= 1000000;
ID  TIME              VALUE
1   2020-05-01 12:00  0.099833
2   2020-05-01 12:15  0.198669
3   2020-05-01 12:30  0.295520
...

As mentioned at the beginning, there are other row generator techniques to achieve this. But this one is the simplest so far, at least for Oracle.

The Java Cache API and Custom Key Generators

The Java Cache API allows you to add a @CacheResult annotation to a method, which means that calls to the method will be cached:

import javax.cache.annotation.CacheResult;

@CacheResult
public String exampleMethod(String a, int b) {
    // ...
}

The cache will be looked up before the annotated method executes. If a value is found in the cache it is returned and the annotated method is never actually executed.

The cache lookup is based on the method parameters. By default a cache key is generated by a key generator that uses Arrays.deepHashCode(Object[]) and Arrays.deepEquals(Object[], Object[]) on the method parameters. The cache lookup based on this key is similar to a HashMap lookup.

You can define and configure multiple caches in your application and reference them by name via the cacheName parameter of the @CacheResult annotation:

@CacheResult(cacheName="examplecache")
public String exampleMethod(String a, int b) {

If no cache name is given the cache name is based on the fully qualified method name and the types of its parameters, for example in this case: “my.app.Example.exampleMethod(java.lang.String,int)”. This way there will be no conflicts with other cached methods with the same set of parameters.

Custom Key Generators

But what if you actually want to use the same cache for multiple methods without conflicts? The solution is to define and use a custom cache key generator. In the following example both methods use the same cache (“examplecache”), but also use a custom cache key generator (MethodSpecificKeyGenerator):

@CacheResult(
  cacheName="examplecache",
  cacheKeyGenerator=MethodSpecificKeyGenerator.class)
public String exampleMethodA(String a, int b) {
    // ...
}

@CacheResult(
  cacheName="examplecache",
  cacheKeyGenerator=MethodSpecificKeyGenerator.class)
public String exampleMethodB(String a, int b) {
    // ...
}

Now we have to implement the MethodSpecificKeyGenerator:

import org.infinispan.jcache.annotation.DefaultCacheKey;

import javax.cache.annotation.CacheInvocationParameter;
import javax.cache.annotation.CacheKeyGenerator;
import javax.cache.annotation.CacheKeyInvocationContext;
import javax.cache.annotation.GeneratedCacheKey;

public class MethodSpecificKeyGenerator
  implements CacheKeyGenerator {

  @Override
  public GeneratedCacheKey generateCacheKey(CacheKeyInvocationContext<? extends Annotation> context) {
    Stream<Object> methodIdentity = Stream.of(context.getMethod());
    Stream<Object> parameterValues = Arrays.stream(context.getKeyParameters()).map(CacheInvocationParameter::getValue);
    return new DefaultCacheKey(Stream.concat(methodIdentity, parameterValues).toArray());
  }
}

This key generator not only uses the parameter values of the method call but also the identity of the method to generate the key. The call to context.getMethod() returns a java.lang.reflect.Method instance for the called method, which has appropriate hashCode() and equals() implementations. Both this method object and the parameter values are passed to the DefaultCacheKey implementation, which uses deep equality on its parameters, as mentioned above.

By adding the method’s identity to the cache key we have ensured that there will be no conflicts with other methods when using the same cache.

24 hour time format: Difference between JodaTime and java.time

We have been using JodaTime in many projects since before Java got better date and time support with Java 8. We update projects to the newer java.time classes whenever we work on them, but some still use JodaTime. One of these was a utility that imports time series from CSV files. The format for the time stamps is flexible and the user can configure it with a format string like “yyyyMMdd HHmmss”. Recently a user tried to import time series with timestamps like this:

20200101 234500
20200101 240000
20200102 001500

As you can see this is a 24-hour format. However, the first hour of the day is represented as the 24th hour of the previous day if the minutes and seconds are zero, and it is represented as “00” otherwise. When the user tried to import this with the “yyyyMMdd HHmmss” format the application failed with an internal exception:

org.joda.time.IllegalFieldValueException:
Cannot parse "20200101 240000": Value 24 for
hourOfDay must be in the range [0,23]

Then he tried “yyyyMMdd kkmmss”, which uses the “kk” format for hours. This format allows the string “24” as hour. But now “20200101 240000” was parsed as 2020-01-01T00:00:00 and not as 2020-01-02T00:00:00, as intended.

I tried to help and find a format string that supported this mixed 24-hour format, but I did not find one, at least not for JodaTime. However, I found out that with java.time the import would work with the “yyyyMMdd HHmmss” format, even though the documentation for “H” simply says “hour-of-day (0-23)”, without mentioning 24.

The import tool was finally updated to java.time and the user was able to import the time series file.

Working with JSON data in Oracle databases

In my last post I showed how to work with JSON data in PostgreSQL. This time I want show how it is done with an Oracle database for comparison. I will use the same example scenario: a table named “events” where application events are stored in JSON format.

JSON data types

In Oracle there is no special data type for JSON data. You can use character string datatypes like VARCHAR2 or CLOB. However, you can add a special CHECK constraint to a column in order to ensure that only valid JSON is inserted:

CREATE TABLE events (
  datetime TIMESTAMP NOT NULL,
  event CLOB NOT NULL
  CONSTRAINT event_is_json CHECK (event IS JSON)
);

If you try to insert something other than JSON you will get a constraint violaiton error:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, 'This is not JSON.');

ORA-02290: check constraint (EVENT_IS_JSON) violated

Let’s insert some valid JSON data:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 2}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

In Oracle you use the JSON_VALUE function to select a value from a JSON structure. It uses a special path syntax for navigating JSON objects where the object root is represented as ‘$’ and properties are accessed via dot notation. This function can be used both in the SELECT clause and the WHERE clause:

SELECT JSON_VALUE(event, '$.type') AS type
  FROM events;
TYPE
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
  WHERE JSON_VALUE(event, '$.type')='add_book'
    AND JSON_VALUE(event, '$.payload.shelf')=1;
EVENT
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Constructing JSON objects

JSON objects can be constructed from values via the JSON_OBJECT and JSON_ARRAY functions:

SELECT JSON_OBJECT(
  'id' VALUE 1,
  'name' VALUE 'tree',
  'isPlant' VALUE 'true' FORMAT JSON,
  'colors' VALUE JSON_ARRAY('green', 'brown')
) FROM dual;
{"id":1,"name":"tree","isPlant":true,"colors":["green","brown"]}

Note that you have to use string values with the additional FORMAT JSON clause for boolean values.

Updating

Modifying JSON object fields has become feasible with the introduction of the JSON_MERGEPATCH function in Oracle 19c. It takes two JSON parameters:

1) the original JSON data
2) a JSON “patch” snippet that will be merged into the original JSON data. This can either add or update JSON properties.

It can be used in combination with JSON_VALUE and JSON_OBJECT. In this example we convert all the event “type” fields from lower case to upper case:

UPDATE events SET event=JSON_MERGEPATCH(
  event,
  JSON_OBJECT('type' VALUE UPPER(JSON_VALUE(event, '$.type')))
);

Oracle provides a lot more functions for working with JSON data. This post only covered the most basic ones. See the Oracle JSON reference for more.

Working with JSON data in PostgreSQL

Today most common SQL-based relational database management systems (DBMS) like PostgreSQL, MySQL, MariaDB, SQL Server and Oracle offer functionality to efficiently store and query JSON data in one form or another, with varying syntax. While a standard named SQL/JSON is in the works, it is not yet fully supported by all of these DBMS. This blog post is specific to PostgreSQL.

JSON data types

In PostgreSQL there are two data types for JSON columns: json and jsonb. The former stores JSON data as-is with any formatting preserved, while the latter stores JSON in a decomposed binary format. Operations on data in jsonb format are potentially more efficient.

We’ll use the jsonb data type in the following example to store a sequence of events, for example for an event sourcing based application, in a table.

CREATE TABLE events (date TIMESTAMP NOT NULL,
                     event JSONB NOT NULL);

JSON literals look like string literals. Let’s insert some events:

INSERT INTO events (date, event) VALUES
  (NOW(), '{"type": "add_shelf", "payload": {"id": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}'),
  (NOW(), '{"type": "add_shelf", "payload": {"id": 2}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

PostgreSQL has two operators for navigating a JSON structure: -> and ->>. The former accesses an object field by key and the latter accesses an object field as text. These operators can be used both in the SELECT clause and the WHERE clause:

SELECT event->>'type' AS type FROM events;
type
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
        WHERE event->>'type'='add_book'
          AND event->'payload'->>'shelf'='1';
event
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Note that in the example above the value of "shelf" is compared to a string literal ('1'). In order to treat the value as a number we have to use the CAST function, and then we can use numerical comparison operators:

SELECT event FROM events
        WHERE CAST(
          event->'payload'->>'shelf' AS INTEGER
        ) > 1;
event
{"type":"add_book","payload":{"shelf":2,"title":"Don Quixote"}}

Updating

Updating JSON object fields is a bit more complicated. It is only possible with the jsonb data type and can be done via the JSONB_SET function, which takes four arguments:

1) the original JSON,
2) a path specifying which object fields should be updated,
3) a jsonb value, which is the new value, and
4) a boolean flag that specifies if missing fields should be created.

In this example we convert all the event "type" fields from lower case to upper case:

UPDATE events SET event=JSONB_SET(
  event,
  '{type}',
  TO_JSONB(UPPER(event->>'type')),
  false
);

PostgreSQL provides a lot more operators and functions for working with JSON data. This post only covered the most basic ones. See the PostgreSQL JSON reference for more.

How to disable IP address logging for Apache web server and Tomcat

The General Data Protection Regulation (GDPR) prohibits excessive or unnecessary collection of personal data. IP adresses in server log files are considered personal data.

Logging of IP addresses is usually enabled by default in fresh web server installations. This article describes how to disable it for the Apache web server and the Tomcat application server, which are a common combination for JVM based web applications.

Apache Web Server

The apache web server logs the HTTP requests from web clients in log files called *access.log, which includes IP addresses. Another apache log file, which can contain IP addresses is error.log.

To disable IP address logging for Apache edit the main configuration file, usually called httpd.conf and scan it for LogFormat entries, for example:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

The format ‘verb’ %h stands for the hostname or IP address. This is what you want to remove. You may also want to remove the “Referer” and “User-Agent” components of the log format for privacy:

LogFormat "%l %u %t \"%r\" %>s %O" combined

Restart the server for the changes to take effect.

Tomcat Application Server

For Tomcat you have to configure the so-called AccessLogValve in the server.xml configuration file located in the $CATALINA_HOME/conf directory. The configuration entry looks like this:

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
       prefix="localhost_access_log." suffix=".txt"
       pattern="%h %l %u %t "%r" %s %b" />

The pattern attribute specifies the log format. Again, the relevant format verb you want to remove is %h for the hostname or IP address:

       pattern="%l %u %t "%r" %s %b" />

Restart the server for the changes to take effect.