Generating Rows in Oracle Database

Sometimes you want to automatically populate a database table with a number rows. Maybe you need a big table with lots of entries for a performance experiment or some dummy data for development. Unfortunately, there’s no standard SQL statement to achieve this task. There are different possibilities for the various database management systems. For the Oracle database (10g or later) I will show you the simplest one I have encountered so far. It actually “abuses” an unrelated functionality: the CONNECT BY clause for hierarchical queries in combination with the DUAL table.

Here’s how it can be used:

SELECT ROWNUM id
FROM dual
CONNECT BY LEVEL <= 1000;

This select creates a result set with the numbers from 1 to 1000. You can combine it with INSERT to populate the following table with rows:

CREATE TABLE example (
  id   NUMBER(5,0),
  name VARCHAR2(200)
);

INSERT INTO example (id, name)
SELECT ROWNUM, 'Name '||ROWNUM
FROM dual
CONNECT BY LEVEL <= 10;

The resulting table is:

ID  NAME
1   Name 1
2   Name 2
3   Name 3
...
10  Name 10

Of course, you can use the incrementing ROWNUM in more creative ways. The following example populates a table for time series data with a million values forming a sinus curve with equidistant timestamps (in this case 15 minute intervals) starting with a specified time:

CREATE TABLE example (
  id    NUMBER(5,0),
  time  TIMESTAMP,
  value NUMBER
);

INSERT INTO example (id, time, value)
SELECT
  ROWNUM,
  TIMESTAMP'2020-05-01 12:00:00'
     + (ROWNUM-1)*(INTERVAL '15' MINUTE),
  SIN(ROWNUM/10)
FROM dual
CONNECT BY LEVEL <= 1000000;
ID  TIME              VALUE
1   2020-05-01 12:00  0.099833
2   2020-05-01 12:15  0.198669
3   2020-05-01 12:30  0.295520
...

As mentioned at the beginning, there are other row generator techniques to achieve this. But this one is the simplest so far, at least for Oracle.

The Java Cache API and Custom Key Generators

The Java Cache API allows you to add a @CacheResult annotation to a method, which means that calls to the method will be cached:

import javax.cache.annotation.CacheResult;

@CacheResult
public String exampleMethod(String a, int b) {
    // ...
}

The cache will be looked up before the annotated method executes. If a value is found in the cache it is returned and the annotated method is never actually executed.

The cache lookup is based on the method parameters. By default a cache key is generated by a key generator that uses Arrays.deepHashCode(Object[]) and Arrays.deepEquals(Object[], Object[]) on the method parameters. The cache lookup based on this key is similar to a HashMap lookup.

You can define and configure multiple caches in your application and reference them by name via the cacheName parameter of the @CacheResult annotation:

@CacheResult(cacheName="examplecache")
public String exampleMethod(String a, int b) {

If no cache name is given the cache name is based on the fully qualified method name and the types of its parameters, for example in this case: “my.app.Example.exampleMethod(java.lang.String,int)”. This way there will be no conflicts with other cached methods with the same set of parameters.

Custom Key Generators

But what if you actually want to use the same cache for multiple methods without conflicts? The solution is to define and use a custom cache key generator. In the following example both methods use the same cache (“examplecache”), but also use a custom cache key generator (MethodSpecificKeyGenerator):

@CacheResult(
  cacheName="examplecache",
  cacheKeyGenerator=MethodSpecificKeyGenerator.class)
public String exampleMethodA(String a, int b) {
    // ...
}

@CacheResult(
  cacheName="examplecache",
  cacheKeyGenerator=MethodSpecificKeyGenerator.class)
public String exampleMethodB(String a, int b) {
    // ...
}

Now we have to implement the MethodSpecificKeyGenerator:

import org.infinispan.jcache.annotation.DefaultCacheKey;

import javax.cache.annotation.CacheInvocationParameter;
import javax.cache.annotation.CacheKeyGenerator;
import javax.cache.annotation.CacheKeyInvocationContext;
import javax.cache.annotation.GeneratedCacheKey;

public class MethodSpecificKeyGenerator
  implements CacheKeyGenerator {

  @Override
  public GeneratedCacheKey generateCacheKey(CacheKeyInvocationContext<? extends Annotation> context) {
    Stream<Object> methodIdentity = Stream.of(context.getMethod());
    Stream<Object> parameterValues = Arrays.stream(context.getKeyParameters()).map(CacheInvocationParameter::getValue);
    return new DefaultCacheKey(Stream.concat(methodIdentity, parameterValues).toArray());
  }
}

This key generator not only uses the parameter values of the method call but also the identity of the method to generate the key. The call to context.getMethod() returns a java.lang.reflect.Method instance for the called method, which has appropriate hashCode() and equals() implementations. Both this method object and the parameter values are passed to the DefaultCacheKey implementation, which uses deep equality on its parameters, as mentioned above.

By adding the method’s identity to the cache key we have ensured that there will be no conflicts with other methods when using the same cache.

24 hour time format: Difference between JodaTime and java.time

We have been using JodaTime in many projects since before Java got better date and time support with Java 8. We update projects to the newer java.time classes whenever we work on them, but some still use JodaTime. One of these was a utility that imports time series from CSV files. The format for the time stamps is flexible and the user can configure it with a format string like “yyyyMMdd HHmmss”. Recently a user tried to import time series with timestamps like this:

20200101 234500
20200101 240000
20200102 001500

As you can see this is a 24-hour format. However, the first hour of the day is represented as the 24th hour of the previous day if the minutes and seconds are zero, and it is represented as “00” otherwise. When the user tried to import this with the “yyyyMMdd HHmmss” format the application failed with an internal exception:

org.joda.time.IllegalFieldValueException:
Cannot parse "20200101 240000": Value 24 for
hourOfDay must be in the range [0,23]

Then he tried “yyyyMMdd kkmmss”, which uses the “kk” format for hours. This format allows the string “24” as hour. But now “20200101 240000” was parsed as 2020-01-01T00:00:00 and not as 2020-01-02T00:00:00, as intended.

I tried to help and find a format string that supported this mixed 24-hour format, but I did not find one, at least not for JodaTime. However, I found out that with java.time the import would work with the “yyyyMMdd HHmmss” format, even though the documentation for “H” simply says “hour-of-day (0-23)”, without mentioning 24.

The import tool was finally updated to java.time and the user was able to import the time series file.

Working with JSON data in Oracle databases

In my last post I showed how to work with JSON data in PostgreSQL. This time I want show how it is done with an Oracle database for comparison. I will use the same example scenario: a table named “events” where application events are stored in JSON format.

JSON data types

In Oracle there is no special data type for JSON data. You can use character string datatypes like VARCHAR2 or CLOB. However, you can add a special CHECK constraint to a column in order to ensure that only valid JSON is inserted:

CREATE TABLE events (
  datetime TIMESTAMP NOT NULL,
  event CLOB NOT NULL
  CONSTRAINT event_is_json CHECK (event IS JSON)
);

If you try to insert something other than JSON you will get a constraint violaiton error:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, 'This is not JSON.');

ORA-02290: check constraint (EVENT_IS_JSON) violated

Let’s insert some valid JSON data:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 2}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

In Oracle you use the JSON_VALUE function to select a value from a JSON structure. It uses a special path syntax for navigating JSON objects where the object root is represented as ‘$’ and properties are accessed via dot notation. This function can be used both in the SELECT clause and the WHERE clause:

SELECT JSON_VALUE(event, '$.type') AS type
  FROM events;
TYPE
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
  WHERE JSON_VALUE(event, '$.type')='add_book'
    AND JSON_VALUE(event, '$.payload.shelf')=1;
EVENT
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Constructing JSON objects

JSON objects can be constructed from values via the JSON_OBJECT and JSON_ARRAY functions:

SELECT JSON_OBJECT(
  'id' VALUE 1,
  'name' VALUE 'tree',
  'isPlant' VALUE 'true' FORMAT JSON,
  'colors' VALUE JSON_ARRAY('green', 'brown')
) FROM dual;
{"id":1,"name":"tree","isPlant":true,"colors":["green","brown"]}

Note that you have to use string values with the additional FORMAT JSON clause for boolean values.

Updating

Modifying JSON object fields has become feasible with the introduction of the JSON_MERGEPATCH function in Oracle 19c. It takes two JSON parameters:

1) the original JSON data
2) a JSON “patch” snippet that will be merged into the original JSON data. This can either add or update JSON properties.

It can be used in combination with JSON_VALUE and JSON_OBJECT. In this example we convert all the event “type” fields from lower case to upper case:

UPDATE events SET event=JSON_MERGEPATCH(
  event,
  JSON_OBJECT('type' VALUE UPPER(JSON_VALUE(event, '$.type')))
);

Oracle provides a lot more functions for working with JSON data. This post only covered the most basic ones. See the Oracle JSON reference for more.

Working with JSON data in PostgreSQL

Today most common SQL-based relational database management systems (DBMS) like PostgreSQL, MySQL, MariaDB, SQL Server and Oracle offer functionality to efficiently store and query JSON data in one form or another, with varying syntax. While a standard named SQL/JSON is in the works, it is not yet fully supported by all of these DBMS. This blog post is specific to PostgreSQL.

JSON data types

In PostgreSQL there are two data types for JSON columns: json and jsonb. The former stores JSON data as-is with any formatting preserved, while the latter stores JSON in a decomposed binary format. Operations on data in jsonb format are potentially more efficient.

We’ll use the jsonb data type in the following example to store a sequence of events, for example for an event sourcing based application, in a table.

CREATE TABLE events (date TIMESTAMP NOT NULL,
                     event JSONB NOT NULL);

JSON literals look like string literals. Let’s insert some events:

INSERT INTO events (date, event) VALUES
  (NOW(), '{"type": "add_shelf", "payload": {"id": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}'),
  (NOW(), '{"type": "add_shelf", "payload": {"id": 2}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

PostgreSQL has two operators for navigating a JSON structure: -> and ->>. The former accesses an object field by key and the latter accesses an object field as text. These operators can be used both in the SELECT clause and the WHERE clause:

SELECT event->>'type' AS type FROM events;
type
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
        WHERE event->>'type'='add_book'
          AND event->'payload'->>'shelf'='1';
event
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Note that in the example above the value of "shelf" is compared to a string literal ('1'). In order to treat the value as a number we have to use the CAST function, and then we can use numerical comparison operators:

SELECT event FROM events
        WHERE CAST(
          event->'payload'->>'shelf' AS INTEGER
        ) > 1;
event
{"type":"add_book","payload":{"shelf":2,"title":"Don Quixote"}}

Updating

Updating JSON object fields is a bit more complicated. It is only possible with the jsonb data type and can be done via the JSONB_SET function, which takes four arguments:

1) the original JSON,
2) a path specifying which object fields should be updated,
3) a jsonb value, which is the new value, and
4) a boolean flag that specifies if missing fields should be created.

In this example we convert all the event "type" fields from lower case to upper case:

UPDATE events SET event=JSONB_SET(
  event,
  '{type}',
  TO_JSONB(UPPER(event->>'type')),
  false
);

PostgreSQL provides a lot more operators and functions for working with JSON data. This post only covered the most basic ones. See the PostgreSQL JSON reference for more.

How to disable IP address logging for Apache web server and Tomcat

The General Data Protection Regulation (GDPR) prohibits excessive or unnecessary collection of personal data. IP adresses in server log files are considered personal data.

Logging of IP addresses is usually enabled by default in fresh web server installations. This article describes how to disable it for the Apache web server and the Tomcat application server, which are a common combination for JVM based web applications.

Apache Web Server

The apache web server logs the HTTP requests from web clients in log files called *access.log, which includes IP addresses. Another apache log file, which can contain IP addresses is error.log.

To disable IP address logging for Apache edit the main configuration file, usually called httpd.conf and scan it for LogFormat entries, for example:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

The format ‘verb’ %h stands for the hostname or IP address. This is what you want to remove. You may also want to remove the “Referer” and “User-Agent” components of the log format for privacy:

LogFormat "%l %u %t \"%r\" %>s %O" combined

Restart the server for the changes to take effect.

Tomcat Application Server

For Tomcat you have to configure the so-called AccessLogValve in the server.xml configuration file located in the $CATALINA_HOME/conf directory. The configuration entry looks like this:

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
       prefix="localhost_access_log." suffix=".txt"
       pattern="%h %l %u %t "%r" %s %b" />

The pattern attribute specifies the log format. Again, the relevant format verb you want to remove is %h for the hostname or IP address:

       pattern="%l %u %t "%r" %s %b" />

Restart the server for the changes to take effect.

Migrating from JScience quantities to Unit API 2.0

If you’re developing software that operates a lot with physical quantities you absolutely should use a library that defines types for quantities and supports safe conversions between units of measurements. Our go-to library for this in Java was JScience. The latest version of JScience is 4.3.1, which was released in 2012.

Since then a group of developers has formed that strives towards the standardization of a units API for Java. JScience maintainer Jean-Marie Dautelle is actively involved in this effort. The group operates under the name Units of Measurement alongside with their GitHub presence unitsofmeasurement.

Over the years there have been several JSRs (Java Specification Requests) by the group:

The current state of affairs is JSR-385, which is the basis of this post. The Units of Measurement API 2.0, or Unit API 2.0 for short, was released in July 2019.

JARs

While JScience is distributed as one JAR (~600 KiB), a setup of Unit API involves three JARs (~300 KiB in total):

  • unit-api-2.0.jar
  • indriya-2.0.jar
  • uom-lib-common-2.0.jar

JScience offers a lot more functionality than just quantities and units, but that’s the part we have been using and what we are interested in.

The unit-api JAR only defines interfaces, which is the scope of JSR-385. So you need an implementation to do anything useful with it. The reference implementation is called Indriya, provided by the second JAR. The third JAR, uom-lib-common, is a utility library used by Indriya for common functionality shared with other projects under the Units of Measurement umbrella.

Using quantities

Here’s a simple use of a physical quantity with JScience, in this example Length:

import org.jscience.physics.amount.Amount;

import javax.measure.quantity.Length;

import static javax.measure.unit.SI.*;

// ...

final Amount<Length> d = Amount.valueOf(214, CENTI(METRE));
final double d_metre = d.doubleValue(METRE);

And here’s the equivalent code using Units API 2.0 and Indriya:

import tech.units.indriya.quantity.Quantities;

import javax.measure.Quantity;
import javax.measure.quantity.Length;

import static javax.measure.MetricPrefix.CENTI;
import static tech.units.indriya.unit.Units.METRE;

// ...

final Quantity<Length> d = Quantities.getQuantity(214, CENTI(METRE));
final double d_metre = d.to(METRE).getValue().doubleValue();

Consistency

While JScience also defines aliases with alternative spellings like METER and constants for many prefixed units like CENTIMETER or MILLIMETER, Indriya encourages consistency and only allows METRE, CENTI(METRE), MILLI(METRE).

Quantity names

Most quantities have the same names in both projects, but there are some differences:

  • Amount<Duration> becomes Quantity<Time>
  • Amount<Velocity> becomes Quantity<Speed>

In these cases Unit API uses the correct SI names, i.e. time and speed. Wikipedia explains the difference between speed and velocity.

Arithmetic operations

The method names for the elementary arithmetic operations have changed:

  • plus() becomes add()
  • minus() becomes subtract()
  • times() becomes multiply()

Only the method name for division is the same:

  • divide() is still divide()

However, the runtime exceptions thrown on division by zero are different:

  • JScience: java.lang.ArithmeticException: / by zero
  • Indriya: java.lang.IllegalArgumentException: cannot initalize a rational number with divisor equal to ZERO

Type hints

If you divide or multiply two quantities the Java type system needs a type hint, because it doesn’t know the resulting quantity. Here’s how this looks in JScience versus Unit API:

With JScience:

Amount<Area> a = Amount.valueOf(100, SQUARE_METRE);
Amount<Length> b = Amount.valueOf(10, METRE);
Amount<Length> c = a.divide(b)
                    .to(METRE);

With Unit API:

Quantity<Area> a = Quantities.getQuantity(100, SQUARE_METRE);
Quantity<Length> b = Quantities.getQuantity(10, METRE);
Quantity<Length> c = a.divide(b)
                      .asType(Length.class);

Comparing quantities

If you want to compare quantities via compareTo(), isLessThan(), etc. you need quantities of type ComparableQuantity. The Quantities.getQuantity() factory method returns a ComparableQuantity, which is a sub-interface of Quantity.

Defining custom units

Defining custom units is very similar to JScience. Here’s an example for degree (angle), which is not an SI unit:

public static final Unit<Angle> DEGREE_ANGLE =
    new TransformedUnit<>("°", RADIAN,
        MultiplyConverter.ofPiExponent(1).concatenate(MultiplyConverter.ofRational(1, 180)));