Database – Page 6 – Schneide Blog

Migrating from Oracle to PostgreSQL

We are maintaining several applications with a SQL-Database as our data storage. If we can decide freely, we usually opt for PostgreSQL as the database management system (DBMS). But sometimes our clients have specific requirements because they are running the services on-premises so we use our customers’ choice. SQL is SQL anyway, is it not?

No it isn’t. And this year one of our customers asked us to migrate our application from Oracle to PostgreSQL. The migration was challenging even though we are using an object-relational mapper (ORM) and the necessary changes to our application code were very limited.

In this post I want to explain the general, application-agnostic challenges of such a migration. A follow-up will cover the application- and framework-specific issues.

Why is it not easy?

Luckily, PostgreSQL supports most common SQL features of Oracle, especially sequences, PL/SQL like scripts, triggers, foreign keys etc. and all the important datatypes. So you are mostly migrating from an inferior to a more powerful solution, at least feature and capability-wise from a client perspective. Please note that I am not judging the performance, replication, clustering and other administrative features here!

Unfortunately there is no simple and powerful enough tool to simply dump the oracle database into some standard SQL text format that you could pipe into psql or use with pg_restore. In addition there is also a challenge to convert the different number-types of Oracle to sematically equivalent PostgreSQL types etc.

Another challenge is coping with the referential integrity. Especially data in complex schemas with a lot of foreign keys are harder to migrate without proper tool support as you have to figure out the correct order of tables to restore.

Nevertheless, such a migration is doable, especially if you do not have too much scripting logic in your database. And there is a free tool to help you with all this stuff called Ora2Pg.

What can Ora2Pg do for you?

It can export the full database schema including constraints, convert datatypes based on configuration provided by you and offers a basic automatic conversion of PL/SQL code to PLPGSQL. When running the migration you can interactively choose what to migrate and what to skip. That allows you to only migrate the data into a readily prepared schema, for example.

How to run Ora2Pg?

Ora2Pg is a collection of perl scripts and configuration files so you need a system capable of running these. If you do not want to mess with your whole system and install all of the dependencies I prepared a Dockerfile able to run Ora2Pg:

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

RUN mkdir -p /migration
RUN ora2pg --project_base /migration --init_project my_project
WORKDIR /migration/my_project

# uncomment this if you have a customized ora2pg.conf
#COPY ora2pg.conf /migration/my_project/config/

CMD ora2pg -t SHOW_VERSION -c config/ora2pg.conf && ora2pg -t SHOW_TABLE -c config/ora2pg.conf\
 && ora2pg -t SHOW_REPORT --estimate_cost -c config/ora2pg.conf\
 && ./export_schema.sh && ora2pg -t INSERT -o data.sql -b ./data -c ./config/ora2pg.conf

Here are the commands and the workflow to export the oracle database using the above docker image:

docker build -t o2pg .
# this will fail initially but create the project structure and generate a default configuration file
docker run --name oracle-export o2pg
# copy the project structure to the host system
docker cp oracle-export:/migration/my_project ./my_project_migration/

Now you can edit the configuration in my_project_migration/config and copy it to the directory you have built and run the docker commands. Most importantly you have to change the connection parameters at the top of the ora2pg.conf file. When you are ready to make the first go you need to enable configuration copying in the Dockerfile and rebuild the image. Now you should get your first somehow usable export.

The most import config options we changed for our projects are:

Connection parameters
Excluded tables that you do not want to migrate
Deletion of the contents of the target tables
Conversion of some datatypes like NUMBER(*,0) to bigint and NUMBER:1 to boolean for some columns

Most of the defaults are sensible to begin with but you can tailor the export specifically to your needs. If you feel ready to try the import you can run the import using a second docker image based on the following Dockerfile-import:

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus
RUN yum install -y postgresql-server

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

# you need to mount the project volume to /my_project
WORKDIR /my_project

ENV pg_port=5432

CMD ./import_all.sh -d $pg_db -h $pg_host -p $pg_port -U $pg_user -o $pg_user

To run the import with your exported project run build and run the import container as follows:

docker build -t postgres-import -f Dockerfile-import .
docker run -it --rm -e pg_host=target-db.intranet -e pg_db=my_project_db -e pg_user=my_db_user -v ./my_project_migration:/my_project postgres-import

Then you can interactively provide the database password and decide which migration steps to perform.

Caveat

Depending on your schema, data and privileges in the target database it may be necessary to disable all triggers before importing and reenable them after a successful import. This can done by replacing all occurences of TRIGGER USER by TRIGGER ALL in the file data/data.sql. You may need appropriate privileges for this to work.

Final words

Such a migration is not an easy task but may be worth it in total cost of ownership and maybe developer satisfaction as Oracle has some oddities and limitations for backend developers.

I will tackle some application specific issues with such a migration in a follow-up article that we encountered when migrating our system from Oracle to PostgreSQL using the above approach and tools.

Tables as types in PostgreSQL

In SQL each column of a database table has a data type. These are types like NUMBER, VARCHAR(size) / TEXT, TIMESTAMP. What you perhaps don’t know yet is that in PostgreSQL you can use tables as types. What does this mean? Let’s say we have a table:

CREATE TABLE person (
  firstname TEXT,
  lastname  TEXT,
  email     TEXT
);

Now you can use this table as a type for columns in other tables:

CREATE TABLE article (
  id       SERIAL,
  content  TEXT,
  author   person,
  reviewer person
);

Instead of repeating the three columns of a person twice, e.g. author_firstname, author_lastname, author_email, and reviewer_firstname, reviewer_lastname, reviewer_email, the person table defined before acts as a type. Of course, the usual way in SQL is to give each person an ID and reference persons via these IDs from other tables. But sometimes you do not want this reference semantics. In this example you might want to fix the values of author and reviewer for articles in time and not retroactively update them automatically if a person changes their last name or email address later.

How to access the columns of these types? For INSERT the syntax is as follows:

INSERT INTO article (content, author, reviewer)
  VALUES ('...',
    ('Jane', 'Doe', 'jane.doe@example.com'),
    ('John', 'Roe', 'jroe@example.com')
  );

Or with explicit names of the sub-columns:

INSERT INTO article (content,
    author.firstname,
    author.lastname,
    author.email,
    reviewer.firstname,
    reviewer.lastname,
    reviewer.email)
  VALUES ('...',
    'Jane', 'Doe', 'jane.doe@example.com',
    'John', 'Roe', 'jroe@example.com'
  );

In a SELECT query individual values can be accessed with the following syntax:

SELECT
  content,
  (author).lastname,
  (reviewer).lastname
FROM article;

Of course, tables that uses other tables as data types for their columns can be used as data types again.

One last thing worth mentioning is that these nested definitions can be mapped nicely to JSON:

SELECT jsonb_pretty(to_jsonb(article)) FROM article;

{
  "id": 1,
  "content": "...",
  "author": {
    "email": "jane.doe@example.com",
    "firstname": "Jane",
    "lastname": "Doe"
  },
  "reviewer": {
    "email": "jroe@example.com",
    "firstname": "John",
    "lastname": "Roe"
  }
}

Geometric shapes, functions and operators in PostgreSQL

On this blog I frequently write about features of relational database systems and their SQL dialects. One feature many developers do not know about is support for geometric shapes, although a lot of RDBMs support them in one form or the other, each with its own syntax, of course. In this article I’m going to demonstrate this feature with PostgreSQL.

PostgreSQL has data types for geometric shapes like point, box, line segment, line, path, polygon, and circle. These data types are only defined for two dimensions with Euclidean (x, y) coordinates. Here are some literals for these types:

point '(3.2,4)'
box '((1,2),(6,4))'
lseg '((-4,0),(3,2))'
path '((0,0),(2,1),(5,3))'
polygon '((0,0),(1,1),(2,0),(3,1))'
circle '((5,2),1.5)'

You can create tables with columns of these types and insert shapes:

CREATE TABLE shapes (p point, c circle);

INSERT INTO shapes (p, c) VALUES
  (point '(1,0)', circle '(0,0),3'),
  (point '(10,20)', circle '(2,3),4'),
  (point '(0.5,1.5)', circle '(1,2),1');

Now you can query shapes and filter them with special operators:

SELECT * FROM shapes WHERE c @> p;

This query uses the contains operator @> in the WHERE clause. It selects all rows where the circle c contains the point p.

Here’s another operator: <-> determines the Euclidean distance between two points.

SELECT point '(0,0)' <-> point '(1,1)';
=> 2.23606797749979

The ?|| operator tests if two lines are parallel:

SELECT line '((1,2),(1,3))' ?|| line '((2,3),(2,4))';
=> true

You can translate a shape with the + operator:

SELECT box '((0,0),(1,1))' + point '(1,2)';
=> box '(2,3),(1,2)'

Or you can test with && if two shapes overlap:

SELECT box '((1,2),(4,3))' && box '(2,3),(1,2)';
=> true

This is only a small selection of geometric operators. See the full list in the official documentation. There you can also find a list of geometric functions like area , center, isclosed, npoints, etc.

SELECT area(box '((4,6),(10,12))');
=> 36

As mentioned in the beginning, other database systems support similar functionality. Check out MySQL’s spatial data types, Oracle Spatial, and MS SQL’s spatial data types.

Pagination in SQL

Pagination is the task of dividing a data set into subsequent parts of the whole data set. For example, a search engine initially only shows the first 15 results for a search query. The user can then step through the rest of the results the by clicking a “Next” button.

Ideally this feature is also supported by the underlying database system. Otherwise, the application would have to load all matching data records from the database, just to filter out the major part of of them, because the user only wanted to see page 3 of 50. A pagination request has two components: a limit and an offset. If a page contains a maximum of 15 items and page 3 is requested, then the limit would be 15 and the offset would be 30 = (page-1) × limit.

PostgreSQL, MySQL, MariaDB

The database systems PostgreSQL, MySQL and MariaDB have a straight forward syntax for pagination: LIMIT {number} OFFSET {number} . So a simple SQL query with pagination might look like this:

SELECT * FROM users ORDER BY name LIMIT 15 OFFSET 30;

Oracle DB

Oracle DB didn’t have a dedicated syntax for pagination before Oracle 12c, but it was still possible to achieve the same result with other means. With Oracle 12c a new syntax for pagination was introduced under the name “Row limiting clause”. First I’ll show the old method, then the new syntax.

The old method is based on ROWNUM . If you wanted to specify both an offset and a limit, you had to nest multiple queries:

SELECT *
FROM (SELECT *, rownum AS rnum
      FROM (SELECT *
            FROM users
            ORDER BY name)
      WHERE rownum < 45)
WHERE rnum >= 30;

The newer row limiting clause syntax is shorter and looks as follows:

SELECT * FROM users ORDER BY name
  OFFSET 30 ROWS FETCH NEXT 15 ROWS ONLY;

This syntax also allows the option to specify a percentage of rows instead of a fixed number of rows:

SELECT * FROM users ORDER BY name
  FETCH FIRST 20 PERCENT ROWS ONLY;

MS SQL Server

Microsoft’s SQL Server also supports the Oracle-like syntax with OFFSET and FETCH clauses and recommends the usage of this syntax for pagination.

The World of SQL Dialects

For software projects I work with various relational database management systems (RDBMs), mainly PostgreSQL, MySQL/MariaDB, Oracle Database and Microsoft SQL Server. All of these use SQL as a query language, but the dialects of this language vary wildly, especially when it comes to non-standardized features. One such feature I often use is the aggregation of a list to a string. It does the following.

LEGS    ANIMAL
-----------------
2       Ostrich
2       Human
4       Cat
4       Dog
4       Capybara
6       Ant
8       Spider

Given a table like the one above it groups the elements of a column that have the same value in another column together in a string, concatenated by a separator like a comma:

LEGS    ANIMALS
----------------------------
2       Human, Ostrich
4       Capybara, Cat, Dog
6       Ant
8       Spider

This simple operation has four different syntaxes in the four mentioned database systems, which I want to demonstrate.

PostgreSQL

In PostgreSQL the function is called STRING_AGG:

SELECT legs,
  STRING_AGG(animal, ', ' ORDER BY animal) AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;

MySQL / MariaDB

In MySQL and its fork MariaDB the function is called GROUP_CONCAT, and it has a special syntax to specify the separator:

SELECT legs,
  GROUP_CONCAT(animal ORDER BY animal SEPARATOR ', ') AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;

Oracle

Oracle calls it LISTAGG and specifies the grouping via WITHIN GROUP.

SELECT legs,
  LISTAGG(animal, ', ') WITHIN GROUP (ORDER BY animal) AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;

Microsoft SQL Server

SQL Server calls it STRING_AGG like PostgreSQL, but specifies the grouping via WITHIN GROUP like Oracle:

SELECT legs,
  STRING_AGG(animal, ', ') WITHIN GROUP (ORDER BY animal) AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;

Unfortunately, as developers we have to live with all these dialects of SQL. Even though there is an ISO standards committee for SQL, database creators love to build non-standard extensions into their products. The situation is worse than the browser-specific extensions and differences of JavaScript, HTML and CSS in modern web browsers. One thing that can paper over these differences are OR-Mappers like Hibernate or query languages like Hibernate’s HQL that abstract over SQL, but they come with their own set of problems.

Migrating from Oracle to PostgreSQL

One promise of SQL for application developers is that changing the database management system (DBMS) is not that of a big deal. Due to the many specialties and not complete standards conformance of the database vendors it can be a big task to migrate from one DBMS vendor to another.

Nevertheless, there are plenty of good reasons to do so:

Cost of buying, running and maintaining the DBMS
Limitations of the current DBMS like performance, tool support, character sets, naming, data types and sizes etc.
Missing features like geospatial support, clustering, replication, sharding, timeseries support and so on
Support or requirements on the customers or operators side

Some of our long running projects that started several years ago had the requirement to work with an Oracle DBMS, version 8i at that time. Now, more than 10 years later our customer provides and prefers to host a PostgreSQL 13 cluster. Of course she would like us to migrate our applications over to the new DBMS and eventually get rid of the Oracle installation.

Challenges for the migration

Even though PostgreSQL is supports most of SQL:2016 core and most important features of Oracle there are enough differences and subtleties that make migration non-trivial. The most obvious items to look out for are

different column type names
SQL features and syntactical differences (sequences!)
PL/SQL functions syntax and features

Depending on your usage of database specific features you have to assess how much work and risk is expected.

Tools and migration process

Fortunately, there is a quite mature tool that can aid you along the process called ora2pg. It has tons of options to help you customizing the migration and a quite helpful assessment of the task ahead. The migration report looks like this:

-------------------------------------------------------------------------------
Ora2Pg v21.1 - Database Migration Report
-------------------------------------------------------------------------------
Version Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
Schema NAOMI-TEST
Size 91.44 MB

-------------------------------------------------------------------------------
Object Number Invalid Estimated cost Comments Details
-------------------------------------------------------------------------------
DATABASE LINK 0 0 0.00 Database links will be exported as SQL/MED PostgreSQL's Foreign Data Wrapper (FDW) extensions using oracle_fdw.
FUNCTION 1 0 1.00 Total size of function code: 0 bytes.
GLOBAL TEMPORARY TABLE 60 0 168.00 Global temporary table are not supported by PostgreSQL and will not be exported. You will have to rewrite some application code to match the PostgreSQL temporary table behavior. ht_my_table <--- SNIP --->.
INDEX 69 0 6.90 0 index(es) are concerned by the export, others are automatically generated and will do so on PostgreSQL. Bitmap will be exported as btree_gin index(es). Domain index are exported as b-tree but commented to be edited to mainly use FTS. Cluster, bitmap join and IOT indexes will not be exported at all. Reverse indexes are not exported too, you may use a trigram-based index (see pg_trgm) or a reverse() function based index and search. Use 'varchar_pattern_ops', 'text_pattern_ops' or 'bpchar_pattern_ops' operators in your indexes to improve search with the LIKE operator respectively into varchar, text or char columns.
JOB 0 0 0.00 Job are not exported. You may set external cron job with them.
SEQUENCE 4 0 1.00 Sequences are fully supported, but all call to sequence_name.NEXTVAL or sequence_name.CURRVAL will be transformed into NEXTVAL('sequence_name') or CURRVAL('sequence_name').
SYNONYM 0 0 0.00 SYNONYMs will be exported as views. SYNONYMs do not exists with PostgreSQL but a common workaround is to use views or set the PostgreSQL search_path in your session to access object outside the current schema.
TABLE 225 0 72.00 495 check constraint(s). Total number of rows: 264690. Top 10 of tables sorted by number of rows:. topt has 52736 rows. po has 50830 rows. notification has 18911 rows. timeline_entry has 16556 rows. char_sample_types has 11400 rows. char_safety_aspects has 9488 rows. char_sample_props has 5358 rows. tech_spec has 4876 rows. mail_log_entry has 4778 rows. prop_data has 4358 rows. Top 10 of largest tables:.
-------------------------------------------------------------------------------
Total 359 0 248.90 248.90 cost migration units means approximatively 3 man-day(s). The migration unit was set to 5 minute(s)

-------------------------------------------------------------------------------
Migration level : A-3
-------------------------------------------------------------------------------

Migration levels:
A - Migration that might be run automatically
B - Migration with code rewrite and a human-days cost up to 5 days
C - Migration with code rewrite and a human-days cost above 5 days
Technical levels:
1 = trivial: no stored functions and no triggers
2 = easy: no stored functions but with triggers, no manual rewriting
3 = simple: stored functions and/or triggers, no manual rewriting
4 = manual: no stored functions but with triggers or views with code rewriting
5 = difficult: stored functions and/or triggers with code rewriting
-------------------------------------------------------------------------------

The tool is written in Perl, so I decided to put and run it inside Docker containers because I did not want to mess with my working machine or some VMs. To have quick turnaround times with my containers I split up the process into 3 steps:

Export of the schema and data using a docker container
On success copy the ora2pg project to the host
Import the schema and data using another docker container

The ora2pg migration project is copied to the host machine allowing you to inspect the export and make adjustments if need be. Then you can copy it to the import container or simply bind mount the directory containing the ora2pg project.

The Dockerfile for the export image looks like this

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

RUN mkdir -p /naomi/migration
RUN ora2pg --project_base /ora2pg --init_project my-migration
WORKDIR /ora2pg

COPY ora2pg.conf /ora2pg/my-migration/config/

CMD ora2pg -t SHOW_VERSION -c config/ora2pg.conf && ora2pg -t SHOW_TABLE -c config/ora2pg.conf\
 && ora2pg -t SHOW_REPORT --estimate_cost -c config/ora2pg.conf\
 && ./export_schema.sh && ora2pg -t INSERT -o data.sql -b ./data -c ./config/ora2pg.conf

Once the export looks good you can work on importing everything. The Dockerfile for the import image looks like this:

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus
RUN yum install -y postgresql-server

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

# you need to mount the project volume to /ora2pg
WORKDIR /ora2pg

CMD ./import_all.sh -d my_target_db -h $pg_host -U myuser -o myowner

Our target database runs on another host, so you need credentials to authenticate and perform all the required actions. Therefore we are the import container interactively. The PowerShell command for the import looks like this

docker run -it --rm -e pg_host=192.168.56.1 -v $PWD/ora2pg/my-migration:/ora2pg pgimport

The import script allows you to create the schema, sequences, indexes, constraints and load the data. I suggest adding the contraints after importing the data – a workflow supported by the import_all.sh script.

That way we got our Oracle database migrated into a PostgreSQL database. Unfortunately, this is only one part of the whole migration. The other part is making changes to the application code to correctly use the new database.

JDBC’s wasNull method pitfall

Java’s java.sql package provides a general API for accessing data stored in relational databases. It is part of JDBC (Java Database Connectivity). The API is relatively low-level, and is often used via higher-level abstractions based on JDBC, such as query builders like jOOQ, or object–relational mappers (ORMs) like Hibernate.

If you choose to use JDBC directly you have to be aware that the API relatively old. It was added as part of JDK 1.1 and predates later additions to the language such as generics and optionals. There are also some pitfalls to be avoided. One of these pitfalls is ResultSet’s wasNull method.

The wasNull method

The wasNull method reports whether the database value of the last ‘get’ call for a nullable table column was NULL or not:

int height = resultSet.getInt("height");
if (resultSet.wasNull()) {
    height = defaultHeight;
}

The wasNull check is necessary, because the return type of getInt is the primitive data type int, not the nullable Integer. This way you can find out whether the actual database value is 0 or NULL.

The problem with this API design is that the ResultSet type is very stateful. Its state does not only change with each row (by calling next method), but also with each ‘get’ method call.

If any other ‘get’ method call is inserted between the original ‘get’ method call and its wasNull check the code will be wrong. Here’s an example. The original code is:

var width = rs.getInt("width");
var height = rs.getInt("height");
var size = new Size(width, rs.wasNull() ? defaultHeight : height);

A developer now wants to add a third dimension to the size:

var width = rs.getInt("width");
var height = rs.getInt("height");
var depth = rs.getInt("depth");
var size = new Size(width, rs.wasNull() ? defaultHeight : height, depth);

It’s easy to overlook the wasNull call, or to wrongly assume that adding another ‘get’ method call is a safe code change. But the wasNull check now refers to “depth” instead of “height”, which breaks the original intention.

Advice

So my advice is to wrap the ‘get’ calls for nullable database values in their own methods that return an Optional:

Optional<Integer> getOptionalInt(ResultSet rs, String columnName) {
    final int value = rs.getInt(columnName);
    if (rs.wasNull()) {
        return Optional.empty();
    }
    return Optional.of(value);
}

Now the default value fallback can be safely applied with the orElse method:

var width = rs.getInt("width");
var height = getOptionalInt(rs, "height").orElse(defaultHeight);
var depth = rs.getInt("depth");
var size = new Size(width, height, depth);

Contiguous date ranges in Oracle SQL

In one of my last posts from a couple of weeks ago I wrote about querying gaps between non-contiguous date ranges in Oracle SQL. This week’s post is about contiguous date ranges.

While non-contiguous date ranges are best represented in a database table with a start_date and an end_date column, it is better to represent contiguous date ranges only by one date column, so that we avoid redundancy and do not have to keep the start date of a date range in sync with the end date of the previous date range. In this post I will use the start date:

CREATE TABLE date_ranges (
name VARCHAR2(100),
start_date DATE
);

The example content of the table is:

NAME	START_DATE
----	----------
A	05/02/2020
B	02/04/2020
C	16/04/2020
D	01/06/2020
E	21/06/2020
F	02/07/2020
G	05/08/2020

This representation means that the date range with the most recent start date does not have an end. The application using this data model can choose whether to interpret this as a date range with an open end or just as the end point for the previous range and not as a date range by itself.

While this is a nice non-redundant representation, it is less convenient for queries where we want to have both a start and an end date per row, for example in order to check wether a given date lies within a date range or not. Luckily, we can transform the ranges with a query:

SELECT
date_ranges.*,
LEAD(date_ranges.start_date)
OVER (ORDER BY start_date)
AS end_date
FROM date_ranges;

As in the previous post on non-contiguous date ranges the LEAD analytic function allows you to access the following row from the current row without using a self-join. Here’s the result:

NAME	START_DATE	END_DATE
----	----------	--------
A	05/02/2020	02/04/2020
B	02/04/2020	16/04/2020
C	16/04/2020	01/06/2020
D	01/06/2020	21/06/2020
E	21/06/2020	02/07/2020
F	02/07/2020	05/08/2020
G	05/08/2020	(null)

By using a WITH clause you can use this query like a view and join it with the another table, for example with the join condition that a date lies within a date range:

WITH ranges AS
(SELECT date_ranges.*, LEAD(date_ranges.start_date) OVER (ORDER BY start_date) AS end_date FROM date_ranges)
SELECT timeseries.*, ranges.name
FROM timeseries LEFT OUTER JOIN ranges ON
timeseries.measurement_date
BETWEEN ranges.start_date AND ranges.end_date;

Querying gaps between date ranges in Oracle SQL

Let’s say we have a database table with date ranges, each range designated by a RANGE_START and a RANGE_END column:

CREATE TABLE date_ranges (
  range_start DATE,
  range_end   DATE
);

RANGE_START	RANGE_END
-----------	---------
05/02/2020	01/04/2020
02/04/2020	15/04/2020
16/04/2020	01/05/2020
01/06/2020	20/06/2020
21/06/2020	01/07/2020
02/07/2020	31/07/2020
05/08/2020	30/08/2020

We are now interested in finding the gaps between these date ranges. If we look at this example data set we can see that there are two gaps:

RANGE_START	RANGE_END
05/02/2020	01/04/2020
02/04/2020	15/04/2020
16/04/2020	01/05/2020
-- gap --
01/06/2020	20/06/2020
21/06/2020	01/07/2020
02/07/2020	31/07/2020
-- gap --
05/08/2020	30/08/2020

What would be the SQL query to find these automatically? With standard SQL this would be a difficult task. However, there are some special functions in Oracle SQL called analytic functions that greatly help with this task. Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. In this case we will use the analytic functions MAX and LEAD:

SELECT * FROM (
  SELECT
    MAX(range_end)
      OVER(ORDER BY range_start) + 1 gap_start,
    LEAD(range_start)
      OVER(ORDER BY range_start) - 1 gap_end
  FROM date_ranges
) WHERE gap_start <= gap_end;

The result of this query are the date range gaps we are interested in:

GAP_START	GAP_END
---------	-------
02/05/2020	31/05/2020
01/08/2020	04/08/2020

Note that the MAX function in the query is the analytic MAX function, not the aggregate MAX function, indicated by the OVER keyword with an analytic clause. It operates on a sliding window. The LEAD analytic function allows you to access the following row from the current row without using a self-join.

Using CSV data as external table in Oracle DB

If you want to import CSV data into an Oracle database you can use the SQL*Loader command line tool. You simple create a control file that describes how to load the data and then call the sqlldr command with the control file name as an argument:

example.ctl

LOAD DATA
INFILE example.csv
INTO TABLE example_table
FIELDS TERMINATED BY ';'
(ID, NAME, AMOUNT, DESCRIPTION)

> sqlldr username/password example.ctl

But there’s another way to load CSV data into an Oracle database: External tables.

External tables

Oracle’s external tables feature allows you to query data from a file on the filesystem like a regular database table.

First you have to create a directory in the file system and put your CSV file inside:

mkdir -p /path/to/directory

example.csv

1;Water;250
2;Beer;500
3;Wine;150

Now connect to the database as “SYS as SYSDBA”, define the directory as a database object and grant read/write access to your user:

CREATE OR REPLACE DIRECTORY
  external_tables_dir AS '/path/to/directory';
GRANT READ,WRITE ON DIRECTORY
  external_tables_dir TO example_user;

Now you can connect as example_user and create an external table for the CSV file:

CREATE TABLE example_table (
  id NUMBER(4,0),
  name VARCHAR2(50),
  amount NUMBER(8,0)
)
ORGANIZATION EXTERNAL (
  DEFAULT DIRECTORY external_tables_dir
  ACCESS PARAMETERS (
    RECORDS DELIMITED BY NEWLINE
    FIELDS TERMINATED BY ';'
  )
  LOCATION ('example.csv')  
);

The relevant part here is the ORGANIZATION EXTERNAL block. It references the directory and the CSV file inside the directory and allows you to specify format parameters of the CSV file such as record and field delimiters.

Now you can query the table like a regular table:

SELECT * FROM example_table

ID NAME  AMOUNT
-- ----- ------
1  Water 250
2  Beer  500
3  Wine  150

Access information and errors such as bad or discarded records are stored in log files in the specified directory. The default names of these log files consist of the table name and an ID, e.g. example_table_12345.log, example_table_12345.bad and example_table_12345.dsc.

	Anonymous on Avoiding Code Style Discu…
	Anonymous on What Happens When We Don’t Lis…
	Writing Integration… on Every Unit Test Is a Stage Pla…
	mariuselvert on C# is very strict about modify…
	Anonymous on C# is very strict about modify…