Migrating a Grails application from Oracle to PostgreSQL

In my previous post I explained how to migrate an Oracle schema with data to a PostgreSQL database management system (DBMS). Besides the general tasks and issues there are additional topics to migrate a complete application using the database to the other DBMS.

In our specific case we have a grails application which we maintain since Grails 1.0 times for more than 12 years. During that time we did a ton of feature development with lots of refactoring and many database migrations. So the source database will most like not be perfectly consistent and clean.

General approach

Since Grails/GORM and the DatabaseMigration-Plugin (DBM-Plugin) do a great job at preparing an empty database with a matching schema for the application to run we let the framework tools generate the schema and only migrate the data using Ora2Pg.

Sounds simple, but how it is done in detail and what else to look for?

Generating the initial dabase schema

The DBM-Plugin provides a script to create a database changelog with a schema matching the domain model of your grails application. It is integrated in gradle, so you can grails dbm-generate-gorm-changelog initialdb.groovy to create the migration scripts providing a fitting schema. You then include this script in grails-app/migrations/changelog.groovy or replace all the migrations you had before included there with this initial database changelog.

To prepare an empty database to run with your application you call the gradle task dbmUpdate.

Checking all plain SQL code

If you are only using GORM’s dynamic finders, save()/update()/delete()-methods, HQL and the criteria API you are probably fine to run your application or perform the data mirgration step.

Our application has some specific parts where we use plain SQL. Because of syntactical differences you will want to check all the plain SQL if it works with PostgreSQL. The most obvious stuff is dealing with sequences or other queries where you need the dual table in Oracle.

Migrating the data

This is probably the part where the most things can go wrong. We had quite some work with data-inconsistencies and left-overs from manual corrections that happened over the course of running and upgrading the application for so many year. For younger and simpler applications this may not present any challenges but for us it was quite time-consuming.

Now you can use Ora2Pg to import the data. After the whole data import using Ora2Pg worked as intended you should check the value of the hibernate_sequence . This sequence is used to generate the ids of all grails domain objects.

Do not let the sequences from the autoincrement columns of the tables of your domain objects confuse you! They are not used by Grails/GORM. To avoid this confusion you can remove the default value of the id columns and the accompanying sequences.

Checking the result

You should always run acceptance or manual tests to make sufficiently sure that the migration worked as intended. There is always the possibility of a configuration or software error or some oversights in checking the application code.

If possible tests the result on a dedicated system with some snapshot of the real world data before making the switch on the production system. Good luck!

Linking separate PostgreSQL servers with a Foreign Data Wrapper

If you want to query a table in one database server from another you need a way to connect these two servers with each other. For PostgreSQL databases the feature that makes this possible is called Foreign Data Wrapper.

To use this feature you have to load its extension into the current database:

CREATE EXTENSION postgres_fdw;

The postgres_fdw extension ships with a default PostgreSQL installation. It allows you to connect a PosgreSQL database with another PostgreSQL database. You can connect a PostgreSQL database with databases by other vendors, too, but then you need external extensions like oracle_fdw for Oracle databases or mysql_fdw for MySQL databases. In this article we will only use postgres_fdw.

You can check if the extension was loaded successfully. The following query should return a row for the extension:

SELECT * FROM pg_extension WHERE extname='postgres_fdw';

The next step is to set up the remote server instance:

CREATE SERVER remotesrv
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (
    host '127.0.0.1',
    port '5432',
    dbname 'remotedb'
  );

This statement registers a remote server under the name remotesrv, a name you can choose freely. You have to specify which Foreign Data Wrapper to use (postgres_fdw in this case) as well the target host, port and database name.

The CREATE SERVER statement didn’t contain any user login information, you have to provide it via a user mapping:

CREATE USER MAPPING
  FOR CURRENT_USER
  SERVER remotesrv
  OPTIONS (
    user 'joe',
    password 'secret'
  );

In this case we map the remote user joe to the current user (CURRENT_USER) of the local server. It doesn’t have to be the current user, you could specify any user name.

Now you have to import tables from the remote database. You can either explicitly import individual tables or a whole schema. Here’s how to import the public schema of the remote (“foreign”) database into the public schema of the local database:

IMPORT FOREIGN SCHEMA public
  FROM SERVER remotesrv
  INTO public;

You can restrict which tables to import with the LIMIT TO or EXCEPT clauses. The following statement will only import the tables books and students:

IMPORT FOREIGN SCHEMA public
  LIMIT TO (students, books)
  FROM SERVER remotesrv
  INTO public;

Now you can access these tables as if they were in the local database.

Migrating from Oracle to PostgreSQL

We are maintaining several applications with a SQL-Database as our data storage. If we can decide freely, we usually opt for PostgreSQL as the database management system (DBMS). But sometimes our clients have specific requirements because they are running the services on-premises so we use our customers’ choice. SQL is SQL anyway, is it not?

No it isn’t. And this year one of our customers asked us to migrate our application from Oracle to PostgreSQL. The migration was challenging even though we are using an object-relational mapper (ORM) and the necessary changes to our application code were very limited.

In this post I want to explain the general, application-agnostic challenges of such a migration. A follow-up will cover the application- and framework-specific issues.

Why is it not easy?

Luckily, PostgreSQL supports most common SQL features of Oracle, especially sequences, PL/SQL like scripts, triggers, foreign keys etc. and all the important datatypes. So you are mostly migrating from an inferior to a more powerful solution, at least feature and capability-wise from a client perspective. Please note that I am not judging the performance, replication, clustering and other administrative features here!

Unfortunately there is no simple and powerful enough tool to simply dump the oracle database into some standard SQL text format that you could pipe into psql or use with pg_restore. In addition there is also a challenge to convert the different number-types of Oracle to sematically equivalent PostgreSQL types etc.

Another challenge is coping with the referential integrity. Especially data in complex schemas with a lot of foreign keys are harder to migrate without proper tool support as you have to figure out the correct order of tables to restore.

Nevertheless, such a migration is doable, especially if you do not have too much scripting logic in your database. And there is a free tool to help you with all this stuff called Ora2Pg.

What can Ora2Pg do for you?

It can export the full database schema including constraints, convert datatypes based on configuration provided by you and offers a basic automatic conversion of PL/SQL code to PLPGSQL. When running the migration you can interactively choose what to migrate and what to skip. That allows you to only migrate the data into a readily prepared schema, for example.

How to run Ora2Pg?

Ora2Pg is a collection of perl scripts and configuration files so you need a system capable of running these. If you do not want to mess with your whole system and install all of the dependencies I prepared a Dockerfile able to run Ora2Pg:

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

RUN mkdir -p /migration
RUN ora2pg --project_base /migration --init_project my_project
WORKDIR /migration/my_project

# uncomment this if you have a customized ora2pg.conf
#COPY ora2pg.conf /migration/my_project/config/

CMD ora2pg -t SHOW_VERSION -c config/ora2pg.conf && ora2pg -t SHOW_TABLE -c config/ora2pg.conf\
 && ora2pg -t SHOW_REPORT --estimate_cost -c config/ora2pg.conf\
 && ./export_schema.sh && ora2pg -t INSERT -o data.sql -b ./data -c ./config/ora2pg.conf

Here are the commands and the workflow to export the oracle database using the above docker image:

docker build -t o2pg .
# this will fail initially but create the project structure and generate a default configuration file
docker run --name oracle-export o2pg
# copy the project structure to the host system
docker cp oracle-export:/migration/my_project ./my_project_migration/

Now you can edit the configuration in my_project_migration/config and copy it to the directory you have built and run the docker commands. Most importantly you have to change the connection parameters at the top of the ora2pg.conf file. When you are ready to make the first go you need to enable configuration copying in the Dockerfile and rebuild the image. Now you should get your first somehow usable export.

The most import config options we changed for our projects are:

  • Connection parameters
  • Excluded tables that you do not want to migrate
  • Deletion of the contents of the target tables
  • Conversion of some datatypes like NUMBER(*,0) to bigint and NUMBER:1 to boolean for some columns

Most of the defaults are sensible to begin with but you can tailor the export specifically to your needs. If you feel ready to try the import you can run the import using a second docker image based on the following Dockerfile-import:

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus
RUN yum install -y postgresql-server

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

# you need to mount the project volume to /my_project
WORKDIR /my_project

ENV pg_port=5432

CMD ./import_all.sh -d $pg_db -h $pg_host -p $pg_port -U $pg_user -o $pg_user

To run the import with your exported project run build and run the import container as follows:

docker build -t postgres-import -f Dockerfile-import .
docker run -it --rm -e pg_host=target-db.intranet -e pg_db=my_project_db -e pg_user=my_db_user -v ./my_project_migration:/my_project postgres-import

Then you can interactively provide the database password and decide which migration steps to perform.

Caveat

Depending on your schema, data and privileges in the target database it may be necessary to disable all triggers before importing and reenable them after a successful import. This can done by replacing all occurences of TRIGGER USER by TRIGGER ALL in the file data/data.sql. You may need appropriate privileges for this to work.

Final words

Such a migration is not an easy task but may be worth it in total cost of ownership and maybe developer satisfaction as Oracle has some oddities and limitations for backend developers.

I will tackle some application specific issues with such a migration in a follow-up article that we encountered when migrating our system from Oracle to PostgreSQL using the above approach and tools.

Tables as types in PostgreSQL

In SQL each column of a database table has a data type. These are types like NUMBER, VARCHAR(size) / TEXT, TIMESTAMP. What you perhaps don’t know yet is that in PostgreSQL you can use tables as types. What does this mean? Let’s say we have a table:

CREATE TABLE person (
  firstname TEXT,
  lastname  TEXT,
  email     TEXT
);

Now you can use this table as a type for columns in other tables:

CREATE TABLE article (
  id       SERIAL,
  content  TEXT,
  author   person,
  reviewer person
);

Instead of repeating the three columns of a person twice, e.g. author_firstname, author_lastname, author_email, and reviewer_firstname, reviewer_lastname, reviewer_email, the person table defined before acts as a type. Of course, the usual way in SQL is to give each person an ID and reference persons via these IDs from other tables. But sometimes you do not want this reference semantics. In this example you might want to fix the values of author and reviewer for articles in time and not retroactively update them automatically if a person changes their last name or email address later.

How to access the columns of these types? For INSERT the syntax is as follows:

INSERT INTO article (content, author, reviewer)
  VALUES ('...',
    ('Jane', 'Doe', 'jane.doe@example.com'),
    ('John', 'Roe', 'jroe@example.com')
  ); 

Or with explicit names of the sub-columns:

INSERT INTO article (content,
    author.firstname,
    author.lastname,
    author.email,
    reviewer.firstname,
    reviewer.lastname,
    reviewer.email)
  VALUES ('...',
    'Jane', 'Doe', 'jane.doe@example.com',
    'John', 'Roe', 'jroe@example.com'
  ); 

In a SELECT query individual values can be accessed with the following syntax:

SELECT
  content,
  (author).lastname,
  (reviewer).lastname
FROM article;

Of course, tables that uses other tables as data types for their columns can be used as data types again.

One last thing worth mentioning is that these nested definitions can be mapped nicely to JSON:

SELECT jsonb_pretty(to_jsonb(article)) FROM article;
{
  "id": 1,
  "content": "...",
  "author": {
    "email": "jane.doe@example.com",
    "firstname": "Jane",
    "lastname": "Doe"
  },
  "reviewer": {
    "email": "jroe@example.com",
    "firstname": "John",
    "lastname": "Roe"
  }
}

Geometric shapes, functions and operators in PostgreSQL

On this blog I frequently write about features of relational database systems and their SQL dialects. One feature many developers do not know about is support for geometric shapes, although a lot of RDBMs support them in one form or the other, each with its own syntax, of course. In this article I’m going to demonstrate this feature with PostgreSQL.

PostgreSQL has data types for geometric shapes like point, box, line segment, line, path, polygon, and circle. These data types are only defined for two dimensions with Euclidean (x, y) coordinates. Here are some literals for these types:

point '(3.2,4)'
box '((1,2),(6,4))'
lseg '((-4,0),(3,2))'
path '((0,0),(2,1),(5,3))'
polygon '((0,0),(1,1),(2,0),(3,1))'
circle '((5,2),1.5)'

You can create tables with columns of these types and insert shapes:

CREATE TABLE shapes (p point, c circle);

INSERT INTO shapes (p, c) VALUES
  (point '(1,0)', circle '(0,0),3'),
  (point '(10,20)', circle '(2,3),4'),
  (point '(0.5,1.5)', circle '(1,2),1');

Now you can query shapes and filter them with special operators:

SELECT * FROM shapes WHERE c @> p;

This query uses the contains operator @> in the WHERE clause. It selects all rows where the circle c contains the point p.

Here’s another operator: <-> determines the Euclidean distance between two points.

SELECT point '(0,0)' <-> point '(1,1)';
=> 2.23606797749979

The ?|| operator tests if two lines are parallel:

SELECT line '((1,2),(1,3))' ?|| line '((2,3),(2,4))';
=> true

You can translate a shape with the + operator:

SELECT box '((0,0),(1,1))' + point '(1,2)';
=> box '(2,3),(1,2)'

Or you can test with && if two shapes overlap:

SELECT box '((1,2),(4,3))' && box '(2,3),(1,2)';
=> true

This is only a small selection of geometric operators. See the full list in the official documentation. There you can also find a list of geometric functions like area , center, isclosed, npoints, etc.

SELECT area(box '((4,6),(10,12))');
=> 36

As mentioned in the beginning, other database systems support similar functionality. Check out MySQL’s spatial data types, Oracle Spatial, and MS SQL’s spatial data types.

Pagination in SQL

Pagination is the task of dividing a data set into subsequent parts of the whole data set. For example, a search engine initially only shows the first 15 results for a search query. The user can then step through the rest of the results the by clicking a “Next” button.

Ideally this feature is also supported by the underlying database system. Otherwise, the application would have to load all matching data records from the database, just to filter out the major part of of them, because the user only wanted to see page 3 of 50. A pagination request has two components: a limit and an offset. If a page contains a maximum of 15 items and page 3 is requested, then the limit would be 15 and the offset would be 30 = (page-1) × limit.

PostgreSQL, MySQL, MariaDB

The database systems PostgreSQL, MySQL and MariaDB have a straight forward syntax for pagination: LIMIT {number} OFFSET {number} . So a simple SQL query with pagination might look like this:

SELECT * FROM users ORDER BY name LIMIT 15 OFFSET 30;

Oracle DB

Oracle DB didn’t have a dedicated syntax for pagination before Oracle 12c, but it was still possible to achieve the same result with other means. With Oracle 12c a new syntax for pagination was introduced under the name “Row limiting clause”. First I’ll show the old method, then the new syntax.

The old method is based on ROWNUM . If you wanted to specify both an offset and a limit, you had to nest multiple queries:

SELECT *
FROM (SELECT *, rownum AS rnum
      FROM (SELECT *
            FROM users
            ORDER BY name)
      WHERE rownum < 45)
WHERE rnum >= 30;

The newer row limiting clause syntax is shorter and looks as follows:

SELECT * FROM users ORDER BY name
  OFFSET 30 ROWS FETCH NEXT 15 ROWS ONLY;

This syntax also allows the option to specify a percentage of rows instead of a fixed number of rows:

SELECT * FROM users ORDER BY name
  FETCH FIRST 20 PERCENT ROWS ONLY;

MS SQL Server

Microsoft’s SQL Server also supports the Oracle-like syntax with OFFSET and FETCH clauses and recommends the usage of this syntax for pagination.

The World of SQL Dialects

For software projects I work with various relational database management systems (RDBMs), mainly PostgreSQL, MySQL/MariaDB, Oracle Database and Microsoft SQL Server. All of these use SQL as a query language, but the dialects of this language vary wildly, especially when it comes to non-standardized features. One such feature I often use is the aggregation of a list to a string. It does the following.

LEGS    ANIMAL
-----------------
2       Ostrich
2       Human
4       Cat
4       Dog
4       Capybara
6       Ant
8       Spider

Given a table like the one above it groups the elements of a column that have the same value in another column together in a string, concatenated by a separator like a comma:

LEGS    ANIMALS
----------------------------
2       Human, Ostrich
4       Capybara, Cat, Dog
6       Ant
8       Spider

This simple operation has four different syntaxes in the four mentioned database systems, which I want to demonstrate.

PostgreSQL

In PostgreSQL the function is called STRING_AGG:

SELECT legs,
  STRING_AGG(animal, ', ' ORDER BY animal) AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;
MySQL / MariaDB

In MySQL and its fork MariaDB the function is called GROUP_CONCAT, and it has a special syntax to specify the separator:

SELECT legs,
  GROUP_CONCAT(animal ORDER BY animal SEPARATOR ', ') AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;
Oracle

Oracle calls it LISTAGG and specifies the grouping via WITHIN GROUP.

SELECT legs,
  LISTAGG(animal, ', ') WITHIN GROUP (ORDER BY animal) AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;
Microsoft SQL Server

SQL Server calls it STRING_AGG like PostgreSQL, but specifies the grouping via WITHIN GROUP like Oracle:

SELECT legs,
  STRING_AGG(animal, ', ') WITHIN GROUP (ORDER BY animal) AS animals
FROM fauna
GROUP BY legs
ORDER BY legs;

Unfortunately, as developers we have to live with all these dialects of SQL. Even though there is an ISO standards committee for SQL, database creators love to build non-standard extensions into their products. The situation is worse than the browser-specific extensions and differences of JavaScript, HTML and CSS in modern web browsers. One thing that can paper over these differences are OR-Mappers like Hibernate or query languages like Hibernate’s HQL that abstract over SQL, but they come with their own set of problems.

Migrating from Oracle to PostgreSQL

One promise of SQL for application developers is that changing the database management system (DBMS) is not that of a big deal. Due to the many specialties and not complete standards conformance of the database vendors it can be a big task to migrate from one DBMS vendor to another.

Nevertheless, there are plenty of good reasons to do so:

  • Cost of buying, running and maintaining the DBMS
  • Limitations of the current DBMS like performance, tool support, character sets, naming, data types and sizes etc.
  • Missing features like geospatial support, clustering, replication, sharding, timeseries support and so on
  • Support or requirements on the customers or operators side

Some of our long running projects that started several years ago had the requirement to work with an Oracle DBMS, version 8i at that time. Now, more than 10 years later our customer provides and prefers to host a PostgreSQL 13 cluster. Of course she would like us to migrate our applications over to the new DBMS and eventually get rid of the Oracle installation.

Challenges for the migration

Even though PostgreSQL is supports most of SQL:2016 core and most important features of Oracle there are enough differences and subtleties that make migration non-trivial. The most obvious items to look out for are

  • different column type names
  • SQL features and syntactical differences (sequences!)
  • PL/SQL functions syntax and features

Depending on your usage of database specific features you have to assess how much work and risk is expected.

Tools and migration process

Fortunately, there is a quite mature tool that can aid you along the process called ora2pg. It has tons of options to help you customizing the migration and a quite helpful assessment of the task ahead. The migration report looks like this:

-------------------------------------------------------------------------------
Ora2Pg v21.1 - Database Migration Report
-------------------------------------------------------------------------------
Version Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
Schema  NAOMI-TEST
Size    91.44 MB

-------------------------------------------------------------------------------
Object  Number  Invalid Estimated cost  Comments        Details
-------------------------------------------------------------------------------
DATABASE LINK   0       0       0.00    Database links will be exported as SQL/MED PostgreSQL's Foreign Data Wrapper (FDW) extensions using oracle_fdw.
FUNCTION        1       0       1.00    Total size of function code: 0 bytes.
GLOBAL TEMPORARY TABLE  60      0       168.00  Global temporary table are not supported by PostgreSQL and will not be exported. You will have to rewrite some application code to match the PostgreSQL temporary table behavior.  ht_my_table <--- SNIP --->.
INDEX   69      0       6.90    0 index(es) are concerned by the export, others are automatically generated and will do so on PostgreSQL. Bitmap will be exported as btree_gin index(es). Domain index are exported as b-tree but commented to be edited to mainly use FTS. Cluster, bitmap join and IOT indexes will not be exported at all. Reverse indexes are not exported too, you may use a trigram-based index (see pg_trgm) or a reverse() function based index and search. Use 'varchar_pattern_ops', 'text_pattern_ops' or 'bpchar_pattern_ops' operators in your indexes to improve search with the LIKE operator respectively into varchar, text or char columns.
JOB     0       0       0.00    Job are not exported. You may set external cron job with them.
SEQUENCE        4       0       1.00    Sequences are fully supported, but all call to sequence_name.NEXTVAL or sequence_name.CURRVAL will be transformed into NEXTVAL('sequence_name') or CURRVAL('sequence_name').
SYNONYM 0       0       0.00    SYNONYMs will be exported as views. SYNONYMs do not exists with PostgreSQL but a common workaround is to use views or set the PostgreSQL search_path in your session to access object outside the current schema.
TABLE   225     0       72.00    495 check constraint(s).       Total number of rows: 264690. Top 10 of tables sorted by number of rows:. topt has 52736 rows. po has 50830 rows. notification has 18911 rows. timeline_entry has 16556 rows. char_sample_types has 11400 rows. char_safety_aspects has 9488 rows. char_sample_props has 5358 rows. tech_spec has 4876 rows. mail_log_entry has 4778 rows. prop_data has 4358 rows. Top 10 of largest tables:.
-------------------------------------------------------------------------------
Total   359     0       248.90  248.90 cost migration units means approximatively 3 man-day(s). The migration unit was set to 5 minute(s)

-------------------------------------------------------------------------------
Migration level : A-3
-------------------------------------------------------------------------------

Migration levels:
    A - Migration that might be run automatically
    B - Migration with code rewrite and a human-days cost up to 5 days
    C - Migration with code rewrite and a human-days cost above 5 days
Technical levels:
    1 = trivial: no stored functions and no triggers
    2 = easy: no stored functions but with triggers, no manual rewriting
    3 = simple: stored functions and/or triggers, no manual rewriting
    4 = manual: no stored functions but with triggers or views with code rewriting
    5 = difficult: stored functions and/or triggers with code rewriting
-------------------------------------------------------------------------------

The tool is written in Perl, so I decided to put and run it inside Docker containers because I did not want to mess with my working machine or some VMs. To have quick turnaround times with my containers I split up the process into 3 steps:

  1. Export of the schema and data using a docker container
  2. On success copy the ora2pg project to the host
  3. Import the schema and data using another docker container

The ora2pg migration project is copied to the host machine allowing you to inspect the export and make adjustments if need be. Then you can copy it to the import container or simply bind mount the directory containing the ora2pg project.

The Dockerfile for the export image looks like this

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

RUN mkdir -p /naomi/migration
RUN ora2pg --project_base /ora2pg --init_project my-migration
WORKDIR /ora2pg

COPY ora2pg.conf /ora2pg/my-migration/config/

CMD ora2pg -t SHOW_VERSION -c config/ora2pg.conf && ora2pg -t SHOW_TABLE -c config/ora2pg.conf\
 && ora2pg -t SHOW_REPORT --estimate_cost -c config/ora2pg.conf\
 && ./export_schema.sh && ora2pg -t INSERT -o data.sql -b ./data -c ./config/ora2pg.conf

Once the export looks good you can work on importing everything. The Dockerfile for the import image looks like this:

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus
RUN yum install -y postgresql-server

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

# you need to mount the project volume to /ora2pg
WORKDIR /ora2pg

CMD ./import_all.sh -d my_target_db -h $pg_host -U myuser -o myowner

Our target database runs on another host, so you need credentials to authenticate and perform all the required actions. Therefore we are the import container interactively. The PowerShell command for the import looks like this

docker run -it --rm -e pg_host=192.168.56.1 -v $PWD/ora2pg/my-migration:/ora2pg pgimport

The import script allows you to create the schema, sequences, indexes, constraints and load the data. I suggest adding the contraints after importing the data – a workflow supported by the import_all.sh script.

That way we got our Oracle database migrated into a PostgreSQL database. Unfortunately, this is only one part of the whole migration. The other part is making changes to the application code to correctly use the new database.

Working with JSON data in PostgreSQL

Today most common SQL-based relational database management systems (DBMS) like PostgreSQL, MySQL, MariaDB, SQL Server and Oracle offer functionality to efficiently store and query JSON data in one form or another, with varying syntax. While a standard named SQL/JSON is in the works, it is not yet fully supported by all of these DBMS. This blog post is specific to PostgreSQL.

JSON data types

In PostgreSQL there are two data types for JSON columns: json and jsonb. The former stores JSON data as-is with any formatting preserved, while the latter stores JSON in a decomposed binary format. Operations on data in jsonb format are potentially more efficient.

We’ll use the jsonb data type in the following example to store a sequence of events, for example for an event sourcing based application, in a table.

CREATE TABLE events (date TIMESTAMP NOT NULL,
                     event JSONB NOT NULL);

JSON literals look like string literals. Let’s insert some events:

INSERT INTO events (date, event) VALUES
  (NOW(), '{"type": "add_shelf", "payload": {"id": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}'),
  (NOW(), '{"type": "add_shelf", "payload": {"id": 2}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

PostgreSQL has two operators for navigating a JSON structure: -> and ->>. The former accesses an object field by key and the latter accesses an object field as text. These operators can be used both in the SELECT clause and the WHERE clause:

SELECT event->>'type' AS type FROM events;
type
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
        WHERE event->>'type'='add_book'
          AND event->'payload'->>'shelf'='1';
event
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Note that in the example above the value of "shelf" is compared to a string literal ('1'). In order to treat the value as a number we have to use the CAST function, and then we can use numerical comparison operators:

SELECT event FROM events
        WHERE CAST(
          event->'payload'->>'shelf' AS INTEGER
        ) > 1;
event
{"type":"add_book","payload":{"shelf":2,"title":"Don Quixote"}}

Updating

Updating JSON object fields is a bit more complicated. It is only possible with the jsonb data type and can be done via the JSONB_SET function, which takes four arguments:

1) the original JSON,
2) a path specifying which object fields should be updated,
3) a jsonb value, which is the new value, and
4) a boolean flag that specifies if missing fields should be created.

In this example we convert all the event "type" fields from lower case to upper case:

UPDATE events SET event=JSONB_SET(
  event,
  '{type}',
  TO_JSONB(UPPER(event->>'type')),
  false
);

PostgreSQL provides a lot more operators and functions for working with JSON data. This post only covered the most basic ones. See the PostgreSQL JSON reference for more.

Organize complex SQL queries with Common Table Expressions

Complex SQL database queries often contain subqueries.

SELECT * FROM ... 
   WHERE name IN (SELECT name 
         FROM ... 
         WHERE ...)

These can quickly become unreadable, especially if multiple subqueries are involved. A nice way to organise such queries with multiple subqueries is called Common Table Expressions (CTE), or colloquially: “WITH queries”. They are supported by most contemporary SQL databases.

When using Common Table Expressions you list all the subqueries after a WITH keyword in front of the actual query and assign a name to each subquery:

WITH
  <subquery1_name> AS (SELECT ...),
  <subquery2_name> AS (SELECT ...),
  [ ... ]
SELECT ...
FROM ...
[ ... ]

You can now refer to these subqueries in the main query by their names. They are conceptually just like temporary views. You can also refer to a subquery in the main query multiple times, whereas if the subquery was inlined you would have to repeat it. A subquery defined as a Common Table Expression can also refer to the preceding subqueries in the subquery list.

Recursion

A Common Table Expression can even refer to itself, which is a recursive definition. In some database systems you have to add the RECURSIVE keyword after WITH, in others you can leave it out. Here’s a recursive CTE that calculates factorial numbers:

WITH RECURSIVE factorials (n, fact) AS 
  (SELECT 0, 1
   UNION ALL
   SELECT n+1, (n+1)*fact FROM factorials
          WHERE n < 6)
SELECT * FROM factorials;
N FACT
0    1
1    1
2    2
3    6
4   24
5  120

You could also use such a recursive query to descend into a tree hierarchy.

While the use cases for recursive queries are less frequent, I find the general concept of Common Table Expressions very useful to make complex queries more readable.