Migrating a Grails application from Oracle to PostgreSQL

In my previous post I explained how to migrate an Oracle schema with data to a PostgreSQL database management system (DBMS). Besides the general tasks and issues there are additional topics to migrate a complete application using the database to the other DBMS.

In our specific case we have a grails application which we maintain since Grails 1.0 times for more than 12 years. During that time we did a ton of feature development with lots of refactoring and many database migrations. So the source database will most like not be perfectly consistent and clean.

General approach

Since Grails/GORM and the DatabaseMigration-Plugin (DBM-Plugin) do a great job at preparing an empty database with a matching schema for the application to run we let the framework tools generate the schema and only migrate the data using Ora2Pg.

Sounds simple, but how it is done in detail and what else to look for?

Generating the initial dabase schema

The DBM-Plugin provides a script to create a database changelog with a schema matching the domain model of your grails application. It is integrated in gradle, so you can grails dbm-generate-gorm-changelog initialdb.groovy to create the migration scripts providing a fitting schema. You then include this script in grails-app/migrations/changelog.groovy or replace all the migrations you had before included there with this initial database changelog.

To prepare an empty database to run with your application you call the gradle task dbmUpdate.

Checking all plain SQL code

If you are only using GORM’s dynamic finders, save()/update()/delete()-methods, HQL and the criteria API you are probably fine to run your application or perform the data mirgration step.

Our application has some specific parts where we use plain SQL. Because of syntactical differences you will want to check all the plain SQL if it works with PostgreSQL. The most obvious stuff is dealing with sequences or other queries where you need the dual table in Oracle.

Migrating the data

This is probably the part where the most things can go wrong. We had quite some work with data-inconsistencies and left-overs from manual corrections that happened over the course of running and upgrading the application for so many year. For younger and simpler applications this may not present any challenges but for us it was quite time-consuming.

Now you can use Ora2Pg to import the data. After the whole data import using Ora2Pg worked as intended you should check the value of the hibernate_sequence . This sequence is used to generate the ids of all grails domain objects.

Do not let the sequences from the autoincrement columns of the tables of your domain objects confuse you! They are not used by Grails/GORM. To avoid this confusion you can remove the default value of the id columns and the accompanying sequences.

Checking the result

You should always run acceptance or manual tests to make sufficiently sure that the migration worked as intended. There is always the possibility of a configuration or software error or some oversights in checking the application code.

If possible tests the result on a dedicated system with some snapshot of the real world data before making the switch on the production system. Good luck!

Hyperfocus on Non-Essentials

When tasked with managing a complex and potentially overwhelming project, a common behaviour of inexperienced managers/developers is to focus on things that are easy to achieve (“low-hanging fruits”), fun to produce (“cherry-picking”) or within the comfort zone.

This means that in the extreme, the developer exclusively focusses on things that are of no interest for the business client but can simulate progress and results.

This behaviour is an application of the “path of least resistance” and I know exactly what it feels like. Here’s the story why:

When I was fourteen years old, my programming career was already 6 years in the making. Of course, I only wrote code for myself, teaching myself new concepts and new errors alike. My only scale of success was “does it run?” and “is it still fun for me?”. My only programming language was BASIC, first the dialect GW-BASIC (still with line numbers!), then the more advanced QBasic (with named jump markers instead of line numbers).

I grew up in small cities and was basically alone with my hobby. But a friend had a parent that owned an optometrist shop that was interested in using computers for their day-to-day operations. I was asked to write a program to handle the shop’s inventory and sales. The task was interesting, but I had no idea how any shop, let alone this particular one, handles their business. I agreed to build a prototype and work from there.

I knew that this project was bigger and more ambitious than any hobby project of my own before, but it was programming after all – how hard could it be?

My plan was to do two things in parallel: Buy and read a book about real software development with BASIC and try to sketch out the application as as “coded paper prototype”.

The book turned out to be the confessions of a frustrated software developer that basically assured the reader on every page that BASIC was not dead and appended dozens of pages with code listings to every chapter. There was probably a lot of wisdom in this book, too, but it missed me by miles.

The sketch of the application began with a menu of all the things I thought would be necessary, like “inventory” or “sales process”. I also included an “Extras” menu and one thing in the menu should be a decent screen saver. Back in those days, the CRT monitors suffered from burn-in if the same image was shown for a long time and I figured that this application would run all day every day, so it seemed logical and important to have a screen saver that is automatically turned on after some period of inactivity.

Which presented itself as a really hard problem, because BASIC was essentially single-threaded (or at least it was to my knowledge back then) and I had to invent some construct that can perhaps be described as “obscure co-routines”. That was some fun programming!

After I solved the automatic activation of the screensaver functionality, I discovered that I could easily make the actual screensaver that gets shown a parameter. So I programmed not one, but several cool and innovative ASCII art screensavers that you could choose from in the extras menu. One screen saver was inspired by the snake game, another one was “colored blocks” that would appear and disappear to form a captivating mood picture.

That was the state of the application when my friend’s parent asked for a demo. I had:

  • No additional knowledge about application design
  • A menu of things I invested no second thought in
  • Several very cool screensavers that activated themselves automatically. Isn’t that great?

You can probably guess how that demo went. None of the things I had developed mattered in the slightest for the optometrist shop. My passion for my creation didn’t translate to the business very well.

I had worked intensively on this project. I hyperfocused on totally non-essential stuff and stayed mostly in my comfort zone, even if I felt as if I had made great progress.

It is easy to fall into this trap. It is easy to mistake one’s own feelings of progress and success with the external (real) ones. It feels very good to work frantically on things that matter to oneself. It becomes a tragedy if the things only matter to oneself and nobody else.

So what can we do to avoid this trap? If you have an idea, write a comment about it! I hope to hear lots of different takes on this problem.

Here is my solution: “Risk first”. With this project strategy, the first task in a project is to solve the hardest part, to cut the biggest knot or to chart the most relevant area. It means that after the first milestone is a success, the project will gradually become easier. It’s the precursor to “fail fast”, which is a “risk first” project that didn’t meet its first milestone.

It is almost guaranteed that the first milestone in a “risk first” project will not be in your comfort zone, is no low-hanging fruit that you can pick without effort and while it might be fun to work on, it’s probably something your customer has a real interest in.

By starting a project “risk first”, I postpone my tendency to focus on non-essentials towards the end of the project. And with concepts like “business value”, I can see very clearly when my work becomes irrelevant for the customer. That’s when I stop my professional work and my hobby begins.

You are not safe with Semantic Versioning (right now).

TL;DR: Several recent hijacks of widely spread NPM libraries should make you double-think whether to trust the package.json-semantic-versioning notation using carets and tildes.

So what’s that about?

Version updates are one oft he most haunting things any person with any kind of computer does ever encounter. On the one hand, it’s good news – some thing that you use has evolved again, and you are right at the source. Have better tools, less bugs, new functionality, and usually delivered by just a few clicks. But there’s always these question marks – do I want to know what happened there? Do I want that now? Wasn’t that old version totally working?

So we all know that dilemma. I mean, as we speak, every Windows user is kindly remembered that one could now let that version 11 live on her system. But is it compatible with your system? Will this work out of the box? Is your time worth trying now or do you wait until the waters have settled? And while Microsoft is asking that question only once a few years apart, there’s software like Notepad++ that wants me to have its updates every single start. Because apparently, text editors can grow up so fast..?

Now imagine this problem, times a few googol, and you are in the everyday world of every web developer. In the npm universe, you have nearly total modular flexibility, which comes with so many small packages that update for whatever reason — One likes to believe that mostly these are bugfixes. Or might these be these breaking changes? Did JavaScript evolve again? Do you want that stuff? Are you state-of-the-art or are you in dependency hell?

Issues like these are the source of semantic versioning. The idea is, that you apply a certain level of trust into the usually three-membered version “major.minor.patch” label. Version “3.10.1” means

  • Major version 3; the API of your software is promised to be compatible within each major version (except for the initial version 0)
  • Minor version 10, you use a new version for new API functionality, e.g. there have been 10 times when version 3.0.0 was improved without breaking backwards compatibility
  • Patch version 1, which is incremented for bugfixes within that API specification, i.e. backwards compatible within the whole major version.

This is done in good faith because only when the package maintainer “re-think”s his API, major version is incremented, and that mediates a level of trust in that you in return only have to re-think your usage of that API, if you switch major version. Therefore, dependency management systems like the npm / yarn package.json allow for the convenient notation to specify e.g.

"dependencies": {
    /* ... */
    "styled-components": "^5.1.1",
    "websocket": "~1.0.31"
  },

The caret (^) notation tells us that when the styled-components package was added to our projects, we installed version 5.1.1, but we trust the npm universe that far that every future execution of “npm install” / “yarn install” can increase this version within the same major version, e.g. if version 5.2.0 was released in the meantime, then update for its new content, and as we speak, we are at version 5.3.3, so this project is well up-to-date with whatever the good folks put in there.

Similarly, the tilde (~) notation only allows that behavior within this minor version, e.g. at the moment any call of “… install” would retrieve the current version 1.0.34 but would not get version 1.1.0 whenever that was released.

The opposite of using these is called dependency pinning, and there is lots of further reading available, e.g. here.

There is a certain misconception that “… install” will only update any of these versions if there is no “package-lock.json” for npm or no “yarn.lock” for yarn is around. That is not the case, see below, but first, my actual point.

So the point of semantic versioning is the establishment of trust between the package developer and the user: “This update only changes about that much”.

Problem: You cannot trust the npm universe right now.

Now the last weeks showed us not only a hijacking of the npm package us-parser-js at the end of October, but also another one of the packages coa and rc 11 days ago – these appeared somewhat correlated and came with a mixture of password-stealing and secret installing of crypto-mining tools, all in all the result of some bad folks getting access to these package repositories, making them execute malicious code in their install scripts – note that install scripts are not uncommon for widely spread npm packages. This means that while you can complain that these hackers did not really adhere to the Semantic Versioning code (oh..??), and also these breaches were noticed in a couple of hours each, think about this:

anyone with a certain caret or tilde in her package.json might have infected herself just by a unluckily-timed call of “npm install”.

Think of an automated script. Think of CI. Think of anyone who just wants to build his project and be as up-to-date as one can get. A last year survey of npm developers showed that usage of two-factor authentication is just below 10%, and while this doesn’t mean that the other 90% are completely irresponsible, there just is no system in place that would promise us that such attacks will just go away soon.

So of course we can not write every dependency of your projects itself, especially if they are not direct dependencies. But think of it as Russian Roulette: At least you can minimize the number of pulling the trigger.

You can not know which package is affected next. You better make sure to pin that version to exactly a version you can trust right now, and if you are ever in need of updating this, at least have a quick googling – whether there’s some sh*t going down right now.

Do you have further ideas on how to isolate your development / CI environments from whatever just happens in the outer rims of the npm universe? Please feel free to share.

How to make npm / yarn respect their respective lockfiles (package-lock.json / YARN.lock)

In principle, you can even live with the caret / tilde, if you make sure that you never actually call “npm / yarn install” itself, but make them actually consider their so-called-lockfiles as lockfiles. In their current versions, these calls should lead to that behaviour:

# instead of npm install
npm ci

# instead of yarn install
# for yarn 1.x:
yarn install --frozen-lockfile
# for yarn 2:
yarn install --immutable

As you can see from the npm call, this is especially suited for CI environments, this means you have to make sure the package-lock.json / yarn.lock is part of your repository.

One disadvantage of our approach is that npm really likes to notice you of being not very up to date, and produce lots of noise for whatever reason that you want to get rid of. Just be sure to pay some amount of attention when you update.

WPF Redux Sample Application

A while ago, I wrote about how we are using the redux architexture in our C# applications. I have just pushed an example showing ReduxSimple with WPF and our extensions in a .NET 5 application to our github account. The example itself is just a counter with an increment and a decrement button, but it already shows the whole redux cycle.

The store setup in App.xaml.cs shows how the ReducerBuilder can be used to build a State reducer from the Reducer class via reflection.

I also added a small prime-number factorization to show how to use ‘expensive’ functions in the view part of the application using our SelectorGraph. This makes it possible to properly derive view data from the state, only updating them once when one of their inputs changes. In the example, that is the counter. So the number will only be factorized when the counter changes, while all other future state changes do update the selector.

The example does not use the UIDuplexBinder yet. It allows read/write binding of WPF controls to an IObservable and an action-creator, and is hopefully pretty straight-forward to use. Please enjoy!

Linking separate PostgreSQL servers with a Foreign Data Wrapper

If you want to query a table in one database server from another you need a way to connect these two servers with each other. For PostgreSQL databases the feature that makes this possible is called Foreign Data Wrapper.

To use this feature you have to load its extension into the current database:

CREATE EXTENSION postgres_fdw;

The postgres_fdw extension ships with a default PostgreSQL installation. It allows you to connect a PosgreSQL database with another PostgreSQL database. You can connect a PostgreSQL database with databases by other vendors, too, but then you need external extensions like oracle_fdw for Oracle databases or mysql_fdw for MySQL databases. In this article we will only use postgres_fdw.

You can check if the extension was loaded successfully. The following query should return a row for the extension:

SELECT * FROM pg_extension WHERE extname='postgres_fdw';

The next step is to set up the remote server instance:

CREATE SERVER remotesrv
  FOREIGN DATA WRAPPER postgres_fdw
  OPTIONS (
    host '127.0.0.1',
    port '5432',
    dbname 'remotedb'
  );

This statement registers a remote server under the name remotesrv, a name you can choose freely. You have to specify which Foreign Data Wrapper to use (postgres_fdw in this case) as well the target host, port and database name.

The CREATE SERVER statement didn’t contain any user login information, you have to provide it via a user mapping:

CREATE USER MAPPING
  FOR CURRENT_USER
  SERVER remotesrv
  OPTIONS (
    user 'joe',
    password 'secret'
  );

In this case we map the remote user joe to the current user (CURRENT_USER) of the local server. It doesn’t have to be the current user, you could specify any user name.

Now you have to import tables from the remote database. You can either explicitly import individual tables or a whole schema. Here’s how to import the public schema of the remote (“foreign”) database into the public schema of the local database:

IMPORT FOREIGN SCHEMA public
  FROM SERVER remotesrv
  INTO public;

You can restrict which tables to import with the LIMIT TO or EXCEPT clauses. The following statement will only import the tables books and students:

IMPORT FOREIGN SCHEMA public
  LIMIT TO (students, books)
  FROM SERVER remotesrv
  INTO public;

Now you can access these tables as if they were in the local database.