Migrating from Oracle to PostgreSQL

One promise of SQL for application developers is that changing the database management system (DBMS) is not that of a big deal. Due to the many specialties and not complete standards conformance of the database vendors it can be a big task to migrate from one DBMS vendor to another.

Nevertheless, there are plenty of good reasons to do so:

  • Cost of buying, running and maintaining the DBMS
  • Limitations of the current DBMS like performance, tool support, character sets, naming, data types and sizes etc.
  • Missing features like geospatial support, clustering, replication, sharding, timeseries support and so on
  • Support or requirements on the customers or operators side

Some of our long running projects that started several years ago had the requirement to work with an Oracle DBMS, version 8i at that time. Now, more than 10 years later our customer provides and prefers to host a PostgreSQL 13 cluster. Of course she would like us to migrate our applications over to the new DBMS and eventually get rid of the Oracle installation.

Challenges for the migration

Even though PostgreSQL is supports most of SQL:2016 core and most important features of Oracle there are enough differences and subtleties that make migration non-trivial. The most obvious items to look out for are

  • different column type names
  • SQL features and syntactical differences (sequences!)
  • PL/SQL functions syntax and features

Depending on your usage of database specific features you have to assess how much work and risk is expected.

Tools and migration process

Fortunately, there is a quite mature tool that can aid you along the process called ora2pg. It has tons of options to help you customizing the migration and a quite helpful assessment of the task ahead. The migration report looks like this:

-------------------------------------------------------------------------------
Ora2Pg v21.1 - Database Migration Report
-------------------------------------------------------------------------------
Version Oracle Database 12c Enterprise Edition Release 12.1.0.2.0
Schema  NAOMI-TEST
Size    91.44 MB

-------------------------------------------------------------------------------
Object  Number  Invalid Estimated cost  Comments        Details
-------------------------------------------------------------------------------
DATABASE LINK   0       0       0.00    Database links will be exported as SQL/MED PostgreSQL's Foreign Data Wrapper (FDW) extensions using oracle_fdw.
FUNCTION        1       0       1.00    Total size of function code: 0 bytes.
GLOBAL TEMPORARY TABLE  60      0       168.00  Global temporary table are not supported by PostgreSQL and will not be exported. You will have to rewrite some application code to match the PostgreSQL temporary table behavior.  ht_my_table <--- SNIP --->.
INDEX   69      0       6.90    0 index(es) are concerned by the export, others are automatically generated and will do so on PostgreSQL. Bitmap will be exported as btree_gin index(es). Domain index are exported as b-tree but commented to be edited to mainly use FTS. Cluster, bitmap join and IOT indexes will not be exported at all. Reverse indexes are not exported too, you may use a trigram-based index (see pg_trgm) or a reverse() function based index and search. Use 'varchar_pattern_ops', 'text_pattern_ops' or 'bpchar_pattern_ops' operators in your indexes to improve search with the LIKE operator respectively into varchar, text or char columns.
JOB     0       0       0.00    Job are not exported. You may set external cron job with them.
SEQUENCE        4       0       1.00    Sequences are fully supported, but all call to sequence_name.NEXTVAL or sequence_name.CURRVAL will be transformed into NEXTVAL('sequence_name') or CURRVAL('sequence_name').
SYNONYM 0       0       0.00    SYNONYMs will be exported as views. SYNONYMs do not exists with PostgreSQL but a common workaround is to use views or set the PostgreSQL search_path in your session to access object outside the current schema.
TABLE   225     0       72.00    495 check constraint(s).       Total number of rows: 264690. Top 10 of tables sorted by number of rows:. topt has 52736 rows. po has 50830 rows. notification has 18911 rows. timeline_entry has 16556 rows. char_sample_types has 11400 rows. char_safety_aspects has 9488 rows. char_sample_props has 5358 rows. tech_spec has 4876 rows. mail_log_entry has 4778 rows. prop_data has 4358 rows. Top 10 of largest tables:.
-------------------------------------------------------------------------------
Total   359     0       248.90  248.90 cost migration units means approximatively 3 man-day(s). The migration unit was set to 5 minute(s)

-------------------------------------------------------------------------------
Migration level : A-3
-------------------------------------------------------------------------------

Migration levels:
    A - Migration that might be run automatically
    B - Migration with code rewrite and a human-days cost up to 5 days
    C - Migration with code rewrite and a human-days cost above 5 days
Technical levels:
    1 = trivial: no stored functions and no triggers
    2 = easy: no stored functions but with triggers, no manual rewriting
    3 = simple: stored functions and/or triggers, no manual rewriting
    4 = manual: no stored functions but with triggers or views with code rewriting
    5 = difficult: stored functions and/or triggers with code rewriting
-------------------------------------------------------------------------------

The tool is written in Perl, so I decided to put and run it inside Docker containers because I did not want to mess with my working machine or some VMs. To have quick turnaround times with my containers I split up the process into 3 steps:

  1. Export of the schema and data using a docker container
  2. On success copy the ora2pg project to the host
  3. Import the schema and data using another docker container

The ora2pg migration project is copied to the host machine allowing you to inspect the export and make adjustments if need be. Then you can copy it to the import container or simply bind mount the directory containing the ora2pg project.

The Dockerfile for the export image looks like this

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

RUN mkdir -p /naomi/migration
RUN ora2pg --project_base /ora2pg --init_project my-migration
WORKDIR /ora2pg

COPY ora2pg.conf /ora2pg/my-migration/config/

CMD ora2pg -t SHOW_VERSION -c config/ora2pg.conf && ora2pg -t SHOW_TABLE -c config/ora2pg.conf\
 && ora2pg -t SHOW_REPORT --estimate_cost -c config/ora2pg.conf\
 && ./export_schema.sh && ora2pg -t INSERT -o data.sql -b ./data -c ./config/ora2pg.conf

Once the export looks good you can work on importing everything. The Dockerfile for the import image looks like this:

FROM centos:7

# Prepare the system for ora2pg 
RUN yum install -y wget
RUN wget https://yum.oracle.com/RPM-GPG-KEY-oracle-ol7 -O /etc/pki/rpm-gpg/RPM-GPG-KEY-oracle

COPY ol7-temp.repo /etc/yum.repos.d/
RUN yum install -y oraclelinux-release-el7
RUN mv /etc/yum.repos.d/ol7-temp.repo /etc/yum.repos.d/ol7-temp.repo.disabled
RUN yum install -y oracle-instantclient-release-el7
RUN yum install -y oracle-instantclient-basic
RUN yum install -y oracle-instantclient-devel
RUN yum install -y oracle-instantclient-sqlplus
RUN yum install -y postgresql-server

RUN yum install -y perl perl-CPAN perl-DBI perl-Time-HiRes perl-YAML perl-local-lib make gcc
RUN yum install -y perl-App-cpanminus

RUN cpanm CPAN::Config
RUN cpanm CPAN::FirstTime

ENV LD_LIBRARY_PATH=/usr/lib/oracle/21/client64/lib
ENV ORACLE_HOME=/usr/lib/oracle/21/client64

RUN perl -MCPAN -e 'install DBD::Oracle'

COPY ora2pg-21.1.tar.gz /tmp

WORKDIR /tmp
RUN tar zxf ora2pg-21.1.tar.gz && cd ora2pg-21.1 && perl Makefile.PL && make && make install

# you need to mount the project volume to /ora2pg
WORKDIR /ora2pg

CMD ./import_all.sh -d my_target_db -h $pg_host -U myuser -o myowner

Our target database runs on another host, so you need credentials to authenticate and perform all the required actions. Therefore we are the import container interactively. The PowerShell command for the import looks like this

docker run -it --rm -e pg_host=192.168.56.1 -v $PWD/ora2pg/my-migration:/ora2pg pgimport

The import script allows you to create the schema, sequences, indexes, constraints and load the data. I suggest adding the contraints after importing the data – a workflow supported by the import_all.sh script.

That way we got our Oracle database migrated into a PostgreSQL database. Unfortunately, this is only one part of the whole migration. The other part is making changes to the application code to correctly use the new database.

Running Tango-Servers in Docker

Containers are a great way of running your software in an environment that you defined and no one else needs to maintain of even know about. This is especially true if your software provides its service using a network port. All that operators have to provide is the execution platform for the container and the required resources.

Tango servers fit quite well into this type of software: Essentially they provide a service over the network. Unfortunately, they need a tango control system to be fully usable, so thats some other services. Luckily, there are docker images for this so building a stack of containers to run your Tango server in the scope of a control system is quite easy (sample docker-compose.yml):

version: '3.7'
services:
  my-tango-server:
    container_name: my-tango-server
    image: my-ts
    environment:
      - TANGO_HOST=tango-cs:10000
    depends_on:
      - tango-cs
  tango-cs:
    container_name: tango-database
    image: tangocs/tango-cs:9.3.2-alpha.1-no-tango-test
    ports:
      - "10000:10000"
    environment:
      - TANGO_HOST=localhost:10000
      - MYSQL_HOST=tango-db:3306
      - MYSQL_USER=tango
      - MYSQL_PASSWORD=tango
      - MYSQL_DATABASE=tango
    links:
      - "tango-db:localhost"
    depends_on:
      - tango-db
  tango-db:
    container_name: mysql-database
    image: tangocs/mysql:9.2.2
    environment:
      - MYSQL_ROOT_PASSWORD=root
    volumes:
      - tango_mysql_data:/var/lib/mysql
volumes:
  tango_mysql_data:

Example Dockerfile for the image my-ts used for the my-tango-server container:

FROM debian:buster

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD my-tango-server myts

Unfortunately this naive approach has the following issues:

  • The network port of our tango server changes with each restart and is not available from outside the docker stack
  • The tango server communicates the internal IP-Address of its container to the tango database so connection attempts from the outside fail even if the port is correct
  • Even though the tango server container depends on the tango control system container the tango server might start up before the tango control system is available.

Let us tackle the issues one by one.

Expose a fixed port for the Tango server

This is probably the easiest issue to fix because it is well documented for OmniORB and widely used. So let us change docker-compose.yml to expose a port and pass it into the container as an environment variable:

version: '3.7'
services:
  my-tango-server:
    container_name: my-tango-server
    image: my-ts
    ports:
      - "45450:45450"
   environment:
      - TANGO_HOST=tango-cs:10000
      - TANGO_SERVER_PORT=45450
    depends_on:
      - tango-cs
  tango-cs:
 ...

And use it in our Dockerfile for the my-ts image:

FROM debian:buster

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD my-tango-server myts -ORBendPoint giop:tcp::$TANGO_SERVER_PORT

Fine, now our Tango server always listens on a defined port and that same port is exposed to the outside of our docker stack.

Publish the correct IP/hostname of the Tango server

I found it hard to find documentation about this and how to put it on the command line correctly but here is the solution to the problem. The crucial command line parameter is -ORBendPointPublish . We need to pass a hostname of the host machine into the container and let OmniORB publish that name (tango-server.local in our example):

version: '3.7'
services:
  my-tango-server:
    container_name: my-tango-server
    image: my-ts
    ports:
      - "45450:45450"
   environment:
      - TANGO_HOST=tango-cs:10000
      - TANGO_SERVER_PORT=45450
      - TANGO_SERVER_PUBLISH=tango-server.local:45450
    depends_on:
      - tango-cs
  tango-cs:
 ...

And the corresponding changes to our Dockerfile:

FROM debian:buster

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD my-tango-server myts -ORBendPoint giop:tcp::$TANGO_SERVER_PORT -ORBendPointPublish giop:tcp:$TANGO_SERVER_PUBLISH

Waiting for availability of a service

This one is not specific to our tango server and the whole stack but a general problem when building stacks where one service depends on another one being up and listening on a network port. Docker compose takes care of the startup order of the containers but does not check for the readiness of services running in the containers. In linux containers this can easily be achieved by using the small shell utility wait-for-it. Here we only need to make some changes to our Dockerfile:

FROM debian:buster
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install wait-for-it

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD wait-for-it $TANGO_HOST -- my-tango-server myts -ORBendPoint giop:tcp::$TANGO_SERVER_PORT -ORBendPointPublish giop:tcp:$TANGO_SERVER_PUBLISH

Depending on your distribution/base image installation of the tool (or other similar alternatives like dockerize or docker-compose-wait).

Summing it up

After moving some rocks out of the way it is dead easy to deploy your tango servers together with a complete tango control system to any host platform providing a container runtime. You do not need to delivery everything in one single stack if you do not want to. Most of the stuff described above works the same when deploying tango control system and tango servers in independent stacks or as separate containers even on multiple hosts.

Docker runtime breaking your container

Docker (or container technology in general) is a great tool to clearly separate the concerns of developers and operations. We use it to simplify various tasks like building projects, packaging them for different platforms and deployment of our software onto the target machines like staging and production servers. All the specifics of the projects are contained and version controlled using the Dockerfiles and compose files.

Our operations only needs to provide some infrastructure able to build container images and run them. This works great most of the time and removes a lot of the friction between developers and operation where in the past snowflaky-servers needed to be setup and maintained. Developers often had to ask for specific setups and environments because each project had their own needs. That is all gone with this great container technology. Brave new world. Except when it suddenly does not work anymore.

Help, my deployment container stopped working!

As mentioned above we use docker to deploy our software to the target machines. These machines are often part of a corporate network protected by firewalls and only accessible using VPN. I already talked about how to use openvpn in a docker container for deployment. So the other day I was making a release of one of my long-running projects and pressing the deploy button for that project on our jenkins continuous integration server.

But instead of just leaning back, relaxing and watching the magic work the deployment failed and the red light lit up! A look into the job output showed that the connection to the target machine was refused. A quick check from the developer machine showed no problem on the receiving side. VPN, target machine and everything was up and running as usual.

After a quick manual deployment performed with care and administrator hat I went on an investigation journey…

What was going on?

The deployment job did not change for several months, the container image did not change and the rest of the infrastructure was working as expected. After more digging, debugging narrowing down the problem I found out, that openvpn did not work in the container anymore because of some strange permission denied error:

Tue May 19 15:24:14 2020 /sbin/ip addr add dev tap0 1xx.xxx.xxx.xxx/22 broadcast 1xx.xxx.xxx.xxx
Tue May 19 15:24:14 2020 /sbin/ip -6 addr add 2axx:1xxx:4:5xxx:9xx:5xxx:5xxx:4xxx/64 dev tap0
RTNETLINK answers: Permission denied
Tue May 19 15:24:14 2020 Linux ip -6 addr add failed: external program exited with error status: 2
Tue May 19 15:24:14 2020 Exiting due to fatal error

This hot trace made it easy to google for and revealed following issue on github: https://github.com/dperson/openvpn-client/issues/75. The cause of all the trouble was changed behaviour of the docker runtime. Our automatic updates had run over the weekend and actually installed a new package version of the docker runtime (see exerpt from apt history log):

containerd.io:amd64 (1.2.13-1, 1.2.13-2)

This subtle change broke my container! After some sacrifices to the whale gods I went on to implement the fix. Fortunately there is an easy way to get it working like before. You just have to pass following command line switch to docker run and everything works as expected:

--sysctl net.ipv6.conf.all.disable_ipv6=0

As nice as containers are for abstracting away hardware, operating systems and other environment details sometimes the container runtime shines through. It is just a shame that such things happen on minor releases or package release upgrades…

Running a for-loop in a docker container

Docker is a great tool for running services or deployments in a defined and clean environment. Operations just has to provide a host for running the containers and everything else is up to the developers. They can forge their own environment and setup all the prerequisites appropriately for their task. No need to beg the admins to install some tools and configure server machines to fit the needs of a certain project. The developers just define their needs in a Dockerfile.

The Dockerfile contains instructions to setup a container in a domain specific language (DSL). This language consists only of a couple commands and is really simple. Like every language out there, it has its own quirks though. I would like to show a solution to one I encountered when trying to deploy several items to a target machine.

The task at hand

We are developing a distributed system for data acquisition, storage and real-time-display for one of our clients. We deliver the different parts of the system as deb-packages for the target machines running at the customer’s site. Our customer hosts her own debian repository using an Artifactory server. That all seems simple enough, because artifactory tells you how to upload new artifacts using curl. So we built a simple container to perform the upload using curl. We tried to supply the bash shell script required to the CMD instruction of the Dockerfile but ran into issues with our first attempts. Here is the naive, dysfunctional Dockerfile:

FROM debian:stretch
RUN DEBIAN_FRONTEND=noninteractive apt-get update &amp;&amp; apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get update &amp;&amp; apt-get -y install dpkg curl

# Setup work dir, $PROJECT_ROOT must be mounted as a volume under /elsa
WORKDIR /packages

# Publish the deb-packages to clients artifactory
CMD for package in *.deb; do\n\
  ARCH=`dpkg --info $package | grep "Architecture" | sed "s/Architecture:\ \([[:alnum:]]*\).*/\1/g" | tr -d [:space:]`\n\
  curl -H "X-JFrog-Art-Api:${API_KEY}" -XPUT "${REPOSITORY_URL}/${package};deb.distribution=${DISTRIBUTION};deb.component=non-free;deb.architecture=$ARCH" -T ${package} \n\
  done

The command fails because the for-shell built-in instruction does not count as a command and the shell used to execute the script is sh by default and not bash.

The solution

After some unsuccessfull attempts to set the shell to /bin/bash using dockers’ SHELL instruction we finally came up with the solution for an inline shell script in the CMD instruction of a Dockerfile:

FROM debian:stretch
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install dpkg curl

# Setup work dir, $PROJECT_ROOT must be mounted as a volume under /elsa
WORKDIR /packages

# Publish the deb-packages to clients artifactory
CMD /bin/bash -c 'for package in *.deb;\
do ARCH=`dpkg --info $package | grep "Architecture" | sed "s/Architecture:\ \([[:alnum:]]*\).*/\1/g" | tr -d [:space:]`;\
  curl -H "X-JFrog-Art-Api:${API_KEY}" -XPUT "${REPOSITORY_URL}/${package};deb.distribution=${DISTRIBUTION};deb.component=non-free;deb.architecture=$ARCH" -T ${package};\
done'

The trick here is to call bash directly and supplying the shell script using the -c parameter. An alternative would have been to extract the script into an own file and call that in the CMD instruction like so:

# Publish the deb-packages to clients artifactory
CMD ["deploy.sh", "${API_KEY}", "${REPOSITORY_URL}", "${DISTRIBUTION}"]

In the above case I prefer the inline solution because of the short and simple script, no need for an additional external file and worrying about how to pass the parameters to the script.

Using OpenVPN in an automated deployment

In my previous post I showed how to use Ansible in a Docker container to deploy a software release to some remote server and how to trigger the deployment using a Jenkins job.

But what can we do if the target server is not directly reachable over SSH?

Many organizations use a virtual private network (VPN) infrastructure to provide external parties access to their internal services. If there is such an infrastructure in place we can extend our deployment process to use OpenVPN and still work in an unattended fashion like before.

Adding OpenVPN to our container

To be able to use OpenVPN non-interactively in our container we need to add several elements:

  1. Install OpenVPN:
    RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install openvpn
  2. Create a file with the credentials using environment variables:
    CMD echo -e "${VPN_USER}\n${VPN_PASSWORD}" > /tmp/openvpn.access
  3. Connect with OpenVPN using an appropriate VPN configuration and wait for OpenVPN to establish the connection:
    openvpn --config /deployment/our-client-config.ovpn --auth-user-pass /tmp/openvpn.access --daemon && sleep 10

Putting it together our extended Dockerfile may look like this:

FROM ubuntu:18.04
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get -y install software-properties-common
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install ansible ssh
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install openvpn

SHELL ["/bin/bash", "-c"]
# A place for our vault password file
RUN mkdir /ansible/

# Setup work dir
WORKDIR /deployment

COPY deployment/${TARGET_HOST} .

# Deploy the proposal submission system using openvpn and ansible
CMD echo -e "${VAULT_PASSWORD}" > /ansible/vault.access && \
echo -e "${VPN_USER}\n${VPN_PASSWORD}" > /tmp/openvpn.access && \
ansible-vault decrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
openvpn --config /deployment/our-client-config.ovpn --auth-user-pass /tmp/openvpn.access --daemon && \
sleep 10 && \
ansible-playbook -i ${TARGET_HOST}, -u root --key-file ${TARGET_HOST}/credentials/deployer ansible/deploy.yml && \
ansible-vault encrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
rm -rf /ansible && \
rm /tmp/openvpn.access

As you can see the CMD got quite long and messy by now. In production we put the whole process of executing the Ansible and OpenVPN commands into a shell script as part of our deployment infrastructure. That way we separate the preparations of the environment from the deployment steps themselves. The CMD looks a bit more friendly then:

CMD echo -e "${VPN_USER}\n${VPN_PASSWORD}" > /vpn/openvpn.access && \
echo -e "${VAULT_PASSWORD}" > /ansible/vault.access && \
chmod +x deploy.sh && \
./deploy.sh

As another aside you may have to mess around with nameserver configuration depending on the OpenVPN configuration and other infrastructure details. I left that out because it seems specific to the setup with our customer.

Fortunately, there is nothing to do on the ansible side as the whole VPN stuff should be completely transparent for the tools using the network connection to do their job. However we need some additional settings in our Jenkins job.

Adjusting the Jenkins Job

The Jenkins job needs the VPN credentials added and some additional parameters to docker run for the network tunneling to work in the container. The credentials are simply another injected password we may call JOB_VPN_PASSWORD and the full script may now look like follows:

docker build -t app-deploy -f deployment/docker/Dockerfile .
docker run --rm \
--cap-add=NET_ADMIN \
--device=/dev/net/tun \
-e VPN_USER=our-client-vpn-user \
-e VPN_PASSWORD=${JOB_VPN_PASSWORD} \-e VAULT_PASSWORD=${JOB_VAULT_PASSWORD} \
-e TARGET_HOST=${JOB_TARGET_HOST} \
-e ANSIBLE_HOST_KEY_CHECKING=False \
-v `pwd`/artifact:/artifact \
app-deploy

DOCKER_RUN_RESULT=`echo $?`
exit $DOCKER_RUN_RESULT

Conclusion

Adding VPN-support to your automated deployment is not that hard but there are some details to be aware of:

  • OpenVPN needs the credentials in a file – similar to Ansible – to be able to run non-interactively
  • OpenVPN either stays in the foreground or daemonizes right away if you tell it to, not only after the connection was successful. So you have to wait a sufficiently long time before proceeding with your deployment process
  • For OpenVPN to work inside a container docker needs some additional flags to allow proper tunneling of network connections
  • DNS-resolution can be tricky depending on the actual infrastructure. If you experience problems either tune your setup by adjusting name resolution in the container or access the target machines using IPs…

After ironing out the gory details depicted above we have a secure and convenient deployment process that saves a lot of time and nerves in the long run because you can update the deployment at the press of a button.

Ansible for deployment with docker

We are using Jenkins not only for our continuous integration needs but also for running deployments at the push of a button. Back in the dark times™ this often meant using Apache Ant in combination with JSch to copy the projects artifacts to some target machine and execute some remote commands over ssh.
After gathering some experience with Ansible and docker we took a new shot at the task and ended up with one-shot docker containers executing some ansible scripts. The general process stayed the same but the new solution is more robust in general and easier to maintain.
In addition to the project- and deployment-process-specific simplifications we can now use any jenkins slave with a docker installation and network access. No need for an ant installation and a java runtime anymore.
However there are some quirks you have to overcome to get the whole thing running. Let us see how we can accomplish non-interactive (read: fully automated) deployments using the tools mentioned above.

The general process

The deployment job in Jenkins has to perform the following steps:

  1. Fetch the artifacts to deploy from somewhere, e.g. a release job
  2. Build the docker image provisioned with ansible
  3. Run a container of the image providing credentials and artifacts through the environment and/or volumes

The first step can easily be implemented using the Copy Artifact Plugin.

The Dockerfile

You can use mostly any linux base image for your deployment needs as long as the distribution comes with ansible. I chose Ubunt 18.04 LTS because it is sufficiently up-to-date and we are quite familiar with it. In addition to ssh and ansible we only need the command for running the playbook. In our case the the playbook and ssh-credentials reside in the deployment folder of our projects, of course encrypted in an ansible vault. To be able to connect to the target machine we need the vault password to decrypt the ssh password or private key. Therefore we inject the vault password as an environment variable when running the container.

A simple Dockerfile may look as follows:

FROM ubuntu:18.04
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get -y install software-properties-common
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install ansible ssh

SHELL ["/bin/bash", "-c"]
# A place for our vault password file
RUN mkdir /ansible/

# Setup work dir
WORKDIR /deployment

COPY deployment/${TARGET_HOST} .

# Deploy the proposal submission system using openvpn and ansible
CMD echo -e "${VAULT_PASSWORD}" > /ansible/vault.access && \
ansible-vault decrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
ansible-playbook -i ${TARGET_HOST}, -u root --key-file ${TARGET_HOST}/credentials/deployer ansible/deploy.yml && \
ansible-vault encrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
rm -rf /ansible

Note that we create a temporary vault password file using the ${VAULT_PASSWORD} environment variable for ansible vault decryption. A second noteworthy detail is the ad-hoc inventory using the ${TARGET_HOST} environment variable terminated with comma. Using this variable we also reference the correct credential files and other host-specific data files. That way the image will not contain any plain text credentials.

The Ansible playbook

The playbook can be quite simple depending on you needs. In our case we deploy a WAR-archive to a tomcat container and make a backup copy of the old application first:

---
- hosts: all
  vars:
    artifact_name: app.war
    webapp_directory: /var/lib/tomcat8/webapps
  become: true

  tasks:
  - name: copy files to target machine
    copy: src={{ item }} dest=/tmp
    with_fileglob:
    - "/artifact/{{ artifact_name }}"
    - "../{{ inventory_hostname }}/context.xml"

  - name: stop our servlet container
    systemd:
      name: tomcat8
      state: stopped

  - name: make backup of old artifact
    command: mv "{{ webapp_directory }}/{{ artifact_name }}" "{{ webapp_directory }}/{{ artifact_name }}.bak"

  - name: delete exploded webapp
    file:
      path: "{{ webapp_directory }}/app"
      state: absent

  - name: mv artifact to webapp directory
    copy: src="/tmp/{{ artifact_name }}" dest="{{ webapp_directory }}" remote_src=yes
    notify: restart app-server

  - name: mv context file to container context
    copy: src=/tmp/context.xml dest=/etc/tomcat8/Catalina/localhost/app.xml remote_src=yes
    notify: restart app-server

  handlers:
  - name: restart app-server
    systemd:
      name: tomcat8
      state: started

The declarative syntax and the use of handlers make the process quite straight-forward. Of note here is the - hosts: all line that is used with our command line ad-hoc inventory. That way we do not need an inventory file in our project repository or somewhere else. The host can be provided by the deployment job. That brings us to the Jenkins-Job that puts everything together.

The Jenkins job

As you probably already noticed in the parts above there are several inputs required by the container and the ansible script:

  1. A volume with the artifacts to deploy
  2. A volume with host specific credentials and data files
  3. Vault credentials
  4. The target host(s)

We inject passwords using the EnvInject Plugin which stored them encrypted and masks them in all the logs. I like to prefix such environmental settings in Jenkins with JOB_ do denote their origin. After the copying build step which puts the artifacts into the artifact directory we can use a small shell script to build the image and run the container with all the needed stuff wired up. The script may look like the following:

docker build -t app-deploy -f deployment/docker/Dockerfile .
docker run --rm \
-e VAULT_PASSWORD=${JOB_VAULT_PASSWORD} \
-e TARGET_HOST=${JOB_TARGET_HOST} \
-e ANSIBLE_HOST_KEY_CHECKING=False \
-v `pwd`/artifact:/artifact \
app-deploy

DOCKER_RUN_RESULT=`echo $?`
exit $DOCKER_RUN_RESULT

We run the container with the --rm flag to delete it right after execution leaving nothing behind. We also disable host key checking for ansible to keep the build non-interactive. The last two lines let the jenkins job report the result to the container execution.

Wrapping it up

Essentially we have a deployment consisting of three parts. A docker image definition of the required environment, an ansible playbook performing the real work and a jenkins job with a small shell script wiring it all together so that it can be executed at the push of a button. We do not store any clear text credentials persistently at neither location, so that we are reasonably secure for running the deployment.