Docker runtime breaking your container

Docker (or container technology in general) is a great tool to clearly separate the concerns of developers and operations. We use it to simplify various tasks like building projects, packaging them for different platforms and deployment of our software onto the target machines like staging and production servers. All the specifics of the projects are contained and version controlled using the Dockerfiles and compose files.

Our operations only needs to provide some infrastructure able to build container images and run them. This works great most of the time and removes a lot of the friction between developers and operation where in the past snowflaky-servers needed to be setup and maintained. Developers often had to ask for specific setups and environments because each project had their own needs. That is all gone with this great container technology. Brave new world. Except when it suddenly does not work anymore.

Help, my deployment container stopped working!

As mentioned above we use docker to deploy our software to the target machines. These machines are often part of a corporate network protected by firewalls and only accessible using VPN. I already talked about how to use openvpn in a docker container for deployment. So the other day I was making a release of one of my long-running projects and pressing the deploy button for that project on our jenkins continuous integration server.

But instead of just leaning back, relaxing and watching the magic work the deployment failed and the red light lit up! A look into the job output showed that the connection to the target machine was refused. A quick check from the developer machine showed no problem on the receiving side. VPN, target machine and everything was up and running as usual.

After a quick manual deployment performed with care and administrator hat I went on an investigation journey…

What was going on?

The deployment job did not change for several months, the container image did not change and the rest of the infrastructure was working as expected. After more digging, debugging narrowing down the problem I found out, that openvpn did not work in the container anymore because of some strange permission denied error:

Tue May 19 15:24:14 2020 /sbin/ip addr add dev tap0 1xx.xxx.xxx.xxx/22 broadcast 1xx.xxx.xxx.xxx
Tue May 19 15:24:14 2020 /sbin/ip -6 addr add 2axx:1xxx:4:5xxx:9xx:5xxx:5xxx:4xxx/64 dev tap0
RTNETLINK answers: Permission denied
Tue May 19 15:24:14 2020 Linux ip -6 addr add failed: external program exited with error status: 2
Tue May 19 15:24:14 2020 Exiting due to fatal error

This hot trace made it easy to google for and revealed following issue on github: https://github.com/dperson/openvpn-client/issues/75. The cause of all the trouble was changed behaviour of the docker runtime. Our automatic updates had run over the weekend and actually installed a new package version of the docker runtime (see exerpt from apt history log):

containerd.io:amd64 (1.2.13-1, 1.2.13-2)

This subtle change broke my container! After some sacrifices to the whale gods I went on to implement the fix. Fortunately there is an easy way to get it working like before. You just have to pass following command line switch to docker run and everything works as expected:

--sysctl net.ipv6.conf.all.disable_ipv6=0

As nice as containers are for abstracting away hardware, operating systems and other environment details sometimes the container runtime shines through. It is just a shame that such things happen on minor releases or package release upgrades…

Running a for-loop in a docker container

Docker is a great tool for running services or deployments in a defined and clean environment. Operations just has to provide a host for running the containers and everything else is up to the developers. They can forge their own environment and setup all the prerequisites appropriately for their task. No need to beg the admins to install some tools and configure server machines to fit the needs of a certain project. The developers just define their needs in a Dockerfile.

The Dockerfile contains instructions to setup a container in a domain specific language (DSL). This language consists only of a couple commands and is really simple. Like every language out there, it has its own quirks though. I would like to show a solution to one I encountered when trying to deploy several items to a target machine.

The task at hand

We are developing a distributed system for data acquisition, storage and real-time-display for one of our clients. We deliver the different parts of the system as deb-packages for the target machines running at the customer’s site. Our customer hosts her own debian repository using an Artifactory server. That all seems simple enough, because artifactory tells you how to upload new artifacts using curl. So we built a simple container to perform the upload using curl. We tried to supply the bash shell script required to the CMD instruction of the Dockerfile but ran into issues with our first attempts. Here is the naive, dysfunctional Dockerfile:

FROM debian:stretch
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install dpkg curl

# Setup work dir, $PROJECT_ROOT must be mounted as a volume under /elsa
WORKDIR /packages

# Publish the deb-packages to clients artifactory
CMD for package in *.deb; do\n\
  ARCH=`dpkg --info $package | grep "Architecture" | sed "s/Architecture:\ \([[:alnum:]]*\).*/\1/g" | tr -d [:space:]`\n\
  curl -H "X-JFrog-Art-Api:${API_KEY}" -XPUT "${REPOSITORY_URL}/${package};deb.distribution=${DISTRIBUTION};deb.component=non-free;deb.architecture=$ARCH" -T ${package} \n\
  done

The command fails because the for-shell built-in instruction does not count as a command and the shell used to execute the script is sh by default and not bash.

The solution

After some unsuccessfull attempts to set the shell to /bin/bash using dockers’ SHELL instruction we finally came up with the solution for an inline shell script in the CMD instruction of a Dockerfile:

FROM debian:stretch
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install dpkg curl

# Setup work dir, $PROJECT_ROOT must be mounted as a volume under /elsa
WORKDIR /packages

# Publish the deb-packages to clients artifactory
CMD /bin/bash -c 'for package in *.deb;\
do ARCH=`dpkg --info $package | grep "Architecture" | sed "s/Architecture:\ \([[:alnum:]]*\).*/\1/g" | tr -d [:space:]`;\
  curl -H "X-JFrog-Art-Api:${API_KEY}" -XPUT "${REPOSITORY_URL}/${package};deb.distribution=${DISTRIBUTION};deb.component=non-free;deb.architecture=$ARCH" -T ${package};\
done'

The trick here is to call bash directly and supplying the shell script using the -c parameter. An alternative would have been to extract the script into an own file and call that in the CMD instruction like so:

# Publish the deb-packages to clients artifactory
CMD ["deploy.sh", "${API_KEY}", "${REPOSITORY_URL}", "${DISTRIBUTION}"]

In the above case I prefer the inline solution because of the short and simple script, no need for an additional external file and worrying about how to pass the parameters to the script.

Using OpenVPN in an automated deployment

In my previous post I showed how to use Ansible in a Docker container to deploy a software release to some remote server and how to trigger the deployment using a Jenkins job.

But what can we do if the target server is not directly reachable over SSH?

Many organizations use a virtual private network (VPN) infrastructure to provide external parties access to their internal services. If there is such an infrastructure in place we can extend our deployment process to use OpenVPN and still work in an unattended fashion like before.

Adding OpenVPN to our container

To be able to use OpenVPN non-interactively in our container we need to add several elements:

  1. Install OpenVPN:
    RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install openvpn
  2. Create a file with the credentials using environment variables:
    CMD echo -e "${VPN_USER}\n${VPN_PASSWORD}" > /tmp/openvpn.access
  3. Connect with OpenVPN using an appropriate VPN configuration and wait for OpenVPN to establish the connection:
    openvpn --config /deployment/our-client-config.ovpn --auth-user-pass /tmp/openvpn.access --daemon && sleep 10

Putting it together our extended Dockerfile may look like this:

FROM ubuntu:18.04
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get -y install software-properties-common
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install ansible ssh
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install openvpn

SHELL ["/bin/bash", "-c"]
# A place for our vault password file
RUN mkdir /ansible/

# Setup work dir
WORKDIR /deployment

COPY deployment/${TARGET_HOST} .

# Deploy the proposal submission system using openvpn and ansible
CMD echo -e "${VAULT_PASSWORD}" > /ansible/vault.access && \
echo -e "${VPN_USER}\n${VPN_PASSWORD}" > /tmp/openvpn.access && \
ansible-vault decrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
openvpn --config /deployment/our-client-config.ovpn --auth-user-pass /tmp/openvpn.access --daemon && \
sleep 10 && \
ansible-playbook -i ${TARGET_HOST}, -u root --key-file ${TARGET_HOST}/credentials/deployer ansible/deploy.yml && \
ansible-vault encrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
rm -rf /ansible && \
rm /tmp/openvpn.access

As you can see the CMD got quite long and messy by now. In production we put the whole process of executing the Ansible and OpenVPN commands into a shell script as part of our deployment infrastructure. That way we separate the preparations of the environment from the deployment steps themselves. The CMD looks a bit more friendly then:

CMD echo -e "${VPN_USER}\n${VPN_PASSWORD}" > /vpn/openvpn.access && \
echo -e "${VAULT_PASSWORD}" > /ansible/vault.access && \
chmod +x deploy.sh && \
./deploy.sh

As another aside you may have to mess around with nameserver configuration depending on the OpenVPN configuration and other infrastructure details. I left that out because it seems specific to the setup with our customer.

Fortunately, there is nothing to do on the ansible side as the whole VPN stuff should be completely transparent for the tools using the network connection to do their job. However we need some additional settings in our Jenkins job.

Adjusting the Jenkins Job

The Jenkins job needs the VPN credentials added and some additional parameters to docker run for the network tunneling to work in the container. The credentials are simply another injected password we may call JOB_VPN_PASSWORD and the full script may now look like follows:

docker build -t app-deploy -f deployment/docker/Dockerfile .
docker run --rm \
--cap-add=NET_ADMIN \
--device=/dev/net/tun \
-e VPN_USER=our-client-vpn-user \
-e VPN_PASSWORD=${JOB_VPN_PASSWORD} \-e VAULT_PASSWORD=${JOB_VAULT_PASSWORD} \
-e TARGET_HOST=${JOB_TARGET_HOST} \
-e ANSIBLE_HOST_KEY_CHECKING=False \
-v `pwd`/artifact:/artifact \
app-deploy

DOCKER_RUN_RESULT=`echo $?`
exit $DOCKER_RUN_RESULT

Conclusion

Adding VPN-support to your automated deployment is not that hard but there are some details to be aware of:

  • OpenVPN needs the credentials in a file – similar to Ansible – to be able to run non-interactively
  • OpenVPN either stays in the foreground or daemonizes right away if you tell it to, not only after the connection was successful. So you have to wait a sufficiently long time before proceeding with your deployment process
  • For OpenVPN to work inside a container docker needs some additional flags to allow proper tunneling of network connections
  • DNS-resolution can be tricky depending on the actual infrastructure. If you experience problems either tune your setup by adjusting name resolution in the container or access the target machines using IPs…

After ironing out the gory details depicted above we have a secure and convenient deployment process that saves a lot of time and nerves in the long run because you can update the deployment at the press of a button.

Ansible for deployment with docker

We are using Jenkins not only for our continuous integration needs but also for running deployments at the push of a button. Back in the dark times™ this often meant using Apache Ant in combination with JSch to copy the projects artifacts to some target machine and execute some remote commands over ssh.
After gathering some experience with Ansible and docker we took a new shot at the task and ended up with one-shot docker containers executing some ansible scripts. The general process stayed the same but the new solution is more robust in general and easier to maintain.
In addition to the project- and deployment-process-specific simplifications we can now use any jenkins slave with a docker installation and network access. No need for an ant installation and a java runtime anymore.
However there are some quirks you have to overcome to get the whole thing running. Let us see how we can accomplish non-interactive (read: fully automated) deployments using the tools mentioned above.

The general process

The deployment job in Jenkins has to perform the following steps:

  1. Fetch the artifacts to deploy from somewhere, e.g. a release job
  2. Build the docker image provisioned with ansible
  3. Run a container of the image providing credentials and artifacts through the environment and/or volumes

The first step can easily be implemented using the Copy Artifact Plugin.

The Dockerfile

You can use mostly any linux base image for your deployment needs as long as the distribution comes with ansible. I chose Ubunt 18.04 LTS because it is sufficiently up-to-date and we are quite familiar with it. In addition to ssh and ansible we only need the command for running the playbook. In our case the the playbook and ssh-credentials reside in the deployment folder of our projects, of course encrypted in an ansible vault. To be able to connect to the target machine we need the vault password to decrypt the ssh password or private key. Therefore we inject the vault password as an environment variable when running the container.

A simple Dockerfile may look as follows:

FROM ubuntu:18.04
RUN DEBIAN_FRONTEND=noninteractive apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get -y install software-properties-common
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install ansible ssh

SHELL ["/bin/bash", "-c"]
# A place for our vault password file
RUN mkdir /ansible/

# Setup work dir
WORKDIR /deployment

COPY deployment/${TARGET_HOST} .

# Deploy the proposal submission system using openvpn and ansible
CMD echo -e "${VAULT_PASSWORD}" > /ansible/vault.access && \
ansible-vault decrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
ansible-playbook -i ${TARGET_HOST}, -u root --key-file ${TARGET_HOST}/credentials/deployer ansible/deploy.yml && \
ansible-vault encrypt --vault-password-file=/ansible/vault.access ${TARGET_HOST}/credentials/deployer && \
rm -rf /ansible

Note that we create a temporary vault password file using the ${VAULT_PASSWORD} environment variable for ansible vault decryption. A second noteworthy detail is the ad-hoc inventory using the ${TARGET_HOST} environment variable terminated with comma. Using this variable we also reference the correct credential files and other host-specific data files. That way the image will not contain any plain text credentials.

The Ansible playbook

The playbook can be quite simple depending on you needs. In our case we deploy a WAR-archive to a tomcat container and make a backup copy of the old application first:

---
- hosts: all
  vars:
    artifact_name: app.war
    webapp_directory: /var/lib/tomcat8/webapps
  become: true

  tasks:
  - name: copy files to target machine
    copy: src={{ item }} dest=/tmp
    with_fileglob:
    - "/artifact/{{ artifact_name }}"
    - "../{{ inventory_hostname }}/context.xml"

  - name: stop our servlet container
    systemd:
      name: tomcat8
      state: stopped

  - name: make backup of old artifact
    command: mv "{{ webapp_directory }}/{{ artifact_name }}" "{{ webapp_directory }}/{{ artifact_name }}.bak"

  - name: delete exploded webapp
    file:
      path: "{{ webapp_directory }}/app"
      state: absent

  - name: mv artifact to webapp directory
    copy: src="/tmp/{{ artifact_name }}" dest="{{ webapp_directory }}" remote_src=yes
    notify: restart app-server

  - name: mv context file to container context
    copy: src=/tmp/context.xml dest=/etc/tomcat8/Catalina/localhost/app.xml remote_src=yes
    notify: restart app-server

  handlers:
  - name: restart app-server
    systemd:
      name: tomcat8
      state: started

The declarative syntax and the use of handlers make the process quite straight-forward. Of note here is the - hosts: all line that is used with our command line ad-hoc inventory. That way we do not need an inventory file in our project repository or somewhere else. The host can be provided by the deployment job. That brings us to the Jenkins-Job that puts everything together.

The Jenkins job

As you probably already noticed in the parts above there are several inputs required by the container and the ansible script:

  1. A volume with the artifacts to deploy
  2. A volume with host specific credentials and data files
  3. Vault credentials
  4. The target host(s)

We inject passwords using the EnvInject Plugin which stored them encrypted and masks them in all the logs. I like to prefix such environmental settings in Jenkins with JOB_ do denote their origin. After the copying build step which puts the artifacts into the artifact directory we can use a small shell script to build the image and run the container with all the needed stuff wired up. The script may look like the following:

docker build -t app-deploy -f deployment/docker/Dockerfile .
docker run --rm \
-e VAULT_PASSWORD=${JOB_VAULT_PASSWORD} \
-e TARGET_HOST=${JOB_TARGET_HOST} \
-e ANSIBLE_HOST_KEY_CHECKING=False \
-v `pwd`/artifact:/artifact \
app-deploy

DOCKER_RUN_RESULT=`echo $?`
exit $DOCKER_RUN_RESULT

We run the container with the --rm flag to delete it right after execution leaving nothing behind. We also disable host key checking for ansible to keep the build non-interactive. The last two lines let the jenkins job report the result to the container execution.

Wrapping it up

Essentially we have a deployment consisting of three parts. A docker image definition of the required environment, an ansible playbook performing the real work and a jenkins job with a small shell script wiring it all together so that it can be executed at the push of a button. We do not store any clear text credentials persistently at neither location, so that we are reasonably secure for running the deployment.