Hybrid Python packaging for Debian/Ubuntu

Writing software in Python often is a pleasure and can lead to great products with limited costs because of its expressiveness and rich ecosystem.

One area where imho Python falls a bit short is deployment and packaging. On Linux many users and customers expect packages for their platform so they can manage the software installation and updates using the standard tools.

This is where the pain often starts. Depending on the dependencies of your python project it may be simple or rather hard to provide a decent experience for the people managing your software.

I want to present several ways of providing a decent deployment experience to your customer specifically for Debian-based linux distributions.

The simple case

If all the dependencies of your project are available in usable versions for the target distribution, it is quite easy to package a python project as a .deb. My preferred way is to just use stdeb like below:

python3 -m build --sdist --no-isolation
py2dsc-deb --with-python3=True --debian-version 1 ./dist/my_project.tar.gz

This will built a simple debian package installable on a matching destination platform. For simple cases this often is enough.

If only one or a few dependencies are missing, you could consider packaging them too using this approach and allowing your project to take this same route.

Not using packages at all

If some dependencies are not available on the target platform through Debian packages it may be easiest to just provide a tarball with an installation script. This script would essentially perform the following steps

Unpack the source to a nice destination directory
Create a venv there
Install the dependencies in the venv
Provide some startscript and/or service definition to launch the software using the venv

This is simple and usually scales to bigger projects but does not provide nice and clean integration into the system tools. Administrators have to manage the software this way and not the package manager way they may expect and be comfortable with.

A hybrid Debian package approach

My hybrid approach is a blend of the two above:

It builds a normal debian package containing the project itself along with version and dependency metadata. In the postinst-script of the package however, it creates a venv and installs the dependencies unavailable or unusable (e.g. wrong version) on the target platform.

First we create the debian packaging files using

python3 setup.py sdist
dh_make -p my-project_1.0.0 -f dist/my-project-1.0.0.tar.gz

This creates a debian/ directory containing all the packaging metadata files. You should mainly edit the control, copyright and changelog files and then craft the postinst file for our hybrid packaging approach:

#!/bin/sh

set -e

case "$1" in
    configure)
      python3 -m venv /opt/my-project/venv
      . /opt/my-project/venv/bin/activate && pip install PyQt5 pytango==9.5.1 taurus pyepics
    ;;

    abort-upgrade|abort-remove|abort-deconfigure)
    ;;

    *)
        echo "postinst called with unknown argument '$1'" >&2
        exit 1
    ;;
esac

exit 0

For correct removal we need a modified postrm script too:

#!/bin/sh

set -e

case "$1" in
    purge|remove|upgrade|failed-upgrade|abort-install|abort-upgrade|disappear)
      rm -rf /opt/my-project/venv/
    ;;

    *)
        echo "postrm called with unknown argument '$1'" >&2
        exit 1
    ;;
esac
exit 0

Using a final dpkg-buildpackage -b -us -uc we get a debian package that builds its own venv on the target machine using the dependencies we actually need and not what the system offers.

For us and our customers this is a perfect compromise:

It allows us to define the dependencies and their versions exactly and mostly independent from what the target system offers while coming as a normal debian package managed using system tools.

Beware of using Git LFS on Github

In my private game programming projects, I am often using data files alongside my code for all kinds of game assets like images and sounds. So I thought it might be a good idea to use the Git Large File Storage (=LFS) extension for that.

What is Git LFS?

Essentially, if you’re not using it, the file will be in your local .git folder if it was part of your repository at any time in your history. E.g. if you accidentally added&committed a 800mb video files and then deleted it again, they will still be in your local .git folder. This problem multiplies when using a CI with many branches: each branch will typically have a copy of all files ever used in your repository. This is not a problem with source code files, because they are not that big and they can be compressed really well with different versions of themselves, which is what git typically does.

With Git LFS, the big files are only stored as references in the .git folder. This means that you might need an additional request to your remote when checking them out again, but it will save you lots space and traffic when cloning repositories.

In my previous projects on github, I just did not enable LFS for my assets. And that worked fine, as my assets are usually pretty small and I don’t change them often. But this time I wanted to try it.

Sorry, Github, what?

Imagine my suprise when I got an e-mail from github last month warning me that my LFS traffic quota is almost reached and I have to pay to extend it. What? I never had and traffic quota problems without LFS. Github doesn’t even seem to have one, if I just keep my big files in ‘pure’ git. So that’s what I get for trying to safe Github traffic.

Now the LFS quota is a meager 1 gb per month with Github Pro. That’s nothing. Luckily, my current project is not asset heavy: the full repo is very small at ~60mb. But still the quota was reached with me as a single developer. How did that happen? I just enabled CI for my project on my home server and I was creating lots of branches my CI wanted to build. That’s only 12 branches cloned for the 80% warning to be reached.

Workarounds

Jenkins, which I’m using as a CI tool, has the ability to use a ‘reference repository’ when cloning. This can be used to get the bulk of the data from a local remote, while getting the rest from Github. This is what I’m now using to avoid excess LFS traffic. It is a bit of a pain to set up: you have to manually maintain this reference repository, Jenkins will not do it for you, and you have to do that on each agent. I only have one at this point, so that’s an okay trade-off. But next time, Isure won’t use Git LFS on Github, if I can avoid it.

A Tale of Hidden Variables

Today was one of those frustrating moments that every developer encounters at some point. We were working on a Docker Compose setup and observed behavior that could only happen if a specific environment variable had been set. To ensure that this environment variable wasn’t being set I scoured through the Docker Compose file, checked the local environment variables using the export command, and grepped all the relevant files in the project directory. But no matter what I did, this environment variable was still haunting us, wreaking havoc on the setup.

After what felt like an eternity of troubleshooting, we finally uncovered the culprit: an old, hidden .env file left over from a long-forgotten configuration. This file had been silently setting the environment variable I was desperately trying to eliminate.

Here’s how it all unfolded and what I learned from the experience:

When I first suspected that the environment variable might be lurking somewhere in the project, my instinct was to use grep to search for it in all the files within my local directory. I ran something along the lines of:

grep -r 'MY_ENV_VAR' *

To my surprise, nothing relevant showed up. I had expected this command to search through everything in my local directory. However, I had forgotten one important detail: grep doesn’t search hidden files by default when you use *.

Since .env files are typically hidden (starting with a dot), grep completely skipped over them. Little did I know, that old .env file was sitting quietly in the background, setting the environment variable that was causing all my issues.

After some frustration, my colleague finally had the realization that there might be hidden files at play. In Unix-like operating systems, files that start with a dot (.), like .env, are treated as hidden and are not listed or searched by default with common commands. Just as hidden variables in physics could influence particles without being directly observable, the hidden .env file was affecting my environment variables without being immediately visible.

To include hidden files in your search, you need to modify the grep command to look for them explicitly:

grep -r 'MY_ENV_VAR' . --include=".*"

This experience led me to reflect on whether deployment-relevant files like .env should be hidden in the first place, since they can easily be overlooked during debugging. It also makes them more prone to being forgotten. Hidden files are easy to miss when troubleshooting, especially when you’re under pressure.

Given that .env files can have a significant impact on the behavior of applications, containerized setups, and CI/CD pipelines, making them hidden by default might not always be the best approach. After all, if an environment variable has the power to alter how an entire application runs, it’s something we want to be highly visible and readily accessible.

In the end, this experience taught me two important lessons:

Always search for hidden files when troubleshooting issues related to environment variables. If your Docker Compose or other environment-dependent setups aren’t behaving as expected, don’t forget to check for hidden .env files.

Consider the visibility of critical configuration files. Should .env files be hidden by default, or should they be treated as first-class citizens in our directory structures? In many cases, keeping them visible might help avoid unexpected behavior and wasted hours of debugging.

Advanced deb-packaging with CMake

CMake has become our C/C++ build tool of choice because it provides good cross-platform support and very reasonable IDE (Visual Studio, CLion, QtCreator) integration. Another very nice feature is the included packaging support using the CPack module. It allows to create native deployable artifacts for a plethora of systems including NSIS-Installer for Windows, RPM and Deb for Linux, DMG for Mac OS X and a couple more.

While all these binary generators share some CPACK-variables there are specific variables for each generator to use exclusive packaging system features or requirements.

Deb-packaging features

The debian package management system used not only by Debian but also by Ubuntu, Raspbian and many other Linux distributions. In addition to dependency handling and versioning packagers can use several other features, namely:

Specifying a section for the packaged software, e.g. Development, Games, Science etc.
Specifying package priorities like optional, required, important, standard
Specifying the relation to other packages like breaks, enhances, conflicts, replaces and so on
Using maintainer scripts to customize the installation and removal process like pre- and post-install, pre- and post-removal
Dealing with configuration files to protect end user customizations
Installing and linking files and much more without writing shell scripts using ${project-name}.{install | links | ...} files

All these make the software easier to package or easier to manage by your end users.

Using deb-features with CMake

Many of the mentioned features are directly available as appropriately named CMake-variables all starting with CPACK_DEBIAN_. I would like to specifically mention the CPACK_DEBIAN_PACKAGE_CONTROL_EXTRA variable where you can set the maintainer scripts and one of my favorite features: conffiles.

Deb protects files under /etc from accidental overwriting by default. If you want to protect files located somewhere else you specify them in a file called conffiles each on a separate line:

/opt/myproject/myproject.conf
/opt/myproject/myproject.properties

If the user made changes to these files she will be asked what to do when updating the package:

keep the own version
use the maintainer version
review the situation and merge manually.

For extra security files like myproject.conf.dpkg-dist and myproject.conf.dpkg-old are created so no changes are lost.

Unfortunately, I did not get the linking feature working without using maintainer scripts. Nevertheless I advise you to use CMake for your packaging work instead of packaging using the native debhelper way.

It is much more natural for a CMake-based project and you can reuse much of your metadata for other target platforms. It also shields you from a lot of the gory details of debian packaging without removing too much of the power of deb-packages.

Debian packaging against the rules

In a former post I talked about packaging your own software in the most convenient and natural way for the target audience. Think of a MSI or .exe installer for Microsoft Windows, distribution specific packages for Linux (maybe even by providing own repositories) or smartphone apps via the standard app stores. In the case of Debian packages there are quite strict rules about filesystem layout, licensing and signatures. This is all fine if you want to get your software upstream into official repositories.

If you are developing commercial software for specific clients things may be different! I suggest doing what serves the clients user experience (UX) best even in regard to packaging for debian or linux.

Packaging for your users

Packaging for Linux means you need to make sure that your dependencies and versioning are well defined. If you miss out here problems will arise in updating your software. Other things you may consider even if they are against the rules

Putting your whole application with executables, libraries, configuration and resources under the same prefix, e.g. /opt/${my_project} or /usr/local/${my_project}. That way the user finds everything in one place instead of scattered around in the file system.
- On debian this has some implication like the need to use the conffiles-feature for your configuration
Package together what belongs together. Often times it has no real benefit to split headers, libraries, executables etc. into different packages. Fewer packages makes it easier for the clients to handle.
Provide integration with operating system facilities like systemd or the desktop. Such a seamless integration eases use and administration of your software as no “new tricks” have to be learned.
- A simple way for systemd is a unit file that calls an executable with an environment file for configuration
Adjust the users path or put links to your executables in well known directories like /usr/bin. Running your software from the command line should be easy and with sensible defaults. Show sample usages to the user so they can apply “monkey see – monkey do”.

Example of a unit file:

[Unit]
Description=My Server

[Service]
EnvironmentFile=/opt/my_project/my-server.env
ExecStart=/usr/bin/my-server

[Install]
WantedBy=multi-user.target

In the environment file you can point to other configuration files like XML configs or the like if need be. Environment variables in general are a quite powerful way to customize behaviour of a program on a per-process base, so make sure your start scripts or executables support them for manual experimentation, too.

Possible additional preparations

If you plan to deliver your packages without providing an own repository and want to enable your clients to install them easily themselves you can further aid them.

If the target machines are few and can easily be prepared by you, install tools like gdebi that allow installation using double click and a graphical interface.

If the target machines are numerous implement automation with tools like ansible and ensure unattended installation/update procedures.

Point your clients to easy tools they are feeling comfortable with. That could of course be a command line utility like aptitude, too.

What to keep in mind

There is seldom a one-size-fits-all in custom software. Do what fits the project and your target audience best. Do not fear to break some rules if it improves the overall UX of your service.

Platform independent development with .NET

We develop most of our projects as platform independent applications, usually running under Windows, Mac and Linux. There are exceptions, for example when it is required to communicate with special hardware drivers or third-party libraries or other components that are not available on all platforms. But even then we isolate these parts into interchangeable modules that can be operated either in a simulated mode or with the real thing. The simulated modes are platform independent. Developers usually can work on the code base using their favorite operating system. Of course, it has to be tested on the target platform(s) that the application will run on in the end.

Platform independent development is both a matter of technology choices and programming practices. Concerning the technology the ecosystem based on the Java VM is a proven choice for platform independent development. We have developed many projects in Java and other JVM based languages. All of our developers are polyglots and we are able to develop software with a wide variety of programming languages.

The .NET ecosystem

Until recently the .NET platform has been known to be mainly a Microsoft Windows based ecosystem. The Mono project was started by non-Microsoft developers to provide an open source implementation of .NET for other operating systems, but it never had the same status as Microsoft’s official .NET on Windows.

However, recently Microsoft has changed course: They open sourced their .NET implementation and are porting it to other platforms. They acquired Xamarin, the company behind the Mono project, and they are releasing developer tools such as IDEs for non-Windows platforms.

IDEs for non-Windows platforms

If you want to develop a .NET project on a platform other than Windows you now have several choices for an IDE:

Xamarin Studio on Windows and Mac or MonoDevelop on Linux.
Microsoft Visual Studio for Mac. This is based on Xamarin Studio and currently has the status of a preview version.
JetBrains Rider (EAP) on Windows, Mac and Linux.
Microsoft Visual Studio Code for Windows, Mac and Linux. This is actually more a powerful programmer editor than a full IDE.

I am currently using JetBrains Rider on a Mac to develop a .NET based application in C#. Since I have used other JetBrains products before it feels very familiar. Xamarin Studio, MonoDevelop, VS for Mac and JetBrains Rider all support the solution and project file format of the original Visual Studio for Windows. This means a .NET project can be developed with any of these IDEs.

Web applications

The .NET application I am developing is based on Web technologies. The server side uses the NancyFX web framework, the client side uses React. Persistence is done with Microsoft’s Entity Framework. All the libraries I need for the project like NancyFX, the Entity Framework, a PostgreSQL driver, JSON.NET, NLog, NUnit, etc. work on non-Windows platforms without any problems.

Conclusion

Development of .NET applications is no longer limited to the Windows platform. Microsoft is actively opening up their development platform for other operating systems.

Packaging kernel modules/drivers using DKMS

Hardware drivers on linux need to fit to the running kernel. When drivers you need are not part of the distribution in use you need to build and install them yourself. While this may be ok to do once or twice it soon becomes tedious doing it after every kernel update.

The Dynamic Kernel Module Support (DKMS) may help in such a situation: The module source code is installed on the target machine and can be rebuilt and installed automatically when a new kernel is installed. While veterans may be willing to manually maintain their hardware drivers with DKMS end user do not care about the underlying system that keeps their hardware working. They want to manage their software updates using the tools of their distribution and everything should be working automagically.

I want to show you how to package a kernel driver as an RPM package hiding all of the complexities of DKMS from the user. This requires several steps:

Preparing/patching the driver (aka kernel module) to include dkms.conf and follow the required conventions of DKMS
Creating a RPM spec-file to install the source, tool chain and integrate the module source with DKMS

While there is native support for RPM packaging in DKMS I found the following procedure more intuitive and flexible.

Preparing the module source

You need at least a small file called dkms.conf to describe the module source to the DKMS system. It usually looks like that:

PACKAGE_NAME="menable"
PACKAGE_VERSION=3.9.18.4.0.7
BUILT_MODULE_NAME[0]="menable"
DEST_MODULE_LOCATION[0]="/extra"
AUTOINSTALL="yes"

Also make sure that the source tarball extracts into the directory /usr/src/$PACKAGE_NAME-$PACKAGE_VERSION ! If you do not like /usr/src as a location for your kernel modules you can configure it in /etc/dkms/framework.conf.

Preparing the spec file

Since we are not building a binary and package it but install source code, register, build and install it on the target machine the spec file looks a bit different than usual: We have no build step, instead we just install the source tree and potentially additional files like udev rules or documentation and perform all DKMS work in the postinstall and preuninstall scripts. All that means, that we build a noarch-RPM an depend on dkms, kernel sources and a compiler.

Preparation section

Here we unpack and patch the module source, e.g.:

Source: %{module}-%{version}.tar.bz2
Patch0: menable-dkms.patch
Patch1: menable-fix-for-kernel-3-8.patch

%prep
%setup -n %{module}-%{version} -q
%patch0 -p0
%patch1 -p1

Install section

Basically we just copy the source tree to /usr/src in our build root. In this example we have to install some additional files, too.

%install
rm -rf %{buildroot}
mkdir -p %{buildroot}/usr/src/%{module}-%{version}/
cp -r * %{buildroot}/usr/src/%{module}-%{version}
mkdir -p %{buildroot}/etc/udev/rules.d/
install udev/10-siso.rules %{buildroot}/etc/udev/rules.d/
mkdir -p %{buildroot}/sbin/
install udev/men_path_id udev/men_uiq %{buildroot}/sbin/

Post-install section

In the post-install script of the RPM we add our module to the DKMS system build and install it:

occurrences=/usr/sbin/dkms status | grep "%{module}" | grep "%{version}" | wc -l
if [ ! occurrences > 0 ];
then
    /usr/sbin/dkms add -m %{module} -v %{version}
fi
/usr/sbin/dkms build -m %{module} -v %{version}
/usr/sbin/dkms install -m %{module} -v %{version}
exit 0

Pre-uninstall section

We need to remove our module from DKMS if the user uninstalls our package to leave the system in a clean state. So we need a pre-uninstall script like this:

/usr/sbin/dkms remove -m %{module} -v %{version} --all
exit 0

Conclusion

Packaging kernel modules using DKMS and RPM is not really hard and provides huge benefits to your users. There are some little quirks like the post-install and pre-uninstall scripts but after you got that working you (and your users) are rewarded with a great, fully integrated experience. You can use the full spec file of the driver in the above example as a template for your driver packages.

	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…