Hybrid Python packaging for Debian/Ubuntu

Writing software in Python often is a pleasure and can lead to great products with limited costs because of its expressiveness and rich ecosystem.

One area where imho Python falls a bit short is deployment and packaging. On Linux many users and customers expect packages for their platform so they can manage the software installation and updates using the standard tools.

This is where the pain often starts. Depending on the dependencies of your python project it may be simple or rather hard to provide a decent experience for the people managing your software.

I want to present several ways of providing a decent deployment experience to your customer specifically for Debian-based linux distributions.

The simple case

If all the dependencies of your project are available in usable versions for the target distribution, it is quite easy to package a python project as a .deb. My preferred way is to just use stdeb like below:

python3 -m build --sdist --no-isolation
py2dsc-deb --with-python3=True --debian-version 1 ./dist/my_project.tar.gz

This will built a simple debian package installable on a matching destination platform. For simple cases this often is enough.

If only one or a few dependencies are missing, you could consider packaging them too using this approach and allowing your project to take this same route.

Not using packages at all

If some dependencies are not available on the target platform through Debian packages it may be easiest to just provide a tarball with an installation script. This script would essentially perform the following steps

Unpack the source to a nice destination directory
Create a venv there
Install the dependencies in the venv
Provide some startscript and/or service definition to launch the software using the venv

This is simple and usually scales to bigger projects but does not provide nice and clean integration into the system tools. Administrators have to manage the software this way and not the package manager way they may expect and be comfortable with.

A hybrid Debian package approach

My hybrid approach is a blend of the two above:

It builds a normal debian package containing the project itself along with version and dependency metadata. In the postinst-script of the package however, it creates a venv and installs the dependencies unavailable or unusable (e.g. wrong version) on the target platform.

First we create the debian packaging files using

python3 setup.py sdist
dh_make -p my-project_1.0.0 -f dist/my-project-1.0.0.tar.gz

This creates a debian/ directory containing all the packaging metadata files. You should mainly edit the control, copyright and changelog files and then craft the postinst file for our hybrid packaging approach:

#!/bin/sh

set -e

case "$1" in
    configure)
      python3 -m venv /opt/my-project/venv
      . /opt/my-project/venv/bin/activate && pip install PyQt5 pytango==9.5.1 taurus pyepics
    ;;

    abort-upgrade|abort-remove|abort-deconfigure)
    ;;

    *)
        echo "postinst called with unknown argument '$1'" >&2
        exit 1
    ;;
esac

exit 0

For correct removal we need a modified postrm script too:

#!/bin/sh

set -e

case "$1" in
    purge|remove|upgrade|failed-upgrade|abort-install|abort-upgrade|disappear)
      rm -rf /opt/my-project/venv/
    ;;

    *)
        echo "postrm called with unknown argument '$1'" >&2
        exit 1
    ;;
esac
exit 0

Using a final dpkg-buildpackage -b -us -uc we get a debian package that builds its own venv on the target machine using the dependencies we actually need and not what the system offers.

For us and our customers this is a perfect compromise:

It allows us to define the dependencies and their versions exactly and mostly independent from what the target system offers while coming as a normal debian package managed using system tools.

A few more heuristics for rejecting Merges

Since a few weeks ago, I am trying to find a few easy things to look for when facing a Merge Request (also called “Pull Request”) that is too large to be quickly accepted.

When facing a larger Merge Request, how can one rather quickly decide whether it is worth going through all changes in one session, or to decide that this is too dangerous and reject.

These thoughts apply for a medium-sized repository – I am of the opinion that if you happen to work in a large project, or contribute to a public open-source repository, one should never even aim for larger merge requests, i.e. they should be rejected if there is more than one reason any code changed in that MR.

Being too strict just for the sake of it, in my eyes, can be a costly mistake – You waste your time in unnecessary structure and, in earlier / more experimental development stages, you might not want to take the drive out of a project. Nevertheless, maintainers need to know what’s going on.

Last time, I kept two main thoughts open, and I want to discuss these here, especially since they now had time to flourish a while in the tasty marinade that is my brain.

Can you describe the changes in one sentence?

I want my code to change for a multitude of reasons, but I want to know which kind of “glasses” I read these changes with. For me, it is a lesser problem to go through many changes if I can assign them to the same “why”. I.e. introducting i18n might change many lines of codes, but as these happen for the same reason, they can be understood easily.

But if, for some reason, people decide to change the formatting (replace tabs with spaces or such shenanigans), you better make sure that this the only reason any line changes. If there is any other thing someone did “as it just appeared easy” -reject the whole MR. That is no place for the “boy scout rule”, it is just too dangerous.

For me, it is too little to always apply the same type of glasses to any Merge Request there is. One could say “I only look for technical correctness”. But usually I can very well allow myself some flexibility there. I need to know, however, that all changes happened for only a few given reasons, because only then I can be sure that the developer did not actually loose track of his goal somewhere on the way.

Does this Merge increase the trust in a collaboration?

From a bird-eye point of view, people working together should always pay attention whether a given trajectory goes in the direction of increasing trust. Of course, if you fix a broken menu button in a user interface of a large project, the MR should just do that – but if you are in a smaller project with the intention of staying there, I suggest that every MR expresses exactly that: “I understand what is important at the current stage of this collaboration and do exactly that”.

Especially when working together for a longer time, it can be easy to let the branching discipline slip a little – things might have gone well for a longer time. But this is a fragile state, because if you then care too little about the boundaries of a specific MR, this can damage the trust all too easily.

In a customer project, this trust goes out to the customer. This might be the difference between “if something breaks they’ll write you a mail, you fix it, they are happy” and “they insist on a certain test coverage”.

Conclusion

So basically, reviewing the code of others boils down to writing own code, or improving User Experience, or managing anything – think not in a list of small-scale checklists, think in terms of “Cognitive Load”. A good programmer should have a large set of possible glasses (mindsets) through which they see code, especially foreign code. One should always be honest whether a given change is compatible with only a very small number of reasons. If there is a Merge Request that allows itself to do too much, this is not the Boy Scout Rule – it is a recipe of undermining mutual trust. Do not overestimate your own brain capacity. Reject the thing, and reserve that capacity for something useful.

Git revert may not be what you want!

Git is a powerful version control system (VCS) for tracking changes in a source code base as most developers will know by now… It is also known for its quirky command line interface and previously less known concepts like staging area, cherry-picking and rebasing.

This can make Git quite intimidating and there are many memes about how confusing and hard it is to work with.

Especially for people coming from Subversion (SVN) there are a few changes and pitfalls because some commands have the same name but either do different things in the context or altogether, e.g. commit, checkout and revert.

Commit in the SVN world perform synchronization with the remote repository whereas a commit in git is local and has to synchronized with the remote repository using push. Checkout is similar and maybe even worse: In SVN it fetches the remote repository state and puts it in your working copy. In Git is used to replace files in your working copy with contents from the local repository or formerly to switch and/or create local branches…

But now on to our main topic:

What does git revert actually do?

Revert is maybe the most misleading command existing in both SVN and Git.

In SVN it is quite simple: Undo all local edits (see svnbook). It throws away you uncommited changes and causes potential data loss.

Git’s revert works in a completely different way! It creates “inverse” commit(s) in your (local) repository. Your working copy has to be clean without changes to be able to revert and you will undo the specified commit and record this change in the repository and its history.

So both revert commands are fundamentally different. This can lead to unexpected behaviour, especially when reverting a merge commit:

We had the case where our developers had two feature branches and one dev decided to temporarily merge the other feature branch to test some things out. Then he reverted the merge using git revert and opened a merge request. This leads to the situation, that the other feature branch becomes unmergeable because git records no changes. This is rightfully so, because this was already merged as part of the first one, but without the changes (because they were reverted!). So all the changes of the other feature branch were lost and not easily mergeable:

To fix a situation like this you can cherry-pick the reverted commit mentioned in the commit message.

How to revert local changes in git

So revert is maybe not what we wanted to do in the first place to undo our temporary merge of the other feature branch. Instead we should use git reset. It has some variants, but the equivalent of SVN’s revert would be the lossy

git reset --hard HEAD

to clean our working copy. If we wanted to “revert” the merge commit in our example we would do

git reset --hard HEAD^

to undo the last commit.

I hope this clear up some confusion regarding the function of some Git commands vs. their SVN counterparts or “false friends”.

Updating Grails: From 5 to 6

We have a long history of maintaining quite a large grails application since Grails 1.0. Over the first few major versions upgrading was a real pain.

The situation changed dramatically after Grails 3 as you can see in our former blog posts and and the upgrade from 3 to 4. Going from 4 to 5 was so smooth that I did not even dedicate a blog post to it.

A few weeks ago we decided to upgrade from version 5 to 6 and here is a short summary of our experiences. Things fortunately went quite smooth again:

The changes

The new major version 6 contains mostly dependency upgrades and official support for Java 17 which is probably the biggest selling point.

Some other minor things to note are without any particular order:

The logback configuration file name has changed from logback.groovy to logback-config.groovy
The pathing-jar is already setup for you, so you can remove the directive from your build files if you had it in
The build uses the standard gradle-application plugin allowing some more simplifications in your build files and infrastructure

Other noteworthy news

Object Computing stepped down as the steward of the grails framework and informed the community in an open letter. While there is still the grails foundation and the open source community we can expect the development and changes to slow down.

Whether this results in negative developer experience remains to be seen.

Conclusion

Using and maintaining a Grails application developed to a rather smooth ride and only poorly maintained plugins hurt your experience. The framework and its foundation have been pretty solid for some time now.

Regarding new projects you certainly have to evaluate if using grails is the best option. For an full stack framework the answer maybe yes, but if you only need a powerful API backend lighter and more modern frameworks like micronaut or javalin may be a better choice.

When open source is only marketing

Nowadays, Open Source Software (OSS) is everywhere and probably everyone is using it somewhere. It may be your browser, the operating system on your phone, some libraries of your beloved games and apps or some web server software or infrastructure software delivering websites and other data like music or videos to you. Basically, OSS is everywhere – even if it is sometimes not openly visible.

When developing software we often use libraries and frameworks that are open source. For us it is great for a couple of reasons:

Often free of cost
We can have a look under the hood to learn how something is done
We can check and review the code for security issues to increase trust
We can improve the software, adapt it to our use case or fix issues plaguing us

In using OSS and probably contributing to it by either feedback or code we strenghen the software development community and are pushing our domain a little forward step-by-step.

Some companies noticed the positive notion about OSS and started to build their businesses on OSS. They pay developers to develop or contribute to OSS and provide services around it offering assistance, support or solutions based on their OSS. Companies like Red Hat or Nextcloud have quite some success with this model.

I personally think it is a great way of business leading to a win-win situation in the IT world: A company can make money and millions of other developers can increase their productivity or develop their own solutions standing on the shoulder of giants (aka OSS).

Open Source done wrong

Sometimes, I feel open source is only used as a label and marketing tool. Let me illustrate by an example I stumbled upon a few weeks ago:

There is an OSS javascript library with a github repository, issue tracker and so on. Sounds like a proper open source project, doesn’t it?

Well, there is an issue or maybe more a lacking feature that some people noticed and somebody even opened a pull request with a proposed fix.

The fix does not breaking anything, consists of the addition of 5 lines in one file and some CSS styles in another and only adds a feature to one component of the library.

Still, the pull request (PR) was simply put down and closed with a comment along the lines

“We do not think this is necessary, so we do not accept it.”

A follow-up issue 3 (!) years later was closed with a copy&paste comment, too.

While of course it is always the right and obligation (look at the recent supply chain attacks!) of the maintainers to decide what gets in and what does not it sometimes just has the notion of “No, not invented here”. Blocking unobstrusive, small and sensible changes without real reasons is very counter-productive. It defeats some of the main advantages of an open source solution:

Developers are not able to give back and get their improvements upstream. This leads to workarounds, forks oder separately maintained patches. That in turn means overhead and more overall effort needed to build and maintain a solution thus mitigating part of the benefits using the OSS in the first place.

Do not take me wrong: Closed/proprietary software does not even give you many of the options and advantages of OSS. It offers more of a “take it as it is” approach and leaves it up to you to decide if it is worth it or not.

Doing it right

On the other hand there are tons of well governed open source projects that really consider contributions for upstream and listen to the community. We have had several successful contributions to the Tango and Grails projects for example.

So if you or your company run an open source project I advise you to be open and to listen to your users. Working as a loosely connected distributed team with a good steering team at the helm leads to many potential win-win situations and makes the (software development) world a better place.

Exploring Tango Admin Devices

We are using the open-source control system framework TANGO in several projects where coordinated control of multiple hardware systems is needed.

What is TANGO good for?

TANGO provides uniform, distributed access and control of heterogeneous hardware devices. It is object-oriented in nature and usually one hardware device is represented by one (or more) software devices.

The device drivers can be written in either C++, Java or Python and client libraries exist for all of these languages. Using a middleware adapter like TangoGQL any language can access the devices.

All that makes TANGO useful for building SCADA systems ranging from a handful controlled devices to several hundreds you want to supervise and control.

Lesser know features of TANGO

All of the above is well known in the TANGO and SCADA community and quite straightforward. What some people may not know is that TANGO automatically provides an Admin-device for each TANGO server (an executable running one or more TANGO devices).

These admin devices have an address of the form dserver/<server_name>/<instance_name> and provide numerous commands for controlling and querying the TANGO device server instance:

You can for example introspect the device server to find available device classes, device instances and needed device properties (think of them as configuration settings).

In addition to introspection you can also control some aspects of the TANGO server like polling and logging. The Admin-device also allows restarting individual devices or even the whole server instance. This can be very useful to apply configuration changes remotely without shell access or something similar to the remote machine.

Wrapping it up

Admin-devices automatically exist and run for each TANGO device server. Using them allows clients to explore what devices are available, what they offer and how they can be configured. They also allow some aspects to be changed remotely at runtime.

We use these features to provide a rich web-base UI for managing the control system in a convenient way instead of relying on the basic tools (like Jive and Astor) that TANGO offers out-of-the-box.

git-submodules in Jenkins pipeline scripts

Nowadays, the source control git is a widespread tool and work nicely hand in hand with many IDEs and continuous integration (CI) solutions.

We use Jenkins as our CI server and migrated mostly to the so-called pipeline scripts for job configuration. This has the benefit of storing your job configuration as code in your code repository and not in the CI servers configuration. Thus it is easier to migrate the project to other Jenkins CI instances, and you get versioning of your config for free.

Configuration of a pipeline job

Such a pipeline job is easily configured in Jenkins merely providing the repository and the location of the pipeline script which is usually called Jenkinsfile. A simple Jenkinsfile may look like:

node ('build&&linux') {
    try {
        env.JAVA_HOME="${tool 'Managed Java 11'}"
        stage ('Prepare Workspace') {
            sh label: 'Clean build directory', script: 'rm -rf my_project/build'
            checkout scm // This fetches the code from our repository
        }
        stage ('Build project') {
            withGradle {
                sh 'cd my_project && ./gradlew --continue war check'
            }
            junit testResults: 'my_project/build/test-results/test/TEST-*.xml'
        }
        stage ('Collect artifacts') {
            archiveArtifacts(
                artifacts: 'my_project/build/libs/*.war'
            )
        }
    } catch (Exception e) {
        if (e in org.jenkinsci.plugins.workflow.steps.FlowInterruptedException) {
            currentBuild.result = 'ABORTED'
        } else {
            echo "Exception: ${e.class}, message: ${e.message}"
            currentBuild.result = 'FAILURE'
        }
    }
}

If you are running GitLab you get some nice features in combination with the Jenkins Gitlab plugin like automatic creation of builds for all your branches and merge requests if you configure the job as a multibranch pipeline.

Everything works quite well if your project resides in a single Git repository.

How to use it with git submodules

If your project uses git-submodules to connect other git repositories that are not directly part of your project the responsible line checkout scm in the Jenkinsfile does not clone or update the submodules. Unfortunately, the fix for this issue leads to a somewhat bloated checkout command as you have to copy and mention the settings which are injected by default into the parameter object of the GitSCM class and its extensions…

The simple one-liner from above becomes something like this:

checkout scm: [
    $class: 'GitSCM',
    branches: scm.branches,
    extensions: [
        [$class: 'SubmoduleOption',
        disableSubmodules: false,
        parentCredentials: false,
        recursiveSubmodules: true,
        reference: 'https://github.com/softwareschneiderei/ADS.git',
        shallow: true,
        trackingSubmodules: false]
    ],
    submoduleCfg: [],
    userRemoteConfigs: scm.userRemoteConfigs
]

After these changes projects with submodules work as expected, too.

Automated vulnerability checking of software dependencies

The OWASP organization is focused on improving the security of software systems and regularly publishes lists with security risks, such as the OWASP Top 10 Most Critical Web Application Security Risks or the Mobile Top 10 Security Risks. Among these are common attack vectors like command injections, buffer overruns, stack buffer overflow attacks and SQL injections.

When developing software you have to be aware of these in order to avoid and prevent them. If your project depends on third-party software components, such as open source libraries, you have to assess those dependencies for security risks as well. It is not enough to do this just once. You have to check them regularly and watch for any known, publicly disclosed, vulnerabilities in these dependencies.

Publicly known information-security vulnerabilities are tracked according to the Common Vulnerabilities and Exposures (CVE) standard. Each vulnerability is assigned an ID, for example CVE-2009-2704, and published in the National Vulnerability Database (NVD) by the U.S. government. Here’s an example for such an entry.

Automated Dependency Checking

There are tools and services to automatically check the dependencies of your project against these publicly known vulnerabilities, for example the OWASP Dependency Check or the Sonatype OSS Index. In order to use them your project has to use a dependency manager, for example Maven in the Java world or NuGet in the .NET ecosystem.

Here’s how to integrate the OWASP Dependency Check into your Maven based project, by adding the following plugin to the pom.xml file:

<plugin>
  <groupId>org.owasp</groupId>
  <artifactId>dependency-check-maven</artifactId>
  <version>5.0.0-M1</version>
  <executions>
    <execution>
      <goals>
        <goal>check</goal>
      </goals>
    </execution>
  </executions>
</plugin>

When you run the Maven goal dependency-check:check you might see an output like this:

One or more dependencies were identified with known vulnerabilities in Project XYZ:

jboss-j2eemgmt-api_1.1_spec-1.0.1.Final.jar (pkg:maven/org.jboss.spec.javax.management.j2ee/jboss-j2eemgmt-api_1.1_spec@1.0.1.Final, cpe:2.3:a:sun:j2ee:1.0.1:*:*:*:*:*:*:*) : CVE-2009-2704, CVE-2009-2705
...

The output tells you which version of a dependency is affected and the CVE ID. Now you can use this ID to look it up in the NVD database and inform yourself about the potential dangers of the vulnerability and take action, like updating the dependency if there is a newer version, which addresses the vulnerability.

The sorry state of Grails (Plugins)

We have been developing and maintaining a complex web application on Grails since summer of 2008. By then Grails had passed the 1.0 release milestone and was really hot. A good 10 years later the application is still in use and we are trying to upgrade from Grails 2.4 to 3.3.

Upgrading Grails – a rough ride

Similar to past upgrade experiences the ride is not very smooth. Besides the major changes like the much welcomed switch to the gradle build system, interceptors instead of filters and streamlined configuration there are again a host of more subtle changes. The biggest problem for us though is the plugin situation.

It’s the plugins

In the past we had tough breaks like the abandoned selenium plugin in favor of the much better geb for functional testing. That had cost us a lot of work and many lost and not yet rewritten functional tests.

This time it seems especially hard because you two of our central plugins are not readily available anymore:

Apache Shiro Plugin
Compass-based Searchable Plugin

1. Shiro authentication

There still is no official release of the shiro plugin for Grails 3.x. After some searching and researching the initial port on github we decided to fork and maintain the most current forked version ourselves and try to work with it. Fortunately it was relatively easy to integrate and to update some dependencies. Our authentication and authorization works at least as good as before and we do not face additional problems. Working with interceptors feels quite good, too.

2. Search

The situation is harder with search. Compass and the searchable plugin are dead – plain and simple. The replacement for grails is the elasticsearch plugin which mostly adopted the API of the searchable plugin. Getting it to work is not that easy though. You have different versions depending on the grails 3 version you are targetting. Each plugin version targets a specific elasticsearch server version and so on. Often times (like in the default configuration) you will need a matching mapper-attachment plugin that is not available on maven in newer versions. This is mentioned somewhere in the midst of the plugin documentation.

Furthermore the plugin itself has some problems with hibernate proxies and concurrency so here we have to mess around with the plugin code once more. Once we have everything working for us like before we will try to get our patches upstream.

Marching forward

The upgrade from 2.x to 3.x is the biggest (and best) step of Grails into the right direction. On the downside it places a lot of burden on the application and plugin developers. That again increases the cost of maintaining proven applications further.

Right now we are close to a Grails 3.3 version of our application but have invested considerable effort into this upgrade.

Our current recommendation and practice is to not start new web applications based on the grails framework because there have been too many breaking changes and the maintainance cost is high. But we are keeping a close look at grails because the increased modularization and and new options like the grails-react-profile may keep grails interesting in the future.

.NET Core for platform independent web development

Several of our projects are based on the .NET platform. Until recently all of them used the classic .NET Framework. With a new project we had the opportunity to give .NET Core a try. The name stands for a moderized variant of the .NET Framework. It is developed by The .NET Foundation and Microsoft as a platform independent open-source project.

Not every type of project is currently suitable for .NET Core. If you want to develop a Windows desktop application (WinForms, WPF) you still have to use the classic .NET Framework. However, for server based applications .NET Core is a really good fit. Our application, for example, is implemented as a JSON API server with .NET Core and a React/Redux based client interface.

The Benefits

Since .NET Core is platform independent it runs on Linux, MacOS and Windows. We no longer need a Window machines to build the project from our CI server. Microsoft provides Docker images for building and running .NET Core projects.

ASP.NET Core applications are no longer bound to Microsoft’s IIS or IIS Express. You can also host them on Apache or Nginx servers as well.

With .NET Core you also have a vast choice of IDEs. Of course, you can use Visual Studio on Windows. But you also have the option to use JetBrains’ Rider (on any platform), Visual Studio for Mac or Visual Studio Code (Mac, Linux, Windows). If you don’t want to use an IDE for everything .NET Core also has a nice command-line interface. For example, the following command sets up a new ASP.NET Core project with React and Redux:

$ dotnet new reactedux

To compile an run the project:

$ dotnet run

The Entity Framework Core also has a feature I missed in the Entity Framework for the classic .NET Framework: a pure in-memory database provider, which is very useful for testing.

The Downsides

When you browse the NuGet packages list you have to be aware that not every package is compatible with .NET Core yet, but the list is growing. And, as mentioned above, you can’t develop desktop GUI applications with .NET Core.

	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…