Reading a conanfile.txt from a conanfile.py

I am currently working on a project that embeds another library into its own source tree via git submodules. This is currently convenient because the library’s development is very much tied to the host project and having them both in the same CMake project cuts down dramatically on iteration times. Yet, that library already has its own conan dependencies in a conanfile.txt. Because I did not want to duplicate the dependency information from the library, I decided to pull those into my host projects requirements programmatically using a conanfile.py.

Luckily, you can use conan’s own tools for that:

from conans.client.loader import ConanFileTextLoader

def load_library_conan(recipe_folder):
    text = Path(os.path.join(recipe_folder, "libary_folder", "conanfile.txt")).read_text()
    return ConanFileTextLoader(text)

You can then use that in your stage methods, e.g.:

    def config_options(self):
        for line in load_library_conan(self.recipe_folder).options.splitlines():
            (key, value) = line.split("=", 2)
            (library, option) = key.split(":", 2)
            setattr(self.options[library], option, value)

    def requirements(self):
        for x in load_library_conan(self.recipe_folder).requirements:
            self.requires(x)

I realize this is a niche application, but it helped me very much. It would be cool if conan could delegate into subfolders natively, but I did not find a better way to do this.

Full-text Search with PostgreSQL

If you want to add simple text search functionality to an application backed by an SQL database one of the first things that may come to your mind is the SQL LIKE operator. The LIKE operator and its case-insensitive sibling ILIKE find substrings in text data via wildcards such as %, which matches any sequence of zero or more characters:

SELECT * FROM book WHERE title ILIKE '%dog%'.

However, this approach satisfies only very basic requirements for text search, because it only matches exact substrings. That’s why application developers often use an external search engine like Elasticsearch based on the Apache Lucene library.

With a PostgreSQL database there is another option: it comes with a built-in full-text search. A full-text search analyzes text according to the language of the text, parses it into tokens and converts them into so-called lexemes. These are strings, just like tokens, but they have been normalized so that different forms of the same word, for example “pony” and “ponies”, are made alike. Additionally, stop words are eliminated, which are words that are so common that they are useless for searching, like “a” or “the”. For this purpose the search engine uses a dictionary of the target language.

In PostgreSQL, there are two main functions to perform full-text search: they are to_tsvector and to_tsquery. The ts part in the function names stands for “text search”. The to_tsvector function breaks up the input string and creates a vector of lexemes out of it, which are then used to perform full-text search using the to_tsquery function. The two functions can be combined with the @@ (match) operator, which applies a search query to a search vector:

SELECT title
  FROM book
  WHERE to_tsvector(title) @@ to_tsquery('(cat | dog) & pony')

The query syntax of ts_query supports boolean operators like | (or), & (and), ! (not) and grouping using parentheses, but also other operators like and <-> (“followed by”) and * (prefix matching).

You can specify the target language as a parameter of to_tsvector:

# SELECT to_tsvector('english', 'Thousands of ponies were grazing on the prairie.');

'graze':5 'poni':3 'prairi':8 'thousand':1

Here’s another example in German:

# SELECT to_tsvector('german', 'Wer einen Fehler begeht, und ihn nicht korrigiert, begeht einen zweiten (Konfuzius)');

'begeht':4,9 'fehl':3 'konfuzius':12 'korrigiert':8 'wer':1 'zweit':11

PostgreSQL supports dictionaries for about 80+ languages out-of-the-box.

The examples in this article are just a small glimpse of what is possible with regards to full-text search in PostgreSQL. If you want to learn more you should consult the documentation. The key takeaway is that there is another option between simple LIKE clauses and an external search engine.

Effective computer names with DNS aliases

If you have a computer in a network, it has a lot of different names and addresses. Most of them are chosen by the manufacturer, like the MAC address of the network device. Some are chosen by you, like the IP address in the local network. And some need to be chosen by you, like the computer’s name in your local DNS (domain name service).

A typical indicator for an under-managed network is the lack of sufficiently obvious computer names in it. You want to connect to the printer? 192.168.0.77 it is. You need to access the network drive? It is reachable under nas-producer-123.local. You can be sure that either of these names change as soon as anything gets modified in the network.

Not every computer in a network needs a never-changing, obvious name. If you connect a notebook for some hours, it can be addressable only by 192.168.0.151 and nobody cares. But there will be computers and similar network devices like printers that stay longer and provide services to others. These are the machines that require a proper name, and probably not only one.

Our approach is a layered one, with four layers:

  • MAC-address, chosen by the manufacturer
  • IP address, chosen by our DHCP
  • Device name, chosen by our DNS
  • Device aliases, chosen by our DNS

Of course, our DHCP and our DNS is told by our administrator what addresses and names to give out. Our IP addresses are partitioned into sections, but that is not relevant to the users.

The device name is a mapping of a name on an IP address. It is chosen by the administrator in case of a server/service machine. It will tell you about the primary service, like “printer0”, “printer1” or “nas0”. It is not a creative name and should not be remembered or used directly. If the machine has a direct user, like a workstation or a notebook, the user gets to choose the name. The only guideline is to keep it short, the rest is personal preference. This name should only be remembered by the user.

On top of the device name, each machine gets one or several additional DNS names, in the form of DNS aliases (CNAME records). These are the names we work with directly and should be remembered. Let’s see some examples:

I want to print on the laser printer: “laserprinter.local” is the correct address. It is an alias to printer0.local which is a mapping to 192.168.0.77 which resolves to a specific MAC address. If the laser printer gets replaced, every entry in this chain will probably change, except for one: the alias will point to the new printer and I don’t have to care much about it (maybe I need to update my driver).

I want to access the network drive: “nas.local” is one possibility. “networkdrive.local” is another one. Both point to “nas0” today and maybe “nas1” tomorrow. I don’t need to care which computer provides the service, because the service alias always points to the correct machine.

I want to connect to my colleague’s workstation: Because we have different naming preferences, I cannot remember that computer’s name. But I also don’t have to, because the computer has an alias: If my colleague’s name is “Joe”, the computer’s alias is “joe.local”, which resolves to his “totallywhackname.local”, which points to the IP address, etc. There is probably no more obvious DNS name than “joe.local”.

Another thing that we do is give a service its purpose as a name. This blog is run by wordpress, so we would have “wordpress.local”, but also “blog.local” which is the correct address to use if you want to access the blog. Should we eventually migrate our blog to another service, the “blog.local” address would point to it, while the “wordpress.local” address would still point to the old blog. The purpose doesn’t change, while the product that provides it might some day.

Of course, maintaining such a rich ecosystem of names and aliases is a lot of work. We don’t type our zone files directly, we use generators that supply us with the required level of comfort and clarity. This is done by one of our internal tools (if you remember the Sunzu blog post, you now know 2 out of our 53 tools). In short, we maintain a table in our wiki, listing all IP addresses and their DNS aliases and linking to the computer’s detail wiki page. From there, the tool scrapes the computer’s name and MAC address and generates configuration files for both the DHCP and DNS services. We can define our whole network in the wiki and have the tool generate the actual settings for us.

That way, the extra effort for the DNS aliases is negligible, while the positive effects are noticeable. Most network modifications can be done without much reconfiguration of dependent services or machines. And it all starts with alias names for your computers.

Applying the KonMari method to your IT supplies room

Our company is rather small, with less than ten people working in one big room on two floors (yes, the room is divisioned vertically, not horizontally). There are a few additional rooms, like a bathroom or a kitchen, but everything else has to find a place in our working space.

There are two exceptions to this rule:

  • A small room holds all cleaning utilities
  • A bigger room holds all things IT, like our servers and our IT supplies

None of these rooms “spark joy”, as Marie Kondo would describe them. You open the door, search around while ignoring the mess, grab the thing you came for and close the door again. When it is time to put the thing back, you more or less place it where you’ve found it. The state of these rooms is slow deterioration, because it can only get worse, but not better.

The situation became unfortunate for the IT room, because it contained far more things than storage space. Cables piled up on shelves, harddisks lingered on tables at specific locations that probably indicated something. A huge collection of CDs and DVDs waited in boxes for a second installation – most of our computers don’t even have a drive for them anymore. Every drawer contained some kind of main theme (manuals, adapters, cables), but a lot of surprises, too. The time it took to find something only went up and most of the time, it was cheaper to just buy the device (again) than search for it. And if you don’t use it anymore? Put it in the IT room.

A few years back, the KonMari method of cleaning up and organizing things was promoted by Marie Kondo. It is intended for your wardrobe and kitchen, but the guidelines can also be applied to your toolshed – and your IT room:

  • Not keeping a thing is the default
  • Concentrate on only keeping useful things (things that you use regularly or that make you happy)
  • If you keep a thing, it needs a dedicated place
  • Dedicate places by “category” and don’t deviate from your categorization
  • Provide a container for each category
  • Try to stack upright in horizontal direction, not vertically

The last guideline was really eye-opening for me: Every time I dedicated a box for things, like software CDs, the stacks grew upwards. This means that “lower layers” aren’t in direct access anymore and tend to be forgotten. If you dig to the ground of the box, you find copies of obscure software like “Windows 2000” or “Nero burning rom” that you’ve not thought about in ten years or even longer.

At the bottom of our cables box, we found a dozen cables for the parallel port, an interface that was forgotten the minute USB came around in 1996. The company was founded in 2000 and we never owned a device that used this port. We also found disks for the zip 100 drive, which might have used it – we don’t remember.

These things spark nostalgia (something else than joy), but serve no practical purpose anymore. And even if somebody came around with a zip disk, we wouldn’t remember that we have the cables at the bottom of our box.

If you try to stack your things upright, everything is visible and in fast access. There is no bottom layer anymore. Applied to CDs, this means that every CD case’s spine is readable. Every CD that you want to keep needs to be in a labled case. The infamous mainboard driver CD in a paper box with drivers from 2002 for a mainboard you scrapped in 2009 has no place in this collection.

The fitting categorization of things is the most important part of the process, in my opinion. Let me explain it by a paradigm shift that made all the difference for me:

In the early days our categories were like manual, CD, cable, screw, etc. Everytime a new computer was bought, the accompanying utilities box (often the mainboard carton) got looted for these categories – manuals to the manuals, CDs to the CDs. It was easy to find the place where the CDs were stored, but hard to find the right CD.

Now, we provide a small carton for each computer and put everything related to it in this carton. It is labeled with the computer’s number and stored like a book on the shelf. If you search anything for this computer – a CD, a screw, whatever – it is in this carton. If we get rid of the computer, the carton follows suit.

We now categorize by device and not by item type. This means that the collection of 10,000 screws that were collected over the years can be discarded. They simply aren’t needed anymore. They never sparked joy.

Another topic are the cables. While most cables can be associated with a computer or a specific device, there are lots of cables that are “unbound”. Instead of lumping them all together (and forming the aforementioned layers of parallel, serial and USB1 cables), we sort them by main connector and dedicate a box for this connection type. If you search a DisplayPort cable, you grab the DisplayPort box. If you require a VGA cable – well, we’ve thrown this specific box out last year. Look in the “exotic” box.

Each box is visible and clearly labeled. Inside each box are only things that you would expect. This means that there is a lot of boxed air. But it also means that you have to think about what to store and what not – simply because the number of boxes is limited.

And this is where “sparking joy” comes into play. The IT room is not an archive for all things digital. It is also not a graveyard for discard electronics. If you can’t see yourself using the part in the future and having joy using it, don’t keep it.

We have a box labeled “random loot” that defies this filter. It contains things that we can’t categorize, don’t have an immediate use case for, but hesitate to throw away. Every household has a similar thing with “that drawer”. Our plan is to add a year label to the box and just throw it away unopened if it is older than X years.

We need to evolve the categories of the room to keep it useful. An example are USB cables that are all stored in one cable box. With USB-C on the rise, the need to separate into different USB “layers” became apparent. We will soon have at least two USB cable boxes. And perhaps, one day in the future, we might throw the non-USB-C box away.

The IT room was transformed from a frustrating mess to a living and evolving storage space that solves your concern in an efficient way. The typical use cases of the room are adressed right away, with a structure that is maintainable without too much effort.

The inspiration and guidelines of Marie Kondo and the thoughts about proper categorization helped us to have an IT room that actually sparks joy.

Forced Acronyms are not that S.M.A.R.T.

A while back, I noticed that quite a lot of people are following that trend to unify a bunch of talking points to a more or less memorizable acronym. Sometimes, this is a great mnemonic device to make the essence of a thing clear in seconds – but for some reason, there are few stories acknowledged in which such attempts actually fail.

However, one of the most prominent acronyms in project management is the idea of S.M.A.R.T. goals. That easily dissolves into S for Specific, M for Measureable, and… hm… T is… something about Time, and then there are A and R, and they very clearly… well well. let’s consult wikipedia… span up a multidimensional vector space out of {Achievable, Attainable, Assignable, Agreed, Action-oriented, Ambitious, Aligned with corporate goals, Realistic, Resourced, Reasonable, Results-based}.

Now this is the point where it’s hard to follow. These are somehow too much possibilities, with no clear assignment. There are probably lots of people out there with their very specific memorization and their very specific interpretation of these letters; and it might very well be true that this forced acronym holds some value. In their specific case.

But why shouldn’t we be honest about it? If you have such a situation, you are not communicating clearly anymore. You have gone beyond that point. There is not a clear, concise meaning anymore.

These are the points where you would be honest to leave your brilliant acronym behind. If you ever sit in a seminar where someone wants to teach you some “easily memorizable acronym” with lots of degrees of freedom, open to interpretation and obviously changing over time, just – complain. Of course, everyone is entitled to using their own memory hook (“Eselsbrücke”) in order to remember whatever his or her goal is. That is not my point.

My Issue is with “official” acronyms that are not clear and constant. We as software developers have a responsibility to treat such inconsistencies as very dangerous and more harmful than helpful. With this post, I want to bring the idea out there that one should rather more often complain about a bad acronym than just think “weeeeell, but I really like how it sounds and I don’t care that it’s somewhat tainted.”

Or am I completely bullheaded in that regard? What is your opinion?

PS: If you are German and remember the beginning of 2021, a similar laziness happened there when our government tried to make their Covid rules clear and well-known. Note that this remark does have nothing to do with politics. Anyway: they invented this acronym of “AHA” (which, in German, is also that sound of having a light bulb appear over your head.) Not that bad of an idea. However, one of that “A”s originally meant “you just need a non-medical mask (Alltagsmaske) everywhere” – until some day, it was changed to “you need a medical face mask in everyday life (im Alltag)”. They just thought it clever to keep the acronym, but change one letter to mean its near opposite.

This is dangerous. Grossly negilent. Just for the sake of liking your old acronym too much, you needlessly fails to communicate clearly. Which is, for a government as much as for a software developer, usually your job.

Naming things 😉

Metal in C++ with SDL2

Metal, Cupertino’s own graphics API, is sort of a middle-ground in complexity between OpenGL and Vulkan. I’ve wanted to try it for a while, but the somewhat tight integration into Apple’s ecosystem (ObjectiveC/Swift and XCode) has so far prevented that. My graphics projects are usually using C++ and CMake, so I wanted a solution that worked with that. Apple released Metal-cpp last year and newer SDL2 versions (since 2.0.14) can create a window that supports drawing to it with metal. Here’s how to weld that together (with minimal ObjectiveC).

metal-cpp

I get the metal-cpp code from the linked website (the download is at step 1). I add a library in CMake that builds a single source file that compiles the metal-cpp implementation with the ??_PRIVATE_IMPLEMENTATION macros as described on the page (see step 3). That target also exports the includes to be used later.

SDL window and view

Next, I use conan to install SDL2. After SDL_Init, I call SDL_CreateWindow to create my window. I do not specify SDL_WINDOW_OPENGL (or in the SDL_CreateWindow‘s flags, or next step will fail. After that, I use SDL_Metal_CreateView from SDL_metal.h to create a metal view. This is where things get a little bit icky. I create a metal device using MTL::CreateSystemDefaultDevice(); but I still need to assign it to the view I just created. I’m doing that in ObjectiveC++. In a new .mm file I add a small function to do that:

void assign_device(void* layer, MTL::Device* device)
{
  CAMetalLayer* metalLayer = (CAMetalLayer*) layer;
  metalLayer.device = (__bridge id<MTLDevice>)(device);
}

I use a small .h file to expose this function to my C++ code like any other free function. There’s another helper I create in the .mm file:

CA::MetalDrawable* next_drawable(void* layer)
{
  CAMetalLayer* metalLayer = (CAMetalLayer*) layer;
  id <CAMetalDrawable> metalDrawable = [metalLayer nextDrawable];
  CA::MetalDrawable* pMetalCppDrawable = ( __bridge CA::MetalDrawable*) metalDrawable;
  return pMetalCppDrawable;
}

At the beginning of each frame, I use that together with SDL_Metal_GetLayer to get a texture to render to:

auto surface = next_drawable(SDL_Metal_GetLayer(view));

Next I create a render pass descriptor that starts by clearing that drawable with our fancy red:

MTL::ClearColor clear_color(152.0/255.0, 23.0/255.0, 42.0/255.0, 1.0);
auto pass_descriptor = MTL::RenderPassDescriptor::alloc()->init();
auto attachment = pass_descriptor->colorAttachments()->object(0);
attachment->setClearColor(clear_color);
attachment->setLoadAction(MTL::LoadActionClear);
attachment->setTexture(surface->texture());

And fire that off to the GPU using a command buffer and render encoder:

auto buffer = queue->commandBuffer();
auto encoder = buffer->renderCommandEncoder(pass_descriptor);
encoder->endEncoding();
buffer->presentDrawable(surface);
buffer->commit();

There you have it, a minimal running metal application. Still a long ways from the traditional “Hello Triangle”, but most metal examples that show how to do that can easily be translated to the C++ API. Note that you probably have to take some extra steps to compile metal shaders (aka MSL). You can either load them from source or precompile them using the command line tools.

Commenting SQL database objects

Did you know that you can annotate database object like tables, views and columns with comments in many SQL database systems? By that I don’t mean comments in SQL scripts, indicated by double dashes (--), but comments attached to the objects themselves, stored in the database. These may be helpful to the database admin by providing context via a description text on what is stored in these objects.

For PostgreSQL and Oracle databases the syntax is as follows:

COMMENT ON TABLE [schema_name.]table_name IS '...';
COMMENT ON COLUMN [schema_name.]table_name.column_name IS '...';

For example:

COMMENT ON COLUMN books.author IS 'The main author''s last name';
COMMENT ON TABLE books IS 'Contains only the best books';

These comments can be viewed in database tools like SQL Developer:

Comments on columns
Comments on tables

You can also view the comments in psql:

db=# \d+ books
 Column |  Type   |          Description
--------+---------+------------------------------
id      | integer |
author  | text    | The main author''s last name
title   | text    |

And for a table:

db=# \dt+ books
                    List of relations
 Schema | Name  | Type  |     |        Description
--------+-------+-------+ ... +------------------------------
public  | books | table |     | Contains only the best books

In Oracle you can query the comments from the data dictionary views ALL_TAB_COMMENTS and ALL_COL_COMMENTS:

> SELECT * FROM all_col_comments WHERE table_name='BOOKS';
OWNER    TABLE_NAME  COLUMN_NAME  COMMENTS
--------------------------------------------------------------
LIBRARY	 BOOKS	     ID           (null)
LIBRARY	 BOOKS	     AUTHOR       The main author's last name
LIBRARY	 BOOKS	     TITLE        (null)

> SELECT * FROM all_tab_comments WHERE table_name='BOOKS';
OWNER    TABLE_NAME  TABLE_TYPE  COMMENTS
--------------------------------------------------------------
LIBRARY	 BOOKS	     TABLE       Contains only the best books

In Oracle comments are limited to tables, views, materialized views, columns, operators and indextypes, but in PostgreSQL you can attach comments to nearly everything. Another good use case for this are documentation comments on database functions:

COMMENT ON FUNCTION my_function IS $$
This function does something important.

Parameters:
...
Example usage:
...
$$;

Note: the $$ delimits multi-line strings (called dollar quoted string constants).

Serving static resources in Javalin running as servlets

Javalin is a nice JVM-based microframework targetted at web APIs supporting Java and Kotlin as implementation language. Usually, it uses Jetty and runs standalone on the server or in a container.

However, those who want or need to deploy it to a servlet container/application server like Tomcat or Wildfly can do so by only changing a few lines of code and annotating at least one Url as a @WebServlet. Most of your application will continue to run unchanged.

But why do I say only “most of your application”?

Unfortunately, Javalin-jetty and Javalin-standalone do not provide complete feature parity. One important example is serving static resources, especially, if you do not want to only provide an API backend service but also serve resources like a single-page-application (SPA) or an OpenAPI-generated web interface.

Serving static resources in Javalin-jetty

Serving static files is straightforward and super-simple if you are using Javalin-jetty. Just configure the Javalin app using config.addStaticFiles() to specify some paths and file locations and your are done.

The OpenAPI-plugin for Javalin uses the above mechanism to serve it’s web interface, too.

Serving static resources in Javalin-standalone

Javalin-standalone, which is used for deployment to application servers, does not support serving static files as this is a jetty feature and standalone is built to run without jetty. So the short answer is: you can not!

The longer answer is, that you can implement a workaround by writing a servlet based on Javalin-standalone to serve files from the classpath for certain Url-paths yourself. See below a sample implementation in Kotlin using Javalin-standalone to accomplish the task:

package com.schneide.demo

import io.javalin.Javalin
import io.javalin.http.Context
import io.javalin.http.HttpCode
import java.net.URLConnection
import javax.servlet.annotation.WebServlet
import javax.servlet.http.HttpServlet
import javax.servlet.http.HttpServletRequest
import javax.servlet.http.HttpServletResponse

private const val DEFAULT_CONTENT_TYPE = "text/plain"

@WebServlet(urlPatterns = ["/*"], name = "Static resources endpoints")
class StaticResourcesEndpoints : HttpServlet() {
    private val wellknownTextContentTypes = mapOf(
        "js" to "text/javascript",
        "css" to "text/css"
    )

    private val servlet = Javalin.createStandalone()
        .get("/") { context ->
            serveResource(context, "/public", "index.html")
        }
        .get("/*") { context ->
            serveResource(context, "/public")
        }
        .javalinServlet()!!

    private fun serveResource(context: Context, prefix: String, fileName: String = "") {
        val filePath = context.path().replace(context.contextPath(), prefix) + fileName
        val resource = javaClass.getResourceAsStream(filePath)
        if (resource == null) {
            context.status(HttpCode.NOT_FOUND).result(filePath)
            return
        }
        var mimeType = URLConnection.guessContentTypeFromName(filePath)
        if (mimeType == null) {
            mimeType = guessContentTypeForWellKnownTextFiles(filePath)
        }
        context.contentType(mimeType)
        context.result(resource)
    }

    private fun guessContentTypeForWellKnownTextFiles(filePath: String): String {
        if (filePath.indexOf(".") == -1) {
            return DEFAULT_CONTENT_TYPE
        }
        val extension = filePath.substring(filePath.lastIndexOf('.') + 1)
        return wellknownTextContentTypes.getOrDefault(extension, DEFAULT_CONTENT_TYPE)
    }

    override fun service(req: HttpServletRequest?, resp: HttpServletResponse?) {
        servlet.service(req, resp)
    }
}

The code performs 3 major tasks:

  1. Register a Javalin-standalone app as a WebServlet for certain URLs
  2. Load static files bundled in the WAR-file from defined locations
  3. Guess the content-type of the files as good as possible for the response

Feel free to use and modify the code in your project if you find it useful. I will try to get this workaround into Javalin-standalone if I find the time to improve feature-parity between Javalin-jetty and Javalin-standalone. Until then I hopy you find the code useful.

Basic business service: Sunzu, the list generator

This might be the start of a new blog post series about building blocks for an effective business IT landscape.

We are a small company that strives for a high level of automation and traceability, the latter often implemented in the form of documentation. This has the amusing effect that we often automate the creation of documentation or at least the creation of reports. For a company of less than ten people working mostly in software development, we have lots of little services and software tools that perform tasks for us. In fact, we work with 53 different internal projects (this is what the blog post series could cover).

Helpful spirits

Some of them are rather voluminous or at least too big to replace easily. Others are just a few lines of script code that perform one particular task and could be completely rewritten in less than an hour.

They all share one goal: To make common or tedious tasks that we have to do regularly easier, faster, less error-prone or just more enjoyable. And we discover new possibilities for additional services everywhere, once we’ve learnt how to reflect on our work in this regard.

Let me take you through the motions of discovering and developing such a “basic business service” with a recent example.

A fateful friday

The work that led to the discovery started abrupt on Friday, 10th December 2021, when a zero-day vulnerability with the number CVE-2021-44228 was publicly disclosed. It had a severity rating of 10 (on a scale from 0 to, well, 10) and was promptly nicknamed “Log4Shell”. From one minute to the next, we had to scan all of our customer projects, our internal projects and products that we use, evaluate the risk and decide on actions that could mean disabling a system in live usage until the problem is properly understood and fixed.

Because we don’t only perform work but also document it (remember the traceability!), we created a spreadsheet with all of our projects and a criteria matrix to decide which projects needed our attention the most and what actions to take. An example of this process would look like this:

  • Project A: Is the project at least in parts programmed in java? No -> No attention required
  • Project B: Is the project at least in parts programmed in java? Yes -> Is log4j used in this project? Yes -> Is the log4j version affected by the vulnerability? No -> No immediate attention required

Our information situation changed from hour to hour as the whole world did two things in parallel: The white hats gathered information about possible breaches and not affected versions while the black hats tried to find and exploit vulnerable systems. This process happened so fast that we found ourselves lagging behind because we couldn’t effectively triage all of our projects.

One bottleneck was the creation of the spreadsheet. Even just the process of compiling a list of all projects and ruling out the ones that are obviously not affected by the problem was time-consuming and not easily distributable.

Post mortem

After the dust settled, we had switched off one project (which turned out to be not vulnerable on closer inspection) and confirmed that all other projects (and products) weren’t affected. We fended off one of the scariest vulnerabilities in recent times with barely a scratch. We could celebrate our success!

But as happy as we were, the post mortem of our approach revealed a weak point in our ability to quickly create spreadsheets about typical business/domain entities for our company, like project repositories. If we could automate this job, we would have had a complete list of all projects in a few seconds and could have worked from there.

This was the birth hour of our list generator tool (we called it “sunzu” because – well, that would require the explanation of a german word play). It is a simple tool: You press a button, the tool generates a new page with a giant table in the wiki and forwards you to it. Now you can work with that table, remove columns you don’t need, add additional ones that are helpful for your mission and fill out the cells that are empty. But the first step, a complete list of all entities with hyperlinks to their details, is a no-effort task from now on.

No-effort chores

If Log4Shell would happen today, we would still have to scan all projects and decide for each. We would still have to document our evaluation results and our decisions. But we would start with a list of all projects, a column that lists their programming languages and other data. We would be certain that the list is complete. We would be certain that the information is up-to-date and accurate. We would start with the actual work and not with the preparation for it. The precious minutes at the beginning of a time-critical task would be available and not bound to infrastructure setup.

Since the list generator tool can generate a spreadsheet of all projects, it has accumulated additional entities that can be listed in our company. For some, it was easy to collect the data. Others require more effort. There are some that don’t justify the investment (yet). But it had another effect: It is a central place for “list desires”. Any time we create a list manually now, we pose the important question: Can this list be generated automatically?

Basic business building blocks

In conclusion, our “sunzu” list generator is a basic business service that might be valueable for every organization. Its only purpose is to create elaborate spreadsheets about the most important business entities and present them in an editable manner. If the spreadsheet is created as an Excel file, as an editable website like tabble or a wiki page like in our case is secondary.

The crucial effect is that you can think “hmm, I need a list of these things that are important to me right now” and just press a button to get it.

Sunzu is a web service written in Python, with a total of less than 400 lines of code. It could probably be rewritten from scratch on one focussed workday. If you work in an organization that relies on lists or spreadsheets (and which organization doesn’t?), think about which data sources you tap into to collect the lists. If a human can do it, you can probably teach it to a computer.

What are entities/things in your domain or organization that you would like to have a complete list/spreadsheet generated generated automatically about? Tell us in the comments!

Always apply the Principle Of Least Astonishment to yourself, too

Great principles have the property that while they can be stated in a concise form, they have far-reaching consequences one can fully appreciate after many years of encountering them.

One of these things is what is known as the Principle of Least Astonishment / Principle of Least Surprise (see here or here). As stated there, in a context of user interface design, its upshot is “Never surprise the user!”. Within that context, it is easily understandable as straightforward for everyone that has ever used any piece of software and notices that never once was he glad that the piece didn’t work as suggested. Or did you ever feel that way?

Surprise is a tool for willful suspension, for entertainment, a tool of unnecessary complication; exact what you do not want in the things that are supposed to make your job easy.

Now we can all agree about that, and go home. Right? But of course, there’s a large difference between grasping a concept in its most superficial manifestation, and its evasive, underlying sense.

Consider any software project that cannot be simplified to a mere single-purpose-module with a clear progression, i.e. what would rather be a script. Consider any software that is not just a script. You might have a backend component with loads of requirements, you have some database, some caching functionality, then you want a new frontend in some fancy fresh web technology, and there’s going to be some conflict of interests in your developer team.

There will be some rather smart ways of accomplishing something and there will be rather nonsmart ways. How do you know which will be which? So there, follow your principle: Never surprise anyone. Not only your end user. Do not surprise any other team member with something “clever”. In most situations,

  1. it’s probably not clever at all
  2. the team member being fooled by you is yourself

Collaboration is a good tool to let that conflict naturally arise. I mean the good kind of conflict, not the mistrust, denial of competency, “Ctrl+A and Delete everything you ever wrote!”-kind of conflict. Just the one where someone would tell you “hm. that behaviour is… astonishing.”

But you don’t have a team member in every small project you do. So just remember to admit the factor of surprise in every thing you leave behind. Do not think “as of right now, I understand this thing, ergo this is not of any surprise to anyone, ever”. Think, “when I leave this code for two months and return, will there be anything… of surprise?”

This principle has many manifestations. As one of Jakob Nielsen’s usability heuristics, it’s called “Recognition rather than Recall”. In a more universal way of improving human performance and clarity, it’s called “Reduce Cognitive Load”. It has a wide range of applicability from user interfaces to state management, database structures, or general software architecture. I like the focus of “Surprise”, because it should be rather easy for you to admit feeling surprised, even by your own doing.