Try ending the workday with a beneficial ritual

One thing that is important to me is to start and end the workday with a proven and familiar routine – lets call it a ritual. There are some advantages to this approach. First, you have a defined starting point. No matter what the day may throw at you, there are some anchors in your structure or environment that you can rely on. For example, I don’t start my work without a (big) filled glass of water on my desk. It might get hectic, but my supply of water is secured until lunch. I make it a habit to empty that glass before lunch, too, but that’s not as important as the ritual of supplying myself with a beverage and only then starting my work.

My guess is that most of you already do this, too. The start of a workday is the natural point in time to install habits or even rituals. But what about the end of your workday? Sure, there is a point in time when you “drop the pen” and rush out the door. But right before this moment, there is a possibility to introduce a beneficial ritual that might only cost minutes, but brings value that furthers your career and even your current work.

My usual ritual is a short daily reflection. That’s not exactly my own idea, I just borrowed it from the Clean Code Developer Initiative. My problem with the CCDI version is the focus on software development alone, which is probably a good start, but too narrow for my work profile.

My adaption is to have three basic questions that I ask myself at the end of each workday and answer in “articulated thoughts”. You may prefer to say it out loud or write your answer down (Obsidian or similar tools might be a suitable tool for that). My questions to myself are:

  • How do you feel right now?
  • What surprised you today?
  • What do you want to remember from today’s work?

Note how these questions don’t deal with details of your current work. If you have specific topics that you want to reflect on, you can always add some more questions for a period. I have found it important not to skip or replace the three basic questions, though.

“How do you feel?” is a complicated question because it leads to your motivation for work. Of course, “tired” or “stressed” is always a valid answer. But what if you legitimately feel “proud” or “fulfilled”? Can you identify what aspect of today’s work made you proud? Can you think of a way to have more of that without neglecting other important duties?

“What surprised you today?” tries to carve out your latest learning experience. It is possible that your day was dull enough to have no surprises, but if there were, you’ve probably expanded your knowledge on a topic you didn’t expect. If the surprise was a negative one, maybe you can think about a way to make it less surprising, more rare or downright impossible in the future. In my case, this lead to some unusual gadgets like the “bad idea commands” list that hangs right besides the admininstation console. The most infamous command on this list is “mdadm –create”, by the way (I meant “mdadm –assemble” and was very surprised by the result).

“What do you want to remember?” is an explicit appeal to write your answer down. You don’t need to tell an elaborate story. Just give your future self some cues, preferably from outside your brain (Obsidian’s market claim of “a second brain” is no coincidence). Make a small note or write your future self an e-mail (this is my typical way of offloading things to future me). But persist this information now or it will be gone.

After this daily reflection, I shut down my computer and put the (probably empty) glass of water into the dishwasher. Then I switch into leisure mode.

Of course, my three questions are inspired from other sources, too. One is the workshop hosting manual for code retreats, which has a great section about the “closing circle”, a group reflection on a probably awesome day.

If you have a similar ritual, let us know about it! Write a blog entry or drop a comment below.

How comments get you through a code review

Code comments are a big point of discussion in software development. How and where to use comments. Or should you comment at all? Is the code not enough documentation if it is just written well enough? Here I would like to share my own experience with comments.

In the last months I had some code reviews where colleagues looked over my merge requests and gave me feedback. And it happened again and again that they asked questions why I do this or why I decided to go this way.
Often the decisions had a specific reason, for example because it was a customer requirement, a special case that had to be covered or the technology stack had to be kept small.

That is all metadata that would be tedious and time-consuming for reviewers to gather. And at some point, it is no longer a reviewer, it is a software developer 20 years from now who has to maintain the code and can not ask you questions any more . The same applies if you yourself adjust the code again some time later and can not remember your thoughts months ago. This often happens faster than you think. To highlight how fast details disappear here is a current example: This week I set up a new laptop because the old one had a hardware failure. I did all the steps only half a year ago. But without documentation, I would not have been able to reconstruct everything. And where the documentation was missing or incomplete, I had to invest effort to rediscover the required steps.

Example

Here is an example of such a comment. In the code I want to compare if the mixer volume has changed after the user has made changes in the setup dialog.

var setup = await repository.LoadSetup(token);

var volumeOld = setup.Mixers.Contents.Select(mixer=>mixer.Volume).ToList();

setup = Setup.App.RunAsDialog(setup, configuration);

var volumeNew = setup.Mixers.Contents.Select(mixer=>mixer.Volume).ToList();
if (volumeNew == volumeOld)
{
     break;
}
            
ResizeToMixerVolume(setup, volumeOld);

Why do I save the volume in an additional variable instead of just writing the setup into a new variable in the third line? That would be much easier and more elegant. I change this quickly – and the program is broken.

This little comment would have prevented that and everyone would have understood why this way was chosen at the moment.

// We need to copy the volumes, because the original setup is partially mutated by the Setup App.
var volumeOld = setup.Mixers.Contents.Select(mixer=>mixer.Volume).ToList();

If you annotate such prominent places, where a lot of brain work has gone into, you make the code more comprehensible to everyone, including yourself. This way, a reviewer can understand the code without questions and the code becomes more maintainable in the long run.



When laziness broke my code

I was just integrating a new task-graph system for a C# machine control system when my tests started to go red. Note that the tasks I refer to are not the same as the C# Task implementation, but the broader concept. Task-graphs are well known to be DAGs, because otherwise the tasks cannot be finished. The general algorithm to execute a task-graph like this is called topological sorting, and it goes like this:

  1. Find the number of dependencies (incoming edges) for each task
  2. Find the tasks that have zero dependencies and start them
  3. For any finished tasks, decrement the follow-up tasks dependency count by one and start them if they reach zero.

The graph that was failed looked like the one below. Task A was immediately followed by a task B that was followed by a few more tasks.

I quickly figured out that the reason that the tests were failing was that node B was executed twice. Looking at the call-stack for both executions, I could see that the first time B was executed was when A was completed. This is correct as per step 3 in the algorithm. However, the second time it was started was directly from the initial Run method that does the work from step 2: Starting the initial tasks that are not being started recursively. I was definitely not calling Run twice, so how did that happen?

public void Run()
{
    var ready = tasks
        .Where(x => x.DependencyCount == 0);

    StartGroup(ready);
}

Can you see it? It is important to note that many of the tasks in this graph are asynchronous. Their completion is triggered by an IObserver, a C# Task completing or some other event. When the event is processed, StartGroup is used to start all tasks that have no more dependencies. However, A was no such task, it was synchronous, so the StartGroup({B}) call happened while Run was still on the stack.

Now what happened was that when A (instantly!) completed, it set the DependencyCount of B to 0. Since ready in the code snippet is lazily evaluated from within StartGroup, the ‘contents’ actually change while StartGroup is running.

The fix was adding a .ToList after the .Where, a unit test that checked that this specifically would not happen again, and a mental note that lazy evaluation can be deceiving.

Applying the Golden Circle to software development

When I was a young and impressionable software developer (1998), the slogan of the year was “the code is the documentation”. This essentially meant that comments, and (inline) code comments in particular, were a sign of bad code. The reasoning was (and still is) that if you can’t articulate your ideas clear enough in source code, adding additional text won’t rescue your communication. “Communication” is the conveying of what you want to accomplish with your code towards two listeners: the computer and the next human that works with your code. The computer is often the easier part, because it just does as it is told without interpretation.

Some years later (2004), Jeff Atwood, the author of the influential codinghorror blog, still condemns most of the comments we could find in contemporary source code. But there is more nuance than just “comments are bad” and it is cleared up later (2006) that there is a difference in what should be expressed in source code and what should be expressed in comments. Yes, you should write comments, but the “right kind”.

According to Jeff Atwood, the source code contains the “how”. It tells the story in all its details for the next human and the computer. The comments, on the other hand, are not intended for the computer and shouldn’t contain details. They should contain the “why”, the high-level picture and the motivation behind the specific “how” that we find.

The code tells you how, the comments tell you why” (2006) is a great way to describe the expected content of both “layers”.

But my feeling was (and still is) that something is missing in that description. My programs aren’t just code and comments, there are more things that I try to tell my story with. And just a few weeks ago, I had a sudden idea that I might be able to describe what that missing piece could be. It is just an idea and the puzzle probably is still missing pieces, but it feels “right” enough that I want to write this blog post to discuss the idea. But I need to talk about something else first.

In 2009, Simon Sinek presents the Golden Circle to the world. The TED talk is probably the most energetic piece of explanation in human history. The Golden Circle is “the world’s simplest idea”, defining three layers of “clarity” to actions:

  • What: The “lowest” level, meaning that every person knows “what they are doing”.
  • How: The “intermediate” level. Some people know “how they do it”. They explicitly choose their method of doing and can reason about it.
  • Why: The “highest” level. According to Simon Sinek, only a few people know “why they do it”. He talks about “purpose, cause and belief”.

If you don’t know about the Golden Circle yet, please watch the 20 minutes TED talk while I wait here for you. If you want to think about it first – I’m patient. The Golden Circle has inspired and guided me ever since. Not that I’m very good at applying it in my business or personal life, but it stayed with me and gave me a coordinate system to categorize things.

And with that categorization practice, I feel as if the layers in Jeff Atwood’s blog post from 2006 are misnamed and one crucial layer is missing:

  • What: The code tells you what (not how!). It is the detailed step-by-step recipe to replicate a behaviour. It is so simple that even a computer can do it, without being aware of it.
  • How: This is the missing layer. I want to talk about it in a minute.
  • Why: That’s the part that is correctly placed and named: The why of a story requires the spoken word. Only comments are viable for this kind of information. The computer (as of today) has no understanding what any of it means.

The missing layer is the “How”. In my idea, I envisioned that everything that is deliberately put in the code, but not readily understood by a computer, like structures, patterns, idioms or even names (I call them “activated comments”) are there for the “How”. We structure our code not because the computer requires it, the compiling stages of our programming languages even interfold and compact it until it is a binary blob. We don’t name our variables and types because the computer would learn something from them. The first thing a compiler does is to shorten our names to unreadable “symbols”. Most of our patterns get replaced by other, more minute patterns that the compiler puts into place. We put these things in our code because it helps us understand “how the story goes”. It provides us with guidance how to make sense of the mess. The structure tells you how to approach the program.

The computer only knows about “What”. There are maybe some indications about a simple “How”-awareness in some technologies, but most of the time, the computer is deaf to human communication.

Which brings me to my description of code and comment, as an updated version of Jeff Atwood’s motto:

“The code tells you what, the structure tell you how and the comments tell you why”

By “structure”, I mean everything that gets lost during translation for the computer, but is visible for the human reader. It entails high-level things like “architecture” or “code design” and lower-end decisions like names and formatting. If you have a better word for it, let me know!

I hope that this blog post inspired you to have a thought yourself. Don’t hesitate and tell us your thought in the comment below.

Help me with the Spiderman Operator

From time to time, I encounter a silly syntax in Java that I silently dubbed the “spiderman operator” because of all the syntactically pointing that’s going on. My problem is that it’s not very readable, I don’t know an alternative syntax for it and my programming style leads me more often to it than I am willing to ignore.

The spiderman operator looks like this:

x -> () -> x

In its raw form, it means that you have a function that takes x and returns a Supplier of x:

Function<X, Supplier<X>> rawForm = x -> () -> x;

That in itself is not very useful or mysterious, but if you take into account that the Supplier<X> is just one possible type you can return, because in Java, as long as the signature fits, the thing sits, it gets funnier.

A possible use case

Let’s define a type that is an interface with just one method:

public interface DomainValue {
    BigDecimal value();
}

In Java, the @FunctionalInterface annotation is not required to let the interface be, in fact, a functional interface. It only needs to have one method without implementation. How can we provide methods with implementation in Java interfaces. Default methods are the way:

@FunctionalInterface
public interface DomainValue {
    BigDecimal value();

    default String denotation() {
        return getClass().getSimpleName();
    }
}

Let’s say that we want to load domain values from a key-value-store with the following access method:

Optional<Double> loadEntry(String key)

If there is no entry with the given key or the syntax is not suitable to be interpreted as a double, the method returns Optional.emtpy(). Else it returns the double value wrapped in an Optional shell. We can convert it to our domain value like this:

Optional<DomainValue> myValue = 
    loadEntry("current")
        .map(BigDecimal::new)
        .map(x -> () -> x);

And there it is, the spiderman operator. We convert from Double to BigDecimal and then to DomainValue by saying that we want to convert our BigDecimal to “something that can supply a BigDecimal”, which is exactly what our DomainValue can do.

A bigger use case

Right now, the DomainValue type is nothing more than a mantle around a numerical value. But we can expand our domain to have more specific types:

public interface Voltage extends DomainValue {
}
public interface Power extends DomainValue {
    @Override
    default String denotation() {
        return "Electric power";
    }
}

Boring!

public interface Current extends DomainValue {
    default Power with(Voltage voltage) {
	return () -> value().multiply(voltage.value());
    }
}

Ok, this is maybe no longer boring. We can implement a lot of domain functionality just in interfaces and then instantiate ad-hoc types:

Voltage europeanVoltage = () -> BigDecimal.valueOf(220);
Current powerSupply = () -> BigDecimal.valueOf(2);
Power usage = powerSupply.with(europeanVoltage);

Or we load the values from our key-value-store:

Optional<Voltage> maybeVoltage = 
    loadEntry("voltage")
        .map(BigDecimal::new)
        .map(x -> () -> x);

Optional<Current> maybeCurrent = 
    loadEntry("current")
        .map(BigDecimal::new)
        .map(x -> () -> x);

You probably see it already: We have some duplicated code! The strange thing is, it won’t go away so easily.

The first call for help

But first I want to sanitize the code syntactically. The duplication is bad, but the spiderman operator is just unreadable.

If you have an idea how the syntax of the second map() call can be improved, please comment below! Just one request: Make sure your idea compiles beforehands.

Failing to eliminate the duplication

There is nothing easier than eliminating the duplication above: The code is syntactically identical and only the string parameter is different – well, and the return type. We will see how this affects us.

What we cannot do:

<DV extends DomainValue> Optional<DV> loadFor(String entry) {
    Optional<BigDecimal> maybeValue = load(entry);
    return maybeValue.map(x -> () -> x);
}

Suddenly, the spiderman operator does not compile with the error message:

The target type of this expression must be a functional interface

I can see the problem: Subtypes of DomainValue are not required to stay compatible to the functional interface requirement (just one method without implementation).

Interestingly, if we work with a wildcard for the generic, it compiles:

Optional<? extends DomainValue> loadFor(String entry) {
    Optional<BigDecimal> maybeValue = load(entry);
    return maybeValue.map(x -> () -> x);
}

The problem is that we still need to downcast to our specific subtype afterwards. But we can use this insight and move the downcast into the method:

<DV extends DomainValue> Optional<DV> loadFor(
	String entry,
	Class<DV> type
) {
	Optional<BigDecimal> maybeValue = load(entry);
	return maybeValue.map(x -> type.cast(x));
}

Which makes our code readable enough, but at the price of using reflection:

Optional<Voltage> european = loadFor("voltage", Voltage.class);
Optional<Current> powerSupply = loadFor("current", Current.class);

I’m not a fan of this solution, because downcasts are dangerous and reflection is dangerous, too. Mixing two dangerous things doesn’t neutralize the danger most of the time. This code will fail during runtime sooner or later, without any compiler warning us about it. If you don’t believe me, add a second method without implementation to the Current interface and see if the compiler warns you. Hint: This is what you will see at runtime:

java.lang.ClassCastException: Cannot cast java.math.BigDecimal to Current

But, to our surprise, it doesn’t even need a second method. The code above doesn’t work. Even if we reintroduce our spiderman operator (with an additional assignment to help the type inference), the cast won’t work:

<DV extends DomainValue> Optional<DV> loadFor(
    String entry,
    Class<DV> type
) {
    Optional<BigDecimal> maybeValue = load(entry);
    Optional<DomainValue> maybeDomainValue = maybeValue.map(x -> () -> x);
    return maybeDomainValue.map(x -> type.cast(x));
}

The ClassCastException just got a lot more mysterious:

java.lang.ClassCastException: Cannot cast Loader$$Lambda$8/0x00000008000028c0 to Current

My problem is that I am stuck. There is working code that uses the spiderman operator and produces code duplication, but there is no way around the duplication that I can think of. I can get objects for the supertype (DomainValue), but not for a specific subtype of it. If I want that, I have to accept duplication. Or am I missing something?

The second call for help

If you can think about a way to eliminate the duplication, please tell me (or us) in the comments. This problem doesn’t need to be solved for my peace of mind or the sanity of my code – the duplication is confined to a particular place.

Being used to roam nearly without boundaries in the Java syntax (25 years of thinking in Java will do that to you), this particular limitation hit hard. If you can give me some ideas, I would be grateful.

Forced Acronyms are not that S.M.A.R.T.

A while back, I noticed that quite a lot of people are following that trend to unify a bunch of talking points to a more or less memorizable acronym. Sometimes, this is a great mnemonic device to make the essence of a thing clear in seconds – but for some reason, there are few stories acknowledged in which such attempts actually fail.

However, one of the most prominent acronyms in project management is the idea of S.M.A.R.T. goals. That easily dissolves into S for Specific, M for Measureable, and… hm… T is… something about Time, and then there are A and R, and they very clearly… well well. let’s consult wikipedia… span up a multidimensional vector space out of {Achievable, Attainable, Assignable, Agreed, Action-oriented, Ambitious, Aligned with corporate goals, Realistic, Resourced, Reasonable, Results-based}.

Now this is the point where it’s hard to follow. These are somehow too much possibilities, with no clear assignment. There are probably lots of people out there with their very specific memorization and their very specific interpretation of these letters; and it might very well be true that this forced acronym holds some value. In their specific case.

But why shouldn’t we be honest about it? If you have such a situation, you are not communicating clearly anymore. You have gone beyond that point. There is not a clear, concise meaning anymore.

These are the points where you would be honest to leave your brilliant acronym behind. If you ever sit in a seminar where someone wants to teach you some “easily memorizable acronym” with lots of degrees of freedom, open to interpretation and obviously changing over time, just – complain. Of course, everyone is entitled to using their own memory hook (“Eselsbrücke”) in order to remember whatever his or her goal is. That is not my point.

My Issue is with “official” acronyms that are not clear and constant. We as software developers have a responsibility to treat such inconsistencies as very dangerous and more harmful than helpful. With this post, I want to bring the idea out there that one should rather more often complain about a bad acronym than just think “weeeeell, but I really like how it sounds and I don’t care that it’s somewhat tainted.”

Or am I completely bullheaded in that regard? What is your opinion?

PS: If you are German and remember the beginning of 2021, a similar laziness happened there when our government tried to make their Covid rules clear and well-known. Note that this remark does have nothing to do with politics. Anyway: they invented this acronym of “AHA” (which, in German, is also that sound of having a light bulb appear over your head.) Not that bad of an idea. However, one of that “A”s originally meant “you just need a non-medical mask (Alltagsmaske) everywhere” – until some day, it was changed to “you need a medical face mask in everyday life (im Alltag)”. They just thought it clever to keep the acronym, but change one letter to mean its near opposite.

This is dangerous. Grossly negilent. Just for the sake of liking your old acronym too much, you needlessly fails to communicate clearly. Which is, for a government as much as for a software developer, usually your job.

Naming things 😉

Basic business service: Sunzu, the list generator

This might be the start of a new blog post series about building blocks for an effective business IT landscape.

We are a small company that strives for a high level of automation and traceability, the latter often implemented in the form of documentation. This has the amusing effect that we often automate the creation of documentation or at least the creation of reports. For a company of less than ten people working mostly in software development, we have lots of little services and software tools that perform tasks for us. In fact, we work with 53 different internal projects (this is what the blog post series could cover).

Helpful spirits

Some of them are rather voluminous or at least too big to replace easily. Others are just a few lines of script code that perform one particular task and could be completely rewritten in less than an hour.

They all share one goal: To make common or tedious tasks that we have to do regularly easier, faster, less error-prone or just more enjoyable. And we discover new possibilities for additional services everywhere, once we’ve learnt how to reflect on our work in this regard.

Let me take you through the motions of discovering and developing such a “basic business service” with a recent example.

A fateful friday

The work that led to the discovery started abrupt on Friday, 10th December 2021, when a zero-day vulnerability with the number CVE-2021-44228 was publicly disclosed. It had a severity rating of 10 (on a scale from 0 to, well, 10) and was promptly nicknamed “Log4Shell”. From one minute to the next, we had to scan all of our customer projects, our internal projects and products that we use, evaluate the risk and decide on actions that could mean disabling a system in live usage until the problem is properly understood and fixed.

Because we don’t only perform work but also document it (remember the traceability!), we created a spreadsheet with all of our projects and a criteria matrix to decide which projects needed our attention the most and what actions to take. An example of this process would look like this:

  • Project A: Is the project at least in parts programmed in java? No -> No attention required
  • Project B: Is the project at least in parts programmed in java? Yes -> Is log4j used in this project? Yes -> Is the log4j version affected by the vulnerability? No -> No immediate attention required

Our information situation changed from hour to hour as the whole world did two things in parallel: The white hats gathered information about possible breaches and not affected versions while the black hats tried to find and exploit vulnerable systems. This process happened so fast that we found ourselves lagging behind because we couldn’t effectively triage all of our projects.

One bottleneck was the creation of the spreadsheet. Even just the process of compiling a list of all projects and ruling out the ones that are obviously not affected by the problem was time-consuming and not easily distributable.

Post mortem

After the dust settled, we had switched off one project (which turned out to be not vulnerable on closer inspection) and confirmed that all other projects (and products) weren’t affected. We fended off one of the scariest vulnerabilities in recent times with barely a scratch. We could celebrate our success!

But as happy as we were, the post mortem of our approach revealed a weak point in our ability to quickly create spreadsheets about typical business/domain entities for our company, like project repositories. If we could automate this job, we would have had a complete list of all projects in a few seconds and could have worked from there.

This was the birth hour of our list generator tool (we called it “sunzu” because – well, that would require the explanation of a german word play). It is a simple tool: You press a button, the tool generates a new page with a giant table in the wiki and forwards you to it. Now you can work with that table, remove columns you don’t need, add additional ones that are helpful for your mission and fill out the cells that are empty. But the first step, a complete list of all entities with hyperlinks to their details, is a no-effort task from now on.

No-effort chores

If Log4Shell would happen today, we would still have to scan all projects and decide for each. We would still have to document our evaluation results and our decisions. But we would start with a list of all projects, a column that lists their programming languages and other data. We would be certain that the list is complete. We would be certain that the information is up-to-date and accurate. We would start with the actual work and not with the preparation for it. The precious minutes at the beginning of a time-critical task would be available and not bound to infrastructure setup.

Since the list generator tool can generate a spreadsheet of all projects, it has accumulated additional entities that can be listed in our company. For some, it was easy to collect the data. Others require more effort. There are some that don’t justify the investment (yet). But it had another effect: It is a central place for “list desires”. Any time we create a list manually now, we pose the important question: Can this list be generated automatically?

Basic business building blocks

In conclusion, our “sunzu” list generator is a basic business service that might be valueable for every organization. Its only purpose is to create elaborate spreadsheets about the most important business entities and present them in an editable manner. If the spreadsheet is created as an Excel file, as an editable website like tabble or a wiki page like in our case is secondary.

The crucial effect is that you can think “hmm, I need a list of these things that are important to me right now” and just press a button to get it.

Sunzu is a web service written in Python, with a total of less than 400 lines of code. It could probably be rewritten from scratch on one focussed workday. If you work in an organization that relies on lists or spreadsheets (and which organization doesn’t?), think about which data sources you tap into to collect the lists. If a human can do it, you can probably teach it to a computer.

What are entities/things in your domain or organization that you would like to have a complete list/spreadsheet generated generated automatically about? Tell us in the comments!

My own little Y2K22 bug

Ever since the year 2000 (or Y2K), software developers dread the start of a new year. You’ll never know which arbitrary limit will affect the fitness of your projects. Sometimes, it isn’t even the new year (see the year 2038 problem that will manifest itself in late January). But more often than not, the first day of a new year is a risky time.

Welcome, 2022!

The year 2022 started with Microsoft Exchange quarantining lots of e-mails for no apparent reason other than it is no longer 2021. I was amused about this “other people’s problem” until my phone rang.

A customer reported that one of my applications doesn’t start anymore, when it ran perfectly a few days ago – in 2021. My mind began to race:

The application in question wasn’t updated recently. It has to be something in the code that parses a current date with an unfortunate date/time format. My search for all format strings (my search term was “MMddHH” without the quotes) in the application source code brought some expected instances like “yyyyMMddHHmmss” and one of a very suspicious kind: “yyMMddHHmm”.

The place where this suspicious format was used took a version information file and reported a version number, some other data and a build number. The build number was defined as an integer (32 bit). Let me explain why this could be a problem:

2G should be enough for everyone!

A 32-bit integer has an arbitrary value limit of 231=2.147.483.648. If you represent the last minute of 2021 in the format above, you get 2.112.312.359 which is beneath the limit, but quite close.

If you add one minute and count up the year, you’ll be at 2.201.010.000 which is clearly above the value limit and result in either an integer overflow ending in a very negative number or an arithmetic exception.

In my case, it was the arithmetic exception which halted the program in its very first steps while figuring out what, where and when it is.

This is a rookie mistake that can only be explained by “it evolved that way”. The mistake is in the source code since the year 2004. I wrote it myself, so it is my mistake. But I didn’t just think about a weird date format that won’t spark joy 18 years later. I started with a build number from continuous integration. The first build of the project is “build 1”, the next is “build 2”, and so on. You really have to commit early, commit often (and trigger builds) to reach the integer limit that way. This is true for a linear series of builds. But what if you decide to use feature branches? The branches can happen in parallel and each have their own build number series. So “build 17” could be the 17th build of your main branch and go in production or it could be a fleeting build result on a feature branch that gets merged and deleted a few days later. If you want to use the build number as a chronological ordering, perhaps to look for updates, you cannot rely on the CI build numbering. Why not use time for your chronological ordering?

Time as an integer

And how do you capture time in an integer? You invent a clever format that captures the essence of “now” in a string that can be parsed as an integer. The infamous “yyMMddHHmm” is born. The year 2022 is a long time down the road if you apply a quick and clever fix in 2004.

But why did the application crash in 2022 without any update? The build number had to be from 2021 and would still pass the conversion. Well, it turned out that this specific application had no build number set, because we changed our build system and deemed this information not important for this application. So the string in the version file was empty. How is an empty string interpreted as today?

Well, there was another clever code by another developer from 2008 that took a string being null or empty and replaced it with the current date/time. The commit message says “Quickfix for new version format”.

Combined cleverness

Combine these three things and you have the perfect timebomb:

  1. A clever way to store a date/time as an integer
  2. A clever way to intepret missing settings
  3. A lazy way to intriduce a new build process

The problem described above was present in a total of five applications. Four applications had fixed build numbers/dates and would have broken with the next version in 2022 or later. The fifth application had an empty build number and failed exactly as programmed after the 01.01.2022.

Lessons learnt

What can we learn from this incident?

First: clever code or a quick fix is always a bad idea.

Second: cleverness doesn’t stack. One clever workaround can neutralize another clever hack even if both “solutions” would work on their own.

Third: If your solution relies on a certain limit to never be reached, it is only a temporary solution. The limit will be reached eventually. At least leave an automated test that warns about this restriction.

Fourth: Don’t mitigate a hack with another hack. You only make your situation worse in the long run.

The fourth take-away is important. You could fix the problem described above in at least two ways:

  • Replace the integer with a long (64 bit) and hope that your software isn’t in production anymore when the long wraps around. Replace the date/time format with the usual “yyyyMMddHHmmss”.
  • Leave the integer in place and change the date/time format to “yyDDDHHmm” with “DDD” being the day of the year. With this approach, you shorten the string by one digit and keep it below the limit. You also make the build number even less readable and leave a timebomb for the year 2100.

You can probably guess which route I took, even if it was a lot more work than expected. The next blog entry about this particular code can be expected at 01.01.10000.

Hyperfocus on Non-Essentials

When tasked with managing a complex and potentially overwhelming project, a common behaviour of inexperienced managers/developers is to focus on things that are easy to achieve (“low-hanging fruits”), fun to produce (“cherry-picking”) or within the comfort zone.

This means that in the extreme, the developer exclusively focusses on things that are of no interest for the business client but can simulate progress and results.

This behaviour is an application of the “path of least resistance” and I know exactly what it feels like. Here’s the story why:

When I was fourteen years old, my programming career was already 6 years in the making. Of course, I only wrote code for myself, teaching myself new concepts and new errors alike. My only scale of success was “does it run?” and “is it still fun for me?”. My only programming language was BASIC, first the dialect GW-BASIC (still with line numbers!), then the more advanced QBasic (with named jump markers instead of line numbers).

I grew up in small cities and was basically alone with my hobby. But a friend had a parent that owned an optometrist shop that was interested in using computers for their day-to-day operations. I was asked to write a program to handle the shop’s inventory and sales. The task was interesting, but I had no idea how any shop, let alone this particular one, handles their business. I agreed to build a prototype and work from there.

I knew that this project was bigger and more ambitious than any hobby project of my own before, but it was programming after all – how hard could it be?

My plan was to do two things in parallel: Buy and read a book about real software development with BASIC and try to sketch out the application as as “coded paper prototype”.

The book turned out to be the confessions of a frustrated software developer that basically assured the reader on every page that BASIC was not dead and appended dozens of pages with code listings to every chapter. There was probably a lot of wisdom in this book, too, but it missed me by miles.

The sketch of the application began with a menu of all the things I thought would be necessary, like “inventory” or “sales process”. I also included an “Extras” menu and one thing in the menu should be a decent screen saver. Back in those days, the CRT monitors suffered from burn-in if the same image was shown for a long time and I figured that this application would run all day every day, so it seemed logical and important to have a screen saver that is automatically turned on after some period of inactivity.

Which presented itself as a really hard problem, because BASIC was essentially single-threaded (or at least it was to my knowledge back then) and I had to invent some construct that can perhaps be described as “obscure co-routines”. That was some fun programming!

After I solved the automatic activation of the screensaver functionality, I discovered that I could easily make the actual screensaver that gets shown a parameter. So I programmed not one, but several cool and innovative ASCII art screensavers that you could choose from in the extras menu. One screen saver was inspired by the snake game, another one was “colored blocks” that would appear and disappear to form a captivating mood picture.

That was the state of the application when my friend’s parent asked for a demo. I had:

  • No additional knowledge about application design
  • A menu of things I invested no second thought in
  • Several very cool screensavers that activated themselves automatically. Isn’t that great?

You can probably guess how that demo went. None of the things I had developed mattered in the slightest for the optometrist shop. My passion for my creation didn’t translate to the business very well.

I had worked intensively on this project. I hyperfocused on totally non-essential stuff and stayed mostly in my comfort zone, even if I felt as if I had made great progress.

It is easy to fall into this trap. It is easy to mistake one’s own feelings of progress and success with the external (real) ones. It feels very good to work frantically on things that matter to oneself. It becomes a tragedy if the things only matter to oneself and nobody else.

So what can we do to avoid this trap? If you have an idea, write a comment about it! I hope to hear lots of different takes on this problem.

Here is my solution: “Risk first”. With this project strategy, the first task in a project is to solve the hardest part, to cut the biggest knot or to chart the most relevant area. It means that after the first milestone is a success, the project will gradually become easier. It’s the precursor to “fail fast”, which is a “risk first” project that didn’t meet its first milestone.

It is almost guaranteed that the first milestone in a “risk first” project will not be in your comfort zone, is no low-hanging fruit that you can pick without effort and while it might be fun to work on, it’s probably something your customer has a real interest in.

By starting a project “risk first”, I postpone my tendency to focus on non-essentials towards the end of the project. And with concepts like “business value”, I can see very clearly when my work becomes irrelevant for the customer. That’s when I stop my professional work and my hobby begins.

Don’t shoot your messengers

Writing small, focused tests, often called unit tests, is one of the things that look easy at the outset but turn out to be more delicate than anticipated. Writing a three-lines-of-code unit test in the triple-A structure soon became second nature to me, but there were lots of cases that resisted easy testing.

Using mock objects is the typical next step to accommodate this resistance and make the test code more complex. This leads to 5 to 10 lines of test code for easy mock-based tests and up to thirty or even fifty lines of test code where a lot of moving parts are mocked and chained together to test one single method.

So, the first reaction for a more complicated testing scenario is to make the test more complicated.

But even with the powerful combination of mock objects and dependency injection, there are situations where writing suitable tests seems impossible. In the past, I regarded these code blocks as “untestable” and omitted the tests because their economic viability seemed debatable.

I wrote small tests for easy code, long tests for complicated code and no tests for defiant code. The problem always seemed to be the tests that just didn’t cut it.

Until I could recognize my approach in a new light: I was encumbering the messenger. If the message was too harsh, I would outright shoot him.

The tests tried to tell me something about my production code. But I always saw the problem with them, not the code.

Today, I can see that the tests I never wrote because the “test story” at hand was too complicated for my abilities were already telling me something important.

The test you decide not to write because it’s too much of a hassle tells you that your code structure needs improvement. They already deliver their message to you, even before they exist.

With this insight, I can oftentimes fix the problem where it is caused: In the production code. The test coverage increases and the tests become simpler.

Let’s look at a small example that tries to show the line of thinking without being too extensive:

We developed a class in Java that represents a counter that gets triggered and imposes a wait period on every tenth trigger impulse:

public class CountAndWait {
	private int triggered;
	
	public CountAndWait() {
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}
			this.triggered = 0;
		}
	}
}

There is a lot going on in the code for such a simple functionality. Especially the try-catch block catches my eye and makes me worried when thinking about tests. Why is it even there? Well, here is a starter link for an explanation.

But even without advanced threading issues, the normal functionality of our code is worrisome enough. How many lines of code will a test contain that covers the sleep? Should I really use a loop in my test code? Will the test really have a runtime of one second? That’s the same amount of time several hundred other unit tests require for much more coverage. Is this an economically sound testing approach?

The test doesn’t even exist and already sends a message: Your production code should be structured differently. If you focus on the “test story”, perhaps a better structure emerges?

The “story of the test” is the description of the production code path that is covered and asserted by the test. In our example, I want the story to be:

“When a counter object is triggered for the tenth time, it should impose a wait. Afterwards, the cycle should repeat.”

Nothing in the story of this test talks about interruption or exceptions, so if this code gets in the way, I should restructure it to eliminate it from my story. The new production code might look like this:

public class CountAndWait {
	private final Runnable waiting;
	private int triggered;
	
	public static CountAndWait forOneSecond() {
		return new CountAndWait(() -> {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}			
		});
	}
	
	public CountAndWait(Runnable waiting) {
		this.waiting = waiting;
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			this.waiting.run();
			this.triggered = 0;
		}
	}
}

That’s a lot more code than before, but we can concentrate on the latter half. We can now inject a mock object that attests to how often it was run. This mock object doesn’t need to sleep for any amount of time, so the unit test is fast again.

Instead of making the test more complex, we introduced additional structure (and complexity) into the production code. The resulting unit test is rather easy to write:

class CountAndWaitTest {
	@Test
	@DisplayName("Waits after 10 triggers and resets")
	void wait_after_10_triggers_and_reset() {
		Runnable simulatedWait = mock(Runnable.class);
		CountAndWait target = new CountAndWait(simulatedWait);
		
		// no wait for the first 9 triggers
		Repeat.times(9).call(target::trigger);
		verifyNoInteractions(simulatedWait);
		
		// wait at the 10th trigger
		target.trigger();
		verify(simulatedWait, times(1)).run();
		
		// reset happened, no wait for another 9 triggers
		Repeat.times(9).call(target::trigger);
		verify(simulatedWait, times(1)).run();
	}
}

It’s still different from a simple 3-liner test, but the “and” in the test story hints at a more complex story than “get y for x”, so that might be ok. We could probably simplify the test even more if we got access to the internal trigger count and verify the reset directly.

I hope the example was clear enough. For me, the revelation that test problems more often than not have their root cause in production code is a clear message to improve my ability on writing code that facilitates testing instead of obstructing it.

I don’t shoot/omit my messengers anymore even if their message means more work for me.