daniel.lindner

Implicit Protocol Requirements Can Drive You Mad

Some years ago, I had a software project that wanted to integrate a new kind of machinery into an existing application. Thanks to a modular and layered architecture, you could swap out the old machinery module and replace it with a new one. So it came down to writing an elaborate adapter between the existing application code and the new machinery interface. Shouldn’t be too hard, right?

And at first, it wasn’t. The machinery interface was relatively narrow, with just a few data registers to read from and write into. One core functionality of the old and the new machinery was moving equipment around at different axes (horizontally, vertically, etc.). The difference was: The old machinery was based on position switches, the new one operated on a sensor-based positioning system.

Position switches are rudimentary technology: An engine drives along the axis until it triggers the position switch that shuts of the engine. The advantage is a basic set of commands: Drive left (until you hit a switch) or drive right (until you hit a switch). This machinery control can be implemented by analog relais logic. The downside is that there is only guessing where the engine actually is at any moment if it doesn’t reveal its position by triggering a switch.

The new machinery works with a fancier method of positioning and movement. The control unit of the machine keeps track of the coordinates for every axis of movement. If you want the machine to assume a different position, you transmit the target coordinates and the machine moves until the difference is zero.

In reality, it wasn’t that easy. You also needed to transmit the desired velocity of the movement. The target was reached once the coordinates were equal to the transmitted coordinates and the actual velocity of all axes was zero again.

Okay, so making the new machinery move was a two-step transmission: First, you give it the target coordinates, then the speed values. And then you wait until things are like you want them to be.

The new module worked flawlessly with the new machinery. We could move it around in the boring one-dimensional ways the actual use case required or we could make it dance in complicated courses. The customer was pleased and the machinery was installed to perform the one-dimensional movements from now on.

The project was finished successfully. But after a while, the customer had a complaint. Seldom, but reocurring, the machinery would not move when commanded to, but blow a fuse and go into an error state.

Initially, the customer treated it as an electrical problem within the machinery. Until the manufacturer couldn’t find a cause and suspected my software to transmit faulty command parameters. I implemented an exhaustive logging of all transmissions and could prove that the parameters were as correct as they were boring. The application transmitted “full left” or “full right” for the horizontal movement and nothing else.

We were all stumped and out of ideas until I had an idea out of the blue:

What if the command interface to the machinery has a hidden assumption that is not met by the application?

But why did it work 99 percent of the time? Wouldn’t the assumption be present for every movement command?

Every time I hear “spurious failure”, I think about a concurrency problem. But my module worked strictly serial, one command after the other. There was nothing going on concurrently on my side.

And then it dawned me: The concurrent process is the main loop of the machine control unit. The machine control unit essentially runs a single thread that performs a series of steps in an endless loop: Check machine status, check command registers, apply commands, do other machinery stuff, repeat.

What if the “check command registers” step occurs right when my software is in the middle of transmitting the target parameters? It would read a partially written set of parameters. More specifically it would read new target coordinates, but not the necessary velocities. It would calculate delta distances and try to move, but with absurdly low or high velocities, depending on the formulas. If at any point a division by velocity occurs, it would divide by zero.

Because I couldn’t review the code of the machine control unit and the original programmer of it wasn’t available anymore, I tested my hypothesis by reversing the parameter write order: velocity first, location last.

And I wasn’t wrong: This little change got rid of the spurious failures.

The hidden assumption of the control unit code was that all parameters were transactionally valid at any given time. This translated to an implicit protocol requirement: All clients of the command interface needed to either

Transmit all changes at once (not possible with the technology that was used for transmission)
Transmit the changes in an order that has no effect until all changes are written.

The second option was what I implemented. Instead of “steer, then accelerate”, I needed to “accelerate, then steer”, because velocity without a delta distance would not move the equipment, but delta distance without velocity would attempt to do so.

One small sentence about the required write sequence in the documentation would still make this a “surprise requirement”, but a documented one. Without any documentation, its pure luck if a client pushes the buttons in the right order or not.

If you want one learning from this story: If a failure happens only occassionally, think about concurrency problems and include all periphery (humans, too!) into your scenarios.

“Keep in mind” code

Reading source code, especially those from other people (and that includes your past self of some months, too), is hard and needs practice. Source code is one of the rare forms of literature that is easier to write than to read. Yet it seems that code is only written once, but read multiple times during its lifetime. Every time we need to make a change to it, we need to read the whole block of code thoroughly and sift through the rest in order to find the relevant block.

This means that source code should be written with readability and understandability in mind. And in fact, I’ve never met a programmer that set out to write obscure code. We all want our code to be easy to read. And while I don’t have a silver bullet answer how this feature can be achieved, I’ve seen some patterns that are detrimental to the goal. I call them “keep in mind” code lines, because that is what you need to do:

You need to make a mental or physical note of some additional requirement that the source code imposes onto the reader, often without a discernable reason.

Let me make an obvious example:

while (true) {
    // some more code
}

This simple line of code requires the reader to make a note: There needs to be some construct that exits the forever loop in the “some more code” section or else the program wouldn’t work right anyway. We have two possibilities: We can interrupt our flow of reading and understanding, scan ahead looking for the exit structure and discard the mental note once we’ve found it. Or we can persist the note, continue with our reading and cross the note off once we read past the exit structure. Both reactions require additional effort from the reader.

If we choose the first possibility, we need to pause our mental model of the code that we’ve read and understood so far. This is equivalent to peeking some pages ahead in a riveting book, just to make sure the character doesn’t die in the current situation. It’s good for peak suspense, but we really don’t want that in our source code. Source code should be a rather boring type of literature. The stories we tell should fascinate on a higher level than “will this thread survive the method call”?

Recalling a paused mental model is always accompanied by some loss. We don’t notice it right away, but some aspect that we already knew goes missing and needs to be learnt again if relevant. In my opinion, that is a bad trade: My high-level model gets compromised because I need to follow a low-level distraction from one line of code.

If we choose the second possibility and make a written note on a piece of paper (or something equivalent), we might hold the mental model in place during the short interruption of writing the note. But we need to implement a recurring note checking mechanism into our reading process, because we shouldn’t forget about the notes.

There is a third possibility: Ignoring the danger. That would mean reading the code like letting the TV run in the background. You don’t really pay attention and the story just flows by. I don’t think that’s a worthwile way to engage with source code.

Let me try to define what a line of “keep in mind” code is: It is source code that cannot be understood without a forward reference further “down” the lines, but raises concerns or questions. It represents an open item on my “sorrow list”.

Another, less obvious example would be:

private final InputStream input;

Because InputStream is a resource (a Closeable) in java, it needs to be used in accordance to its lifecycle. Storing it into a member variable means that the enclosing object “inherits” the lifecycle management. If the enclosing object exposes the resource to the outside, it gets messy. All these unfortunate scenarios appear on my checklist as soon as I read the line above.

What can we do to avoid “keep in mind” lines? We can try to structure our source code not for writing, but for reading. The dangling mental reference of “I need to exit that while-true loop” is present even as the code is written. Once we notice that we keep a mental short-term list of open code structure tasks while programming, we can optimize it. Every code structure that doesn’t lead to some mandatory complementary work further down is one less thing to keep in mind while reading.

How would the example above produce less mental load while reading? Two options come to mind:

do {
    // some more code
} while (true);

This is essentially the same code, but the reader has seen all lines that exit the loop before being made aware that otherwise, it loops forever. The solution to the problem is already present when the problem presents itself.

Another option makes the exit structure explicit from the start:

boolean exitWhileLoop = false;
while (!exitWhileLoop) {
    // some more code
}

The exit structures in the “some more code” section should now use the flag “exitWhileLoop” if possible instead of breaking out directly. If necessary, a hearty “continue” statement at the right place omits the rest of the loop code. This option will lead to more code that is more verbose about the control flow. For the reader, that’s a good thing because the intent isn’t hidden between the lines anymore. If you as the code author think that your code gets clunky because of it, contemplate if the control flow structure is a good fit for the story you want to tell. Maybe you can simplify it, or you need to employ an even more complex structure because your story requires it.

In any case, try to avoid “keep in mind” lines. They burden your readers and make working with the code less pleasant. I have several more examples of such lines or structures, but wanted to keep this blog post short. Are you interested in more specific examples? Can you provide some example from your experience? Write a comment!

P.S.: I love the gibberish on the AI-generated checklist in the blog entry picture and wanted you to savor it, too.

Single-Use Webapps

One of our customers has the requirement to enter data into a large database while being out in the field, potentially without any internet connection. This is a diminishing problem with the availability of satellite-based internet access, but it can be solved in different ways, not just the obvious “make internet happen” way.

One way to solve the problem is to analyze the customer’s requirements and his degrees of freedom – the things he has some leeway over. The crucial functionality is the safe and correct digital entry of the data. It would suffice to use a pen and a paper or an excel sheet if the mere typing of data was the main point. But the data needs to be linked to existing business entities and has some business rules that need to be obeyed. Neither paper nor excel would warn the user if a business rule is violated by the new data. The warning or error would be delayed until the data needs to be copied over into the real system and then it would be too late to correct it. Any correction attempt needs to happen on site, on time.

One leeway part is the delay between the data recording and the transfer into the real system. Copying the data over might happen several days later, but because the data is exclusive to the geographical spot, there are no edit collisions to be feared. So it’s not a race for the central database, it’s more of an “eventual consistency” situation.

If you take those two dimensions into account, you might invent “single-use webapps”. These are self-contained HTML files that present a data entry page that is dynamic enough to provide interconnected selection lists and real-time data checks. It feels like they gathered their lists and checks from the real system, and that is exactly what they did. They just did it when the HTML file was generated and not when it is used locally in the browser. The entry page is prepared with current data from the central database, written to the file and then forgotten by the system. It has no live connection and no ability to update its lists. It only exists for one specific data recording at one specific geographical place. It even has a “best before” date baked into the code so that it gives a warning if the preparation date and the usage date are too distant.

Like any good data entry form, the single-use webapp presents a “save data” button to the user. In a live situation, this button would transfer the data to the central database system, checking its integrity and correctness on the way. In our case, the checks on the data are done (using the information level at page creation time) and then, a transfer file is written to the local disk. The transfer file is essentially just the payload of the request that would happen in the live situation. It gets stored to be transferred later, when the connection to the central system is available again.

And what happens to the generated HTML files? The user just deletes them after usage. They only serve one purpose: To create one transfer file for one specific data entry task, giving the user the comfort and safety of the real system while entering the data.

What would your solution of the problem look like?

Disclaimer: While the idea was demonstrated as a proof of concept, it was not put into practice by the customer yet. The appeal of “internet access anywhere on the planet” is undeniably bigger and has won the competition of solutions for now. We would have chosen alike. The single-use webapp provides comfort and ease-of-use, but ubiquitous connectivity to the central system tops all other solutions and doesn’t need an extra manual or unusual handling.

How to Eat Last

A good book about leader mentality is “Leaders Eat Last” from Simon Sinek. The book is not about your diet, but your approach towards your subordinates and your peer group.

I don’t want to recapitulate the content of the book – it is worth the read or at least a presentation about it. I want to talk about one specific implementation of the principle in my company that I did even before reading the book, but could only name and highlight after Simon Sinek lend me his analogy.

I’m a software developer and founded a software development company. I hired other software developers and they develop software with me. I might be the founder, owner and director of the company (so, in a short team, the “leader”), but I’m still a fellow developer and understand the developer’s mindset. So I know what a developer wants, because I want it, too.

Except, I make sure that I’m the last one in the company to get it.

Two examples:

We bought our second round of office desks in 2010, when we moved into a new office. They were still traditional desks that could only be height-adjusted with tremenduous effort. We only did it once and settled for “good enough”. Our first electrically height adjustable desk was bought in 2013 because of a specific medical requirement. But it opened the door to the possibility of having the desk at any height throughout the day. You might even work standing up.

We slowly accumulated more electrically height adjustable desks until we had 50 percent classic and 50 percent electric desks. At that point, I bought the other half at once (and they are fancy “gamer nerd” desks, because why not?). The last classic desk in the company was my own. I replaced it with the oldest electric desk in the portfolio. Now I can work while standing up, too.

When the Corona pandemic hit in 2020, we moved to home offices all of a sudden. I wrote about this change several times on this blog. This physical separation led to an increased demand for video calls. I made sure everyone is covered with the basic equipment (webcam, headphones, etc.), including me. But I also experimented with the concept of a “virtual office”. It consisted of a video meeting room that I hung out in all workday. I turned the camera and microphone off, but was instantly present if somebody had a desire to talk to me – just like in the real office. For this use case, I installed an additional monitor on my setup, the fourth one, called the “pandemic display” in a blog post about it. Because I didn’t know if the experiment would work, I bought the smallest and cheapest display available for me.

The experiment went fine and I decided to equip everyone with an additional “videoconference display”. The new models were bigger and better. If an employee didn’t see the benefit of it, I didn’t force them to install one in their home office, but every workplace in the office has at least four monitors. Guess were the original one is still installed? I made sure everybody had a better monitor than me.

With this process, I can guarantee that my employees have the work equipment that is good enough for their boss. Because I have it too – or something inferior. If I feel the need to upgrade my gear, I upgrade everybody else and then lift my things to their level. If I feel comfortable with my gear, so does everybody else (except for individual demands and we have a process installed for that, too).

I love self-regulating systems and this is one: The whole company is equipped in a manner that is sufficient or even better for me to do the work. If I want more or better things, everybody gets the upgrade before me because only then do I allow myself to have the same luxury. No “upward” exception for the boss, and only temporarily “downwards”. My wants and needs define the lower limit of equipment quality for all of us. If I can’t buy it for everyone, I don’t buy it.

That is the whole trick: Equip yourself last or lowest. You can be sure everybody is well-equipped that way. Thanks, Simon!

Your Placeholder Data Still Conveys Meaning – Part I

There is a long-standing tradition to fill unknown text fields with placeholder data. In graphic design, these texts are called “dummy text”. In the german language, the word is “Blindtext”, which translates directly as “blind text”. The word means that while some text is there, the meaning of it can’t be seen.

A popular dummy text is the latin sounding “Lorem ipsum dolor sit amet”, which isn’t actually valid latin. It has no meaning other than being text and taking up space.

While developing software user interfaces, we often deal with smaller input areas like textfields (instead of text areas that could hold a sizeable portion of “lorem ipsum”) or combo boxes. If we don’t know the actual content yet, we tend to fill it with placeholder data that tries to reflect the software’s domain. And by doing that, we can make many mistakes that seem small because they can easily be fixed – just change the text – but might have negative effects that can just as easily be avoided. But you need to be aware of the subtle messages your placeholders send to the reader.

In this series, we will look at a specific domain example: digital invoices. The mistakes and solutions aren’t bound to any domain, though. And we will look at user interfaces and the corresponding source code, because you can fool yourself or your fellow developers with placeholder data just as easily as your customer.

We start with a relatively simple mistake: Letting your placeholder data appear to be real.

The digital (or electronic) invoice is a long-running effort to reduce actual paper document usage in the economy. With the introduction of the european norm EN 16931, there is a chance of a unified digital format used in one major economic region. Several national interpretations of the norm exist, but the essential parts are identical. You can view invoices following the format with a specialized viewer application like the Quba viewer. One section of the data is the information about the invoice originator, or the “seller” in domain terms:

You can see the defined fields of the norm (I omitted a few for simplicity – a mistake we will discuss later in detail) and a seemingly correct set of values. It appears to be the address of my company, the Softwareschneiderei GmbH.

If you take a quick look at the imprint of our home page, you can already spot some differences. The street is obviously wrong and the postal code is a near miss. But other data is seemingly correct: The company name is real, the country code is valid and my name has no spelling error.

And then, there are those placeholder texts that appear to be correct, but really aren’t. I don’t encourage you to dial the phone number, because it is a real number. But it won’t connect to a phone, because it is routed to our fax machine (we don’t actually have a “machine” for that, it’s a piece of software that will act like a fax). Even more tricky is the e-mail address. It could very well be routed, but actually isn’t.

Both placeholder texts serve the purpose of “showing it like it might be”, but appear to be so real and finalized that they lose the “placeholder” characteristics. If you show the seller data to me, I will immediately spot the wrong street and probably the wrong postal code, but accept the phone number as “real”. But is isn’t real, it is just very similar to the real one.

How can you avoid placeholders that look too real?

One possibility is to fake the data completely until given the real values:

These texts have the same “look and feel” and the same lengths as the near-miss entries, but are readily recognizable as made-up values.

There is only one problem: If you mix real and made-up values, you present your readers a guessing game for each entry: real or placeholder? If it is no big deal to change the placeholders later on, resist the urge to be “as real as possible”. You can change things like the company name from “Softwareschneiderei GmbH” to “Your Company Name Here Inc.” or something similar and it won’t befuddle anybody because the other texts are placeholders, too. You convey the information that this section is still “under construction”. There is no “80% done” for these things. The section is fully real or not. Introducing situations like “the company name and the place are already real, but the street, postal code and anything else isn’t” doesn’t clear anything and only makes things more complicated.

But I want to give you another possibility to make the placeholders look less real:

Add a prefix or suffix that communicates that the entry is in a state of flux:

That way, you can communicate that you know, guess or propose a value for the field, but it still needs approval from the customer. Another benefit is that you can search for “TODO” and list all the decisions that are pending.

If, for some reason, it is not possible to include the prefix or suffix with a separator, try to include it as visible (and searchable) as possible:

This are the two ways I make my placeholder text convey the information that they are, indeed, just placeholders and not the real thing yet.

Maybe there are other possibilities that you know of? Describe them in a comment below!

In the first part of this series, we looked at two mistakes:

Your placeholders look too real
You mix real data with placeholders

And we discussed three solutions:

Make your placeholders unmistakably fake
Give your placeholders a “TODO” prefix or suffix
Demote your real data to placeholders as long as there is still an open question

In the next part of this series, we will look at the code side of the problem and discover that we can make our lives easier there as well.

Highlight Your Assumptions With a Test

There are many good reasons to write unit tests for your code. Most of them are abstract enough that it might be hard to see the connection to your current work:

Increase the test coverage
Find bugs
Guide future changes
Explain the code
etc.

I’m not saying that these goals aren’t worth it. But they can feel remote and not imperative enough. If your test coverage is high enough for the (mostly arbitrary) threshold, can’t we let the tests slip a bit this time? If I don’t know about future changes, how can I write guidelining tests for them? Better wait until I actually know what I need to know.

Just like that, the tests don’t get written or not written in time. Writing them after the fact feels cumbersome and yields subpar tests.

Finding motivation by stating your motivation

One thing I do to improve my testing habit is to state my motivation why I’m writing the test in the first place. It seemed to boil down to two main motivations:

#Requirement: The test ensures that an explicit goal is reached, like a business rule that is spelled out in the requirement text. If my customer wants the value added tax of a price to be 19 % for baby food and 7 % for animal food, that’s a direct requirement that I can write unit tests for.
#Bugfix: The test ensures the perpetual absence of a bug that was found in production (or in development and would be devastating in production). These tests are “tests that should have been there sooner”. But at least, they are there now and protect you from making the same mistake twice.

A code example for a #Requirement test looks like this:

/**
 * #Requirement: https://ticket.system/TICKET-132
 */
@Test
void reduced_VAT_for_animal_food() {
    var actual = VAT.addTo(
        new NetPrice(10.00),
        TaxCategory.animalFood
    );
    assertEquals(
        new GrossPrice(10.70),
        actual
    );
}

If you want an example for a #Bugfix test, it might look like this:

/**
 * #Bugfix: https://ticket.system/TICKET-218
 */
@Test
void no_exception_for_zero_price() {
    try {
        var actual = VAT.addTo(
            NetPrice.zero,
            TaxCategory.general
        );
        assertEquals(
            GrossPrice.zero,
            actual
        );
    } catch (ArithmeticException e) {
        fail(
            "You messed up the tax calculation for zero prices (again).",
            e
        );
    }
}

In my mind, these motivations correlate with the second rule of the “ATRIP rules for good unit tests” from the book “Pragmatic Unit Testing” (first edition), which is named “Thorough”. It can be summarized like this:

all mission critical functionality needs to be tested
for every occuring bug, there needs to be an additional test that ensures that the bug cannot happen again

The first bullet point leads to #Requirement-tests, the second one to #Bugfix-tests.

An overshadowed motivation

But recently, we discovered a third motivation that can easily be overshadowed by #Requirement:

#Assumption: The test ensures a fact that is not stated explicitly by the requirement. The code author used domain knowledge and common sense to infer the most probable behaviour of the functionality, but it is a guess to fill a gap in the requirement text.

This is not directly related to the ATRIP rules. Maybe, if one needs to fit it into the ruleset, it might be part of the fifth rule: “Professional”. The rule states that test code should be crafted with care and tidyness, that it is relevant even if it doesn’t get shipped to the customer. But this correlation is my personal opinion and I don’t want my interpretation to stop you from finding your own justification why testing assumptions is worth it.

How is an assumption different from a requirement? The requirement is written down somewhere else, too and not just in the code. The assumption is necessary for the code to run and exhibit the requirements, but it’s only in the code. In the mind of the developer, the assumption is a logical extrapolation from the given requirements. “It can’t be anything else!” is a typical thought about it. But it is only “written down” in the mind of the developer, nowhere else.

And this is a perfect motivation for a targeted unit test that “states the obvious”. If you tag it with #Assumption, it makes it clear for the next developer that the actual content of the corresponding coded fact is more likely to change than other facts, because it wasn’t required directly.

So if you come across an unit test that looks like this:

/**
 * #Assumption: https://ticket.system/TICKET-132
 */
@Test
void normal_VAT_for_clothing() {
    var actual = VAT.addTo(
        new NetPrice(10.00),
        TaxCategory.clothing
    );
    assertEquals(
        new GrossPrice(11.90),
        actual
    );
}

you know that the original author made an educated guess about the expected functionality, but wasn’t explicitly told and is not totally sure about it.

This is a nice way to make it clear that some of your code is not as rigid or expected as other code that was directly required by a ticket. And by writing an unit test for it, you also make sure that if anybody changes that assumed fact, they know what they are doing and are not just guessing, too.

Your null parameter is hostile

I hope we all agree that emitting null values is a hostile move. If you are not convinced, please ask the inventor of the null pointer, Sir Tony Hoare. Or just listen to him giving you an elaborate answer to your question:

https://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare/

So, every time you pass a null value across your code’s boundary, you essentially outsource a problem to somebody else. And even worse, you multiply the problem, because every client of yours needs to deal with it.

But what about the entries to your functionality? The parameters of your methods? If somebody passes null into your code, it’s clearly their fault, right?

Let’s look at an example of pdfbox, a java library that deals with the PDF file format. If you want to merge two or more PDF documents together, you might write code like this:

File left = new File("C:/temp/document1.pdf");
File right = new File("C:/temp/document2.pdf");

PDFMergerUtility merger = new PDFMergerUtility();
merger.setDestinationFileName("C:/temp/combined.pdf");

merger.addSource(left);
merger.addSource(right);

merger.mergeDocuments(null);

If you copy this code verbatim, please be aware that proper exception and resource handling is missing here. But that’s not the point of this blog entry. Instead, I want you to look at the last line, especially the parameter. It is a null pointer and it was my decision to pass it here. Or was it really?

If you look at the Javadoc of the method, you’ll notice that it expects a StreamCacheCreateFunction type, or “a function to create an instance of a stream cache”. If you don’t want to be specific, they tell you that “in case of null unrestricted main memory is used”.

Well, in our example code above, we don’t have the necessity to be specific about a stream cache. We could implement our own UnrestrictedMainMemoryStreamCacheCreator, but it would just add cognitive load on the next reader and don’t provide any benefit. So, we decide to use the convenience value of null and don’t overthink the situation.

But that’s the same as emitting null from your code over a boundary, just in the other direction. We use null as a way to communicate a standard behaviour here. And that’s deeply flawed, because null is not standard and it is not convenient.

Offering an interface that encourages clients to use null for convience or abbreviation purposes should be considered just as hostile as returning null in case of errors or “non-results”.

How could this situation be defused by the API author? Two simple solutions come to mind:

There could be a parameter-less method that internally delegates to the parameterized one, using the convenient null value. This way, my client code stays clear from null values and states its intent without magic numbers, whereas the implementation is free to work with null internally. Working with null is not that big of a problem, as long as it doesn’t pass a boundary. The internal workings of a code entity is of nobody’s concern as long as it isn’t visible from the outside.
Or we could define the parameter as optional. I mean in the sense of Optional<StreamCacheCreateFunction>. It replaces null with Optional.empty(), which is still a bit weird (why would I pass an empty box to a code entity?), but communicates the situation better than before.

Of course, the library could also offer a variety of useful standard implementations for that interface, but that would essentially be the same solution as the self-written implementation, minus the coding effort.

In summary, every occurrence of a null pointer should be treated as toxic. If you handle toxic material inside your code entity without spilling it, that’s on you. If somebody spills toxic material as a result of a method call, that’s an hostile act.

But inviting your clients to use toxic material for convenience should be considered as an hostile attitude, too. It normalizes harmful behaviour and leads to a careless usage of the most dangerous pointer value in existence.

Revisiting the bus factor concept

The concept of a “bus factor” is both grim and very useful to manage project risks. It originates from the area of project management and is sometimes called a “truck number” or (to give it a more positive spin) the “lottery factor”.

It tries to pinpoint the number of people in a project that can drop out suddenly and unplanned without the project success being jeopardized. The “bus” or “truck” is conceptually used as the tool to enforce the drop out. The big lottery win might produce the same outcome, but with less implacability.

The sole number of a bus factor is often helpful to make lurking project risks visible. Especially a bus factor of 1, the most nerve-wrecking number, should be avoided. It means that the project success is directly coupled to the health (or gambling luck) of one specific person.

But even a higher bus factor, lets say 3, is no complete relief. What if those three people hop into the same car to meet the customer in a project meeting and have an accident? The only way to mitigate those “cluster risks” is to plan separate routes and means of travel. Most people would regard those measures as “overly paranoid” and it robs the three people from communicating directly before and after the meeting.

You can explore the individual project risk with more sophisticated tools than just a number. Setting up and filling out a RACI matrix (or one of its many variants) is a good way to make things visible.

But in this blog post, I want to highlight another detail of the bus factor that I learned the hard way: The “bus factor risk” of different people can vary a lot. The “bus factor risk” is the individual probability that the bus factor occurs.

Let’s have an example with the lottery: Your project has two key players that keep the project afloat. One of them never fills out a lottery ticket, the other plays regularly. Their “lottery factor risk percentage” is not equal. Given the low probability to win the lottery, the percentage doesn’t change much, but it changes.

Now imagine one person that often pursues high risk spare time activities. I don’t want to single out one specific activity, but think about free-climbing maybe. The other person stays mostly at home and cooks delicious meals without using sharp knives or hot water. Ok, this comparison sounds a bit contrived, but you get the message:

Two projects with a bus factor of 2 each can vary a lot in the actual risk percentage, because all 4 people have their individual drop out percentage.

It doesn’t have to be spare time activities, by the way. Every person has an individual health risk that can only be improved to a certain degree. Every person simply has “luck” or “misfortune” and can’t do anything about it.

My message is simply that the bus factor number 2 might not be “half the risk” than 1. Or even that two bus factor numbers with the same value denote equal risk.

I don’t think that it is useful to try to quantify the individual “bus factor risk”of a person. Way too many factors come into play and most of them should not be the employer’s concern (like a medical history or spare time activities). What might be useful is to be aware that equal numbers don’t equate equal actual risk.

How to improve this() by using super()

I have a particular programming style regarding constructors in Java that often sparks curiosity and discussion. In this blog post, I want to note my part in these discussions down.

Let’s start with the simplest example possible: A class without anything. Let’s call it a thing:

public class Thing {
}

There is not much you can do with this Thing. You can instantiate it and then call methods that are present for every Object in Java:

Thing mine = new Thing();
System.out.println(
    mine.hashCode()
);

This code tells us at least two things about the Thing class that aren’t immediately apparent:

It inherits methods from the Object class; therefore, it extends Object.
It has a constructor without any parameters, the “default constructor”.

If we were forced to write those two things in code, our class would look like this:

public class Thing extends Object {
    
    public Thing() {
        super();
    }
}

That’s a lot of noise for essentially no signal/information. But I adopted one rule from it:

Rule 1: Every production class has at least one constructor explicitly written in code.

For me, this is the textual anchor to navigate my code. Because it is the only constructor (so far), every instantiation of the class needs to call it. If I use “Callers” in my IDE on it, I see all clients that use the class by name.

Every IDE has a workaround to see the callers of the constructor(s) without pointing at some piece of code. If you are familiar with such a feature, you might use it in favor of writing explicit constructors. But every IDE works out of the box with the explicit constructor, and that’s what I chose.

There are some exceptions to Rule 1:

Test classes aren’t instantiated directly, so they don’t benefit from a constructor. See also https://schneide.blog/2024/09/30/every-unit-test-is-a-stage-play-part-iii/ for a reasoning why my test classes don’t have explicit constructors.
Record classes are syntactic sugar that don’t benefit from an explicit constructor that replaces the generated one. In fact, record classes use much of their appeal once you write constructors for them.
Anonymous inner types are oftentimes used in one place exclusively. If I need to see all their clients by using the IDE, my code is in a very problematic state, and an explicit constructor won’t help.

One thing that Rule 1 doesn’t cover is the first line of each constructor:

Rule 2: The first line of each constructor contains either a super() or a this() call.

The no-parameters call to the constructor of the superclass is done regardless of my code, but I prefer to see it in code. This is a visual cue to check Rule 3 without much effort:

Rule 3: Each class has only one constructor calling super().

If you incorporate Rule 3 into your code, the instantiation process of your objects gets much cleaner and free from duplication. It means that if you only exhibit one constructor, it calls super() – with or without parameters. If you provide more than one constructor, they form a hierarchy: One constructor is the “main” or “core” constructor. It is the one that calls super(). All the other constructors are “secondary” or “intermediate” constructors. They use this() to call the main constructor or another secondary constructor that is an intermediate step towards the main constructor.

If you visualize this construct, it forms a funnel that directs all constructor calls into the main constructor. By listing its callers, you can see all clients of your class, even those that use secondary constructors. As soon as you have two super() calls in your class, you have two separate ways to construct objects from it. I came to find this possibility way more harmful than useful. There are usually better ways to solve the client’s problem with object instantiation than to introduce a major source of current or future duplication (and the divergent change code smell). If you are interested in some of them, leave a comment, and I will write a blog entry explaining some of them.

Back to the funnel:

if you don’t see the funnel yet, let me abstract the situation a bit more:

This is how it looks in source code:

public class Thing {
    
    private final String name;
    
    public Thing(int serialNumber) {
        this(
            "S/N " + serialNumber
        );
    }
    
    public Thing(String name) {
        super();
        this.name = name;
    }
}

I find this structure very helpful to navigate complex object construction code. But I also have a heuristic that the number of secondary constructors (by visually counting the this() calls) is proportional to the amount of head scratching and resistance to change that the class will induce.

As always, there are exceptions to the rule:

Some classes are just “more specific names” for the same concept. Custom exception types come to mind (see the code example below). It is ok to have several super() calls in these classes, as long as they are clearly free from additional complexity.
Enum types cannot have the super() call in the main constructor. I don’t write a comment as a placeholder; I trust that enum types are low-complexity classes with only a few private constructors and no shenanigans.

This is an example of a multi-super-call class:

public class BadRequest extends IOException {

    public BadRequest(String message, Throwable cause) {
        super(message, cause);
    }

    public BadRequest(String message) {
        super(message);
    }
}

It clearly does nothing more than represent a more specific IOException. There won’t be many reasons to change or even just look at this code.

I might implement a variation to my Rule 2 in the future, starting with Java 22: https://openjdk.org/jeps/447. I’m looking forward to incorporating the new possibilities into my habits!

As you’ve seen, my constructor code style tries to facilitate two things:

Navigation in the project code, with anchor points for IDE functionality.
Orientation in the class code with a standard structure for easier mental mapping.

It introduces boilerplate or cruft code, but only a low amount at specific places. This is the trade-off I’m willing to make.

What are your ideas about this? Leave us a comment!

Java enum inheritance preferences are weird

Java enums were weird from their introduction in Java 5 in the year 2004. They are implemented by forcing the compiler to generate several methods based on the declaration of fields/constants in the enum class. For example, the static Enum::valueOf(String) method is only present after compilation.

But with the introduction of default methods in Java 8 (published 2014), things got a little bit weirder if you combine interfaces, default methods and enums.

Let’s look at an example:

public interface Person {

  String name();
}

Nothing exciting to see here, just a Person type that can be asked about its name. Let’s add a default implementation that makes clearly no sense at all:

public interface Person {

  default String name() {
    return UUID.randomUUID().toString();
  }
}

If you implement this interface in a class and don’t overwrite the name() method, you are the weird one:

public class ExternalEmployee implements Person {

  public ExternalEmployee() {
    super();
  }
}

We can make your weirdness visible by creating an ExternalEmployee and calling its name() method:

public class Main {

  public static void main(String[] args) {
    ExternalEmployee external = new ExternalEmployee();
    System.out.println(external.name());
  }
}

This main method prints the “name” of your external employee on the console:

1460edf7-04c7-4f59-84dc-7f9b29371419

Are you sure that you hired a human and not some robot?

But what if we are a small startup company with just a few regular employees that can be expressed by a java enum?

public enum Staff implements Person {

  michael,
  bob,
  chris,
  ;
}

You can probably predict what this little main method prints on the console:

public class Main {

  public static void main(String[] args) {
    System.out.println(
      Staff.michael.name()
    );		
  }
}

But, to our surprise, the name() method got overwritten, without us doing or declaring to do so:

michael

We ended up with the “default” generated name() method from the Java enum type. In this case, the code generated by the compiler takes precedence over the default implementation in the interface, which isn’t what we would expect at first glance.

To our grief, we can’t change this behaviour back to a state that we want by overwriting the name() method once more in our Staff class (maybe we want our employees to be named by long numbers!), because the generated name() method is declared final. From the source code of the enum class:

/**
 * @return the name of this enum constant
 */
public final String name() {
  return name;
}

The only way out of this situation is to avoid the names of methods that are generated in an enum type. For the more obscure ordinal(), this might be feasible, but name() is prone for name conflicts (heh!).

While I can change my example to getName() or something, other situations are more delicate, like this Kotlin issue documents: https://youtrack.jetbrains.com/issue/KT-14115/Enum-cant-implement-an-interface-with-method-name

And I’m really a fan of Java’s enum functionality, it has the power to be really useful in a lot of circumstances. But with great weirdness comes great confusion sometimes.

	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…