A Purpose of Domain-Driven-English-German-Language-Mumbo-Jumbo

Disclaimer: Due to it’s nature, this blog article needs to make some use of the German language. This is part of its essence and could not be avoided, sorry to all international readers.

Since its conception in 2003, the expression “Domain-Driven Design” might have been tossed around a bit, together with all the other XYZ-Driven Designs that are out there. As usual with such terms, I only try to gather the core points of these ideas; I do not like sticking to any such concept with religious fervor or otherwise dogmatic understanding. Moreover, these concepts are usually not of the type “you either use them or you don’t”, but you have some control over the degree in which you employ them, depending on your requirements as a whole.

This is why in a new project, I might implement a handful of ideas and see where it goes, always prepared to call it a day and toss any rule out when it endangers my progress. On the other hand, if I only follow principles that instantly convince me, I risk missing out on some practice that just is unusual, but not bad in itself.

Domain-Driven Design, in my understanding, aims at aligning the architectural details of your code base with the domain model, i.e. the technical peculiarities of your (customer’s) specific use case. Which doesn’t sound hard or bad per se, but as usual, takes some practice to shed some light on.

Enter the idea of using German words in your code. For variables, methods, classes, and such stuff – even with Umlauts and the Eszett (“ß”). If one is not used to that, such code might instantly induce some sort of digestive sickness or at least that’s what it has done to me, because of it’s sheer look, i.e.

// just some example to look at

var sortedZuordnungen = szenario.SortedZeitplanForArbeitsplatz(arbeitsplatz.Id)
.ToList();
var gesperrteHalbtage = sperrungen.Where(s => s.AufArbeitsplatz(arbeitsplatz.Id)).Select(s => s.Halbtag);

var nächsteZuordnung = sortedZuordnungen.FirstOrDefault();
Halbtag tryStart = Constants.HeuteVormittag;

while (nächsteZuordnung != default)
{
    tryStart.CreateListFromHere(anzahlHalbtage, gesperrteHalbtage);
    nächsteZuordnung = FindNächsteZuordnung();
}

(replace “German” with any other language your customer might use; if you’re living in a completely English-speaking environment, this article should be of limited insight for you. Sorry again.)

Now code like this – at first – what is this!? That’s not proper! It looks like the sound of some older German politician who never really bothered learning the English language, with some crazy dialect and whatnot!

The advantage behind this concept becomes especially apparent when dealing with a lot of very generic terms. E.g. the word “component” might just mean a button on your UI, or it might mean something very specific for your customer – or even worse, you might mean something very specific for your customer, but in reality, he would never refer to that entity with that word, so… you’re left with a chance of awkward bewilderment in every single meeting with the guy.

So, despite it’s weird look – this is one of the concepts that I haven’t tossed out the window yet. The key point is the overall reduction of friction in your thoughts. In communicating with various languages, one always has to do some minor translations in your head. These can be faulty or misleading either way – the nature of the language itself is secondary.

What works for me, is

  • Pure code fabrications that are close to the programming language get English names like usual
  • Things that a customer might talk about in German should get a German name
  • German and English can be mixed in a single word without any shame
  • Thus, words can be long, but you have an IDE who can help with that
  • German compound words get the correct German capitalization, i.e. the equivalent of “componentNumber” would be “komponentennummer”, not “komponentenNummer”
  • The linking of two German parts happens with the correct grammatical standards, i.e. a “workPlace” becomes an “arbeitsplatz” with the “s” inbetween (Fugen-s).

For some reason, this by now resulted in quite an uninterrupted workflow for me. The last two rules were an interesting finding because I noticed that without them, I really made a noticeable pause in my thinking process whenever I thought about these entities. This pause is now gone.

E.g. by now, the cognitive load of talking about a “KomponentenController” – something that is a Controller from a software engineering point of view and dealing with components from a domain point of view, appears easier for me than having to talk about a “ComponentController” with the extra translation of Component and Komponente. Mind you, there are enough words that do not sound that similar in our two languages.

I will not use this concept in every single project I might start from now. I.e. for hobby projects (where I’m my own customer), I would still prefer the 100%-English-language solution. But depending on your project, this is worth a try, and I’m positively amazed on how well that can work.

Ignoring YAGNI – 12 years later

Fourteen years ago, we started to build a distributed system to gather environmental data in an automated 24/7 fashion. Our development process was agile and made heavy use of short iterations (at least that was what they were then, today they are normal-sized). So the system grew with many small new features and improvements, giving the customer immediate business value.

One part of the system was the task scheduler. Because the system had to run 24/7 and be mostly independent of human interaction, the task scheduler’s job was to launch different measurement processes at the right time. We had done extensive domain crunching and figured out that all tasks follow a rigid time regime like “start every 10 minutes” or “start every hour”, regardless of the processes’ runtime. This made the scheduler rather easy to develop. You should keep it simple, after all.

But another result of the domain crunching bothered us: The schedule of all tasks originated from the previous software system, built 30 years ago and definitely unfit for the modern software world. The schedules weren’t really rooted in the domain, they all had technical explanations like “the recording of the values is done sequentially and takes up to 8 minutes, we can’t record them more often than that”. For our project, the measurement hardware was changed, so our recording took a couple of milliseconds. We could store and display the values continuously, if the need arises.

So we discussed the required simpleness or complexity of the task scheduler with the customer and they seemed pleased with all the new possibilities. But they decided that the current schedules were sufficient and didn’t need to be changed. We could go ahead and build our simple task scheduler.

And this is when we decided to abandon KISS and make the task scheduler more powerful than needed. “But you ain’t going to need it!” was the enemy. Because we knew that the customer will inevitably come around and make use of their new possibilities. We knew that if we build the system with more complexity, we would be the heroes in a future time, wearing a smug smile and telling the customer: “We’ve already built this, you can use it right away”. Oh how glorious this prospect of the future shone! Just a few more thoughts going into the code and we’re set for a bright future.

Let me tell you a few details about the “few more thoughts” with the example of an “every hour” task schedule. Instead of hard-coding the schedule, we added a configuration file with a cron-like expression for the schedule. You could now leverage the power of cron expressions to design your schedule as you see fit. If you wanted to change the schedule from “every hour” to “every odd minute and when the pale moon rises”, you could do so. The task scheduler had to interpret the configuration file and make sure that tasks don’t pile up: If you schedule a task to run “every minute”, but it takes two minutes to process, you’ve essentially built a time-bomb for your system load. This must not be feasible.

But it doesn’t stop there. A lot of functionality, most of which wasn’t even present or outlined at the time of our decision, relies implicitly on that schedule. Two examples: There are manual operations that must not be performed during the execution of the task. The system goes into a “protected state” around the task execution. It disables these operations a few minutes before the scheduled execution and even some time afterwards. If you had a fixed schedule of “every hour”, you could even hard-code the protected timespan. With a possible dynamic schedule, you have to calculate your timespan based on the current schedule and warn your operator if it isn’t possible anymore to find a time slot to even perform the manual operation.
The second example is a functionality that supervises the completeness of the recorded data. The problem is: This functionality is on another computer (it’s a distributed system, remember?) that doesn’t know about the configuration files. To be able to scan the data archive and say “everything that should be there, is there”, the second computer needs to know about all the schedules of the first computers (there are many of them, recording their data on their own schedules and transferring it to the second computer). And if a schedule changes, the second computer needs to take the change into account and scan the data archive for two areas: one area with the old schedule and one area with the new schedule. Otherwise, there would be false alarms.

You can probably see that the one decision to make the task scheduler a little more complex and configurable as required had quite some impact on the complexity of other parts of the system. But this investment will be worth it as soon as the customer changes the schedule! The whole system is programmed, tested and documented to facilitate schedule changes. We are ready!

It’s been over twelve years since we wrote the first line of code for the more complex implementation (I’ve checked the source control logs). The customer hasn’t changed a single bit of the schedule yet. There are over twenty “first computers” and they all still run the same task schedule as initially planned. Our decision did nothing but to add accidental complexity to the system. It probably introduced some bugs along the way, too. It certainly increased our required level of awareness (“hurdle of understanding”) during the development of features that are somewhat coupled with the task schedule.

In short: It’s been a disaster. The smug smile we thought we’d wear has been replaced by a deep frown. Who wrote all that mess? And why? It wasn’t the customer, it was us. We will never be going to need it.

Domain-aligned bugs

Frank C. Müller [CC BY-SA 4.0 (https://creativecommons.org/licenses/by-sa/4.0) ]
Imagine that you are an user of a typical enterprise software that handles commercial products and their prices. There are different prices in the software that are somehow related to each other. There is the purchase price that indicates your cost if you buy the product. There is the retail price that gets listed in your price lists and is paid by your customers, should they buy the product. You probably already figured out that the retail price should never be lower than the purchase price, because that would mean you lose money with every successful sale.

Let’s say that the enterprise software not only handles products, but also parts. Several parts combined, with some manufacturing effort, result in a product. Each part has a purchase price, the resulting product has a retail price. The retail price of the product should be higher than the sum of purchase prices of the parts. If it isn’t, you lose the costs of the manufacturing effort and some extra money with every successful sale.

If for any reason you cannot clearly estimate your manufacturing effort, the enterprise software has another input field for an amount of money that you can add to the sum of the parts’ costs. We call this field the “sales bonus”. So, if you sell a product made up of parts, your customer has to pay a price that consists at least of the retail prices of the parts and the sales bonus. Of course, your customer has an individual discount percentage that needs to be subtracted from the total price. Are you still following?

You are now thinking in the domain of price determination and financial mathematics. If you were the user of said enterprise software, you’d probably expect some bugs like these:

  • It is possible to enter a retail price lower than the purchase price
  • The price of products manufactured from parts isn’t calculated correctly
  • It is possible to enter a negative sales bonus
  • The total price with discount could be lower than the sum of purchase prices of the parts without a warning

All of them are bugs in the domain. All of them can be explained to a domain expert or a user with terms and concepts from the domain.

But what about the bug when you sell a product that consists of three parts, each with a retail price of 10 €, and a sales bonus of 5 €. You want to create a quote for your customer and the price shows up as 34,99999999998 €. You are a bit bewildered and try to countervail the apparent rounding error by changing the sales bonus to 5,00000000002 €. After this change you get another crazy total price and your prices in the database are different from what you entered, too. Everything seems to destabilize and deviate further and further from clear cut prices.

As a programmer, you know what happened. You know what caused this effect of numerical instability. Somebody stored monetary values in a floating point number. You know that is a bad idea and you’d never do this. But this blog post isn’t about you or what you should do or not do. It is about the user, expert in his domain, that stumbles over the bug as described and has to make some decision on how to fix it. This user cannot use any knowledge from the domain to even understand the mechanics of the bug. You, as the programmer, cannot explain this bug in terms of things the user already knows. You need to be vague (“the software doesn’t store the exact values, just approximations”) or introduce additional complexity (“we store this value by splitting it into a significand and multiply it with a factor consisting of a fixed base and an exponent. We can omit the base and just store the significand and the exponent and express a very large numerical range in just a few bits. Think about how cool that is!”).

Read the last explanation again, from the viewpoint of a salesman. We want to add some prices in the range of a few €, slap a moderate discount on top and call it a day. We don’t care about bits or exponential formulas. That is not part of our domain and it shouldn’t affect our domain or software that works in our domain. Confronting us with technical details reflects negatively on your ability to solve our problems. You seem to burden us with your problems in exchange.

As domain experts, we want only domain-aligned bugs.

The Pure Fabrication Tax

Last week, I attended the Maexle event of the C++ user group in Karlsruhe. The Maexle event is basically a programming contest where your program plays the Mia dice game against other programs. You have to implement a simple network protocol to join games, announce rolls and call bluffs. Your program earns points for every game it has participated and not lost. So there is a strong emphasis on starting early and staying in the game, even if your program doesn’t perform the best.

Since it was an event of the C++ user group, the programming language to be used was C++. I’m certainly no C++ hero and knew I couldn’t compete, so I joined the fun with an espionage role and programmed an observing bot that doesn’t play, but gathers data on the players. I chose Java for the task. My observer was online after two minutes, the first real player joined the server after 20 minutes. It turned out to be written in Python. The first real C++ bot was online after 35 minutes, the last one played its first round after two hours.

I listened closely to the problems the teams around me tackled and noticed something strange: Nobody talked about the actual game (Maexle/Mia). Every task was a technical one. Let’s talk about why that’s a problem.

Three Definitions

Before I dive into the subject, I want to define some terms that I’ll use to help you understand my point. It’s entirely possible to look at the story above and see a bunch of engineers having fun with some engineering tasks.

  • First, I value the economics of my customer. In this case, the customer is a lonely server on the LAN that wants to host some games of Maexle for bots. Like, lots of games. Thousands of games. The customer gives points for early market entry, so time to market is an economic factor (or a key performance indicator, some might say). You can roughly say that being online early means you can make bucks longer. The second key performance indicator is uptime. You want to stay in the game as long as you don’t lose all the time. There are some more KPI, but the two I’ve listed should have a major impact in your programming approach – if you value the economics.
  • Second, I don’t care about tools. A programming language is a tool. A compiler is a tool. Your IDE or text editor is a tool. Use your preferred tooling as long as it suits your needs. That means explicitely as long as it doesn’t actively work against your other values like the customer’s economics. This blog post is definitely not about Java or Python being “better” or “better suited” than C++. They aren’t. The first two bots (observer and player) were programmed by participants that had prior experience with the event. It wasn’t the tool that made them fast, it was the absence of rookie errors in the domain and its technical structure.
  • Third, I will explain my point with the concept of “pure fabrication. Pure fabrication is everything that is not specified by the customer, but necessary to fulfill the specification. It’s the code you write because your customer wants to persist some data. He never ordered you to write SQL statements or “open a connection to the database”, maybe he didn’t even know what a database was. Your customer wanted the data stored somehow. The code that enables you to actually program the storage is “pure fabrication” in terms of the domain. Think of it as a scaffolding holding your domain code in place. If you hire a painter to color your house, he will scaffold the walls to reach every spot with ease. You didn’t hire him to set those structures up, they are just necessary for the task. The difference to most of our code is that the painter removes the scaffolding afterwards.

Pure Fabrication vs. Domain

So, if I would have been a customer on the Maexle event, paying for a competitive Maexle bot, I would be very surprised about the actual construction process. Up to two hours into a three-hours event, my programmer would solve apparently hard and important problems, but not my problems. In fact, I wouldn’t even understand the relation between the attempted problems and my required solution. And I would have to have blind faith for more than half my money that something useable will come out of this.

This is the effect of too much pure fabrication in the programming approach. I’m all for solving hard programming problems, but I’m not interested in solving them over and over again. After some iterations, they become solved problems or, essentially, tools. And I don’t care about tools as long as they get their job done. If your domain problem requires a better tool, then we can put the programming problem on our todo list again. Otherwise, we are not valueing our customer’s economics, we are showing off to our peers.

If you program a simple game of Maexle with a heavy emphasis on time to market and even after the initial ramp up aren’t able to reason about your code using language from the domain (like game, dice, roll, bluff, double and, of course, mia), you are staying in pure fabrication land. That’s the level of programming where it matters if you used an integer or freed that memory. That’s when you pay the Pure Fabrication Tax to the fullest. Because your code now does something valueable in the domain, but the distance between your customer’s language and your code’s language is an hindrance. And this distance will demand its tax with every new feature, every change request and every bug.

Bugs are another area where the distance is measureable. If you can’t explain your bugs to the customer, you’ve made them in the pure fabrication part of your code. If you can never explain your bugs, your domain code is hidden between lines and lines of source code with lots of special characters, brackets and magic numbers. Just imagine your hired painter tries to tell you why your house is now pink instead of white or yellow: “It was a small mishap in the way we constructed the scaffolding, we used an E5 steel beam instead of a rail clamp and forgot to perform a hammer check on it”. The last part is totally made up, but I’m sure that’s how we sound for non-programmers.

Exemptions from the Tax?

What solution would I suggest? I don’t think there is a definite solution to the problem. You can’t go full Domain Driven Design on a three-hour Maexle event. By the time you’ve built your fancy Domain Specific Language to write code with the customer besides you, everybody else has gathered their game points and gone home. If you switch to a language that has a string tokenizer in its standard library, you can speed up your programming, but maybe just produce a bigger heap of slightly less low-leveled pure fabrication code.

I don’t want to advocate a solution. My attempt is to highlight the problem: The Pure Fabrication Tax. Given the right (or wrong) amount of extrinsic (or intrinsic) motivation, we are able to produce a mess in just a few hours without really connecting to the domain we produce the mess for. If we didn’t program a Maexle bot that night, but a poker bot or a chat bot, most if not all of the problems and bugs would have been the same. This is not a domain-specific problem. It’s our problem. We probably just like to pay the tax.

What are your thoughts on the Pure Fabrication Tax? Can you see it? Do you have an idea for a solution approach? Leave your comment below!

Disclaimer

And to counter everybody who thinks I’m just bashing the other participants on the event: I was the first one online on the server, with a task that requires virtually no effort and doesn’t even participate directly in the competition, with tools that solved nearly all my pure fabrication problems and still managed to create a program that contained less than five domain terms and was useless for its intended purpose. I said I value the economics of my customer (even if there was none), so I know that I failed hardest on the event. And I had prior knowledge. There was just nobody to compare my mess to.

A small example of domain analysis

One thing I’ve learned a lot about in recent years is domain analysis and domain modeling. Every once in a while, an isolated piece of code or a separable concept shows me just how much I’ve missed out all the years before. A few weeks ago, I came across such an example and want to share the experience and insight. It’s a story about domain exploration with heightened degree of difficulty – another programmer had analyzed it before and written code that I should replace. But first, let’s talk about the domain.

The domain

04250The project consisted of a machine control software that receives commands and alters the state of a complex electronic circuitry accordingly. The circuitry consists of several digital-to-analog converters (DAC), among other parts. We will concentrate on the DACs in this story. In case you don’t know what a DAC is, let me explain. Imagine a little integrated circuit (IC), the black bug-like electronic parts on a circuit board. On one side, you provide it a digital number in binary representation and on the other side, you’ll get an analog voltage that represents your number. Let’s say you drive a 8-bit DAC and give it a digital zero, the output will be zero volt. If you give the same DAC the number 255, it will output the maximum possible voltage. This voltage is given by the “reference voltage” pin and is usually tied to 5 V in traditional TTL logic circuits. If you drive a 12-bit DAC, the zero will still yield 0 V, while the 255 will now only yield about 0,3 V because the maximum digital number is now 4095. So the resolution of a DAC, given in bits, is a big deal for the driver.

DAC0800How exactly you have to provide that digital number, what additional signals need to be set or cleared to really get the analog voltage is up to the specific type of DAC. So this is the part of behaviour that should be encapsulated inside a DAC class. The rest of the software should only be able to change the digital number using a method on a particular DAC object. That’s our modeling task.

The original implementation

My job was not to develop the machine control software from scratch, but re-engineer it from existing sources. The code is written in plain C by an electronics technician, and it really shows. For our DAC driver, there was a function that took one argument – an integer value that will be written to the DAC. If the client code was lazy enough to not check the bounds of the DAC, you would see all kinds of overflow effects. It worked, but only if the client code knew about the resolution of the DAC and checked the bounds. One task the machine control software needed to do was to translate the command parameters that were given in millivolts to the correct integer number to feed it into the DAC and receive the desired millivolts at the analog output pin. This calculation, albeit not very complicated, was duplicated all over the place.


writeDAC(int value);

My original translation

One primary aspect when doing re-engineering work is not to assume too much and don’t change too many places at once. So my first translation was a method on the DAC objects requiring the exact integer value that should be written. The method would internally check for the valid value range because the object knows about the DAC resolution, while the client code should subsequently lose this knowledge. The original code translated nicely to this new structure and worked correctly, but I wasn’t happy with it. To provide the correct integer value, the client code needs to know about the DAC resolution and perform the calculation from millivolts to DAC value. Even if you centralize the calculation, there are still calls from everywhere to it.


dac.write(int value);

My first relevation

When I finally had translated all existing code, I knew that every single call to the DAC got their parameter in millivolts, but needed to set the DAC integer. Now I knew that the client code never cared about DAC integers at all, it cared about millivolts. If you find such a revelation, act on it – even just to see where it might lead you to. I acted and replaced the integer parameter of the write method on the DAC object with a voltage parameter. I created the Voltage domain type and had it expose factory methods to be easily created from millivolts that were represented by integers in the commands that the machine control software received. Now the client code only needed to create a Voltage object and pass it to the DAC to have that voltage show up at the analog output pin. The whole calculation and checking part happened inside the DAC object, where it belongs.


dac.write(Voltage required);

This version of the code was easy to read, easy to reason about and worked like a charm. It went into production and could be the end of the story.

The second insight

But the customer had other plans. He replaced parts of the original circuitry and upgraded most of the DACs on the way. Now there was only one type of DAC, but with additional amplifier functionality for some output pins (a typical DAC has several output pins that can be controlled by a pin address that is provided alongside the digital number). The code needed to drive the DACs, that were bound to 5 V reference voltage, but some channels would be amplified to double the voltage, providing a voltage range from 0 V to 10 V. If you want to set one of those channels to 5 V output voltage, you need to write half the maximum number to it. If the DAC has 12-bit resolution, you need to write 2047 (or 2048, depending on your rounding strategy) to it. Writing 4095 would yield 10 V on those channels.

Because the amplification isn’t part of the DAC itself, the DAC code shouldn’t know about it. This knowledge should be placed in a wrapper layer around the DAC objects, taking the voltage parameters from the client code and changing it according to the amplification of the channel. The client code would want to write 10 V, pass it to the wrapper layer that knows about the amplification and reduces it to 5 V, passing this to the DAC object that transforms it to the maximum reference voltage (5 V) that subsequently gets amplified to 10 V. This sounded so weird that I decided to review my domain analysis.

It dawned on me that the DAC domain never really cared about millivolts or voltages. Sure, the output will be a specific voltage, but it will be relative to your input in relation to the maximum value. The output voltage has the same percentage of the maximum value as the input value. It’s all about ratios. The DAC should always demand a percentage from the client code, not a voltage. This way, you can actually give it the ratio of anything and it will express this ratio as a voltage compared to the reference voltage. The DAC is defined by its core characteristics and the wrapper layer performs the translation from required voltage to percentage. In case of amplification, it is accounted for in this translation – the DAC never needs to know.


dac.write(Percentage required);

Expressiveness of the new concept

Now we can really describe in code what actually happens: A command arrives, requiring us to set a DAC channel to 8 volt. We create the voltage object for 8 volt and pass it on to the DAC wrapper layer. The layer knows about the 2x amplification and the reference voltage. It calculates that 8 volt will be 80% of the maximum DAC value (80% of 5 V being 4 V before and 8 V after amplification) and passes this information to the DAC object. The DAC object, being the only one to know its resolution, sets 0.8 * maximum_DAC_value to the required register and everything works.

The new concept of percentages decouples the voltage information from the DAC resolution information and keeps both informations where they belong. In fact, the DAC chip never really knows about the reference voltage, either – it’s the circuit around it that knows.

Conclusion

While it is easy to see why the first version with voltages as parameters has its charms, it isn’t modeling the reality accurately and therefor falls short when flexibility is required. The first version ties DAC resolution and reference voltage together when in fact the DAC chip only knows the resolution. You can operate the chip with any reference voltage within a valid range. By decoupling those informations and moving the knowledge about reference voltages outside the DAC object, I modeled the reality more accurate and every requirement finds its natural place. This “natural place finding” is what makes a good model useful for reasoning. In our case, the natural place for the reference voltage was outside the DAC in the wrapper layer. Finding a real name for the wrapper layer was easy, I called it “circuit board”.

Domain analysis is all about having the right abstractions for your model. Your model is suitable for your task when everything fits and falls into place nearly automatically. When names needn’t be found but kind of obtrude themselves from the real domain. The right model (for the given task) feels good and transports a lot of domain knowledge. And domain knowledge is the most treasurable knowledge for any developer.