Ignoring YAGNI – 12 years later

Fourteen years ago, we started to build a distributed system to gather environmental data in an automated 24/7 fashion. Our development process was agile and made heavy use of short iterations (at least that was what they were then, today they are normal-sized). So the system grew with many small new features and improvements, giving the customer immediate business value.

One part of the system was the task scheduler. Because the system had to run 24/7 and be mostly independent of human interaction, the task scheduler’s job was to launch different measurement processes at the right time. We had done extensive domain crunching and figured out that all tasks follow a rigid time regime like “start every 10 minutes” or “start every hour”, regardless of the processes’ runtime. This made the scheduler rather easy to develop. You should keep it simple, after all.

But another result of the domain crunching bothered us: The schedule of all tasks originated from the previous software system, built 30 years ago and definitely unfit for the modern software world. The schedules weren’t really rooted in the domain, they all had technical explanations like “the recording of the values is done sequentially and takes up to 8 minutes, we can’t record them more often than that”. For our project, the measurement hardware was changed, so our recording took a couple of milliseconds. We could store and display the values continuously, if the need arises.

So we discussed the required simpleness or complexity of the task scheduler with the customer and they seemed pleased with all the new possibilities. But they decided that the current schedules were sufficient and didn’t need to be changed. We could go ahead and build our simple task scheduler.

And this is when we decided to abandon KISS and make the task scheduler more powerful than needed. “But you ain’t going to need it!” was the enemy. Because we knew that the customer will inevitably come around and make use of their new possibilities. We knew that if we build the system with more complexity, we would be the heroes in a future time, wearing a smug smile and telling the customer: “We’ve already built this, you can use it right away”. Oh how glorious this prospect of the future shone! Just a few more thoughts going into the code and we’re set for a bright future.

Let me tell you a few details about the “few more thoughts” with the example of an “every hour” task schedule. Instead of hard-coding the schedule, we added a configuration file with a cron-like expression for the schedule. You could now leverage the power of cron expressions to design your schedule as you see fit. If you wanted to change the schedule from “every hour” to “every odd minute and when the pale moon rises”, you could do so. The task scheduler had to interpret the configuration file and make sure that tasks don’t pile up: If you schedule a task to run “every minute”, but it takes two minutes to process, you’ve essentially built a time-bomb for your system load. This must not be feasible.

But it doesn’t stop there. A lot of functionality, most of which wasn’t even present or outlined at the time of our decision, relies implicitly on that schedule. Two examples: There are manual operations that must not be performed during the execution of the task. The system goes into a “protected state” around the task execution. It disables these operations a few minutes before the scheduled execution and even some time afterwards. If you had a fixed schedule of “every hour”, you could even hard-code the protected timespan. With a possible dynamic schedule, you have to calculate your timespan based on the current schedule and warn your operator if it isn’t possible anymore to find a time slot to even perform the manual operation.
The second example is a functionality that supervises the completeness of the recorded data. The problem is: This functionality is on another computer (it’s a distributed system, remember?) that doesn’t know about the configuration files. To be able to scan the data archive and say “everything that should be there, is there”, the second computer needs to know about all the schedules of the first computers (there are many of them, recording their data on their own schedules and transferring it to the second computer). And if a schedule changes, the second computer needs to take the change into account and scan the data archive for two areas: one area with the old schedule and one area with the new schedule. Otherwise, there would be false alarms.

You can probably see that the one decision to make the task scheduler a little more complex and configurable as required had quite some impact on the complexity of other parts of the system. But this investment will be worth it as soon as the customer changes the schedule! The whole system is programmed, tested and documented to facilitate schedule changes. We are ready!

It’s been over twelve years since we wrote the first line of code for the more complex implementation (I’ve checked the source control logs). The customer hasn’t changed a single bit of the schedule yet. There are over twenty “first computers” and they all still run the same task schedule as initially planned. Our decision did nothing but to add accidental complexity to the system. It probably introduced some bugs along the way, too. It certainly increased our required level of awareness (“hurdle of understanding”) during the development of features that are somewhat coupled with the task schedule.

In short: It’s been a disaster. The smug smile we thought we’d wear has been replaced by a deep frown. Who wrote all that mess? And why? It wasn’t the customer, it was us. We will never be going to need it.

Quick and dirty is a skill

This post is a reminder, first and foremost to myself, that it is important to work according to what’s essential right now: quality or speed.

Being clean coders we build our software based on quality and reflect on how we do it. We set internal standards in code and UIs, we write tests, we polish.
But there are times when all this focus on quality is obstructive. Times when we need to learn something. For example: at a start of a project when fundamental questions like is it feasible, how should this interaction work, what’s the right order of steps are unanswered, learning needs to be as cheap as possible.
Here quick and dirty is important. The problem is our ego. We want to polish it, we want to build real software with a sound structure. But quality takes time. The problem is quality is not important when answering the fundamental project questions, learning is. May be a mockup in Powerpoint is enough? (not even writing code? ugh). A simple sketch on a piece of paper. Or maybe just a quick demo hacked together in an afternoon.
I know these suggestions may insult our pride. But we need to focus on what’s important: sometimes that’s quality, sometimes that’s speed.
Decades ago when I started coding, quick and dirty wasn’t a problem. Everything I wrote was quick and dirty. I was learning all the time. Over time I got better at developing software, structuring applications and building robust systems. But quick and dirty was lost along the way.
When you write something for the purpose of learning it can happen that you are wrong and all the code has to be thrown away. If it was just 2 hours patching something together that’s okay, but what if you spent a whole week? Just like writing quality software, quick and dirty is a skill in itself and as with other skills we need to practice it.
But beware this is not only a problem at the start of a project: often as developers we tend to overthink something, we plan for every possible outcome, imagine scenarios with weirdly acting users or systems. This is the time to stop and implement something to learn. To get feedback. Not to overanalyse or overdesign. Just release something and test it with real users, it doesn’t need to be part of the software in production, just use a demo or a staging environment. But if you need to learn something, focus on that, not on quality.

There should be a stakeholder for simplicity

You have stakeholders for your product idea, you have stakeholders for your clients and their ideas, you have stakeholders for clean code, for quality, for object oriented programming, … I think we need also stakeholders for simplicity.

Usually in every project ideas are abundant. Your client has many ideas what features he wants. You and your coworkers have a rich background in the problem domain and the technologies you use so that solutions are not sparse. But often this experience and confidence leads to abandoning an important trait: simplicity.
Most of the time you are focused on getting good solutions for the problems that arise. From past experiences with clients who could not exactly explain what they really need (which is different from what they want most of the time) you tend to include a little extra flexibility in your system. Maybe you need this and that variation some time in the future but at this very moment it’s a guess at best. And often this guess costs you. You could argue that you are investing into your project. But how many times did this investment really pay off? And how many times did you have a hard time just because you did not want to make restrictions? At first it seems like a little work. But with the next feature you have to continue supporting your little ‘extra’. Over time it infuses your system like leaven does it with bread. In the end it is more work to make it simple than to keep it simple.
So with every project you approach there should be a stakeholder for simplicity. Someone who focusses on simple solutions. Sometimes you have to cut a bit away from the feature or you have to view the problem from a different angle. Some other time you have to dig deeper into the problem domain or you need real data from your users (which is always better than what you can make up in your mind). Finding simple solutions is work but it is much more work to support your over-engineered solutions.