Recently, a friend recommended the talk “Love your bugs” by Allison Kaptur to me. It’s a good talk with a powerful message: Every bug is a chance to learn. The only prerequisite for this chance: You need to be aware of the bug. You need to see it and then understand and fix it.
What if you don’t see a bug for over a decade?
You can still learn from it – a lot. Here is a story about a bug that lingered in my code for twelve years and had impact on the software results without anybody, including me, noticing.
Let’s say the software is a measurement system that records physical measurement values that cannot be reenacted easily. The samples are measured in rapid succession and then stored away. The measurement results act as a quality quantifier as in “the sample was this good at that time”. The sample’s quality diminishes over time, even with perfect storage conditions. And just to be sure that the measurement is accurate, it isn’t performed once, but twice in quick succession: The measurement device traverses the sample in one direction, recording the values and uses the return path for a second measurement. This results in two measurement streaks that should, in theory, be very similar.
The raw measurement values are aggregated in different ways. In the final report, the maximum value over both measurement streaks is the most prominent and most important aspect. There is nothing exciting or error-prone about finding the maximum value in a value series, even twelve years ago. But I tested the code nonetheless. The whole value aggregation code was under test and had a good test coverage. All tests showed their thumbs up.
But twelve years ago, at the end of a workweek, on a Friday at 17 o’clock, I made a small and easy refactoring to the code. I know this in such detail because of the wonders of version control. Without version control, I probably would have learnt a lot less from this bug.
The code before the refactoring contained two nearly identical sections for the measurement streaks, making it duplicated code. There were only two differences: The first section of code counted from 0 to 99 and stored the values in the first streak’s array. The second section counted from 99 to 0, because the measurement device travels backwards over the sample, and stored the values into the second array.
My refactoring brought both pieces of code together: A new method with two parameters was introduced and called at both places. The first parameter specified a series of positions (0 to 99 or backwards), the second parameter specified the array to store into. All automated tests approved the changes. My manual testing showed no differences. The refactoring was going live.
Twelve years later, while working on a new engine for the measurement device, the customer took the raw values from the journal and performed some manual calculations. The maximum value calculation was wrong. It wasn’t wrong all of the time, but also not correct for all measurements. When it went wrong, it selected the second greatest value or, seldom, the third greatest value as the maximum value.
All the tests still insisted that everything is correct – as it was for twelve long years. There was no other change to the code that could affect the aggregation in any way.
There are two different ways that lead me to find the bug’s origin. The first way was to inspect the trail of commits in the area of code that performs the aggregation. My findings were that the abovementioned refactoring was the most likely culprit. The second way was to write more tests to ensure that yes, the maximum value calculation was indeed correct if given the correct values. Using the examples my customer had examined, I could prove with enough certainty that, given the input of all 200 values, the correct maximum value was returned in all cases. The aggregation therefore wasn’t given all values!
And that lead directly to the bug: The first measurement streak used the same array to store the values as the second streak. During the refactoring, I must have copied and pasted the call to the new method and forgotten to change the second parameter. Of 200 measured values, only 100 got stored permanently. The first 100 values were stored and promptly overwritten as the measurement device returned to its park position. Change the calls to store in both arrays and everything works like intended and like before the refactoring.
How did no test, automated or manual, catch this bug? It turns out that most automated tests were too focussed to indicate a problem. All unit tests that secured the maximum value calculation used given sets of values, they didn’t care about the origin of these value sets. The integration tests that covered the whole measurement process should have raised objections to the refactoring. But they used given sets of measurement values, too. And by chance, the given maximum value was in the second measurement streak. In fact, the given set of measurement values was produced by a loop that just increased a value. The greatest value was always the last one.
The manual tests had the same problem: The simulated measurement device produced measurement values that were either fixed or random. If your whole measurement uses the same fixed values, you don’t see it if half of them went missing. And if your measurement uses random values, you’ll have to pay close attention to a detail that isn’t in focus because it is unchanged. Except that one time when it was changed recently. Remember the change date? Friday afternoon, only minutes before the weekend? Not the best time to manually test a change that is a simple standard refactoring, after all.
So, what have I learnt? First: automated tests, even with great test coverage, aren’t enough. There is so much leeway in the setup of these tests that they will have blind spots without your knowledge. Second: A code review by another human (or even by the same human, some days later) might have caught the bug. It was painfully obvious in hindsight. The problem? “Might have” is an heuristics, just like your automated tests are.
My guess is that mutation testing would have shown the blind spot of the existing tests – among several hundred others. The heuristics is now your trained eye that sifts through the results and separates false positives from true positives.
Right now, I’ve fixed the bug, kept all additional tests and added one more: An integration test that performs a measurement with fixed values and checks nothing else but if all 200 values are stored at the end of the measurement. It’s oddly specific, but conveys this story in an automated fashion.
Oh, and I made another refactoring: I’ve replaced the arrays with collections. Hopefully, I won’t regret this one twelve years in the future. I’ve made it on a Monday.


If we were to choose the holy book of software development, we probably couldn’t agree on one or even a dozen titles. And that is a good thing, because there is no one true way of software development. Clean Code by Robert C. Martin would maybe show up in the late contenders. But if we were to choose the most preachy book of software development, well, I have a favorite. This book is so loud that you cannot ignore it. And it is so opinionated that you’re either nodding your head like a heavy metal fan or writhing in averseness. That’s a good thing, too. Because it forces you to think. Your immediate emotional answer needs support by rational arguments and this book will provide you with ample opportunity to gather arguments for your consent or rejection. What this book probably won’t do is leave you unaffected. When it came out in 2008, it was an instant classic. You could spice up any gathering of software developers by making a statement about this book, be it pro or contra. And even today, ten years later, I would say that even if the loudness is deafening, the clarity of the messages makes this book a worthwhile read for every software developer. My grief with it is foremost that for a book called “Clean Code”, some examples of actual code are quite dirty or even plain wrong. Read it with an active mind and it will be a cornerstone of your professional career. But be careful, it seems that currently printed instances have physical quality problems.
Ever since Extreme Programming hit the (european) scene in 1999, I was curious about Test Driven Development (TDD). I tried automated testing and unit tests whenever I could, read books and later watched videos about the topic. But I never grokked it. It just didn’t work for me and I didn’t even know why. My most feared trap was the one-two-everything syndrome, where you write two simple tests and then have to implement the whole algorithm to fulfill the third test. It was always the third test that broke my rhythm. I tried to exchange experience with TDD practitioners, but their own examples were mostly trivial and my examples always led nowhere (for reference: Try a simple Game of Life in TDD style). I felt dumb and inadequate. When Robert C. Martin (the author of Clean Code) told the developer world that you are either “TDD or not professional” (read
Some years after the GOOS experience, another summer beach holiday was due and as usual, I included a software development book in my luggage. “Domain Driven Design” by Eric Evans came out in 2003 and was praised by some and ignored by most, including me. It took me ten years to finally read it and when I did, it hit me hard. Since my early days as a programmer, I tried to build a meaningful data model with actual types for each program I developed. But it occurred to me that I did it half-heartedly all the time. It shouldn’t stop at a data model, it should be a complete domain model. And for that to work, you need to grok the domain. I review a lot of my code before that insight and always find it funny how I invested effort in my models but more often than not stayed in the technical realm. I cannot say that my programming has changed much from the book, as most concepts meandered through the community since 2003 and were picked up by me mostly under different names. But my software development approach has changed dramatically. I don’t start my thinking from the technical side anymore. And that helps with “business alignment” and all the other magic words that finally have real tangible benefit. And I can now pinpoint when that alignment loosens and employ counter-measures instead of ending up in a special case hell. The best thing was that this book doesn’t require a laptop so I got to sit on the beach that summer with the book in my hands and my head in the clouds. It might be old, but it’s still gold.
I anxiously waited for this book to be printed. Not because I pre-ordered, but because I held talks, workshops and lectures about the topic before the book was available. And I wanted to make sure that I’m not telling nonsense. But Robert C. Martin took his time and delayed the deadline month after month. Then, nearly a year later, the book reached the stores in late 2017. So I would have to wait for my winter holiday to read it. I couldn’t wait and began right away. The book is a slow burner and feels like a long introduction. By the time the central proposition is revealed (and yes, it reads like good unagitated spy thriller at times), you’ve probably already figured it out yourself. And that’s a good thing in my mind, because it feels as if it was your idea and Uncle Bob is just there to nod and congratulate you for your intellect. This book is so many times less preachy than “Clean Code”. If we compare spy thriller literature, this is a John le Carré while Clean Code would be an Ian Fleming (James Bond). “Clean Architecture” is not about programming, it talks about software architecture, a topic that I missed greatly in my early developer years. I liked this book so much
All the other books talk about different aspects of programming, software development or related technical topics. But what about a book that raises a simple question: “Why is IT technology so complicated?”. And gives the answer: “Because we want it this way.”. That’s actually true. In a world without most of the restrictions of the physical world, we were unable to build solutions that actually helped us and came up with machines and software that overwhelmed most people. It needed a whole new generation of “digital natives” until concepts like internal operation modes (e.g. insert vs. overwrite) were intuitively understood. Not because they became simpler, we were just used to the complexity. Alan Cooper described the problem and gave at least hints for solutions in 1999, nearly 20 years ago. That’s the timespan of a generation. This book made me think hard about the status quo I silently had accepted with technology. It just was like it was, what else could there be? If I reveal a tiny bit of different approaches I can think of now, I’m often confronted with incomprehension. Not because I’m particularly clever and everyone else is dumb, but because there seems to be no problem if you’ve grown accustomed to it. If you want to see some of the pain other (older) people feel when interacting with technology and software, read this book. It is an eye-opener to common problems no software developer ever had. It is the first step into the world of UX (user experience), where it’s not as important if the developer feels alright but if the user feels at least adequate. It might be a classic and feel a bit outdated and weak on the solution side, but to understand the problem properly is the first step to appreciate possible answers. And Alan Cooper didn’t stop there. Read his
More by chance, my co-founder stumbled upon “
In 2004, Michael Feathers wrote a book that contains his 20+ years of experience with software development and named it “
Martin Fowler was a very productive author in the late nineties. I’ve read most of his books from this period, if maybe with a few years delay. “
Since the late 80’s, Tom DeMarco and Timothy Lister wrote one book after the other. Each book describes a common business-oriented problem and at least one working solution for it. And yet, the very same problems still persist in the business world. It’s as if nobody reads books. “