tests – Schneide Blog

Don’t shoot your messengers

Writing small, focused tests, often called unit tests, is one of the things that look easy at the outset but turn out to be more delicate than anticipated. Writing a three-lines-of-code unit test in the triple-A structure soon became second nature to me, but there were lots of cases that resisted easy testing.

Using mock objects is the typical next step to accommodate this resistance and make the test code more complex. This leads to 5 to 10 lines of test code for easy mock-based tests and up to thirty or even fifty lines of test code where a lot of moving parts are mocked and chained together to test one single method.

So, the first reaction for a more complicated testing scenario is to make the test more complicated.

But even with the powerful combination of mock objects and dependency injection, there are situations where writing suitable tests seems impossible. In the past, I regarded these code blocks as “untestable” and omitted the tests because their economic viability seemed debatable.

I wrote small tests for easy code, long tests for complicated code and no tests for defiant code. The problem always seemed to be the tests that just didn’t cut it.

Until I could recognize my approach in a new light: I was encumbering the messenger. If the message was too harsh, I would outright shoot him.

The tests tried to tell me something about my production code. But I always saw the problem with them, not the code.

Today, I can see that the tests I never wrote because the “test story” at hand was too complicated for my abilities were already telling me something important.

The test you decide not to write because it’s too much of a hassle tells you that your code structure needs improvement. They already deliver their message to you, even before they exist.

With this insight, I can oftentimes fix the problem where it is caused: In the production code. The test coverage increases and the tests become simpler.

Let’s look at a small example that tries to show the line of thinking without being too extensive:

We developed a class in Java that represents a counter that gets triggered and imposes a wait period on every tenth trigger impulse:

public class CountAndWait {
	private int triggered;
	
	public CountAndWait() {
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}
			this.triggered = 0;
		}
	}
}

There is a lot going on in the code for such a simple functionality. Especially the try-catch block catches my eye and makes me worried when thinking about tests. Why is it even there? Well, here is a starter link for an explanation.

But even without advanced threading issues, the normal functionality of our code is worrisome enough. How many lines of code will a test contain that covers the sleep? Should I really use a loop in my test code? Will the test really have a runtime of one second? That’s the same amount of time several hundred other unit tests require for much more coverage. Is this an economically sound testing approach?

The test doesn’t even exist and already sends a message: Your production code should be structured differently. If you focus on the “test story”, perhaps a better structure emerges?

The “story of the test” is the description of the production code path that is covered and asserted by the test. In our example, I want the story to be:

“When a counter object is triggered for the tenth time, it should impose a wait. Afterwards, the cycle should repeat.”

Nothing in the story of this test talks about interruption or exceptions, so if this code gets in the way, I should restructure it to eliminate it from my story. The new production code might look like this:

public class CountAndWait {
	private final Runnable waiting;
	private int triggered;
	
	public static CountAndWait forOneSecond() {
		return new CountAndWait(() -> {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}			
		});
	}
	
	public CountAndWait(Runnable waiting) {
		this.waiting = waiting;
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			this.waiting.run();
			this.triggered = 0;
		}
	}
}

That’s a lot more code than before, but we can concentrate on the latter half. We can now inject a mock object that attests to how often it was run. This mock object doesn’t need to sleep for any amount of time, so the unit test is fast again.

Instead of making the test more complex, we introduced additional structure (and complexity) into the production code. The resulting unit test is rather easy to write:

class CountAndWaitTest {
	@Test
	@DisplayName("Waits after 10 triggers and resets")
	void wait_after_10_triggers_and_reset() {
		Runnable simulatedWait = mock(Runnable.class);
		CountAndWait target = new CountAndWait(simulatedWait);
		
		// no wait for the first 9 triggers
		Repeat.times(9).call(target::trigger);
		verifyNoInteractions(simulatedWait);
		
		// wait at the 10th trigger
		target.trigger();
		verify(simulatedWait, times(1)).run();
		
		// reset happened, no wait for another 9 triggers
		Repeat.times(9).call(target::trigger);
		verify(simulatedWait, times(1)).run();
	}
}

It’s still different from a simple 3-liner test, but the “and” in the test story hints at a more complex story than “get y for x”, so that might be ok. We could probably simplify the test even more if we got access to the internal trigger count and verify the reset directly.

I hope the example was clear enough. For me, the revelation that test problems more often than not have their root cause in production code is a clear message to improve my ability on writing code that facilitates testing instead of obstructing it.

I don’t shoot/omit my messengers anymore even if their message means more work for me.

Your most precious resource

When I was a young student, living on my own for the first time, my most scarce resource seemed to be money. Money’s too tight to mention was (and probably still is) a motto that every student could understand. So we traded our time for money and participated in experiments and underpaid student assistant jobs.

Soon after I graduated, money began to accumulate. I have a rather frugal lifestyle, so my expenses didn’t suddenly surge. Instead, my perception of time and money shifted: Money isn’t the bottleneck (anymore), time is. Suddenly, time was much more valueable than money and I would gladly pay money if that meant some hours of additional leisure time or one less problem to tend to. It seems that time is the most precious thing there is.

The traditional economic wisdom supports this idea: “Time is money” is true, but the reverse is not: “money is time” doesn’t cut it. The richest man on earth still only has as much time available to him as anybody else.

If time is the most precious resource, the drive to automation as a time-saving effort can be understood directly. Automation also reduces learning costs if you scale horizontally by parallelizing production.

But soon after I had enough money to optimize my time, I hit another resource bottleneck. Suddenly, I had more time on hand than attention to spare. It turns out that attention is the most valueable resource you can spend. It is just entangled enough with time that its hard to distinguish which runs out first. If you reflect a bit, it becomes obvious. The term “to pay attention” is pretty spot on.

Let me take up the thought of automation again, this time in the domain of software development, in the form of automated tests. Here, automation is not in its most profitable shape. You don’t gain much from scaling your tests horizontally. If you don’t change the code, it doesn’t matter if your tests run once or a thousand times in parallel, the result will be the same (except if you run hardware-dependent tests, but even then you probably don’t gain much after covering all hardware variations).

You also don’t gain much from scaling your tests vertically by making them run faster and faster. It sure helps to have them run continuously in the background (think of a user-modded compiler – look into Continuous Test Runners if you are interested), but after one test run per meaningful change, the profit hits a limit.

So why else is automated testing an economically sound practice? My take on it is delegated attention. You write a small software (your test) that augments your attention area onto code that probably fades from your own attention pretty soon. Automated tests provide automated attention in a sustainable manner (except for those tests that cry wolf for no good reason, those are attention sinks and should be removed from your portfolio). Because of the automation, this delegated attention never fades – even after many years, the test has a close eye on “its” code.

If you are a developer, you have automation and zero-cost copying (aka parallelization without upfront costs) intrinsically in your solution portfolio. Look for ways how to make money with those super-powers. Or even better, look for ways to save time. But if you want the best return on investment for your efforts, you should look for ways to expand your attention area.

Do you agree that attention is our most precious resource? What do you do to lower your attention expenses? Perhaps you have experience in the Ops/DevOps area that resonates with this thoughts? Share your opinion by commenting below or writing your own blog entry!

We will pay attention to you.

Twelve years a bug

Recently, a friend recommended the talk “Love your bugs” by Allison Kaptur to me. It’s a good talk with a powerful message: Every bug is a chance to learn. The only prerequisite for this chance: You need to be aware of the bug. You need to see it and then understand and fix it.

What if you don’t see a bug for over a decade?

You can still learn from it – a lot. Here is a story about a bug that lingered in my code for twelve years and had impact on the software results without anybody, including me, noticing.

Let’s say the software is a measurement system that records physical measurement values that cannot be reenacted easily. The samples are measured in rapid succession and then stored away. The measurement results act as a quality quantifier as in “the sample was this good at that time”. The sample’s quality diminishes over time, even with perfect storage conditions. And just to be sure that the measurement is accurate, it isn’t performed once, but twice in quick succession: The measurement device traverses the sample in one direction, recording the values and uses the return path for a second measurement. This results in two measurement streaks that should, in theory, be very similar.

The raw measurement values are aggregated in different ways. In the final report, the maximum value over both measurement streaks is the most prominent and most important aspect. There is nothing exciting or error-prone about finding the maximum value in a value series, even twelve years ago. But I tested the code nonetheless. The whole value aggregation code was under test and had a good test coverage. All tests showed their thumbs up.

But twelve years ago, at the end of a workweek, on a Friday at 17 o’clock, I made a small and easy refactoring to the code. I know this in such detail because of the wonders of version control. Without version control, I probably would have learnt a lot less from this bug.

The code before the refactoring contained two nearly identical sections for the measurement streaks, making it duplicated code. There were only two differences: The first section of code counted from 0 to 99 and stored the values in the first streak’s array. The second section counted from 99 to 0, because the measurement device travels backwards over the sample, and stored the values into the second array.

My refactoring brought both pieces of code together: A new method with two parameters was introduced and called at both places. The first parameter specified a series of positions (0 to 99 or backwards), the second parameter specified the array to store into. All automated tests approved the changes. My manual testing showed no differences. The refactoring was going live.

Twelve years later, while working on a new engine for the measurement device, the customer took the raw values from the journal and performed some manual calculations. The maximum value calculation was wrong. It wasn’t wrong all of the time, but also not correct for all measurements. When it went wrong, it selected the second greatest value or, seldom, the third greatest value as the maximum value.

All the tests still insisted that everything is correct – as it was for twelve long years. There was no other change to the code that could affect the aggregation in any way.

There are two different ways that lead me to find the bug’s origin. The first way was to inspect the trail of commits in the area of code that performs the aggregation. My findings were that the abovementioned refactoring was the most likely culprit. The second way was to write more tests to ensure that yes, the maximum value calculation was indeed correct if given the correct values. Using the examples my customer had examined, I could prove with enough certainty that, given the input of all 200 values, the correct maximum value was returned in all cases. The aggregation therefore wasn’t given all values!

And that lead directly to the bug: The first measurement streak used the same array to store the values as the second streak. During the refactoring, I must have copied and pasted the call to the new method and forgotten to change the second parameter. Of 200 measured values, only 100 got stored permanently. The first 100 values were stored and promptly overwritten as the measurement device returned to its park position. Change the calls to store in both arrays and everything works like intended and like before the refactoring.

How did no test, automated or manual, catch this bug? It turns out that most automated tests were too focussed to indicate a problem. All unit tests that secured the maximum value calculation used given sets of values, they didn’t care about the origin of these value sets. The integration tests that covered the whole measurement process should have raised objections to the refactoring. But they used given sets of measurement values, too. And by chance, the given maximum value was in the second measurement streak. In fact, the given set of measurement values was produced by a loop that just increased a value. The greatest value was always the last one.

The manual tests had the same problem: The simulated measurement device produced measurement values that were either fixed or random. If your whole measurement uses the same fixed values, you don’t see it if half of them went missing. And if your measurement uses random values, you’ll have to pay close attention to a detail that isn’t in focus because it is unchanged. Except that one time when it was changed recently. Remember the change date? Friday afternoon, only minutes before the weekend? Not the best time to manually test a change that is a simple standard refactoring, after all.

So, what have I learnt? First: automated tests, even with great test coverage, aren’t enough. There is so much leeway in the setup of these tests that they will have blind spots without your knowledge. Second: A code review by another human (or even by the same human, some days later) might have caught the bug. It was painfully obvious in hindsight. The problem? “Might have” is an heuristics, just like your automated tests are.

My guess is that mutation testing would have shown the blind spot of the existing tests – among several hundred others. The heuristics is now your trained eye that sifts through the results and separates false positives from true positives.

Right now, I’ve fixed the bug, kept all additional tests and added one more: An integration test that performs a measurement with fixed values and checks nothing else but if all 200 values are stored at the end of the measurement. It’s oddly specific, but conveys this story in an automated fashion.

Oh, and I made another refactoring: I’ve replaced the arrays with collections. Hopefully, I won’t regret this one twelve years in the future. I’ve made it on a Monday.

Integration Tests with CherryPy and requests

CherryPy is a great way to write simple http backends, but there is a part of it that I do not like very much. While there is a documented way of setting up integration tests, it did not work well for me for a couple of reasons. Mostly, I found it hard to integrate with the rest of the test suite, which was using unittest and not py.test. Failing tests would apparently “hang” when launched from the PyCharm test explorer. It turned out the tests were getting stuck in interactive mode for failing assertions, a setting which can be turned off by an environment variable. Also, the “requests” looked kind of cumbersome. So I figured out how to do the tests with the fantastic requests library instead, which also allowed me to keep using unittest and have them run beautifully from within my test explorer.

The key is to start the CherryPy server for the tests in the background and gracefully shut it down once a test is finished. This can be done quite beautifully with the contextmanager decorator:

from contextlib import contextmanager

@contextmanager
def run_server():
    cherrypy.engine.start()
    cherrypy.engine.wait(cherrypy.engine.states.STARTED)
    yield
    cherrypy.engine.exit()
    cherrypy.engine.block()

This allows us to conviniently wrap the code that does requests to the server. The first part initiates the CherryPy start-up and then waits until that has completed. The yield is where the requests happen later. After that, we initiate a shut-down and block until that has completed.

Similar to the “official way”, let’s suppose we want to test a simple “echo” Application that simply feeds a request back at the user:

class Echo(object):
    @cherrypy.expose
    def echo(self, message):
        return message

Now we can write a test with whatever framework we want to use:

class TestEcho(unittest.TestCase):
    def test_echo(self):
        cherrypy.tree.mount(Echo())
        with run_server():
            url = "http://127.0.0.1:8080/echo"
            params = {'message': 'secret'}
            r = requests.get(url, params=params)
            self.assertEqual(r.status_code, 200)
            self.assertEqual(r.content, "secret")

Now that feels a lot nicer than the official test API!

Award your tests

If you want to know more about the gains you get from your tests, you might consider to tag them with their achievements (or failures). Here are some examples of typical awards.

We can probably agree on automated tests decreasing the likelihood of defects in a software. And we all have observed that writing automated tests can mean serious effort, especially if the software under test wasn’t designed with tests in mind. So we can say that automated tests are an investment in low defect rates, among other things. But like any other investment, it isn’t an uniform function of spending money to gain returns, but a portfolio of better and worse little investments. Each test can be seen as a separate micro-investment that should bring a separate micro-return.

A portfolio of micro-investments

We know from experience that this isn’t true. There are tests that have avoided a lot of embarrassment and there are tests that are a constant source of annoyance. And we aren’t very good at predicting which test will turn out which way, because we would avoid writing the frail ones otherwise. But there is a tool that can help us: statistics. In your financial investments, you keep track of your stock prices with elaborate charts and tables. How do you keep track of your test investment?

Track the investments

Can you say which tests broke most often? Which tests prohibited real bugs and which just cost you additional effort without gain? Do you know your most erratic tests? Given a decent amount of tests, it’s not possible to keep track of these characteristics manually. But what if you could look at the test and see its merits and evil deeds of the past? You don’t have a overall statistic, but an individual answer for every test you examine. That would be a first step. And it is relatively easy to make: Award your tests.

Reward your angels

There is the notion of “putting angels on your shoulders” when you write tests. You should listen to your angels. But you should reward those angels that are really helpful and restrain the ones that keep pointing out nonsense. You should just put a medal on your angel test whenever it was helpful and a warning badge whenever it misled you. After a while, you will see that your highest decorated tests are the really good investments. Let’s take a look at typical awards:

@bug: This test stands guard against a certain bug that once lingered in your system and is now fixed. The test didn’t actually catch the bug, but was introduced afterwards to protect against regression. The award is normally given with some kind of bug issue number, like @bug(SYS-132). It’s a relatively low award, because the test doesn’t have to do anything heroic yet, but it is a good indicator for fellow developers to trust this test if it cries wolf: It might actually warn you against recurring regression.
@regression: If a test breaks and points out an actual regression in the system (something worked before and isn’t working any more), it was a good investment. You want to immediately award it for being helpful. This is a very good protection against ever modifying or deleting this test. A @regression veteran should be listened to if it breaks. If a test collects this award several times, it might hint you to a poorly designed part in your code.
@lifesaver: You don’t give out this award easily. It’s the highest award a test can get. If a test points out a regression that would lead to a mission critical failure in production, this test just saved your ass. And if you weren’t even aware that your change would cause such trouble, the test was the best investment in code you can possibly make. Praise it! And if you happen to break a @lifesaver test with a change, chances are you’ve just done something very stupid (again).

These are only three examples of awards you might want to introduce into your project. But there are also some common warning badges you can hand out to troublesome tests:

#erratic: If a test fails with no apparent reason or connection to your recent change, you might label it as erratic (or hysteric, the slightly more troublesome variant). An erratic test loses its credibility to point to real problems and will turn into a burden. This was probably a bad investment. If a test collects several erratic badges, you should consider to put it into quarantine, because unreliable tests tend to spread ignorance about any sort of test failure very fast. Quarantining your tests is fairly easy: you move it into a separate test harness. Your primary test harness has the ability to break the build when unstable, but your secondary test harness just reports failures without breaking the build. You degrade your erratic tests to the reserve guard.
#fragile: If you need to make changes in a test because the production code changed slightly, you are temporarily compromising your safety net. A typical case of a fragile test would be a mock object based test that doesn’t mock a new method call or expects a call that doesn’t happen any more. The test reports a failure, but there isn’t a problem really (well, there’s some violation of the open/closed principle). You have to fix the tests instead of your production code. If you give out this badge often, you have a test harness with high maintenance cost. If your investment is still positive is you decision, but you can now quantify on the stability aspect of your tests compared to the stability aspect of your production code.
#cryptic: If, for whatever reason, you happen to read across a test and then read it again, and again just to understand what’s going on, you might hand out this warning badge. The test isn’t meant to be read, it’s written to be a mystery. In my book, that’s bad style and should be flagged as such. If a test collects several cryptic badges, that means that several developers spent time to grasp this little piece of code. If the test hasn’t earned awards yet, you’re looking at a typical waste of time. It might be poorly invested time.

Append the meta-data

If you start to label and award your tests, don’t forget to add appropriate meta-data like issue numbers, links to the project wiki and, most important, the date of bestowal. Every award fades with time, so you should be able to instantly see when they were added. A test that was quite erratic in its youth can be a great lifesaver right now. If you use a proper version control system, this information can be queried afterwards, but with greater effort. I think its wise to put all relevant information where it can be accessed immediately.

And where to put it exactly? The documentation comment (JavaDoc, Doxygen) above a test is perfectly suited for this kind of information. Have one line for every type of award and list the meta-data of all occurrences in this line. Or, if you want to give a comment with each award, you might use one line per award. But try to make the awards and the test code inseparable and readable as one piece of information, so that your awards survive refactoring and other editing efforts.

Reap your profits

After some time, you might want to visualize and rank your test awards. There’s no magic here, just grep over your test code a few times. What you make of your findings is up to you. It certainly helps to evaluate the investments that go into your tests.

Do you use a similar tagging scheme for your tests or your production code? Let us know in the comment section!

TDD myths: the problems

I take a look at some (in my experience) problems/misconceptions with TDD:
100% code coverage is enough, Debugging is not needed, Design for testability, You are faster than without tests

100% code coverage is enough

Code coverage seems to be a bad indicator for the quality of the tests. Take the following code as an example:

public void testEmptySum() {
  assertEquals(0, sum());
}

public void testSumOfMultipleNumbers() {
  assertEquals(5, sum(2, 3));
}

Now take a look at the implementation:

public int sum(int...numbers) {
  if (numbers.length == 0) {
    return 0;
  }
  return 5;
}

Baby steps in TDD could lead you to this implementation. It has 100% code coverage and all tests are green. But the implementation isn’t finished at all. Our experiment where we investigated how much tests communicate the intend of the code showed flaws in metrics like code coverage.

Debugging is not needed

One promise of TDD or tests in general is that you can neglect debugging. Even abandon it. In my experience when a test goes red (especially an integration test) you sometimes need to fire up the debugger. The debugger helps you to step through code and see the actual state of the system at that time. Tests treat code as a black box, an input results in an output. But what happens in between? How much do you want to couple your tests to your actual implementation steps? Do we need the tests to cover this aspect of software development? Maybe something along the lines as shown in Inventing on principle where the computer shows you the immediate steps your code takes could replace debugging but tests alone cannot do it.

Design for testability

A noble goal. But are tests your primary client? No. Other code is. Design for maintainability would be better. You will need to change your code, fix it, introduce new features, etc. Don’t get me wrong: You need tests and you need testability. But how much code do you write specifically for your tests? How much flexibility do you introduce because of your tests? What patterns do you use just because your tests need them? It’s like YAGNI for code exposure for tests. Code specifically written only for tests couples your code to your tests. Only things that need to be coupled should be. Is the choice of the underlying data structure important? Couple it, test it. If it isn’t, don’t expose it, don’t write a getter. Don’t break the information hiding principle if you don’t need to. If you couple your tests too much to your code every little change breaks your tests. This hinders maintenance. The important and difficult design question is: what is important. Test this.

You are faster than without tests

Some TDD practitioners claim that they are faster with TDD than without tests because the bugs and problems in your code will overwhelm you after a certain time. So with a certain level of complexity you are going faster with TDD. But where is this level? In my experience writing code without tests is 3x-4x faster than with TDD. For small applications. There are entire communities where many applications are written without or with only a few tests. But I wouldn’t write a large application without tests but at least my feeling is that in many cases I go much slower. Cases where I feel faster are specification heavy. Like parsing or writing formats, designing an algorithm or implementing a scientific formula. So the call is open on this one. What are your experiences? Do you feel slowed down by TDD?

Thoughts about TDD

Thoughts and links about test driven development

First a disclaimer: I think tests are a hallmark for professional software development, I like to write tests before the implementation but that’s not always easy or simple (for the difference please refer to Simple made easy). I find it hard to grasp test driven development (TDD) though. The difference between test first and test driven lies in the intention: in both cases tests are written before any implementation code but in TDD the tests drive the design of your implementation.

The problem with opinions of TDD is there are mostly extreme positions: some think “TDD is the (next) holy grail” or the ones which dismissed it. Though reading between the lines there are great discussions about how to do it and what problems arise. Many people (me included) are really trying to get value from TDD. Testing should be fun.
One way in letting the tests drive the way you develop is proposed by Uncle Bob: transformation priority premise. He proposes a list of transformations which introduce new or replace existing constructs like replacing a constant by a variable or adding more logic and gives them a priority. Only if you cannot use a high priority transformation to get the test to pass you look at a transformation with a lower priority.
But how do you determine what you should test next or even which is the first test?
Taking the typical Conway’s game of life kata as an example one thing struck me: I could only get the TDD to work smoothly when I started with the data structure. But why that? Naturally I start with the algorithm (in this case the rules) and write the first test for it. But upon further inspection of the problem and deeper (domain) knowledge it seems the data structure is way more important for solving this kata. So you need to know where the journey goes along beforehand, not every step you will take but the big picture: first the data structure, then the rules in this example. Maybe you should start with the integrations or the functional tests and break them down into units.
What are your experiences using TDD? Do you use or want to use TDD?

A tale of scrap metal code – Part III

The third and last part of a series about the analysis of a software product. This part tries to give some rules of thumb on how to avoid failure like in this project.

In the first part of this tale about an examined software project, I described the initial situation and high-level observations about the project. The second part dove into the actual source code and pointed out what’s wrong on this level. This part will summarize everything and give some hints on how to avoid creating scrap metal code.

About the project

If you want to know more about the project, read the first part of this tale. In short, the project looked like a normal Java software, but unfolded into a nightmare, lacking basic requirements like tests, dependency management or continuity.

A summary of what went wrong

In short, the project failed in every respect except being reasonable functional and delivering business value to the customers. I will repeat this sentence soon, but let’s recall the worst parts again. The project had no tests. The project modularization was made redundant by circular dependencies and hardwired paths. No dependency management was in place, neither through the means of a build tool nor by manual means (like jar versions). The code was bloated and overly complex. The application’s data model was a widely distributed network of arbitrary collections with implicit connections via lookup keys. No effort was spent to grasp exception handling or multithreading. The cleverness was rather invested into wildcard usage of java’s reflection API capabilities. And when the cleverness of the developer was challenged, he resorted to code comments instead of making the code more accessible.

How can this be avoided?

First, you need to know exactly what it is you want to avoid. Let me repeat that the project was sold to happily paying customers who gained profit using it. Many software projects fail to deliver this utmost vital aspect of virtually every project. The problem with this project isn’t apparent yet, because it has a presence (and a past). It’s just that it has no future. I want to give some hints how to develop software projects with a future while still delivering business value to the customer.

Avoid the no-future trap

The most important thing to make a project future-proof is to restrain yourself from taking shortcuts that pay off now and need to be paid back later. You might want to believe that you don’t need to pay back your technical debts (the official term for these shortcuts) or that they will magically disappear sometimes, but both scenarios are quite unlikely. If your project has any chance to keep being alive over a prolonged amount of time, the technical debts will charge interest.

Of course you can take shortcuts to meet tight deadlines or fit into a small budget. This is called prototyping and it pays off in terms of availability (“time to market”) and scope (“trial version”). Just remember that a prototype isn’t meant for production. You definitely need the extra time and/or budget to fix the intentional shortcomings in the code. You won’t feel the difference right now (hey, it works, what else should it do?), but it will return with compound interest in a few years. The project in this tale was dead after three years. The technical debt had added up beyond being repairable.

Analyzing technical debts

It’s always easy to say that you should “do it right” in the first place. What could the developer for project at hands have done differently to be better off now?

1. Invest in automated tests

When I asked why the project has no tests at all, the developer replied that “it surely would be better to have tests, yet there was no time to write them“. This statement implies that tests take more time to write than they save acting as a guideline and a safety net. And it is probably true for every developer just starting to write tests. You will feel uncomfortable, your tests will be cumbersome and everything will slow down. Until you gain knowledge and experience in writing tests. It is an investment. It will pay off in the future, not right now. If you don’t start now, there will be no future payout. And even better: now your investment, not your debt, will accumulate interest. You might get used to writing tests and start being guided by them. They will mercilessly tell you when your anticipated solution is overly complex. And they will stay around and guard your code long after you forgot about it. Tests are a precaution, not an afterthought.

2. Review and refactor your code

The project has a line count of 80,000 lines of ugly code. I’m fairly confident that it can be reduced to 20,000 lines of code without losing any functionality. The code is written with the lowest possible granularity, with higher concepts lurking everywhere, waiting to be found and exposed. Of course, you cannot write correct, concise and considerate code on your first attempt. This is why you should revisit old code in a recurring manner. If you followed advice number one and brought your tests in place, you can apply every refactoring of the book’s catalog and still be sure that you rather fixed this part instead of breaking it. Constantly reviewing and refactoring your code has the additional advantage of a code base that gets more proficient alongside yourself. There are no “dark regions” (the code to never be read or touched again, because it hurts) if you light them up every now and then. This will additionally slow you down when you start out, but put you on afterburner when you realize that you can rescue any code from rotting by applying the refactoring super-powers that you gained through pratice. It’s an investment again, aiming at midterm return of investment.

3. Refrain from clever solutions

The project of this tale had several aspects that the developer thought were “clever”. The only thing with “clever” is that it’s a swearword in software programming. Remember the clever introduction of wildcard runtime classloading to provide a “plugin mechanism”? Pure poison if you ever wanted your API to be stable and documented, just like a plugin interface should be. Magic numbers throughout your code? Of course you are smart enough to handle this little extra obfuscation. Except when you aren’t. You aren’t sure how exception handling works? Be clever and just “empty catch Exception” everywhere the compiler points you to. In this project, the developer knew this couldn’t be the right solution. Yet, he never reviewed the code when he one day knew how to handle exceptions in a meaningful manner. Let me rest my case by stating that if you write your code as clever as you can handle it, you won’t be able to read it soon, as reading code is harder than writing it.

Summary

Over the course of this tale, you learned a lot about a failed project. In this article, I tried to give you some advice (in the form of three basic rules) on how this failure could probably have been avoided. Of course, the advice isn’t complete. There is much more you could do to improve yourself and your project. Perhaps the best self-training program for developer skills is the Clean Code Developer Initiative (it’s mostly german text yet, so here is an english blog post about it), based upon the book “Clean Code” by Robert C. Martin (Uncle Bob).

Invest in the future of your project and stay clean.

	Impressions of Our C… on Computing gets fuzzy again (AI…
	Computing gets fuzzy… on Impressions of Our Current AI…
	mariuselvert on Calculating the Number of Segm…
	Anonymous on Calculating the Number of Segm…
	Anonymous on Avoiding Code Style Discu…