Use duplication to make your single source of truth

Having a single source of truth is one of the big tenets of programming. It is easy to see why. If you want to figure out something about your program, or change something, you just go to the corresponding source.

One of the consequences of this is usually code duplication, but things can get a lot more complicated very fast, when you think of knowledge duplication or fragmentation, instead of just code. Quite unintuitively, duplication can actually help in this case.

Consider the case where you serialize an enum value, e.g. to a database or a file. Suddenly, you have two conceptual points that ‘know’ about the translation of your enum literals to a numeric or string value: The mapping in your code and the mapping implicitly stored in the serialization. None of these two points can be changed independently. Changing the serialized content means changing the source code and vice-versa.

You could still consider your initial enum to value mapping the single source of truth, but the problem is that you can easily miss disruptive changes. E.g. if you used the numeric value, just reordering the enumerated will break the serialization. If you used the text name of the enum, even a simple rename refactoring will break it.

So to deal with this, I often build my own single source of truth: a unit test that keeps track of such implicit value couplings. That way, the test can tell you when you are accidentally breaking things. Effectively, this means duplicating the knowledge of the mapping to a ‘safe’ space: One that must be deliberately changed, and resists accidentally being broken. And then that becomes my new single source of truth for that mapping.

Highlight Your Assumptions With a Test

There are many good reasons to write unit tests for your code. Most of them are abstract enough that it might be hard to see the connection to your current work:

  • Increase the test coverage
  • Find bugs
  • Guide future changes
  • Explain the code
  • etc.

I’m not saying that these goals aren’t worth it. But they can feel remote and not imperative enough. If your test coverage is high enough for the (mostly arbitrary) threshold, can’t we let the tests slip a bit this time? If I don’t know about future changes, how can I write guidelining tests for them? Better wait until I actually know what I need to know.

Just like that, the tests don’t get written or not written in time. Writing them after the fact feels cumbersome and yields subpar tests.

Finding motivation by stating your motivation

One thing I do to improve my testing habit is to state my motivation why I’m writing the test in the first place. It seemed to boil down to two main motivations:

  • #Requirement: The test ensures that an explicit goal is reached, like a business rule that is spelled out in the requirement text. If my customer wants the value added tax of a price to be 19 % for baby food and 7 % for animal food, that’s a direct requirement that I can write unit tests for.
  • #Bugfix: The test ensures the perpetual absence of a bug that was found in production (or in development and would be devastating in production). These tests are “tests that should have been there sooner”. But at least, they are there now and protect you from making the same mistake twice.

A code example for a #Requirement test looks like this:

/**
 * #Requirement: https://ticket.system/TICKET-132
 */
@Test
void reduced_VAT_for_animal_food() {
    var actual = VAT.addTo(
        new NetPrice(10.00),
        TaxCategory.animalFood
    );
    assertEquals(
        new GrossPrice(10.70),
        actual
    );
}

If you want an example for a #Bugfix test, it might look like this:

/**
 * #Bugfix: https://ticket.system/TICKET-218
 */
@Test
void no_exception_for_zero_price() {
    try {
        var actual = VAT.addTo(
            NetPrice.zero,
            TaxCategory.general
        );
        assertEquals(
            GrossPrice.zero,
            actual
        );
    } catch (ArithmeticException e) {
        fail(
            "You messed up the tax calculation for zero prices (again).",
            e
        );
    }
}

In my mind, these motivations correlate with the second rule of the “ATRIP rules for good unit tests” from the book “Pragmatic Unit Testing” (first edition), which is named “Thorough”. It can be summarized like this:

  • all mission critical functionality needs to be tested
  • for every occuring bug, there needs to be an additional test that ensures that the bug cannot happen again

The first bullet point leads to #Requirement-tests, the second one to #Bugfix-tests.

An overshadowed motivation

But recently, we discovered a third motivation that can easily be overshadowed by #Requirement:

  • #Assumption: The test ensures a fact that is not stated explicitly by the requirement. The code author used domain knowledge and common sense to infer the most probable behaviour of the functionality, but it is a guess to fill a gap in the requirement text.

This is not directly related to the ATRIP rules. Maybe, if one needs to fit it into the ruleset, it might be part of the fifth rule: “Professional”. The rule states that test code should be crafted with care and tidyness, that it is relevant even if it doesn’t get shipped to the customer. But this correlation is my personal opinion and I don’t want my interpretation to stop you from finding your own justification why testing assumptions is worth it.

How is an assumption different from a requirement? The requirement is written down somewhere else, too and not just in the code. The assumption is necessary for the code to run and exhibit the requirements, but it’s only in the code. In the mind of the developer, the assumption is a logical extrapolation from the given requirements. “It can’t be anything else!” is a typical thought about it. But it is only “written down” in the mind of the developer, nowhere else.

And this is a perfect motivation for a targeted unit test that “states the obvious”. If you tag it with #Assumption, it makes it clear for the next developer that the actual content of the corresponding coded fact is more likely to change than other facts, because it wasn’t required directly.

So if you come across an unit test that looks like this:

/**
 * #Assumption: https://ticket.system/TICKET-132
 */
@Test
void normal_VAT_for_clothing() {
    var actual = VAT.addTo(
        new NetPrice(10.00),
        TaxCategory.clothing
    );
    assertEquals(
        new GrossPrice(11.90),
        actual
    );
}

you know that the original author made an educated guess about the expected functionality, but wasn’t explicitly told and is not totally sure about it.

This is a nice way to make it clear that some of your code is not as rigid or expected as other code that was directly required by a ticket. And by writing an unit test for it, you also make sure that if anybody changes that assumed fact, they know what they are doing and are not just guessing, too.

Every Unit Test Is a Stage Play – Part V

In this series about describing unit tests with the metaphor of a stage play that tells short stories about your system, we already published four parts:

Today, we look at the critics.

An integral part of the theater experience is the appraisal of the critics. A good review of a stage play can multiply the viewer count manyfold, while a bad review can make avid visitors hesitate or even omit the visit.

In our world of source code and unit tests, we can define the whole team of developers as critics. If they aren’t fond of the tests, they will neglect or even abandon them. Tests need to prove their worth in order to survive.

Let us think a little deeper about this aspect: Test code is evaluated more critically than any other source code! Normal production code can always claim to be wanted by the customer. No matter how bad the production code may look and feel like, it cannot just be deleted. Somebody would notice and complain.

Test code is not wanted by the customer. You can delete a test and it would not be noticed until a regression bug raises the question why the failing functionality wasn’t secured by a test. So in order to survive, test code needs a stakeholder inside the development team. Nobody outside the team cares about the test.

There is another difference between production code and test code: Production code is inherently silent during development. In contrast to this, test code is programmed to drive the developer’s attention to it in case of a crisis. It is code that tries to steal your focus and cries wolf. It is the messenger that delivers the bad news.

Test code is the code you’ll likely read in a state of irritation or annoyance.

Think about a theater critic that visits and rates a stage play in a state of irritation and annoyance. That wanted to do something else instead and probably has a deadline to meet for that other thing. His opinion is probably biased towards a scathing critique.

We talked about several things that test code can do to be inviting, concise, comprehendible and plausible. What it can’t do is to be entertaining. Test code is inherently boring. Every test is a short story that seems trivial when seen in isolation. We can probably anticipate the critique about such a play: “it meant well, but was ultimately forgettable”.

What can we do to make test code more meaningful? To convey its impact and significance to the critics?

In the world of theater (and even more so: movies), one strategy is to add “big names” to the production: “From the director of Known Masterpiece” or “Part III of the Successful Series”.

Another strategy is to embellish oneself with other critiques (hopefully good ones): “Nominated for X awards” or “Praised by Grumpy Critic”.

Let’s translate these two strategies into the world of unit tests:

Strategy 1: Borrow a stakeholder by linking to the requirement

I stated above that test code has no direct stakeholder. That’s correct for the code itself, but not for its motivation to exist. We don’t write unit tests just to have them. We write them because we want to assert that some functionality is present or some bug is absent. In both cases, we probably have a ticket that describes the required change in depth. We can add the “big name” of the ticket to the test by adding its number or a full url as a comment to the test:

/**
 * #Requirement http://issuetracker/TICKET-3096
 */
@Test
public void understands_iso8601_timestamp() {
    final LocalDateTime actual = SomeController.dateTimeFrom(
        "2023-05-24T17:30:20"
    );
    assertThat(
        actual
    ).isEqualTo(
        "2023-05-24T17:30:20"
    );
}

The detail of interest is the comment above the test method. It explains the motivation behind authoring the test. The first word (#Requirement) indicates that this is a new feature that got commissioned by the customer. If it was a bugfix test instead, the first word would be #Bugfix. In both cases, we tell future developers that this test has a meaning in the context of the linked ticket. It isn’t some random test that annoys them, it is the guard for a specific use case of the system.

Strategy 2: Gather visible awards for previous achievements

Once you get used to the accompanying comment to a test method, you can see it as some kind of billboard that displays the merit of the test. Why not display the heroic deeds of the test, too? I’ve blogged about the idea a decade ago, so this is just a quick recap:

/**
 * #Requirement http://issuetracker/TICKET-3096
 * @lifesaver by dsl
 * @regression by xyz
 */
@Test
public void understands_iso8601_timestamp() {
    /* omitted test code */
}

Every time a test does something positive for you, give it a medal! You can add it right below the ticket link and document for everybody to see that this test has earned its place in the code base. Of course, you can also document your frustrating encounters with a specific test in the same way. Over time, the bad tests will exhibit several negative awards, while your best tests will have several lifesaver medals (the highest distinction a test can achieve).

So, to wrap up this part of the metaphor: Pacify the inevitable critics of your test code by not only giving them pleasant code to look at but also context information about why this code exists and why they should listen to it if it happens to have grabbed their attention, even with bad news.

Epilogue

This is the fifth part of a series. All parts are linked below:

Every Unit Test Is a Stage Play – Part IV

In this series about describing unit tests with the metaphor of a stage play that tells short stories about your system, we already published three parts:

Today, we look at the story.

When you visit a theater, you probably expect to be entertained. You expect some level of preparation and presentation. You might not enjoy every aspect of the stage play, but you can cherish the overall experience.

When you read an unit test as a developer, you should not expect to be entertained. But you can expect some level of presentation and you should be able to endure the overall experience.

In both cases, a great factor to success is how the story is presented to you.

Imagine trying to follow a stage play that is in rehearsal mode. Constant interruptions and corrections from outside the stage, repetitions of scenes and single sentences and sometimes omissions that everybody is clued in on except you. And of course, nobody is dressed for their role. It would be hard to follow the plot and piece the story together.

Unit test code often reads like an early rehearsal. The code is stitched together by copy & paste, some details are modified but not emphasized and the point of the story is only revealed at the end, oftentimes told indirectly by convoluted assertions. When the test runs green for the first time, it is abandoned and left as an exercise in improvement for the next reader.

The next reader is a developer that made a change to the production code that got red-flagged by the unit test. He or she tries to find out why the jury of assertions is against the change and what the test is all about. It’s like the first visitor of a stage play has to tell the lighting technician where to point the spotlights without knowing how the story will play out.

If we accept the metaphor and view unit tests as stage plays that tell a short story, we should try to tell the story in a clear and concise manner. Giving standard names to the participating roles is an important first step to clue in the visitor/reader. But the last part of a story is the most crucial one. You are expected to tie the story threads together and provide a resolution that can be followed.

In unit testing, we express the resolution of the unit test’s short story as assertions:

public void parsingOfErroneousODLState() {
    final SerialODL target = new SerialODL(Z, "19");
    
    final ODLState actual = target.getCurrentState();
    
    Assert.assertFalse(
        actual.isNormalOperation()
    );
    Assert.assertEquals(
        3,
        IterableUtil.getSizeFor(actual.getErrorStates())
    );
    Assert.assertEquals(
        ODLErrorState.TEST_ERROR,
        IterableUtil.getElementAt(0, actual.getErrorStates())
    );
    Assert.assertEquals(
        ODLErrorState.INVALID_VALUE_DUE_TO_INITIALIZING,
        IterableUtil.getElementAt(1, actual.getErrorStates())
    );
    Assert.assertEquals(
        ODLErrorState.VALUE_GREATER_ALARM_THRESHOLD,
        IterableUtil.getElementAt(2, actual.getErrorStates())
    );
}

This unit test consists of one line of preparation (“arrange”), one line of code under test that produces the “actual” (“act”) and five assertions on several lines each (“assert”). Nearly 80 percent of this unit test are assertions. And they try to express something, but it gets drowned in noise.

One key to a better story is the usage of a more fitting form of expression, in our case a more natural way to write assertions:

public void parsingOfErroneousODLState() {
    final SerialODL target = new SerialODL(Z, "19");

    final ODLState actual = target.getCurrentState();

    assertThat(
        actual.isNormalOperation()
    ).isFalse();

    assertThat(
        actual.getErrorStates()
    ).containsExactly(
        ODLErrorState.TEST_ERROR,
        ODLErrorState.INVALID_VALUE_DUE_TO_INITIALIZING,
        ODLErrorState.VALUE_GREATER_ALARM_THRESHOLD
    );
}

In this example, we used assertj fluent assertions. As you can see, you can shrink the assertions part of your story down to the essence. You can state what you really want to see and not hide it behind indices and size comparisons that only exist because of the indices.

Another way to guide your reader is by structuring your test story into a standard form. From classic storytelling, we know about the hero’s journey that consists of three sections (departure, initiation, return) and can be found in countless books, movies and stage plays.

Our test’s journey is called AAA pattern. The three sections are:

  • Arrange
  • Act
  • Assert

Whenever you write an unit test, adhere to this pattern. If you find yourself tempted to add a second act or more assertions, break up your one unit test into two. You might want to think about extracting the arrange part into a common utility method (that is placed down below, behind the curtain). The story then says: The hero is in the same position both times, decides different (the two act sections) and has a different outcome (the two assertions) because of that.

There are probably countless things more that you can think of to make the story of your tests more compelling. Remember that test are not required to be entertaining or surprising. You can tell the same classic tale over and over again. The computer doesn’t mind and the next reader is glad when the test code is accessible right away because the structure and phrasing is on point.

Nobody would pay to see a confusing stage play. And nobody wants to decipher extravagant test code that just broke in an unexpected way. Give your readers what they hope for: Plain short stories about your system.

Epilogue

This is the fourth part of a series. All parts are linked below:

Every Unit Test Is a Stage Play – Part III

In the first and second parts of this series, I talked about describing unit tests with the metaphor of a stage play that tells short stories about your system.

This metaphor is really useful in many aspects of unit testing, as we have seen with naming variables and clearing the test method. In each blog entry of the series, I’m focussing on one aspect of the whole.

Today, we look at the theater.

If you go to the theater as a guest, you are greeted by a pompous entrance with a luxurious stairway that lead you to your comfy seat. You don’t get to see all the people and things behind the heavy curtains. You don’t need to recognize any details of the floor, the walls or the ceiling as you make your way into the auditorium. They don’t matter for the play.

If you enter the theater as an actor or a stage help, you slip into the building by the back entrance and make your way through a series of storage rooms. Or at least that is the cliché in many movies (I’m thinking about Birdman, but you probably have your own mental image at hands). You need to recognize all the details and position yourself according to your job. Your preparation matters for the play.

I described the test methods as single stage plays in earlier blog posts. Today, I want you to think about a test class as a theater. We need to agree on the position of the entrance. In my opinion, the entrance is where I’m starting to read – at the top of the file.

In my opinion, as a reader of the test class, I’m one of the guests. My expectation is that the test class is designed to be inviting to guests.

This expectation comes from a fundamental difference between production code classes and test classes: Classes in production code are not meant to be read. In fact, if you tailor your modules right and design the interface of a class clearly and without surprises, I want to utilize your class, but don’t read it. I don’t have to read it because the interface told me everything I needed to know about your class. Spare me the details, I’ve got problems to solve!

Test classes, on the other hand, are meant to be read. Nobody will call a test method from the production code. The interface of a test class is confusing from the outside. To value a test class is to read and understand it.

Production code classes are like goverment agencies: They serve you at the front, but don’t want you to snoop around the internals. Test classes are like a theater: You are invited to come inside and marvel at the show.

So we should design our test classes like theaters: An inviting upper part for the guests and a pragmatic lower part for the stage hands behind the curtain.

Let’s look at an example:

public class UninvitingTest {
	
    public static class TestResult {
        private final PulseCount[] counts;
        private final byte[] bytes;

        public TestResult(
            final PulseCount count,
            final byte[] bytes
        ) {
            this(
                new PulseCount[] {
                    count
                },
                bytes
            );
        }

        public TestResult(	
    	    final PulseCount[] counts,
            final byte[] bytes
        ) {
            super();
            this.counts = counts;
            this.bytes = bytes;
        }

        public PulseCount[] getCounts() {
            return this.counts;
        }

        public byte[] getBytes() {
            return this.bytes;
        }
    }
	
    private static final TestResult ZERO_COUNT = 
	new TestResult(
            new PulseCount(0),
            new byte[] {0x0, 0x0, 0x0, 0x0}
        );
    private static final TestResult VERY_SMALL_COUNT = 
	new TestResult(
            new PulseCount(34),
            new byte[] {0x0, 0x0, 0x22, 0x0}
        );
    private static final TestResult MEDIUM_COUNT_BORDER = 
	new TestResult(
            new PulseCount(65536),
            new byte[] {0x1, 0x0, 0x0, 0x0}
        );
    
    public UninvitingTest() {
    	super();
    }

    @Test
    public void serializeSingleChannelValues() throws Exception {
        SPEChannelValuesSerializer scv = 
            new SPEChannelValuesSerializer();
        Assertion.assertArrayEquals(
            ZERO_COUNT.getBytes(),
            scv.serializeCounts(ZERO_COUNT.getCounts())
        );
        Assertion.assertArrayEquals(
    	    VERY_SMALL_COUNT.getBytes(),
            scv.serializeCounts(VERY_SMALL_COUNT.getCounts())
        );
        Assertion.assertArrayEquals(
    	    MEDIUM_COUNT_BORDER.getBytes(),
            scv.serializeCounts(MEDIUM_COUNT_BORDER.getCounts())
        );
    }
}

In fact, I hope you didn’t read the whole thing. There are lots of problems with this test, but let’s focus on the entrance part:

  • Package declaration (omitted here)
  • Import statements (omitted here)
  • Class declaration
  • Inner class definition
  • Constant definitions
  • Constructor
  • Test method

The amount depends on the programming language, but some ornaments at the top of a file are probably required and can’t be skipped or moved around. We can think of them as a parking lot that we require, but don’t find visually appealing.

The class declaration is something like an entrance door. Behind it, the theater begins. And just by looking at the next three things, I can tell that I took the wrong door. Why am I burdened with the implementation details of a whole other class? Do I need to remember any of that? Are the constants important? Why does a test class require a constructor?

In this test class, I need to travel 50 lines of code before I reach the first test method. Translated into our metaphor, this would be equivalent to three storage rooms filled with random stuff that I need to traverse before I can sit into my chair to watch the play. It would be ridiculous when encountered in real life.

The solution isn’t that hard: Store your stuff in the back rooms. We just need to move our test method up, right under the class declaration. Everything else is defined at the bottom of our class, after the test methods.

This is a clear violation of the Java code conventions and the usual structure of a class. Just remember this: The code conventions and structures apply to production code and are probably useful for it. But we have other requirements for our test classes. We don’t need to know about the intrinsic details of that inner class because it will only be used in a few test methods. The constants aren’t public and won’t just change. The only call to our constructor lies outside of our code in the test framework. We don’t need it at all and should remove it.

If you view your test class as a theater, you store your stuff in the back and present an inviting front to your readers. You know why they visit you: They want to read the tests, so show them the tests as proximate as possible. Let the compiler travel your code, not your readers.

And just so show you the effect, here is the nasty test class from above, with the more inviting structure:

public class UninvitingTest {
    
    @Test
    public void serializeSingleChannelValues() throws Exception {
        SPEChannelValuesSerializer scv = 
            new SPEChannelValuesSerializer();
        Assertion.assertArrayEquals(
            ZERO_COUNT.getBytes(),
            scv.serializeCounts(ZERO_COUNT.getCounts())
        );
        Assertion.assertArrayEquals(
            VERY_SMALL_COUNT.getBytes(),
            scv.serializeCounts(VERY_SMALL_COUNT.getCounts())
        );
        Assertion.assertArrayEquals(
            MEDIUM_COUNT_BORDER.getBytes(),
            scv.serializeCounts(MEDIUM_COUNT_BORDER.getCounts())
        );
    }

    private static final TestResult ZERO_COUNT = 
        new TestResult(
            new PulseCount(0),
            new byte[] {0x0, 0x0, 0x0, 0x0}
        );
    private static final TestResult VERY_SMALL_COUNT = 
        new TestResult(
            new PulseCount(34),
            new byte[] {0x0, 0x0, 0x22, 0x0}
        );
    private static final TestResult MEDIUM_COUNT_BORDER = 
        new TestResult(
            new PulseCount(65536),
            new byte[] {0x1, 0x0, 0x0, 0x0}
        );

    public static class TestResult {
        private final PulseCount[] counts;
        private final byte[] bytes;

        public TestResult(
            final PulseCount count,
            final byte[] bytes
        ) {
            this(
                new PulseCount[] {
                    count
                },
                bytes
            );
        }

        public TestResult(  
            final PulseCount[] counts,
            final byte[] bytes
        ) {
            super();
            this.counts = counts;
            this.bytes = bytes;
        }

        public PulseCount[] getCounts() {
            return this.counts;
        }

        public byte[] getBytes() {
            return this.bytes;
        }
    }

    public UninvitingTest() {
        super();
    }
}

Show your readers the test methods and don’t burden them with details they just don’t need (yet).

Epilogue

This is the third part of a series. All parts are linked below:

Every Unit Test Is a Stage Play – Part II

In the first part of this series, I talked about describing unit tests with the metaphor of a stage play that tells short stories about your system.

This metaphor holds its water in nearly every aspect of unit testing and guides my approach from each single line of code to the whole concept of test coverage. In this series, I’m focussing on one aspect in each part.

Today, we look at the backdrop.

We learnt from the first part that most unit tests are performed by four roles that appear on stage. In a theater, the stage is oftentimes decorated by additional items that facilitate the story. This is the backdrop (or the coulisse) of the play. We have the same thing in unit tests.

In a unit test, the stage is the code inside the test method:

@Test
public void rounds_up_to_the_next_decimal_power() {
    final Configuration given = new Configuration(
        StringVirtualFile.createFileFromContent(
            "report.config",
	    "scale.maximum=2E5"
	)
    );
    final ReportConfiguration target = new ReportConfiguration(
        given
        SuffixProvider.none
    );
    final Optional<Double> actual = target.scaleMaximum();
    final double expected = 1E6;
    assertThat(actual).contains(expected);
}

A good director is very picky about every detail that appears on stage. There should be no incidental item visible for the audience. In the case of an unit test, the audience is the next developer that reads the test code.

The stage doesn’t need to be devoid of “extras” and furniture, but it should be limited to the essential. A theater play isn’t a movie set where eye candy is seen as something positive. During the play, the audience should recognize the actors (and their roles) easily and without searching between all the clutter.

So we need to do two things: Declutter the stage and identify the extras.

Decluttering the stage

In our example above, there is a mismatch between the code required to instantiate the given role and the rest of the whole test. If this were a real play, half the time would be spent on an extra that doesn’t even speak a line. It is implied that the target gets some information from the given, but it isn’t shown. We need to remove some details from the stage that aren’t even relevant to the story, but introduced in detail just like the essential things. It might be fun and suspenseful to guess if the “report.config” detail in line three is more meaningful than the “scale.maximum” specified in line four, but unit test stories are not meant to be mysterious or even entertaining. They are meant to inform the audience about a little fact of the tested system. There will be thousands of little stories about the system. Make them trivial to read and easy to understand.

We need to move the stage props off the stage:

@Test
void rounds_up_to_the_next_decimal_power() {
    Configuration given = givenScaleMaximumOf("2E5");
    final ReportConfiguration target = new ReportConfiguration(
        given,
        irrelevant()
    );
    final Optional<Double> actual = target.scaleMaximum();
    final double expected = 1E6;
    assertThat(actual).contains(expected);
}

private Configuration givenScaleMaximumOf(
    String scaleMaximum
) {
    return new Configuration(
        StringVirtualFile.createFileFromContent(
            "report.config",
            "scale.maximum=" + scaleMaximum
        )
    );
}

private SuffixProvider irrelevant() {
    return SuffixProvider.none;
}

By moving the initialization code of the Configuration object to a new private method, we employ the storytelling device of “conveniently prepared circumstances”. The private method is moved to the “lower decks” or the “backstage” area of our test class. “The stage is on top” might be our motto. Everything “private” or not-Test-annotated should not be visible first and should not be required reading.

Notice how I introduced a parameter to the new “givenScaleMaximumOf” method in order to keep the necessary test details on stage. The audience should not have to look behind the curtains to gather important information about the story. I made the parameter a string (and not a double or integer) because it is just a prop. The story doesn’t benefit from it being typecasted correctly. And if you look back, it was a string before, too.

Identifying the extras

I’ve also extracted the “magic number” or “silent extra” SuffixProvider.none into its own method. This method adds nothing but a name that conveys the meaning of the value to the story instead of the value itself. If I were a directory in a theater, this actor would have a plain and bland costume in contrast to the bright colored main roles. Maybe the stage lighting would illuminate the main roles and keep the extra in the shadowy area, too.

Now, the focus of our test method is back on the main story. The attention of our audience will be on the stage and it will not be burdened by irrelevant details. We will even label props as dispensable if they are.

Keep your stages free from clutter and the eyes of your audience on the story. Test code is boring by nature. It doesn’t have to be plodding, too.

Epilogue

This is the second part of a series. All parts are linked below:

Every Unit Test Is a Stage Play – Part I

At the last dev brunch, I got the recommendation about a talk that tries to explain functional programming differently. What really got me was the effectiveness of the changed vocabulatory. I’ve seen this before, in the old talk about test driven development and behaviour driven development. But in my head, I think about unit tests with another overarching metaphor that I’m trying to explain in this blog post series:

Every unit test is a stage play that tells a short story about your system.

And this metaphor really guides my approach to nearly any aspect of unit testing, from each single line of code to the whole concept of test coverage. So I’m breaking my explanation into (at least) five parts and focus on one aspect in each part.

Today, we look at the actors.

In each classic play, there are well-known roles that can be played by nearly any human, but always stay the same role. There’s the hero, the (comedic) sidekick and of course, the villain or antagonist. In every show of Romeo and Juliet, there will be a Romeo. It might not be the most convincing Romeo ever, but the role stays the same, no matter the cast.

The same thing is true for every well-formed unit test. There are four roles that always appear on stage:

  • target: This is the object under test or the code under test if you don’t use objects. The target is probably different for every unit test you write, but the role is always present. I’ve seen it being called “cut” for “code under test”, but I prefer “target”. If you see a reference named “target” in my test code, you can be sure about the role it plays in the story.
  • actual: If you can design your code to adhere to the simple “parameters in, result out” call pattern, the “result out” is the “actual”. It is what your target produced when challenged by the specific parameters of your test. One trick to testable code is to design the “actual” role as being quite simple and “flat”.
  • expected: This might be the closest thing to an antagonist in your play. The “expected” role is filled with the value (or values) that your “actual” is measured against. If your “actual” is simple, your “expected” will be simple, too. If your “actual” is too complex, the “expected” role will be overbearing. In any case, the “expected” role is what drives your assertions.
  • given: Our hero, the “target”, is often dependent on entry parameters or secondary objects (mocked or not). These sidekicks are part of the “given” role. You might think about the “given-when-then” storytelling structure of behaviour driven design for the name. If you strive for a simple code structure, the required “given” in your unit test should be manageable.

As you can see, the story of a typical unit test is always the same: The target, with the help of the given, produces an actual that competes against the expected.

If this story has a happy end, your test runs green. If the actual fails the expectation, your test runs red. If the target fails to produce an actual at all (think about an exception or error), your whole play falls apart and the test runs red.

Enough theory, let’s look at an unit test that uses the four roles:

@Test
public void rounds_up_to_the_next_decimal_power() {
    final Configuration given = new Configuration(
        StringVirtualFile.createFileFromContent(
            "report.config",
	    "scale.maximum=2E5"
	)
    );
    final ReportConfiguration target = new ReportConfiguration(
        given
        SuffixProvider.none
    );
    final Optional<Double> actual = target.scaleMaximum();
    final double expected = 1E6;
    assertThat(actual).contains(expected);
}

I’ve highlighted the roles for better visibility. Note that for a role to appear in the play, it doesn’t really have to be named explicitely. Most of the time, the last two lines would be collated into one:

assertThat(actual).contains(1E6);

You can still see the “expected” role play its part, but not as prominent as before.

You also probably saw the extra “given” that wasn’t highlighted:

SuffixProvider.none

It might be relevant to the story or really be an uncredited extra that is not crucial in the target’s journey to produce the correct actual. If that’s the case, it seems appropriate not to name it. We will learn about techniques that I use to make these extras more nondescript in a later part. Right now, we can differentiate between main roles that are named and secondary roles that are just there, as part of the scenery. Just don’t fool your audience by having an unnamed actor contribute an important piece to the story’s success. That might be a cool plot twist, but I’m not here to be surprised.

Let your tests perform boring plays, but lots of them.

By using the four roles of test play, you make it clear to the reader (your real audience) what to expect from your test code parts. Don’t name irrelevant test code parts and only omit the role names if there are no extras on stage.

Your audience will still find your play boring (that’s the fate of nearly all test code), but it won’t feel disregarded or, even worse, deceived.

Epilogue

This is the first part of a series. All parts are linked below:

Spicing up the Game of Life Kata – Part I

Conway’s Game of Life is a worthwhile coding kata that I’ve implemented probably hundreds of times. It is compact enough to be completed in 45 minutes, complex enough to benefit from Test First or Test Driven Development and still maintains a low entry barrier so that you can implement it in a foreign programming language without much of a struggle (except if the foreign language is APL).

And despite appearing to be a simple 0-player game with just a few rules, it can yield to deep theory, as John Conway explains nicely in this video. Oh, and it is turing complete, so you can replicate a Game of Life in Game of Life – of course.

But after a few dozen iterations on the kata, I decided to introduce some extra aspects to the challenge – with sometimes surprising results. This blog series talks about the additional requirements and what I learnt from them.

Additional requirement #1: Add color to the game

The low effort user interface of the Game of Life is a character-based console output of the game field for each generation. It is sufficient to prove that the game runs correctly and to watch some of the more advanced patterns form and evolve. But it is rather unpleasing to the human eye.

What if each living cell in the game is not only alive, but also has a color? The first generation on the game field will be very gaudy, but maybe we can think about “color inheritance” and have the reproducing cells define the color of their children. In theory, this should create areas of different colors that can be tracked back to a few or even just one ancestor.

Let’s think about it for a moment: When all parent cells are red, the child should be red, too. If a parent is yellow and another one is red, the child should have a color “on the spectrum” between yellow and red.

Learning about inheritance rules

One specific problem of reproduction in the Game of Life is that we don’t have two parents, we always have three of them:

Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction.

Rule #4 of Game of Life

We need to think about a color inheritance rule that incorporates three source colors and produces a target color that is somehow related to all three of them:

f(c1, c2, c3) → cn

A non-harebrained implementation of the function f is surprisingly difficult to come up with if you stay within your comfort zone regarding the representation of colors in programming languages. Typically, we represent colors in the RGB schema, with a number for the three color ingredients: red, green and blue. If the numbers range from zero to one (using floating-point values) or from zero to 255 (using integer values) or even some other value range doesn’t really matter here. Implementing the color inheritance function using RGB colors adds so many intricacies to the original problem that I consider this approach a mistake.

Learning about color representations

When we search around for alternative color representations, the “hue, saturation and brightness” or HSB approach might capture your interest. The interesting part is the first parameter: hue. It is a value between 0 and 360, with 0 and 360 being identically and meaning “red”. 360 is also the number of degrees in a full circle, so this color representation effectively describes a “color wheel” with one number.

This means that for our color inheritance function, the parameters c1, c2 and c3 are degrees beween 0 and 360. The whole input might look like this:

Just by looking at the graphics, you can probably already see the color spectrum that is suitable for the function’s result. Instead of complicated color calculations, we pick an angle somewhere between two angles (with the third angle defining the direction).

And this means that we have transformed our color calculation into a geometric formula using angles. We can now calculate the span between the “leftmost” and the “rightmost” angle that covers the “middle” angle. We determine a random angle in this span and use it as the color of the new cell.

Learning about implicit coupling

But there are three possibilities to calculate the span! Depending on what angle you assign the “middle” role, there are three spans that you can choose from. If you just take your parent cells in the order that is given by your data structure, you implement your algorithm in a manner that is implicitly coupled to your technical representation. Once you change the data structure ever so slightly (for example by updating your framework version), it might produce a different result regarding the colors for the exact same initial position of the game. That is a typical effect for hardware-tied software, as the first computer games were, but also a sign of poor abstraction and planning. If you are interested in exploring the effects of hardware implications, the game TIS-100 might be for you.

We want our implementation to be less coupled to hardware or data structures, so we define that we use the smallest span for our color calculation. That means that our available colors will rapidly drift towards a uniform color for every given isolated population on our game field.

Learning about long-term effects (drifts)

But that is not our only concern regarding drifts. Even if you calculate your color span correctly, you can still mess up the actual color pick without noticing it. The best indicator of this long-term effect is when every game you run ends in the green/cyan/blue-ish region of the color wheel (the 50 % area). This probably means that you didn’t implement the equivalence of 0° and 360° correctly. Or, in other words, that your color wheel isn’t a wheel, but a value range from 0 to 360, but without wrap-around:

You can easily write a test case that takes the wrap-around into account.

But there are other drifts that might effect your color outcomes and those are not as easily testable. One source of drift might be your random number generator. Every time you pick a random angle from your span, any small tendency of the random number generator influences your long-term results. I really don’t know how to test against these effects.

A more specific source of drift is your usage of the span (or interval). Is it closed (including the endpoints) or open (not including the endpoints)? Both options are possible and don’t introduce drift. But what if the interval is half-open? The most common mistake is to make it left-closed and right-open. This makes your colors drift “counter-clockwise”, but because you wrapped them correctly, you don’t notice from looking at the colors only.

I like to think about possible test cases and test strategies that uncover those mistakes. One “fun” approach is the “extreme values random number generator” that only returns the lowest or highest possible number.

Conclusion

Adding just one additional concept to a given coding kata opens up a multitude of questions and learnings. If you add inheritable colorization to your Game of Life, it not only looks cooler, but it also teaches you about how a problem can be solved easier with a suitable representation, given that you look out for typical pitfalls in the implementation.

Writing (unit) test cases for those pitfalls is one of my current kata training areas. Maybe you have an idea how to test against drifts? Write a comment or even a full blogpost about it! I would love to hear from you.

Don’t test details from a distance

The concept described in this blog entry has evoked a lot of different metaphors and descriptions from our team when we discussed it. So don’t take my words or thoughts on it as the one true way to talk about it – the concept of the “testing gap” or the distance between the code under test and the test’s vantage point.

Before I describe my metaphor for it with some weird visuals, let’s look at some code:

public Budget(String denotation, int maximumHours) {
    this.denotation = denotation;
    this.maximumHours = maximumHours;
    this.currentHours = maximumHours;
}

This is the constructor for an entity, a domain class that represents a budget of work hours that gets slowly used up when you work for the customer’s project. There is not much going on in this code except one little detail of the domain: New budgets always start fully “filled up”, in that the currentHours are set to the maximumHours. You can’t create a budget that is already half empty with this code.

Such a domain concept or “business rule” requires a test that ensures it is still in place:

@Test
public void has_initially_current_hours_set_to_maximum() {
    Budget target = new Budget(
        "current is maxed",
        100
    );
    assertThat(target.getMaximumHours()).isEqualTo(100);
    assertThat(target.getCurrentHours()).isEqualTo(100);
}

This is a fairly boring unit test that ensures that freshly created budgets have all their working hours still available.

In our example, the entity lives in the core of a web application that provides an endpoint to create new budgets. We have a test for the endpoint, of course:

@Test
public void stores_new_budget() throws Exception {
    this.web.perform(
        post("/budgets")
        .contentType(MediaType.APPLICATION_JSON)
        .content("{\"denotation\": \"new budget\", \"maximumHours\": 300}")
    )
    .andExpect(status().isOk())
    .andExpect(content().json("{\"denotation\": \"new budget\", \"maximumHours\": 300, \"currentHours\": 300}"));
}

You can shudder at the code formatting or the necessity to escape your JSON data into inscrutability. At least the second problem more or less disappears with current Java versions. But that’s not the point today. The point is that this is effectively the same test as above, but with a gap in between.

If you wrote just the second test, your code coverage metrics would probably not decrease. Your business rule would still be tested. So why write the first test if it adds nothing to the safety net?

This can be explained with the idea that there is a considerable “testing gap” between the second test and the business rule. It covers the entity’s constructor code and states explicitely that the currentHours property should be set to the same value as the maximumHours property. But it also defines the communication protocol as being HTTP, the data format as being JSON and travels through code that finds an “endpoint” for the given URL, maps the given JSON to a constructor call and serializes the resulting object as JSON back to the requester. That would be a lot of padding just to test the constructor’s third line.

The first test has virtually no testing gap. It knows nothing about the web, data formats or whatever else the application consists of. It just looks at the entity and its behaviour in isolation.

There are perfectly valid reasons to write the second test, but it should not be the only test that ensures the business rule in our example. The second test “sees too much” from its vantage point to pay attention to a little detail like the business rule.

In case you didn’t quite get the concept of the testing gap yet, here is how I imagine it in my head: If your code under test is a mystery box (really try to picture a shoebox made of cardboard that rattles when you move it), then your test is a big floating eye that uses little cracks and holes in the box to get a quick peek inside. If you exhibit state by getter methods like in our example, the eye ensures the internal state of the box by looking at the gauges that are placed on its sides.

If your testing gap is small, the eye hovers up close to the box. It doesn’t see anything else, but it notices every detail of the box.

If you have some testing gap in your test code, the eye is placed in a considerate distance from the mystery box. There are other important things between them. The gauges aren’t directly readable. The eye uses indirect clues and reflections to gather its informations. Every time something in the gap’s setup changes, the testing eye needs to adjust its gaze.

Which brings us to the conclusion of this metaphor: If you want things to be looked at in detail, write tests without a testing gap. Otherwise, your tests will have increased execution times, exhibit a strange imprecision in their message (“something in these dozen of classes has changed and it might not even be relevant”) and require frequent adjustments that are not related to their testing story.

Or, if said with the words of my imagination, place your testing eye directly at the entrance of your test’s hideout.

You’ve probably thought about this concept already, in your own terms and metaphors. Can you try to describe it in a comment? Just for the name, we discussed “testing distance”, “testing height”, “testing gap” and others. Perhaps we like your description even better.

What dependent types can do for you

In a way, this post is also about Test Driven Developement and *Type* Driven Developement. While the two share the same acronym, I always thought of them as different concepts. However, as I recently experienced, when the two concepts are used in a dependently typed language, there is something like a fluid transition between them.

While I will talk about programming in the dependently typed language Agda, not much is needed to follow what is going on – I will just walk through an exercise and explain everything along the way.

The exercise I want to use, is here. It talks about a submarine, its position and certain commands, that change the position. Examples for commands are forward 1, down 2 and up 3. These ‘values’ can be used just like that with the following definition of the type of commands:

      data Command : Set where
        forward : Nat -> Command
        up      : Nat -> Command
        down    : Nat -> Command

Agda can be used in a very mathy way – this should really be read as saying, that the type of commands is a Set and there are three constructors (highlighted green) which take a natural number as argument and produce a command. So, using that application is just juxtaposition, we can make the following definitions now:

  justSomeCommand = forward 5
   anotherOne = up 1

Now the exercise text explains, how these commands can be applied to the position of the submarine. Working as a software developer, I built the habit of turning specifications like that into tests. Since I don’t know any better, I just wrote ‘tests’ in Agda using equations to translate the exercise text – I’ll explain the syntax below:

  apply (forward 5) (pos 0 0) ≡ pos 5 0
  apply (down 5) (pos 5 0) ≡ pos 5 5
  apply (forward 8) (pos 5 5) ≡ pos 13 5

Note that the triple equal sign is different from what we used above. Roughly, this is because it is the proposition, that some tings are equal, while the normal equal sign above, was used to make definitions. The code doesn’t type check as is. We haven’t defined ‘apply’ and it is not valid Agda to just write down equations like that. Let’s fix the latter problem first, by turning it into declarations and definitions. This will actually define elements of the datatypes of equality proofs – but I’m pretty sure you can accept these changes just as boilerplate we have to add to our equations:

  example1 : apply (forward 5) (pos 0 0) ≡ pos 5 0
  example1 = refl

  example2 : apply (down 5) (pos 5 0) ≡ pos 5 5
  example2 = refl

  example3 : apply (forward 8) (pos 5 5) ≡ pos 13 5
  example3 = refl

Now, to make the examples type-check, we have to define ‘pos’ and ‘apply’. Positions can be done analogous to commands:

  data Position : Set where
    pos : Nat -> Nat -> Position

(Here, the type of ‘pos’ just tells us, that it is a function taking two natural numbers as arguments.) Now we are ready to start with ‘apply’:

   apply : Command → Position → Position
    apply = ?

So apply is a function, that takes a ‘Command’ and a ‘Position’ and returns another ‘Position’. For the definition of ‘apply’ I just entered a questionmark ‘?’. It is one of my favorite features of Agda, that terms can be left out like this before type checking. Agda still checks everything we have given so far and will give us a lot of information about what ‘?’ could be. This is called ‘interacting with a hole’. Because, well, it is a hole in your code and the type checker is there to tell you, which things might fit into this hole. After type checking, the hole and what Agda tells us about it, will look like this:

   apply : Command → Position → Position
    apply = ?

Goal: Command -> Position -> Position

This was type-checked with a couple of imports – see my final version of the code if you want to reproduce. The first thing Agda tells us, is the type of the goal and then there is some mumbling about constraints with some fragments, that look like they have something to do with the examples from above – the latter is actually not information about the hole, but general information about the type checking. So lets look at them to see, if the type checker has to say anything:

The refl terms in the definitions of the examples, are highlighted yellow

Something is yellow! This is Agda’s way to tell us, that it does not have enough information to decide, if everything is okay. Which makes a lot of sense, since we haven’t given a definition of ‘apply’ and these equations are about values computed with ‘apply’. So let us just continue to define ‘apply’ and see if the yellow vanishes. This is analogous to the stage in TDD were your tests don’t pass because your code does not yet compile.

We will use pattern matching on the given ‘Command’ and ‘Position’ to define ‘apply’ – the cases below were generated by Agda (I only changed variable names), and we now have a hole for each case:

  apply : Command → Position → Position
  apply (forward x) (pos h d) = {!!}
  apply (up x) (pos h d) = {!!}
  apply (down x) (pos h d) = {!!}

There are various ways in which Agda can use the information given by types to help us with filling these holes. First of all, we can just ask Agda to make the hole ‘smaller’ if there is a unique canonical way to do so. This will work here, since ‘Position’ has only one constructor. So we get new holes for arguments of the constructor ‘pos’ and can try to fill those.

Let us focus on the first case and see what happens if we enter something not in line with our tests:

If we ask Agda, if ‘h+d’ fits into the ‘hole’, it will say no and tell us what the problem is in the following way:

While this is essentially the same kind of feedback you would get from a unit test, there are at least two important advantages to note:

  • This is feedback from the type checker and it is combined with other things the type checker can tell you. It means you get a lot of feedback at once, when you ask Agda, if something you wrote fits into a hole.
  • ‘refl’ is only a simple case of the proves you can write in Agda. More complicated ones need some training, but you can go way beyond unit tests and ‘check’ infinitely many cases or even better: all cases.

If you want more, just try Agda yourself! One easy way to do that, is to use Ingo Blechschmidt’s Agdapad, which let’s you try Agda in your browser.