Spicing up the Game of Life Kata – Part I

Conway’s Game of Life is a worthwhile coding kata that I’ve implemented probably hundreds of times. It is compact enough to be completed in 45 minutes, complex enough to benefit from Test First or Test Driven Development and still maintains a low entry barrier so that you can implement it in a foreign programming language without much of a struggle (except if the foreign language is APL).

And despite appearing to be a simple 0-player game with just a few rules, it can yield to deep theory, as John Conway explains nicely in this video. Oh, and it is turing complete, so you can replicate a Game of Life in Game of Life – of course.

But after a few dozen iterations on the kata, I decided to introduce some extra aspects to the challenge – with sometimes surprising results. This blog series talks about the additional requirements and what I learnt from them.

Additional requirement #1: Add color to the game

The low effort user interface of the Game of Life is a character-based console output of the game field for each generation. It is sufficient to prove that the game runs correctly and to watch some of the more advanced patterns form and evolve. But it is rather unpleasing to the human eye.

What if each living cell in the game is not only alive, but also has a color? The first generation on the game field will be very gaudy, but maybe we can think about “color inheritance” and have the reproducing cells define the color of their children. In theory, this should create areas of different colors that can be tracked back to a few or even just one ancestor.

Let’s think about it for a moment: When all parent cells are red, the child should be red, too. If a parent is yellow and another one is red, the child should have a color “on the spectrum” between yellow and red.

Learning about inheritance rules

One specific problem of reproduction in the Game of Life is that we don’t have two parents, we always have three of them:

Any dead cell with exactly three live neighbors becomes a live cell, as if by reproduction.

Rule #4 of Game of Life

We need to think about a color inheritance rule that incorporates three source colors and produces a target color that is somehow related to all three of them:

f(c1, c2, c3) → cn

A non-harebrained implementation of the function f is surprisingly difficult to come up with if you stay within your comfort zone regarding the representation of colors in programming languages. Typically, we represent colors in the RGB schema, with a number for the three color ingredients: red, green and blue. If the numbers range from zero to one (using floating-point values) or from zero to 255 (using integer values) or even some other value range doesn’t really matter here. Implementing the color inheritance function using RGB colors adds so many intricacies to the original problem that I consider this approach a mistake.

Learning about color representations

When we search around for alternative color representations, the “hue, saturation and brightness” or HSB approach might capture your interest. The interesting part is the first parameter: hue. It is a value between 0 and 360, with 0 and 360 being identically and meaning “red”. 360 is also the number of degrees in a full circle, so this color representation effectively describes a “color wheel” with one number.

This means that for our color inheritance function, the parameters c1, c2 and c3 are degrees beween 0 and 360. The whole input might look like this:

Just by looking at the graphics, you can probably already see the color spectrum that is suitable for the function’s result. Instead of complicated color calculations, we pick an angle somewhere between two angles (with the third angle defining the direction).

And this means that we have transformed our color calculation into a geometric formula using angles. We can now calculate the span between the “leftmost” and the “rightmost” angle that covers the “middle” angle. We determine a random angle in this span and use it as the color of the new cell.

Learning about implicit coupling

But there are three possibilities to calculate the span! Depending on what angle you assign the “middle” role, there are three spans that you can choose from. If you just take your parent cells in the order that is given by your data structure, you implement your algorithm in a manner that is implicitly coupled to your technical representation. Once you change the data structure ever so slightly (for example by updating your framework version), it might produce a different result regarding the colors for the exact same initial position of the game. That is a typical effect for hardware-tied software, as the first computer games were, but also a sign of poor abstraction and planning. If you are interested in exploring the effects of hardware implications, the game TIS-100 might be for you.

We want our implementation to be less coupled to hardware or data structures, so we define that we use the smallest span for our color calculation. That means that our available colors will rapidly drift towards a uniform color for every given isolated population on our game field.

Learning about long-term effects (drifts)

But that is not our only concern regarding drifts. Even if you calculate your color span correctly, you can still mess up the actual color pick without noticing it. The best indicator of this long-term effect is when every game you run ends in the green/cyan/blue-ish region of the color wheel (the 50 % area). This probably means that you didn’t implement the equivalence of 0° and 360° correctly. Or, in other words, that your color wheel isn’t a wheel, but a value range from 0 to 360, but without wrap-around:

You can easily write a test case that takes the wrap-around into account.

But there are other drifts that might effect your color outcomes and those are not as easily testable. One source of drift might be your random number generator. Every time you pick a random angle from your span, any small tendency of the random number generator influences your long-term results. I really don’t know how to test against these effects.

A more specific source of drift is your usage of the span (or interval). Is it closed (including the endpoints) or open (not including the endpoints)? Both options are possible and don’t introduce drift. But what if the interval is half-open? The most common mistake is to make it left-closed and right-open. This makes your colors drift “counter-clockwise”, but because you wrapped them correctly, you don’t notice from looking at the colors only.

I like to think about possible test cases and test strategies that uncover those mistakes. One “fun” approach is the “extreme values random number generator” that only returns the lowest or highest possible number.

Conclusion

Adding just one additional concept to a given coding kata opens up a multitude of questions and learnings. If you add inheritable colorization to your Game of Life, it not only looks cooler, but it also teaches you about how a problem can be solved easier with a suitable representation, given that you look out for typical pitfalls in the implementation.

Writing (unit) test cases for those pitfalls is one of my current kata training areas. Maybe you have an idea how to test against drifts? Write a comment or even a full blogpost about it! I would love to hear from you.

Don’t test details from a distance

The concept described in this blog entry has evoked a lot of different metaphors and descriptions from our team when we discussed it. So don’t take my words or thoughts on it as the one true way to talk about it – the concept of the “testing gap” or the distance between the code under test and the test’s vantage point.

Before I describe my metaphor for it with some weird visuals, let’s look at some code:

public Budget(String denotation, int maximumHours) {
    this.denotation = denotation;
    this.maximumHours = maximumHours;
    this.currentHours = maximumHours;
}

This is the constructor for an entity, a domain class that represents a budget of work hours that gets slowly used up when you work for the customer’s project. There is not much going on in this code except one little detail of the domain: New budgets always start fully “filled up”, in that the currentHours are set to the maximumHours. You can’t create a budget that is already half empty with this code.

Such a domain concept or “business rule” requires a test that ensures it is still in place:

@Test
public void has_initially_current_hours_set_to_maximum() {
    Budget target = new Budget(
        "current is maxed",
        100
    );
    assertThat(target.getMaximumHours()).isEqualTo(100);
    assertThat(target.getCurrentHours()).isEqualTo(100);
}

This is a fairly boring unit test that ensures that freshly created budgets have all their working hours still available.

In our example, the entity lives in the core of a web application that provides an endpoint to create new budgets. We have a test for the endpoint, of course:

@Test
public void stores_new_budget() throws Exception {
    this.web.perform(
        post("/budgets")
        .contentType(MediaType.APPLICATION_JSON)
        .content("{\"denotation\": \"new budget\", \"maximumHours\": 300}")
    )
    .andExpect(status().isOk())
    .andExpect(content().json("{\"denotation\": \"new budget\", \"maximumHours\": 300, \"currentHours\": 300}"));
}

You can shudder at the code formatting or the necessity to escape your JSON data into inscrutability. At least the second problem more or less disappears with current Java versions. But that’s not the point today. The point is that this is effectively the same test as above, but with a gap in between.

If you wrote just the second test, your code coverage metrics would probably not decrease. Your business rule would still be tested. So why write the first test if it adds nothing to the safety net?

This can be explained with the idea that there is a considerable “testing gap” between the second test and the business rule. It covers the entity’s constructor code and states explicitely that the currentHours property should be set to the same value as the maximumHours property. But it also defines the communication protocol as being HTTP, the data format as being JSON and travels through code that finds an “endpoint” for the given URL, maps the given JSON to a constructor call and serializes the resulting object as JSON back to the requester. That would be a lot of padding just to test the constructor’s third line.

The first test has virtually no testing gap. It knows nothing about the web, data formats or whatever else the application consists of. It just looks at the entity and its behaviour in isolation.

There are perfectly valid reasons to write the second test, but it should not be the only test that ensures the business rule in our example. The second test “sees too much” from its vantage point to pay attention to a little detail like the business rule.

In case you didn’t quite get the concept of the testing gap yet, here is how I imagine it in my head: If your code under test is a mystery box (really try to picture a shoebox made of cardboard that rattles when you move it), then your test is a big floating eye that uses little cracks and holes in the box to get a quick peek inside. If you exhibit state by getter methods like in our example, the eye ensures the internal state of the box by looking at the gauges that are placed on its sides.

If your testing gap is small, the eye hovers up close to the box. It doesn’t see anything else, but it notices every detail of the box.

If you have some testing gap in your test code, the eye is placed in a considerate distance from the mystery box. There are other important things between them. The gauges aren’t directly readable. The eye uses indirect clues and reflections to gather its informations. Every time something in the gap’s setup changes, the testing eye needs to adjust its gaze.

Which brings us to the conclusion of this metaphor: If you want things to be looked at in detail, write tests without a testing gap. Otherwise, your tests will have increased execution times, exhibit a strange imprecision in their message (“something in these dozen of classes has changed and it might not even be relevant”) and require frequent adjustments that are not related to their testing story.

Or, if said with the words of my imagination, place your testing eye directly at the entrance of your test’s hideout.

You’ve probably thought about this concept already, in your own terms and metaphors. Can you try to describe it in a comment? Just for the name, we discussed “testing distance”, “testing height”, “testing gap” and others. Perhaps we like your description even better.

What dependent types can do for you

In a way, this post is also about Test Driven Developement and *Type* Driven Developement. While the two share the same acronym, I always thought of them as different concepts. However, as I recently experienced, when the two concepts are used in a dependently typed language, there is something like a fluid transition between them.

While I will talk about programming in the dependently typed language Agda, not much is needed to follow what is going on – I will just walk through an exercise and explain everything along the way.

The exercise I want to use, is here. It talks about a submarine, its position and certain commands, that change the position. Examples for commands are forward 1, down 2 and up 3. These ‘values’ can be used just like that with the following definition of the type of commands:

      data Command : Set where
        forward : Nat -> Command
        up      : Nat -> Command
        down    : Nat -> Command

Agda can be used in a very mathy way – this should really be read as saying, that the type of commands is a Set and there are three constructors (highlighted green) which take a natural number as argument and produce a command. So, using that application is just juxtaposition, we can make the following definitions now:

  justSomeCommand = forward 5
   anotherOne = up 1

Now the exercise text explains, how these commands can be applied to the position of the submarine. Working as a software developer, I built the habit of turning specifications like that into tests. Since I don’t know any better, I just wrote ‘tests’ in Agda using equations to translate the exercise text – I’ll explain the syntax below:

  apply (forward 5) (pos 0 0) ≡ pos 5 0
  apply (down 5) (pos 5 0) ≡ pos 5 5
  apply (forward 8) (pos 5 5) ≡ pos 13 5

Note that the triple equal sign is different from what we used above. Roughly, this is because it is the proposition, that some tings are equal, while the normal equal sign above, was used to make definitions. The code doesn’t type check as is. We haven’t defined ‘apply’ and it is not valid Agda to just write down equations like that. Let’s fix the latter problem first, by turning it into declarations and definitions. This will actually define elements of the datatypes of equality proofs – but I’m pretty sure you can accept these changes just as boilerplate we have to add to our equations:

  example1 : apply (forward 5) (pos 0 0) ≡ pos 5 0
  example1 = refl

  example2 : apply (down 5) (pos 5 0) ≡ pos 5 5
  example2 = refl

  example3 : apply (forward 8) (pos 5 5) ≡ pos 13 5
  example3 = refl

Now, to make the examples type-check, we have to define ‘pos’ and ‘apply’. Positions can be done analogous to commands:

  data Position : Set where
    pos : Nat -> Nat -> Position

(Here, the type of ‘pos’ just tells us, that it is a function taking two natural numbers as arguments.) Now we are ready to start with ‘apply’:

   apply : Command → Position → Position
    apply = ?

So apply is a function, that takes a ‘Command’ and a ‘Position’ and returns another ‘Position’. For the definition of ‘apply’ I just entered a questionmark ‘?’. It is one of my favorite features of Agda, that terms can be left out like this before type checking. Agda still checks everything we have given so far and will give us a lot of information about what ‘?’ could be. This is called ‘interacting with a hole’. Because, well, it is a hole in your code and the type checker is there to tell you, which things might fit into this hole. After type checking, the hole and what Agda tells us about it, will look like this:

   apply : Command → Position → Position
    apply = ?

Goal: Command -> Position -> Position

This was type-checked with a couple of imports – see my final version of the code if you want to reproduce. The first thing Agda tells us, is the type of the goal and then there is some mumbling about constraints with some fragments, that look like they have something to do with the examples from above – the latter is actually not information about the hole, but general information about the type checking. So lets look at them to see, if the type checker has to say anything:

The refl terms in the definitions of the examples, are highlighted yellow

Something is yellow! This is Agda’s way to tell us, that it does not have enough information to decide, if everything is okay. Which makes a lot of sense, since we haven’t given a definition of ‘apply’ and these equations are about values computed with ‘apply’. So let us just continue to define ‘apply’ and see if the yellow vanishes. This is analogous to the stage in TDD were your tests don’t pass because your code does not yet compile.

We will use pattern matching on the given ‘Command’ and ‘Position’ to define ‘apply’ – the cases below were generated by Agda (I only changed variable names), and we now have a hole for each case:

  apply : Command → Position → Position
  apply (forward x) (pos h d) = {!!}
  apply (up x) (pos h d) = {!!}
  apply (down x) (pos h d) = {!!}

There are various ways in which Agda can use the information given by types to help us with filling these holes. First of all, we can just ask Agda to make the hole ‘smaller’ if there is a unique canonical way to do so. This will work here, since ‘Position’ has only one constructor. So we get new holes for arguments of the constructor ‘pos’ and can try to fill those.

Let us focus on the first case and see what happens if we enter something not in line with our tests:

If we ask Agda, if ‘h+d’ fits into the ‘hole’, it will say no and tell us what the problem is in the following way:

While this is essentially the same kind of feedback you would get from a unit test, there are at least two important advantages to note:

  • This is feedback from the type checker and it is combined with other things the type checker can tell you. It means you get a lot of feedback at once, when you ask Agda, if something you wrote fits into a hole.
  • ‘refl’ is only a simple case of the proves you can write in Agda. More complicated ones need some training, but you can go way beyond unit tests and ‘check’ infinitely many cases or even better: all cases.

If you want more, just try Agda yourself! One easy way to do that, is to use Ingo Blechschmidt’s Agdapad, which let’s you try Agda in your browser.

Redux-Toolkit & Solving “ReferenceError: Access lexical declaration … before initialization”

Last week, I had a really annoying error in one of our React-Redux applications. It started with a believed-to-be-minor cleanup in our code, culminated in four developers staring at our code in disbelief and quite some research, and resulted in some rather feasible solutions that, in hindsight, look quite obvious (as is usually the case).

The tech landscape we are talking about here is a React webapp that employs state management via Redux-Toolkit / RTK, the abstraction layer above Redux to simplify the majority of standard use cases one has to deal with in current-day applications. Personally, I happen to find that useful, because it means a perceptible reduction of boilerplate Redux code (and some dependencies that you would use all the time anyway, like redux-thunk) while maintaining compatibility with the really useful Redux DevTools, and not requiring many new concepts. As our application makes good use of URL routing in order to display very different subparts, we implemented our own middleware that does the data fetching upfront in a major step (sometimes called „hydration“).

One of the basic ideas in Redux-Toolkit is the management of your state in substates called slices that aim to unify the handling of actions, action creators and reducers, essentially what was previously described as Ducks pattern.

We provide unit tests with the jest framework, and generally speaking, it is more productive to test general logic instead of React components or Redux state updates (although we sometimes make use of that, too). Jest is very modular in the sense that you can add tests for any JavaScript function from whereever in your testing codebase, the only thing, of course, is that these functions need to be exported from their respective files. This means that a single jest test only needs to resolve the imports that it depends on, recursively (i.e. the depenency tree), not the full application.

Now my case was as follows: I wrote a test that essentially was just testing a small switch/case decision function. I noticed there was something fishy when this test resulted in errors of the kind

  • Target container is not a DOM element. (pointing to ReactDOM.render)
  • No reducer provided for key “user” (pointing to node_modules redux/lib/redux.js)
  • Store does not have a valid reducer. Make sure the argument passed to combineReducers is an object whose values are reducers. (also …/redux.js)

This meant there was too much going on. My unit test should neither know of React nor Redux, and as the culprit, I found that one of the imports in the test file used another import that marginally depended on a slice definition, i.e.

///////////////////////////////
// test.js
///////////////////////////////
import {helper} from "./Helpers.js"
...

///////////////////////////////
// Helpers.js
///////////////////////////////
import {SOME_CONSTANT} from "./state/generalSlice.js"
...

Now I only needed some constant located in generalSlice, so one could easily move this to some “./const.js”. Or so I thought.

When I removed the generalSlice.js depency from Helpers.js, the React application broke. That is, in a place totally unrelated:

ReferenceError: can't access lexical declaration 'loadPage' before initialization

./src/state/loadPage.js/</<
http:/.../static/js/main.chunk.js:11198:100
./src/state/topicSlice.js/<
C:/.../src/state/topicSlice.js:140
> [loadPage.pending]: (state, action) => {...}

From my past failures, I instantly recalled: This is a problem with circular dependencies.

Alas, topicSlice.js imports loadPage.js and loadPage.js imports topicSlice.js, and while some cases allow such a circle to be handled by webpack or similar bundlers, in general, such import loops can cause problems. And while I knew that before, this case was just difficult for me, because of the very nature of RTK.

So this is a thing with the RTK way of organizing files:

  • Every action that clearly belongs to one specific slice, can directly be defined in this state file as a property of the “reducers” in createSlice().
  • Every action that is shared across files or consumed in more than one reducer (in more than one slice), can be defined as one of the “extraReducers” in that call.
  • Async logic like our loadPage is defined in thunks via createAsyncThunk(), which gives you a place suited for data fetching etc. that always comes with three action creators like loadPage.pending, loadPage.fulfilled and loadPage.rejected
  • This looks like
///////////////////////////////
// topicSlice.js
///////////////////////////////
import {loadPage} from './loadPage.js';

const topicSlice = createSlice({
    name: 'topic',
    initialState,
    reducers: {
        setTopic: (state, action) => {
            state.topic= action.payload;
        },
        ...
    },
    extraReducers: {
        [loadPage.pending]: (state, action) => {
              state.topic = initialState.topic;
        },
        ...
    });

export const { setTopic, ... } = topicSlice.actions;

And loadPage itself was a rather complex action creator (thunk), as it could cause state dispatches as well, as it was built, in simplified form, as:

///////////////////////////////
// loadPage.js
///////////////////////////////
import {setTopic} from './topicSlice.js';

export const loadPage = createAsyncThunk('loadPage', async (args, thunkAPI) => {
    const response = await fetchAllOurData();

    if (someCondition(response)) {
        await thunkAPI.dispatch(setTopic(SOME_TOPIC));
    }

    return response;
};

You clearly see that import loop: loadPage needs setTopic from topicSlice.js, topicSlice needs loadPage from loadPage.js. This was rather old code that worked before, so it appeared to me that this is no problem per se – but solving that completely different dependency in Helpers.js (SOME_CONSTANT from generalSlice.js), made something break.

That was quite weird. It looked like this not-really-required import of SOME_CONSTANT made ./generalSlice.js load first, along with it a certain set of imports include some of the dependencies of either loadPage.js or topicSlice.js, so that when their dependencies would have been loaded, their was no import loop required anymore. However, it did not appear advisable to trace that fact to its core because the application has grown a bit already. We needed a solution.

As I told before, it required the brainstorming of multiple developers to find a way of dealing with this. After all, RTK appeared mature enough for me to dismiss “that thing just isn’t fully thought through yet”. Still, code-splitting is such a basic feature that one would expect some answer to that. What we did come up with was

  1. One could address the action creators like loadPage.pending (which is created as a result of RTK’s createAsyncThunk) by their string equivalent, i.e. ["loadPage/pending"] instead of [loadPage.pending] as key in the extraReducers of topicSlice. This will be a problem if one ever renames the action from “loadPage” to whatever (and your IDE and linter can’t help you as much with errors), which could be solved by writing one’s own action name factory that can be stashed away in a file with no own imports.
  2. One could re-think the idea that setTopic should be among the normal reducers in topicSlice, i.e. being created automatically. Instead, it can be created in its own file and then being referenced by loadPage.js and topicSlice.js in a non-circular manner as export const setTopic = createAction('setTopic'); and then you access it in extraReducers as [setTopic]: ... .
  3. One could think hard about the construction of loadPage. This whole thing is actually a hint that loadPage does too many things on too many different levels (i.e. it violates at least the principles of Single Responsibility and Single Level of Abstraction).
    1. One fix would be to at least do away with the automatically created loadPage.pending / loadPage.fulfilled / loadPage.rejected actions and instead define custom createAction("loadPage.whatever") with whatever describes your intention best, and put all these in your own file (as in idea 2).
    2. Another fix would be splitting the parts of loadPage to other thunks, and the being able to react on the automatically created pending / fulfilled / rejected actions each.

I chose idea 2 because it was the quickest, while allowing myself to let idea 3.1 rest a bit. I guess that next time, I should favor that because it makes the developer’s intention (as in… mine) more clear and the Redux DevTools output even more descriptive.

In case you’re still lost, my solution looks as

///////////////////////////////
// sharedTopicActions.js
///////////////////////////////
import {createAction} from "@reduxjs/toolkit";
export const setTopic = createAction('topic/set');
//...

///////////////////////////////
// topicSlice.js
///////////////////////////////
import {setTopic} from "./sharedTopicActions";
const topicSlice = createSlice({
    name: 'topic',
    initialState,
    reducers: {
        ...
    },
    extraReducers: {
        [setTopic]: (state, action) => {
            state.topic= action.payload;
        },

        [loadPage.pending]: (state, action) => {
              state.topic = initialState.topic;
        },
        ...
    });

///////////////////////////////
// loadPage.js, only change in this line:
///////////////////////////////
import {setTopic} from "./sharedTopicActions";
// ... Rest unchanged

So there’s a simple tool to break circular dependencies in more complex Redux-Toolkit slice structures. It was weird that it never occured to me before, i.e. up until to this day, I always was able to solve circular dependencies by shuffling other parts of the import.

My problem is fixed. The application works as expected and now all the tests work as they should, everything is modular enough and the required change was not of a major structural redesign. It required to think hard but had a rather simple solution. I have trust in RTK again, and one can be safe again in the assumption that JavaScript imports are at least deterministic. Although I will never do the work to analyse what it actually was with my SOME_CONSTANT import that unknowingly fixed the problem beforehand.

Is there any reason to disfavor idea 3.1, though? Feel free to comment your own thoughts on that issue 🙂

Don’t shoot your messengers

Writing small, focused tests, often called unit tests, is one of the things that look easy at the outset but turn out to be more delicate than anticipated. Writing a three-lines-of-code unit test in the triple-A structure soon became second nature to me, but there were lots of cases that resisted easy testing.

Using mock objects is the typical next step to accommodate this resistance and make the test code more complex. This leads to 5 to 10 lines of test code for easy mock-based tests and up to thirty or even fifty lines of test code where a lot of moving parts are mocked and chained together to test one single method.

So, the first reaction for a more complicated testing scenario is to make the test more complicated.

But even with the powerful combination of mock objects and dependency injection, there are situations where writing suitable tests seems impossible. In the past, I regarded these code blocks as “untestable” and omitted the tests because their economic viability seemed debatable.

I wrote small tests for easy code, long tests for complicated code and no tests for defiant code. The problem always seemed to be the tests that just didn’t cut it.

Until I could recognize my approach in a new light: I was encumbering the messenger. If the message was too harsh, I would outright shoot him.

The tests tried to tell me something about my production code. But I always saw the problem with them, not the code.

Today, I can see that the tests I never wrote because the “test story” at hand was too complicated for my abilities were already telling me something important.

The test you decide not to write because it’s too much of a hassle tells you that your code structure needs improvement. They already deliver their message to you, even before they exist.

With this insight, I can oftentimes fix the problem where it is caused: In the production code. The test coverage increases and the tests become simpler.

Let’s look at a small example that tries to show the line of thinking without being too extensive:

We developed a class in Java that represents a counter that gets triggered and imposes a wait period on every tenth trigger impulse:

public class CountAndWait {
	private int triggered;
	
	public CountAndWait() {
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}
			this.triggered = 0;
		}
	}
}

There is a lot going on in the code for such a simple functionality. Especially the try-catch block catches my eye and makes me worried when thinking about tests. Why is it even there? Well, here is a starter link for an explanation.

But even without advanced threading issues, the normal functionality of our code is worrisome enough. How many lines of code will a test contain that covers the sleep? Should I really use a loop in my test code? Will the test really have a runtime of one second? That’s the same amount of time several hundred other unit tests require for much more coverage. Is this an economically sound testing approach?

The test doesn’t even exist and already sends a message: Your production code should be structured differently. If you focus on the “test story”, perhaps a better structure emerges?

The “story of the test” is the description of the production code path that is covered and asserted by the test. In our example, I want the story to be:

“When a counter object is triggered for the tenth time, it should impose a wait. Afterwards, the cycle should repeat.”

Nothing in the story of this test talks about interruption or exceptions, so if this code gets in the way, I should restructure it to eliminate it from my story. The new production code might look like this:

public class CountAndWait {
	private final Runnable waiting;
	private int triggered;
	
	public static CountAndWait forOneSecond() {
		return new CountAndWait(() -> {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}			
		});
	}
	
	public CountAndWait(Runnable waiting) {
		this.waiting = waiting;
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			this.waiting.run();
			this.triggered = 0;
		}
	}
}

That’s a lot more code than before, but we can concentrate on the latter half. We can now inject a mock object that attests to how often it was run. This mock object doesn’t need to sleep for any amount of time, so the unit test is fast again.

Instead of making the test more complex, we introduced additional structure (and complexity) into the production code. The resulting unit test is rather easy to write:

class CountAndWaitTest {
	@Test
	@DisplayName("Waits after 10 triggers and resets")
	void wait_after_10_triggers_and_reset() {
		Runnable simulatedWait = mock(Runnable.class);
		CountAndWait target = new CountAndWait(simulatedWait);
		
		// no wait for the first 9 triggers
		Repeat.times(9).call(target::trigger);
		verifyNoInteractions(simulatedWait);
		
		// wait at the 10th trigger
		target.trigger();
		verify(simulatedWait, times(1)).run();
		
		// reset happened, no wait for another 9 triggers
		Repeat.times(9).call(target::trigger);
		verify(simulatedWait, times(1)).run();
	}
}

It’s still different from a simple 3-liner test, but the “and” in the test story hints at a more complex story than “get y for x”, so that might be ok. We could probably simplify the test even more if we got access to the internal trigger count and verify the reset directly.

I hope the example was clear enough. For me, the revelation that test problems more often than not have their root cause in production code is a clear message to improve my ability on writing code that facilitates testing instead of obstructing it.

I don’t shoot/omit my messengers anymore even if their message means more work for me.

It’s not a bug, it’s a missing test

Recently, I changed some wording when I talk about code and code problems. In my opinion, the new words have more correlation with the root cause of the problem, while the old words used to be “nearer” to the symptoms.

Let me explain the changes with a small code example that serves no other purpose than to contain a few problems:

public class InMemoryItemRepository {
	private final Map<String, Item> items;
	
	public InMemoryItemRepository() {
		this.items = new HashMap<>();
	}
	
	public Optional<Item> itemFor(String itemNumber) {
		return Optional.of(this.items.get(itemNumber));
	}
}

This class is a concrete implementation of an “item repository” that stores items by their “item number”. You can retrieve items by calling the itemFor method and giving it a valid itemNumber. Otherwise, you’ll end up with an empty Optional.

The first problem

The first problem in this code is a code smell. The data structure used to store the items is an HashMap. In Java, the HashMap implementation is not thread-safe:

Note that this implementation is not synchronized. If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be synchronized externally.

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/HashMap.html

With our current implementation, we leak this limitation onto our clients, which will be totally unaware, because nothing in the class design hints towards an HashMap. In most production environments, our in-memory implementation might even be swapped out with a database-based one. And the database implementation hopefully is thread-safe.

A code smell is a bug

My change in wording calls this type of code problem a bug (instead of a code smell). It might be possible that right now, this class is only used in a single-threaded manner and the implementation flaw doesn’t matter. In that case, it’s a bug in hibernation. Right now, it sleeps, but the day will come when it wakes up. That doesn’t change its bug-like features, just its immediate damage potential.

If you label a code smell as a bug, you highlight the damage potential instead of the current “nice to have” prioritization.

The second problem

Let’s return to the code example. There is another problem with this implementation: If you ask for an item number that is not stored in the repository, you don’t get an empty Optional, you’ll get a nice and unexpected NullPointerException.

A small unit test that asks for any item number on an empty repository reveals the problem:

	@Test
	void returnsEmptyIfNotStored() {
		var target = new InMemoryItemRepository();
		assertThat(
			target.itemFor("non-existent")
		).isEmpty();
	}

This test will run red with a NullPointerException, pointing to this line in the implementation:

		return Optional.of(this.items.get(itemNumber));

Of course, this is “just a typo” in the way we construct the Optional. Because our HashMap returns null if a given key is not stored in it, we need to call Optional.ofNullable() and have it convert null to Optional.empty:

		return Optional.ofNullable(this.items.get(itemNumber));

A bug is a missing test

This “little fix” makes our unit test pass and the repository useful. My change in wording calls every bug a “missing test”. This points out that the test you write as you fix the bug could have prevented the bug’s damage altogether.

If you were surprised by my assumption that you write tests when you fix bugs, please consider adopting this as a habit. It is the last possible moment to improve your test coverage at a place that has proven to be important enough to have tests and undertested enough to exhibit bugs. If you want your project to be antifragile (and there are good reasons why it should be), start by strengthening it at every place it breaks.

Tests for every code smell?

The combination of both changes would make every code smell a missing test. Right now, I’m not sure if this needs to be an automatism. The existence of a code smell hints towards a more fragile part of the system, but I’m not sure if “potential fragility” is a valid indicator for additional tests.

Of course, if you program completely test driven, these kind of thoughts probably seem pointless to you.

What are your thoughts on this topic and specifically on compulsory testing for every code smell?

Flexible React-Redux Hook Mocks in jest & React Testing Library

Best practices in mocking React components aren’t entirely unheard of, even in connection with a Redux state, and even not in connection with the quite convenient Hooks description ({ useSelector, useDispatch}).

So, of course the knowledge of a proper approach is at hand. And in many scenarios, it makes total sense to follow their principle of exactly arranging your Redux state in your test as you would in your real-world app.

Nevertheless, there are reasons why one wants to introduce a quick, non-overwhelming unit test of a particular component, e.g. when your system is in a state of high fluctuation because multiple parties are still converging on their interfaces and requirements; and a complete mock would be quite premature, more of a major distraction, less of being any help.

Proponents of strict TDD would now object, of course. Anyway – Fortunately, the combination of jest with React Testing Library is flexible enough to give you the tools to drill into any of your state-connected components without much knowledge of the rest of your React architecture*

(*) of course, these tests presume knowledge of your current Redux store structure. In the long run, I’d also consider this bad style, but during highly fluctuatig phases of develpment, I’d favour the explicit “this is how the store is intended to look” as safety by documentation.

On a basic test frame, I want to show you three things:

  1. Mocking useSelector in a way that allows for multiple calls
  2. Mocking useDispatch in a way that allows expecting a specific action creator to be called.
  3. Mocking useSelector in a way that allows for mocking a custom selector without its actual implementation

(Upcoming in a future blog post: Mocking useDispatch in a way to allow for async dispatch-chaining as known from Thunk / Redux Toolkit. But I’m still figuring out how to exactly do it…)

So consider your component e.g. as a simple as:

import {useDispatch, useSelector} from "react-redux";
import {importantAction} from "./place_where_these_are_defined";

const TargetComponent = () => {
    const dispatch = useDispatch();
    const simpleThing1 = useSelector(store => store.thing1);
    const simpleThing2 = useSelector(store => store.somewhere.thing2);

    return <>
        <div>{simpleThing1}</div>
        <div>{simpleThing2}</div>
        <button title={"button title!"} onClick={() => dispatch(importantAction())}>Do Important Action!</button>
    </>;
};

Multiple useSelector() calls

If we had a single call to useSelector, we’d be as easily done as making useSelector a jest.fn() with a mockReturnValue(). But we don’t want to constrain ourselves to that. So, what works, in our example, to actually construct a mockin store as plain object, and give our mocked useSelector a mockImplementation that applies its argument (which, as selector, is a function of the store)) to that store.

Note that for this simple example, I did not concern myself with useDispatch() that much. It just returns a dispatch function of () => {}, i.e. it won’t throw an error but also doesn’t do anything else.

import React from 'react';
import { render, screen, fireEvent } from '@testing-library/react';
import TargetComponent from './TargetComponent;
import * as reactRedux from 'react-redux';
import * as ourActions from './actions';

jest.mock("react-redux", () => ({
    useSelector: jest.fn(),
    useDispatch: jest.fn(),
}));

describe('Test TargetComponent', () => {

    beforeEach(() => {
        useDispatchMock.mockImplementation(() => () => {});
        useSelectorMock.mockImplementation(selector => selector(mockStore));
    })
    afterEach(() => {
        useDispatchMock.mockClear();
        useSelectorMock.mockClear();
    })

    const useSelectorMock = reactRedux.useSelector;
    const useDispatchMock = reactRedux.useDispatch;

    const mockStore = {
        thing1: 'this is thing1',
        somewhere: {
            thing2: 'and I am thing2!',
        }
    };

    it('shows thing1 and thing2', () => {
        render(<TargetComponent/>);
        expect(screen.getByText('this is thing1').toBeInTheDocument();
        expect(screen.getByText('and I am thing2!').toBeInTheDocument();
    });

});

This is surprisingly simple considering that one doesn’t find this example scattered all over the internet. If, for some reason, one would require more stuff from react-redux, you can always spread it in there,

jest.mock("react-redux", () => ({
    ...jest.requireActual("react-redux"),
    useSelector: jest.fn(),
    useDispatch: jest.fn(),
}));

but remember that in case you want to build full-fledged test suites, why not go the extra mile to construct your own Test store (cf. link above)? Let’s stay simple here.

Assert execution of a specific action

We don’t even have to change much to look for the call of a specific action. Remember, we presume that our action creator is doing the right thing already, for this example we just want to know that our button actually dispatches it. E.g. you could have connected that to various conditions, the button might be disabled or whatever, … so that could be less trivial than our example.

We just need to know how the original action creator looked like. In jest language, this is known as spying. We add the blue parts:

// ... next to the other imports...
import * as ourActions from './actions';



    //... and below this block
    const useSelectorMock = reactRedux.useSelector;
    const useDispatchMock = reactRedux.useDispatch;

    const importantAction = jest.spyOn(ourActions, 'importantAction');

    //...

    //... other tests...

    it('dispatches importantAction', () => {
        render(<TargetComponent/>);
        const button = screen.getByTitle("button title!"); // there are many ways to get the Button itself. i.e. screen.getByRole('button') if there is only one button, or in order to be really safe, with screen.getByTestId() and the data-testid="..." attribute.
        fireEvent.click(button);
        expect(importantAction).toHaveBeenCalled();
    });

That’s basically it. Remember, that we really disfigured our dispatch() function. What we can not do this way, is a form of

// arrangement
const mockDispatch = jest.fn();
useDispatchMock.mockImplementation(() => mockDispatch);

// test case:
expect(mockDispatch).toHaveBeenCalledWith(importantAction()); // won't work

Because even if we get a mocked version of dispatch() that way, the spyed-on importantAction() call is not the same as the one that happened inside render(). So again. In our limited sense, we just don’t do it. Dispatch() doesn’t do anything, importantAction just gets called once inside.

Mock a custom selector

Consider now that there are custom selectors which we don’t care about much, we just need them to not throw any error. I.e. below the definition of simpleThing2, this could look like

import {useDispatch, useSelector} from "react-redux";
import {importantAction, ourSuperComplexCustomSelector} from "./place_where_these_are_defined";

const TargetComponent = () => {
    const dispatch = useDispatch();
    const simpleThing1 = useSelector(store => store.thing1);
    const simpleThing2 = useSelector(store => store.somewhere.thing2);
    const complexThing = useSelector(ourSuperComplexCustomSelector);
    
    //... whathever you want to do with it....
};

Here, we want to keep it open how exactly complexThing is gained. This selector is considered to already be tested in its own unit test, we just want its value to not-fail and we can really do it like this, blue parts added / changed:

import React from 'react';
import { render, screen, fireEvent } from '@testing-library/react';
import TargetComponent from './TargetComponent;
import * as reactRedux from 'react-redux';
import * as ourActions from './actions';
import {ourSuperComplexCustomSelector} from "./place_where_these_are_defined";

jest.mock("react-redux", () => ({
    useSelector: jest.fn(),
    useDispatch: jest.fn(),
}));

const mockSelectors = (selector, store) => {
    if (selector === ourSuperComplexCustomSelector) {
        return true;  // or what we want to 
    }
    return selector(store);
}

describe('Test TargetComponent', () => {

    beforeEach(() => {
        useDispatchMock.mockImplementation(() => () => {});
        useSelectorMock.mockImplementation(selector => mockSelectors(selector, mockStore));
    })
    afterEach(() => {
        useDispatchMock.mockClear();
        useSelectorMock.mockClear();
    })

    const useSelectorMock = reactRedux.useSelector;
    const useDispatchMock = reactRedux.useDispatch;

    const mockStore = {
        thing1: 'this is thing1',
        somewhere: {
            thing2: 'and I am thing2!',
        }
    };

    // ... rest stays as it is
});

This wasn’t as obvious to me as you never know what jest is doing behind the scenes. But indeed, you don’t have to spy on anything for this simple test, there is really functional identity of ourSuperComplexCustomSelector inside the TargetComponent and the argument of useSelector.

So, yeah.

The combination of jest with React Testing Library is obviously quite flexible in allowing you to choose what you actually want to test. This was good news for me, as testing frameworks in general might try to impose their opinions on your style, which isn’t always bad – but in a highly changing environment as is anything that involves React and Redux, sometimes you just want to have a very basic test case in order to concern yourself with other stuff.

So, without wanting that you lower your style to such basic constructs, I hope this was of some insight for you. In a more production-ready state, I would still go the way as that krawaller.se blog post state above, it makes sense. I was just here to get you started 😉

Last night, an end-to-end test saved my live

but not with a song, only with an assertion failure. Okay, I’ll admit, this is not the only hyperbole in the blog title. Let’s set things right: The story didn’t happen last night, it didn’t even happen at night (because I’m not working night hours). In fact, it was a well-lit sunny day. And the test didn’t save my live. It did save me from some embarrassment, though. And spared a customer some trouble and hotfixing sessions with me.

The true parts of the blog title are: An end-to-end test reported an assertion failure that helped me to fix a bug before it got released. In other words: An end-to-end test did its job and I felt fine. And that’s enough of the obscure song references from the 1980s, hopefully.

Let me explain the setting. We develop a big system that runs in production 24/7-style and is perpertually adjusted and improved. The system should run unattended and only require human attention when a real incident happens, either in the data or the system itself. If you look at the system from space, it looks like this:

A data source (in fact, lots of data sources) occasionally transmits data to the system that gets transformed and sent to a third-party system that does all kind of elaborate analysis with it. Of course, there is more to the real system than that, but for our story, it is sufficient to know that there are several third-party systems, each with their own data formats. We’ll call them third-party system A (TPS-A) and TPS-B in this story.

Our system is secured by lots and lots of unit tests, a good number of integration tests and a handful of end-to-end tests. All tests run green all the time. The end-to-end tests (E2ET) try to replicate the production environment as close as possible by running on the target operating system and using data transfer means like in the real case. The test boots up the complete system and simulates the data source and the third-party systems. By configuring the system so that it transfers back to the E2ET, we can write assertions in the kind of “given a specific input data, we expect this particular output data from the system, however it chooses to produce it”. This kind of broad-brush test is invalueable because it doesn’t care for specifics in the system. It only cares for output from the system.

This kind of E2ET is also difficult to write, complicated to run, prone to brittleness and obscure in its failure statement. If you write a three-line unit test, it is straight-forward, fast and probably pretty specific when it breaks: “This one thing that I’m testing is wrong now”. An E2ET as described here is like the Delphi Oracle:

  • E2ET: “Somewhere in your system, something doesn’t work the way I like it anymore.”
  • Developer: “Can you be more specific?”
  • E2ET: “No, but here are 2000 lines of debug output that might help you on your quest for the cause.”
  • Developer: “But I didn’t change anything.”
  • E2ET: “Oh, in that case, it could as well be the weather or a slow network. Mind if I run again?”

This kind of E2ET is also rather slow and takes its sweet time resetting the workspace state, booting the system again and using prolonged timeouts for every step. So you wait several minutes for the E2ET to fail again:

  • E2ET: “Yup, there is still something fishy with your current code.”
  • Developer: “It’s the same code you let pass yesterday.”
  • E2ET: “Well, today I don’t like it!”
  • Developer: “Okay, let me dive into the debug output, then.”

Ten to twenty minutes pass.

  • Developer: “Is it possible that you just choke on a non-free network port?”
  • E2ET: “Yes, I’m supposed to wait for the system’s output on this port and cannot bind to it.”
  • Developer: “You cannot bind to it because an instance of yourself is stuck in the background and holding onto the port.”
  • E2ET: “See? I’ve told you there is something wrong with your system!”
  • Developer: “This isn’t a problem with the system’s code. This is a problem with your code.”
  • E2ET: “Hey! Don’t blame me! I’m here to help. You can always chose to ignore or delete me if I’m too much of a burden to you.”

This kind of E2ET cries wolf lots of times for problems in the test code itself, for too optimistic timeouts or just oddities in the environment. If such an E2ET fails, it’s common practice to run it again to see if the problem persists.

How many false alarms can you tolerate before you stop listing?

In my case, I choose to reduce the amount of E2ET to the essential minimum and listen to them every time. If the test raises a false alarm, invest the time to come up with a way to make it more robust without reducing its sensitivity. But listen to the test every time.

Because, in my story, the test insisted that third-party system A and B both didn’t receive half of their data. I could rule out stray effects in the network or on the harddisk pretty fast, because the other half of the data arrived just fine. So I invested considerable time in understanding the debug output. This led me nowhere – it seemed that the missing data wasn’t sent to the system in the first place. Checking the E2ET disproved this hypothesis. The test sent as much data to the system as ever. But half of it went missing underway. Now I reviewed all changes I made in the last few days. Some of them affected the data export to the TPS. But all unit and integration tests in this area insisted that everything worked as intended.

It is important to have this kind of “multiple witnesses”. Unit tests are like traces in a criminal investigation: They report one very specific information about one very specific part of the code and are only useful in larger numbers. Integration tests testify on the possibility to combine several code parts. In my case, this meant that the unit tests guaranteed that all parts involved in the data export work correct on their own (in isolation). The integration tests vouched that, given the right combination of data export parts, they will work as intended.

And this left one area of the code as the main suspect: Something in the combination of export parts must go wrong in the real case. The code that wires the parts together is the only code not tested by unit and integration tests, but E2ET. Both other test types base their work on the hypothesis of “given that everything is wired together like this”. If this hypothesis doesn’t hold true in the real case, no test but the E2ET finds a problem, because the E2ET bases on another hypothesis: “given that the system is started for real”.

I went through all the changes of the wiring code and there it was: A glorious TODO comment, written by myself on a late friday afternoon, stating: “TODO: Register format X export again after you’ve finished issue Y”. I totally forgot about it over the weekend. I only added the comment into the code, not as an issue comment. Friday afternoon me wasn’t cooperative with future or current me. In effect, I totally played myself. I don’t remember removing the registering code or writing the comment. It probably was a minor side change for an unimportant aspect of the main issue. The whole thing was surely done in a few seconds and promptly forgotten. And the only guardian that caught it was the E2ET, using the most nonspecific words for it (“somewhere, something went wrong”).

If I had chosen to ignore or modify the E2ET, the bug would have made it to production. It would have caused a loss of current data for the TPS. Maybe they would have detected it. But maybe, they would have just chosen to operate on the most recent data – data that doesn’t get updated anymore. In short, we don’t want to find out.

So what is the moral of the story?
Always have an end-to-end test for your most important functionality in place. Let it deal with your system from the outside. And listen to it, regardless of the effort it takes.

And what can you do if your E2ET has saved you?

  • Give your test a medal. No, seriously, leave a comment in the test code that it was helpful.
  • Write an unit or integration test that replicates the specific cause. You want to know immediately if you make the same error again in the future. Your E2ET is your last line of defense. Don’t let it be your only one. You’ve been shown a weakness in your test setup, remediate it.
  • Blame past you for everything. Assure future you that you are a better developer than this.
  • Feel good that you were able to neutralize your faux pas in time. That’s impressive!

When was the last time a test has saved you? Leave the story or a link to it in the comments. I can’t be the only one.

Oh, and if you find the random links to 80s music videos weird: At least I didn’t rickroll you. The song itself is from 1987 and would fit my selection.

Integrating catch2 with CMake and Jenkins

A few years back, we posted an article on how to get CMake, googletest and jenkins to play nicely with each other. Since then, Phil Nash’s catch testing library has emerged as arguably the most popular thing to write your C++ tests in. I’m going to show how to setup a small sample project that integrates catch2, CMake and Jenkins nicely.

Project structure

Here is the project structure we will be using in our example. It is a simple library that implements left-pad: A utility function to expand a string to a minimum length by adding a filler character to the left.

├── CMakeLists.txt
├── source
│   ├── CMakeLists.txt
│   ├── string_utils.cpp
│   └── string_utils.h
├── externals
│   └── catch2
│       └── catch.hpp
└── tests
    ├── CMakeLists.txt
    ├── main.cpp
    └── string_utils.test.cpp

As you can see, the code is organized in three subfolders: source, externals and tests. source contains your production code. In a real world scenario, you’d probably have a couple of libraries and executables in additional subfolders in this folder.

The source folder

set(TARGET_NAME string_utils)

add_library(${TARGET_NAME}
  string_utils.cpp
  string_utils.h)

target_include_directories(${TARGET_NAME}
  INTERFACE ./)

install(TARGETS ${TARGET_NAME}
  ARCHIVE DESTINATION lib/)

The library is added to the install target because that’s what we typically do with our artifacts.

I use externals as a place for libraries that go into the projects VCS. In this case, that is just the catch2 single-header distribution.

The tests folder

I typically mirror the filename and path of the unit under test and add some extra tag, in this case the .test. You should really not need headers here. The corresponding CMakeLists.txt looks like this:

set(UNIT_TEST_LIST
  string_utils)

foreach(NAME IN LISTS UNIT_TEST_LIST)
  list(APPEND UNIT_TEST_SOURCE_LIST
    ${NAME}.test.cpp)
endforeach()

set(TARGET_NAME tests)

add_executable(${TARGET_NAME}
  main.cpp
  ${UNIT_TEST_SOURCE_LIST})

target_link_libraries(${TARGET_NAME}
  PUBLIC string_utils)

target_include_directories(${TARGET_NAME}
  PUBLIC ../externals/catch2/)

add_test(
  NAME ${TARGET_NAME}
  COMMAND ${TARGET_NAME} -o report.xml -r junit)

The list and the loop help me to list the tests without duplicating the .test tag everywhere. Note that there’s also a main.cpp included which only defines the catch’s main function:

#define CATCH_CONFIG_MAIN
#include <catch.hpp>

The add_test call at the bottom tells CTest (CMake’s bundled test-runner) how to run catch. The “-o” switch commands catch to direct its output to a file, report.xml. The “-r” switch sets the report mode to JUnit format. We will need both to integrate with Jenkins.

The top-level folder

The CMakeLists.txt in the top-level folder needs to call enable_testing() for our setup. Other than that, it just directs to the subfolders via add_subdirectory().

Jenkins

Now all that is needed is to setup Jenkins accordingly. Setup jenkins to get your code, add a “CMake Build” build-step. Hit “Add build tool invocation” and check “Use cmake” to let cmake handle the invocation of your build tool (e.g. make). You also specify the target here, which is typically “install” or “package” via the “–target” switch.

Now you add another step that runs the tests via CTest. Add another Build Step, this time “CMake/CPack/CTest Execution” and pick CTest. The one quirk with this is that it will let the build fail when CTest returns a non-zero exit code – which it does when any tests fail. Usually, you want the build to become unstable and not failed if that happens. Hence set “1-65535” in the “Ignore exit codes” input.

The final step is to let jenkins use the report.xml that we had CTest generate so it can generate the test result charts and tables. To do that, add the post-build action: “Publish JUnit test result report” and point it to tests/report.xml.

Done!

That’s it. Now you got your CI running nice catch tests. The code for this example is available on our github.

Functional tests for Grails with Geb and geckodriver

Previously we had many functional tests using the selenium-rc plugin for Grails. Many were initially recorded using Selenium IDE, then refactored to be more maintainable. These refactorings introduced “driver” objects used to interact with common elements on the pages and runners which improved the API for walking through a multipage process.

Selenium-rc got deprecated quite a while ago and support for Firefox broke every once in a while. Finally we were forced to migrate to the current state-of-the-art in Grails functional testing: Geb.

Generally I can say it is really a major improvement over the old selenium API. The page concept is similar to our own drivers with some nice features:

  • At-Checkers provide a standardized way of checking if we are at the expected page
  • Default and custom per page timeouts using atCheckWaiting
  • Specification of relevant content elements using a JQuery-like syntax and support for CSS-selectors
  • The so-called modules ease the interaction with form elements and the like
  • Much better error messages

While Geb is a real improvement over selenium it comes with some quirks. Here are some advice that may help you in successfully using geb in the context of your (grails) webapplication.

Cross-plattform testing in Grails

Geb (or more specifically the underlying webdriver component) requires a geckodriver-binary to work correctly with Firefox. This binary is naturally platform-dependent. We have a setup with mostly Windows machines for the developers and Linux build slaves and target systems. So we need binaries for all required platforms and have to configure them accordingly. We have simply put them into a folder in our project and added following configuration to the test-environment in Config.groovy:

environments {
  test {
    def basedir = new File(new File('.', 'infrastructure'), 'testing')
    def geckodriver = 'geckodriver'
    if (System.properties['os.name'].toLowerCase().contains('windows')) {
      geckodriver += '.exe'
    }
    System.setProperty('webdriver.gecko.driver', new File(basedir, geckodriver).canonicalPath)
  }
}

Problems with File-Uploads

If you are plagued with file uploads not working it may be a Problem with certain Firefox versions. Even though the fix has landed in Firefox 56 I want to add the workaround if you still experience problems. Add The following to your GebConfig.grooy:

driver = {
  FirefoxProfile profile = new FirefoxProfile()
  // Workaround for issue https://github.com/mozilla/geckodriver/issues/858
  profile.setPreference('dom.file.createInChild', true)
  new FirefoxDriver(profile)
}

Minor drawbacks

While the Geb-DSL is quite readable and allows concise tests the IDE-support is not there. You do not get much code assist when writing the tests and calling functions of the page objects like in our own, code based solution.

Conclusion

After taking the first few hurdles writing new functional tests with Geb really feels good and is a breeze compared to our old selenium tests. Converting them will be a lot work and only happen on a case-by-case basis but our coverage with Geb is ever increasing.