testing – Schneide Blog

An Indicator That You’re Leaving the Realm of Unit Tests

Automated unit tests are the grassroot foundation of a healthy test suite. But they aren’t the only type of automated tests that we need to write in order to test a system thoroughly enough to be confident about its production readiness.

There are things like end-to-end or even GUI based tests that have completely different testing mechanics that unit tests. It is clear just from looking at the test code that they aren’t unit tests.

But for the wide range of integration tests, there is a subtle and nearly impercetible transition from unit test to integration test that is hard to explain. It doesn’t really matter on which side of the diving line between the two test types you are as long as you are close to it. But as tests evolve, you need to apply different advancement strategies to the different types of tests. One goal is to keep unit tests from becoming integration tests over time, which is prevalent when factoring out system parts that were small at first.

When things are hard to explain, we search for indicators that can serve as objective counselors and help with making the decision. For the distinction between unit and integration tests, one such indicator is the distance between motor point and reaction point. Let me explain the concepts:

Let’s pretend we need to test the implementation of a baker (or a baking machine):

			
@Test
void can_produce_bread() {
	Baker target = new Baker();
	
	Bread actual = target.bake();
	
	Bread expected = new Bread();
	assertEquals(
		expected,
		actual
	);
}

		

This is a straight-forward unit test in the AAA (arrange, act, assert) structure:

Arrange: We build the “test world” or the slice of the system that should be tested. We call it the “target” (some call it the “cut”, from “code under test”, which corresponds nicely with the “slice of the system”).
The target contains the motor point, the specific entry point where the code under test is “irritated” by calling a method. It is this irritation that causes the code under test to exhibit a certain behaviour that produces an observable result. The point where this result can be observed is the reaction point.
Act: We enable the motor point by calling the bake() method on our target baker. The code under test works its magic and gives us the result, which we call “actual”. The return value of the bake() method is the reaction point. It has two roles in the context of our test:
1. It provides the observable result of the code under test.
2. It serves as the last step of the code under test. The test framework leaves the code under test by returning the result. The exit point and the reaction point of the code under test are at the same spot (the distance between them is zero).
Assert: We compare the actual result of the code under test with our expected result. In our case, that’s a bit silly because we just want to have a bread, without any further attributes to it. But this blog post is not about the art of assertion, so we keep it simple and silly.

Let’s review the positions of the three named points:

If you read from top to bottom and left to right, the reaction point seems to be placed before the motor point. If you read it like a programmer should, you see that the point are positioned in their execution order: motor point, exit point, reaction point.

You also see that the distance between the points is very small and in the case of exit and reaction point only distinguishable if you look very closely.

That’s the indicator for writing an unit test: If your entrance to the code under test (the motor point) is effectively the same position as your exit from the code under test and the place where you get your actual result (the reaction point), you are unequivocally writing an unit test.

If the distances between the three points get larger, you are drifting away from unit tests and entering the big realm of integration tests. That is not necessarily a bad thing, sometimes it’s a necessity, but it should be a deliberate decision on your part and not an unnoticed accident.

Let’s look at an example where the distances between the points are larger:

			
@Test
void can_sell_prepared_goods() {
	Baker given = new Baker();
	Bakery target = new Bakery(
		given
	);
	
	target.prepareGoods(1);
	
	assertEquals(
		Optional.of(new Bread()),
		target.sell()
	);
	assertEquals(
		Optional.empty(),
		target.sell()
	);
}

		

In this case, our baker now owns his own bakery where he can sell his breads to make a living. But baking breads “just in time” a customer requests one is not a sustainable business model, so the bakery has to prepare in advance and sell from the supply.

To test that we can fill up the supply and it gets emptied correctly, this test (in combination with other tests not shown here) does the AAA structure again:

We arrange our test world by inventing a baker and giving him to the bakery, which is the target in our case. We want to test the functionality of the bakery and a baker is required to do so. We already asserted that the baker knows his trade.

Then we act on our target. This is the motor point moment: We call the code under test to elicit a behaviour. But as you can see, we don’t receive a result right away. The effect seems to happen internally and we need to observe it from a different angle. Our reaction point has moved away from the motor point. And we have several exit points on our test journey. This is getting complicated!

In order to assert that the bakery’s supply holds one bread when told to prepare only one, we just buy two breads consecutively and see what happens. If there is only one bread in supply, we should get a bread the first time and nothing for our second purchase. The reaction point is now the sell() method, a good distance away from the prepareGoods() method we used as the motor point. Both points are (hopefully) connected by internal machinery in the bakery. We don’t want to assert the internal machinery, we want to assert its outcome. This requires the distance between motor point (“pressing a button up here”) and reaction point (“getting a product down here”).

You might argue that this example is still an unit test and I would agree. But we already see mechanics that occur predominantly in integration tests:

Elaborate arrange steps
act step without a return value (“actual” is missing)
Multiple assertions, telling a story with their order

When you imagine that the breads need to be of different kinds (dark bread, wholemeal bread, the whole german bread culture), you can probably see how the small unit test we just wrote kind of explodes with secondary complexity.

A realiable indicator that an automated test is going to be complicated is the distance between motor point and reaction point. Once you know about the concept, you can incorporate it into your testing intuition.

I hope it helps you write better tests or write good tests more deliberately. If you have thoughts about the concept, share them in a comment!

Don’t shoot your messengers

Writing small, focused tests, often called unit tests, is one of the things that look easy at the outset but turn out to be more delicate than anticipated. Writing a three-lines-of-code unit test in the triple-A structure soon became second nature to me, but there were lots of cases that resisted easy testing.

Using mock objects is the typical next step to accommodate this resistance and make the test code more complex. This leads to 5 to 10 lines of test code for easy mock-based tests and up to thirty or even fifty lines of test code where a lot of moving parts are mocked and chained together to test one single method.

So, the first reaction for a more complicated testing scenario is to make the test more complicated.

But even with the powerful combination of mock objects and dependency injection, there are situations where writing suitable tests seems impossible. In the past, I regarded these code blocks as “untestable” and omitted the tests because their economic viability seemed debatable.

I wrote small tests for easy code, long tests for complicated code and no tests for defiant code. The problem always seemed to be the tests that just didn’t cut it.

Until I could recognize my approach in a new light: I was encumbering the messenger. If the message was too harsh, I would outright shoot him.

The tests tried to tell me something about my production code. But I always saw the problem with them, not the code.

Today, I can see that the tests I never wrote because the “test story” at hand was too complicated for my abilities were already telling me something important.

The test you decide not to write because it’s too much of a hassle tells you that your code structure needs improvement. They already deliver their message to you, even before they exist.

With this insight, I can oftentimes fix the problem where it is caused: In the production code. The test coverage increases and the tests become simpler.

Let’s look at a small example that tries to show the line of thinking without being too extensive:

We developed a class in Java that represents a counter that gets triggered and imposes a wait period on every tenth trigger impulse:

public class CountAndWait {
	private int triggered;
	
	public CountAndWait() {
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}
			this.triggered = 0;
		}
	}
}

There is a lot going on in the code for such a simple functionality. Especially the try-catch block catches my eye and makes me worried when thinking about tests. Why is it even there? Well, here is a starter link for an explanation.

But even without advanced threading issues, the normal functionality of our code is worrisome enough. How many lines of code will a test contain that covers the sleep? Should I really use a loop in my test code? Will the test really have a runtime of one second? That’s the same amount of time several hundred other unit tests require for much more coverage. Is this an economically sound testing approach?

The test doesn’t even exist and already sends a message: Your production code should be structured differently. If you focus on the “test story”, perhaps a better structure emerges?

The “story of the test” is the description of the production code path that is covered and asserted by the test. In our example, I want the story to be:

“When a counter object is triggered for the tenth time, it should impose a wait. Afterwards, the cycle should repeat.”

Nothing in the story of this test talks about interruption or exceptions, so if this code gets in the way, I should restructure it to eliminate it from my story. The new production code might look like this:

public class CountAndWait {
	private final Runnable waiting;
	private int triggered;
	
	public static CountAndWait forOneSecond() {
		return new CountAndWait(() -> {
			try {
				Thread.sleep(1000L);
			} catch (InterruptedException e) {
				Thread.currentThread().interrupt();
			}			
		});
	}
	
	public CountAndWait(Runnable waiting) {
		this.waiting = waiting;
		this.triggered = 0;
	}
	
	public void trigger() {
		this.triggered++;
		if (this.triggered == 10) {
			this.waiting.run();
			this.triggered = 0;
		}
	}
}

That’s a lot more code than before, but we can concentrate on the latter half. We can now inject a mock object that attests to how often it was run. This mock object doesn’t need to sleep for any amount of time, so the unit test is fast again.

Instead of making the test more complex, we introduced additional structure (and complexity) into the production code. The resulting unit test is rather easy to write:

class CountAndWaitTest {
	@Test
	@DisplayName("Waits after 10 triggers and resets")
	void wait_after_10_triggers_and_reset() {
		Runnable simulatedWait = mock(Runnable.class);
		CountAndWait target = new CountAndWait(simulatedWait);
		
		// no wait for the first 9 triggers
		Repeat.times(9).call(target::trigger);
		verifyNoInteractions(simulatedWait);
		
		// wait at the 10th trigger
		target.trigger();
		verify(simulatedWait, times(1)).run();
		
		// reset happened, no wait for another 9 triggers
		Repeat.times(9).call(target::trigger);
		verify(simulatedWait, times(1)).run();
	}
}

It’s still different from a simple 3-liner test, but the “and” in the test story hints at a more complex story than “get y for x”, so that might be ok. We could probably simplify the test even more if we got access to the internal trigger count and verify the reset directly.

I hope the example was clear enough. For me, the revelation that test problems more often than not have their root cause in production code is a clear message to improve my ability on writing code that facilitates testing instead of obstructing it.

I don’t shoot/omit my messengers anymore even if their message means more work for me.

The ALARA principle in software engineering

The ALARA principle originates in radiation protection and means “As Low As Reasonably Achievable”. It means that you have to weigh the purpose of an action dealing with radiation against its disadvantages, like radiation damage or long-term risks. The word “reasonably” means that while some disadvantages are not avoidable, a practicable amount of protection should be in place to lower them. The principle calls for a balancing act: Not without safety measures, but don’t overextend your means by trying to achieve a safety level that isn’t helpful anymore.
To put the ALARA principle in practice with an example: You shouldn’t need a X-ray every time you go to the dentist, but given enough time since the last one and reasonable doubt about a tooth, the X-ray examination will benefit your dental (and overall) health more than if you deny it. It isn’t healthy by itself, but the information gained by it will be used to improve your health.

I learnt about the ALARA principle when my father (a nuclear physicist by heart) explained it to me in context of the current corona pandemic: Use protection like face masks and distance, but don’t stress yourself too much over that one time when you grabbed a pen in the postal office. While preparation and watchfulness is helpful, fear is detrimental to your mental health. And even the most resilient mind has bounded resources that can better be spent on constructive things instead of fear.

A fun fact from radiation protection is that at least three of the four main rules of protection can be applied to corona, too:

Distance yourself from the radiation source
Use appropriate protection gear
Avoid incorporation (keep the thing outside your body)
Limit your exposure time (this doesn’t fit as nicely, because the virus is probably not cumulative)

But how can we apply the ALARA principle to software engineering? I was instantly reminded about the “Thorough” rule of unit testing. In the book “Pragmatic Unit Testing” by Andy Hunt and Dave Thomas, the two original Pragmatic Programmers, good unit tests have to follow the ATRIP-rules. The T stands for “Thorough” which is often misinterpreted as “test everything at least twice”. In reality, the rule states that:

all mission critical functionality needs to be tested
for every occuring bug, there needs to be an additional test that ensures that the bug cannot happen again

The first thing that meets the eye is that the rule doesn’t define a bug as a failure of your testing effort. It takes a bug that probably happened in production and caused some damage as a motivation to strengthen your test coverage in that particular area. The second part of the rule calls for directed, well-aimed testing effort. It is easy to follow because it has a clear trigger: A bug happened, now you have to write a test.

The first part of the rule is more complicated: What is mission critical functionality? And what means “is tested thoroughly” in this context? And here, the ALARA principle can help us. The bug rate in the important parts of your code should be as low as reasonably achievable. “Reasonably achievable” is defined by the resources at your disposal (like time to market), your expertise in testing and the potential damage that could happen if something in your code goes wrong.

If the potential damage is high or even life-threatening, your reasonable effort should be much higher than if the most critical thing that happens is a 15 minute downtime while you restart the server. There are use cases where even 15 minutes mean subsequent damage, but most software is written for a more relaxed context.

I’ve always found the “Thorough” rule of good unit tests pleasant and comforting: If you made reasonable effort to test your most important code and write a test for every bug you or your users encounter, you can say that your bug rate is ALARA – “As Low As Reasonably Achievable”. And that is good enough for most cases.

What was your first thought when you heard about the ALARA principle? Tell us in the comment section!

How the most interesting IT debate is revealing our values as software developers

TDD is dead. Is TDD dead? A question that seems to divide our profession. What does this debate have to do with you?

TDD is dead. Is TDD dead? A question that seems to divide our profession.
On the one side: developers which write their tests first and let them drive their code. They prefer the mockist approach to testing. Code should be tested in isolation, under lab like circumstances. Clean code is their book. Practices and principles guide their thinking. An application should not be bound to frameworks and have a hexagonal architecture. The GOOS book showed how it can be done.
On the other side: developers which focus on readability and clarity. They use their experience and gut to drive their decisions. Because of past experiences they test their the code the classical way. They are pragmatic. Practices and principles are used when they improve the understanding of the code. Code is there to be refactored. Just like a gardener trims bushes and a writer edits his prose they work with their code.

What are your values?

What does this debate have to do with you?

Ask yourself:
What if you could write a proof of your program costing 10 or just 5 times as much as the implementation? It would prove your code would work correctly under all possible circumstances. Would you do it?

Or would you rather improve the existing architecture, design or clarity of your code? So that you remove technical debt and are better positioned for future changes.

Or would you write new features and improve your application for the people using it?

What are your values?

History

At the beginnings of my developer life in the late 80s/early 90s I remember that the industry was focussed on one goal: code reuse. Modules, components, libraries, frameworks were introduced. Then patterns came. All of that was working towards one side of the equation: low coupling.
High cohesion was neglected in pursuit of a noble goal. But what happened? The imbalance produced layer after layer, indirection after indirection, over-separation and over-abstraction. You had to deal with dependency injection (containers), configuration, class hierarchies, interfaces, event buses, callbacks, … just to understand a hello world.
Today we have more computing power and are solving more and more complex things. We think in higher abstractions. Much more people benefit from our skills and our works.
On the user facing side design focusses on simplicity and usability. Even complex relationships can be made understandable and manageable. A wise man once said: design is about intent.
The same with code: Code is about intent. Intent should be the measure of the quality of our code. Not testability, not coupling: intent. If the code (and this includes the code comments) would reveal its intent, you could fix bugs in it, improve it, change it, refactor it. Tests would be your safety net to ensure you are not breaking your intent.
You might say: but this is what TDD is all about! But I think we got it all backwards. The code and its intention revealing nature is more important than the tests. The tests support. But tests should never replace or even harm the clarity of the code.
The quality of the code is important. But most important are the people using your application.
My goal is to delight the people who use my software and my way there is writing intention revealing software. I am not there and I am learning every day but I take step after step.

What are your values?

Should I test this?

Writing software is hard, writing correct software is even harder. So everything that helps you writing better or more correct software should be used to your advantage. But does every test help? And does every code to be automatically tested? How do I decide what to test and how?
Given a typical web CRUD application, take a look at the following piece of functionality:
We have a model class Element which has a Type type:

class Element {
  ...
  Type type
  ...
}

The view contains a select tag which lets you choose a type:

...
<g:select name="filterByTypeId" from="${types}" value="${filterByType?.id}">
...

And finally in the controller we filter the list of shown elements via the selected type:

...
Type filterByType = Type.get(params['filterByTypeId'])
return [elements: filterByType ? Element.findAllByType(filterByType) : Element.list(), types: Type.list(), filterByType: filterByType]
...

Now ask yourself: would you write an automatic test for this? A functional / acceptance or some unit / integration tests? Would you really test this automatically or just by hand? And how do you decide this?

Dogma

According to TDD you should test everything, there does not exist any code without a test (first). If you really live by TDD the choice is already made: you test this code. But is this pragmatic? Effecient? Productive? And what about the aspects you forgot to test? The order of the types for example. The user wanted to list them lexicographically or by a priority or numbered. What if this part changes and your test is so coupled that you need to change it, too. There are some TDD enthusiasts out there but if you are more pragmatic there are other criteria to help you decide.

Cost

If you look at the code in question and think: how much effort is it to create the test(s)? Or to run the test? If the feedback cycle is too long you lose track of it. I need a test for the controller, this is the easy part. Then I need to test that the view passes the correct parameter and accepts and shows the correct list.
I also can write an acceptance test but this seems like a big gun for a small bird. In our case it heavily depends on the framework how easy or difficult and costly it is to write tests for our filter. What do you have to mock or to simulate? You also have to take the hidden costs into account: how much does it cost to maintain this test? When the requirement changes? When there are more filter criteria? Or if an element can have more than one type?

Value

Another question you can ask is: what is the value for the customer? How much does he need it to work? What is the cost of an error? What happens when the code in question does not work? The value for the customer is not only determined by the functionality it provides. Software can be seen as giving your users capabilities, to enable them. The capability is implemented by two things: implementation (your functionality) and affordance (the UI). The value is determined by both parts. So you hardly can decide on the value of a functionality alone. What if you need to change the UI (in our case the select tag) to increase the value? How does this effect your tests? Does the user reach his goal if the functionality part is broken? What is when the code is correct but it is slow? Or the UI isn’t visible on your user’s screen?

Personal / Team profile

You could decide what and if to test by looking at your past: your personal or team mistakes. Typical problems and bugs you made. Habits you have. You could test more when the (business or technical) domain or the underlying technology is new for you. You could write only few tests when you know the area you work in but more when it is unknown and you need to explore it. You can write more tests if you work in a dynamic language and few in a static language. Or vice versa.

Area / Type of code

You can write tests for every bug you find to prevent regression. You could write tests only for algorithms or data structures. For certain core parts or for interaction with other systems. Or only for (public) interfaces. The area or type of code can help you decide if to test or not.

Visibility

Also you could take a look at how easy it is to spot a bug when manually invoke the code. Do you or your user see the bug immediately? Is it hidden? In our case you should easily see when the list is not filtered or filtered by the wrong criteria. But what if it is just a rounding error or an error where cause and effect is separated by time or location?

Conclusion

Do you have or use additional criteria? How do you decide? I have to admit that I didn’t and I wouldn’t test the above code because I can easily spot problems in the code and try it out by hand if it works (visibility). If the code grows more complex and I cannot easily see the problem (again visibility) or the value (or cost of an error) for the customer is high I would write one.

Summary of the Schneide Dev Brunch at 2013-03-03

If you couldn’t attend the Schneide Dev Brunch in March 2013, here are the main topics we discussed summarized as good as I remember them.

Yes, you’ve read it right in the title. The Dev Brunch I want to summarize now is over two month ago. The long delay can only partially be explained by several prolonged periods of illness on my side. So this will be a rather crisp summary, because all the lively details have probably vanished by now. But let me start by explaining what the Dev Brunch is:
The Dev Brunch is a regular brunch on a sunday, only that all attendees want to talk about software development and various other topics. If you bring a software-related topic along with your food, everyone has something to share. This brunch was very well-attended, but we still managed to sit around our main table. Let’s have a look at the main topics we discussed:

XFD presentation

In a presentation of a large german software company, our Extreme Feedback Devices were thoroughly mentioned. We found it noteworthy enough to mention it here.

Industrial Logic’s XP Playing Cards

This is just a deck of playing cards, but not the usual one. One hundred different cards with problems, solutions and values wait for you to make up some game rules and start to play. The inventors have collected a list of possible games on their website. It leads to hilarious results if you just distribute some cards in a group of developers (as we did on the brunch) and start with a problem. Soon enough, your discussion will lead you to the most unexpected topics. We ended with the “Power Distance Index“, but I have no recollection how we got there. These cards are a great facilitator to start technical discussions. They seem to be non-available now, sadly.

Distributed SCRUM

A short report on applying SCRUM to a multi-site team, using desktop sharing and video chat software. The project landscape is driven by an adaption of “scrum of scrums”. I cannot dive into details anymore, but these reports are a great reason to really attend the brunch instead of just reading the summary. The video chat meetings were crucial for team-building, but very time-consuming and wearying due to timezone reasons.

SCRUM User Group Karlsruhe

Speaking of SCRUM, there is a SCRUM User Group in our city, Karlsruhe in Germany. It might not be the biggest user group ever, but one attendant of our brunch reported that all participants are “socially very pleasing”. There are very interesting presentations or gatherings for specific topics. If you have to deal with SCRUM, this should be on your agends.

Retrospectives

We had a prolonged talk about retrospectives and how to apply them. Most retrospective activities tend to be formalized (like “cards and priorities”) and lose effectiveness due to the “comfort aspect”. A hypothesis during the talks was that when moderation isn’t necessary anymore, its more likely to be a negative smell. We talked about moderated vs. non-moderated retrospectives quite a bit, also exploring the question what role should/could be moderator and why. The “Happiness Metric” was mentioned, specifically its application by the swedish company Crisp, as described by Henrik Kniberg. Some sources of ideas for retrospectives were also mentioned: the Facilitator Gathering or some noteworthy books that I forgot to write down (sorry! Please ask for them in the comments).

Internal facilitator

We also discussed some problems that “internal” facilitators face day-to-day. Internal facilitators work within the team they try to facilitate.

Presentation about acceptance testing by Uncle Bob

A big event in February this year were the workshops and the presentation with Robert C. Martin about testing. His talk presented Fitnesse in the context of acceptance testing. There was some confusion about the amount of available seats, so most of us didn’t attend (because we weren’t able to register beforehands). Some of our participants were there, nonetheless and found the presentation worthwile. Only the usual pattern of Uncle Bob’s presentation lacked some virtue this time, but this can easily explained with the flu. Here’s an external summary of the event. Check out the comment section for potential first-hand accounts.

Definition of test types

In the wake of our talk about Uncle Bob’s presentation, we discussed different test categorization schemes. We’ve invented our own, but there is also a widely used definition from the International Software Testing Qualifications Board. We didn’t dive deep into this topic, so lets say it’s still open for discussion.

Book about money counterfeiter

Somehow, I’ve written down a notice about a german book about a famous money counterfeiter, Jürgen Kuhl: “Blütenträume”. This talented artist drew dollar notes by hand so perfectly that even experts couldn’t tell them apart. Regrettably, I don’t remember the context anymore. It might have something to do with Giesecke & Devrient, a manufacturer of money printing machines. But even then, I don’t remember what that context was about.

Traceability of software artifacts

Our last topic circled around the question how software artifacts are registered and traced in our practice. The interesting part of this question is the ability to make connections between different artifacts, like an automatic report about what existing features are tangented by a change and should be tested again (if manual tests are necessary). Or you want to record the specifics of your test environment alongside your tests. Perhaps you are interested in the relation between features and their accompanying tests. The easiest connection can be made between a change (commit) and the issue it belongs to. But changes without issue (like almost all refactorings) are problematic still. It was an interesting discussion with a lot input to think about.

Summary

One thing I’ve learnt from this Dev Brunch is that it isn’t enough to write down some notes and try to remember the details some weeks later. The summaries have to be written in a timely manner. I didn’t succeed with it this time and try to blame it on my lack of health. I promise a better summary next time. The worst part is that I know that I’ve forgotten a lot of important or interesting details (like a youtube channel about ideas – please provide the link in the comment section, Martin!) but cannot recreate the memories.

As usual, the Dev Brunch contained a lot more chatter and talk than listed here. The high number of attendees makes for an unique experience every time. We are looking forward to the next Dev Brunch at the Softwareschneiderei. And as always, we are open for guests and future regulars. Just drop us a notice and we’ll invite you over next time.

	mariuselvert on Calculating the Number of Segm…
	Anonymous on Calculating the Number of Segm…
	Anonymous on Avoiding Code Style Discu…
	Anonymous on What Happens When We Don’t Lis…
	Writing Integration… on Every Unit Test Is a Stage Pla…