Last night, an end-to-end test saved my live

but not with a song, only with an assertion failure. Okay, I’ll admit, this is not the only hyperbole in the blog title. Let’s set things right: The story didn’t happen last night, it didn’t even happen at night (because I’m not working night hours). In fact, it was a well-lit sunny day. And the test didn’t save my live. It did save me from some embarrassment, though. And spared a customer some trouble and hotfixing sessions with me.

The true parts of the blog title are: An end-to-end test reported an assertion failure that helped me to fix a bug before it got released. In other words: An end-to-end test did its job and I felt fine. And that’s enough of the obscure song references from the 1980s, hopefully.

Let me explain the setting. We develop a big system that runs in production 24/7-style and is perpertually adjusted and improved. The system should run unattended and only require human attention when a real incident happens, either in the data or the system itself. If you look at the system from space, it looks like this:

A data source (in fact, lots of data sources) occasionally transmits data to the system that gets transformed and sent to a third-party system that does all kind of elaborate analysis with it. Of course, there is more to the real system than that, but for our story, it is sufficient to know that there are several third-party systems, each with their own data formats. We’ll call them third-party system A (TPS-A) and TPS-B in this story.

Our system is secured by lots and lots of unit tests, a good number of integration tests and a handful of end-to-end tests. All tests run green all the time. The end-to-end tests (E2ET) try to replicate the production environment as close as possible by running on the target operating system and using data transfer means like in the real case. The test boots up the complete system and simulates the data source and the third-party systems. By configuring the system so that it transfers back to the E2ET, we can write assertions in the kind of “given a specific input data, we expect this particular output data from the system, however it chooses to produce it”. This kind of broad-brush test is invalueable because it doesn’t care for specifics in the system. It only cares for output from the system.

This kind of E2ET is also difficult to write, complicated to run, prone to brittleness and obscure in its failure statement. If you write a three-line unit test, it is straight-forward, fast and probably pretty specific when it breaks: “This one thing that I’m testing is wrong now”. An E2ET as described here is like the Delphi Oracle:

  • E2ET: “Somewhere in your system, something doesn’t work the way I like it anymore.”
  • Developer: “Can you be more specific?”
  • E2ET: “No, but here are 2000 lines of debug output that might help you on your quest for the cause.”
  • Developer: “But I didn’t change anything.”
  • E2ET: “Oh, in that case, it could as well be the weather or a slow network. Mind if I run again?”

This kind of E2ET is also rather slow and takes its sweet time resetting the workspace state, booting the system again and using prolonged timeouts for every step. So you wait several minutes for the E2ET to fail again:

  • E2ET: “Yup, there is still something fishy with your current code.”
  • Developer: “It’s the same code you let pass yesterday.”
  • E2ET: “Well, today I don’t like it!”
  • Developer: “Okay, let me dive into the debug output, then.”

Ten to twenty minutes pass.

  • Developer: “Is it possible that you just choke on a non-free network port?”
  • E2ET: “Yes, I’m supposed to wait for the system’s output on this port and cannot bind to it.”
  • Developer: “You cannot bind to it because an instance of yourself is stuck in the background and holding onto the port.”
  • E2ET: “See? I’ve told you there is something wrong with your system!”
  • Developer: “This isn’t a problem with the system’s code. This is a problem with your code.”
  • E2ET: “Hey! Don’t blame me! I’m here to help. You can always chose to ignore or delete me if I’m too much of a burden to you.”

This kind of E2ET cries wolf lots of times for problems in the test code itself, for too optimistic timeouts or just oddities in the environment. If such an E2ET fails, it’s common practice to run it again to see if the problem persists.

How many false alarms can you tolerate before you stop listing?

In my case, I choose to reduce the amount of E2ET to the essential minimum and listen to them every time. If the test raises a false alarm, invest the time to come up with a way to make it more robust without reducing its sensitivity. But listen to the test every time.

Because, in my story, the test insisted that third-party system A and B both didn’t receive half of their data. I could rule out stray effects in the network or on the harddisk pretty fast, because the other half of the data arrived just fine. So I invested considerable time in understanding the debug output. This led me nowhere – it seemed that the missing data wasn’t sent to the system in the first place. Checking the E2ET disproved this hypothesis. The test sent as much data to the system as ever. But half of it went missing underway. Now I reviewed all changes I made in the last few days. Some of them affected the data export to the TPS. But all unit and integration tests in this area insisted that everything worked as intended.

It is important to have this kind of “multiple witnesses”. Unit tests are like traces in a criminal investigation: They report one very specific information about one very specific part of the code and are only useful in larger numbers. Integration tests testify on the possibility to combine several code parts. In my case, this meant that the unit tests guaranteed that all parts involved in the data export work correct on their own (in isolation). The integration tests vouched that, given the right combination of data export parts, they will work as intended.

And this left one area of the code as the main suspect: Something in the combination of export parts must go wrong in the real case. The code that wires the parts together is the only code not tested by unit and integration tests, but E2ET. Both other test types base their work on the hypothesis of “given that everything is wired together like this”. If this hypothesis doesn’t hold true in the real case, no test but the E2ET finds a problem, because the E2ET bases on another hypothesis: “given that the system is started for real”.

I went through all the changes of the wiring code and there it was: A glorious TODO comment, written by myself on a late friday afternoon, stating: “TODO: Register format X export again after you’ve finished issue Y”. I totally forgot about it over the weekend. I only added the comment into the code, not as an issue comment. Friday afternoon me wasn’t cooperative with future or current me. In effect, I totally played myself. I don’t remember removing the registering code or writing the comment. It probably was a minor side change for an unimportant aspect of the main issue. The whole thing was surely done in a few seconds and promptly forgotten. And the only guardian that caught it was the E2ET, using the most nonspecific words for it (“somewhere, something went wrong”).

If I had chosen to ignore or modify the E2ET, the bug would have made it to production. It would have caused a loss of current data for the TPS. Maybe they would have detected it. But maybe, they would have just chosen to operate on the most recent data – data that doesn’t get updated anymore. In short, we don’t want to find out.

So what is the moral of the story?
Always have an end-to-end test for your most important functionality in place. Let it deal with your system from the outside. And listen to it, regardless of the effort it takes.

And what can you do if your E2ET has saved you?

  • Give your test a medal. No, seriously, leave a comment in the test code that it was helpful.
  • Write an unit or integration test that replicates the specific cause. You want to know immediately if you make the same error again in the future. Your E2ET is your last line of defense. Don’t let it be your only one. You’ve been shown a weakness in your test setup, remediate it.
  • Blame past you for everything. Assure future you that you are a better developer than this.
  • Feel good that you were able to neutralize your faux pas in time. That’s impressive!

When was the last time a test has saved you? Leave the story or a link to it in the comments. I can’t be the only one.

Oh, and if you find the random links to 80s music videos weird: At least I didn’t rickroll you. The song itself is from 1987 and would fit my selection.

Integrating catch2 with CMake and Jenkins

A few years back, we posted an article on how to get CMake, googletest and jenkins to play nicely with each other. Since then, Phil Nash’s catch testing library has emerged as arguably the most popular thing to write your C++ tests in. I’m going to show how to setup a small sample project that integrates catch2, CMake and Jenkins nicely.

Project structure

Here is the project structure we will be using in our example. It is a simple library that implements left-pad: A utility function to expand a string to a minimum length by adding a filler character to the left.

├── CMakeLists.txt
├── source
│   ├── CMakeLists.txt
│   ├── string_utils.cpp
│   └── string_utils.h
├── externals
│   └── catch2
│       └── catch.hpp
└── tests
    ├── CMakeLists.txt
    ├── main.cpp
    └── string_utils.test.cpp

As you can see, the code is organized in three subfolders: source, externals and tests. source contains your production code. In a real world scenario, you’d probably have a couple of libraries and executables in additional subfolders in this folder.

The source folder

set(TARGET_NAME string_utils)

add_library(${TARGET_NAME}
  string_utils.cpp
  string_utils.h)

target_include_directories(${TARGET_NAME}
  INTERFACE ./)

install(TARGETS ${TARGET_NAME}
  ARCHIVE DESTINATION lib/)

The library is added to the install target because that’s what we typically do with our artifacts.

I use externals as a place for libraries that go into the projects VCS. In this case, that is just the catch2 single-header distribution.

The tests folder

I typically mirror the filename and path of the unit under test and add some extra tag, in this case the .test. You should really not need headers here. The corresponding CMakeLists.txt looks like this:

set(UNIT_TEST_LIST
  string_utils)

foreach(NAME IN LISTS UNIT_TEST_LIST)
  list(APPEND UNIT_TEST_SOURCE_LIST
    ${NAME}.test.cpp)
endforeach()

set(TARGET_NAME tests)

add_executable(${TARGET_NAME}
  main.cpp
  ${UNIT_TEST_SOURCE_LIST})

target_link_libraries(${TARGET_NAME}
  PUBLIC string_utils)

target_include_directories(${TARGET_NAME}
  PUBLIC ../externals/catch2/)

add_test(
  NAME ${TARGET_NAME}
  COMMAND ${TARGET_NAME} -o report.xml -r junit)

The list and the loop help me to list the tests without duplicating the .test tag everywhere. Note that there’s also a main.cpp included which only defines the catch’s main function:

#define CATCH_CONFIG_MAIN
#include <catch.hpp>

The add_test call at the bottom tells CTest (CMake’s bundled test-runner) how to run catch. The “-o” switch commands catch to direct its output to a file, report.xml. The “-r” switch sets the report mode to JUnit format. We will need both to integrate with Jenkins.

The top-level folder

The CMakeLists.txt in the top-level folder needs to call enable_testing() for our setup. Other than that, it just directs to the subfolders via add_subdirectory().

Jenkins

Now all that is needed is to setup Jenkins accordingly. Setup jenkins to get your code, add a “CMake Build” build-step. Hit “Add build tool invocation” and check “Use cmake” to let cmake handle the invocation of your build tool (e.g. make). You also specify the target here, which is typically “install” or “package” via the “–target” switch.

Now you add another step that runs the tests via CTest. Add another Build Step, this time “CMake/CPack/CTest Execution” and pick CTest. The one quirk with this is that it will let the build fail when CTest returns a non-zero exit code – which it does when any tests fail. Usually, you want the build to become unstable and not failed if that happens. Hence set “1-65535” in the “Ignore exit codes” input.

The final step is to let jenkins use the report.xml that we had CTest generate so it can generate the test result charts and tables. To do that, add the post-build action: “Publish JUnit test result report” and point it to tests/report.xml.

Done!

That’s it. Now you got your CI running nice catch tests. The code for this example is available on our github.

Functional tests for Grails with Geb and geckodriver

Previously we had many functional tests using the selenium-rc plugin for Grails. Many were initially recorded using Selenium IDE, then refactored to be more maintainable. These refactorings introduced “driver” objects used to interact with common elements on the pages and runners which improved the API for walking through a multipage process.

Selenium-rc got deprecated quite a while ago and support for Firefox broke every once in a while. Finally we were forced to migrate to the current state-of-the-art in Grails functional testing: Geb.

Generally I can say it is really a major improvement over the old selenium API. The page concept is similar to our own drivers with some nice features:

  • At-Checkers provide a standardized way of checking if we are at the expected page
  • Default and custom per page timeouts using atCheckWaiting
  • Specification of relevant content elements using a JQuery-like syntax and support for CSS-selectors
  • The so-called modules ease the interaction with form elements and the like
  • Much better error messages

While Geb is a real improvement over selenium it comes with some quirks. Here are some advice that may help you in successfully using geb in the context of your (grails) webapplication.

Cross-plattform testing in Grails

Geb (or more specifically the underlying webdriver component) requires a geckodriver-binary to work correctly with Firefox. This binary is naturally platform-dependent. We have a setup with mostly Windows machines for the developers and Linux build slaves and target systems. So we need binaries for all required platforms and have to configure them accordingly. We have simply put them into a folder in our project and added following configuration to the test-environment in Config.groovy:

environments {
  test {
    def basedir = new File(new File('.', 'infrastructure'), 'testing')
    def geckodriver = 'geckodriver'
    if (System.properties['os.name'].toLowerCase().contains('windows')) {
      geckodriver += '.exe'
    }
    System.setProperty('webdriver.gecko.driver', new File(basedir, geckodriver).canonicalPath)
  }
}

Problems with File-Uploads

If you are plagued with file uploads not working it may be a Problem with certain Firefox versions. Even though the fix has landed in Firefox 56 I want to add the workaround if you still experience problems. Add The following to your GebConfig.grooy:

driver = {
  FirefoxProfile profile = new FirefoxProfile()
  // Workaround for issue https://github.com/mozilla/geckodriver/issues/858
  profile.setPreference('dom.file.createInChild', true)
  new FirefoxDriver(profile)
}

Minor drawbacks

While the Geb-DSL is quite readable and allows concise tests the IDE-support is not there. You do not get much code assist when writing the tests and calling functions of the page objects like in our own, code based solution.

Conclusion

After taking the first few hurdles writing new functional tests with Geb really feels good and is a breeze compared to our old selenium tests. Converting them will be a lot work and only happen on a case-by-case basis but our coverage with Geb is ever increasing.

Look at the automated tests to diagnose the project ailments

A cornerstone of modern software development is developer testing. That means that developers are the primary authors of automated test code. In theory, that is a good thing and might look like the quality assurance department is out of work soon. In practice, we as a profession tried for nearly twenty years to install a culture of developer testing in our work and still end up with software projects that feature no automated tests at all (Side note: JUnit 1.0 was released in February of 1998).

What we know about automated tests

One piece of common understanding about developer testing is the test pyramide. Let’s iterate quickly what we know about it. There are different kinds of automated tests and the test pyramide differentiates three of them:

  • Acceptance tests or UI tests are the heaviest type of automated test. They operate on the software from the outside, with the means of a real user and try to assert that real use cases are accomplishable.
  • Integration tests often use several parts of the system in a test scenario that asserts the correct collaboration of the parts. Integration tests may take some time to come to a conclusion and utilize real hardware like network or disks.
  • Unit tests tend to be small and quick and focus on a particular aspect of an “unit” like a class or entity aggregate. Their reach into the system should be short and might be forcefully restricted by employing mocks.

These three types, the A, I and U of automated tests, should come in different numbers. A good rule of thumb is that for every acceptance test, there might be up to one thousand unit tests. If you draw the quantities as areas, they appear in form of a pyramide. A small top of acceptance tests rests on a broader seating of integration tests that relies on a groundwork of many unit tests. A healthy test pyramide looks like this:

Take this picture as an orientation, not as an absolute scale. But be sure to count your different test types from time to time.

Outlining the tests

This is actually one of the first things I do when I get introduced to a new and unknown code base. This happens quite often when I do consulting work for existing development teams. Have a look at the automated tests, determine their type and count their numbers. If it resembles anything close to the test pyramide, you’ve got a chance. If the resulting shape looks different, you might find this blog entry useful:

The Tower

If you have a hard time finding any tests (because there are none) or you find only some half-assed attempts to produce a meaningful automated test suite, you look at a tower project. The tower is rather small in diameter, in the cases of absent tests it is nothing more than a thin vertical line (the “stick”). If you find a solid number of tests for every type, you’ve found a “block” project. Block projects usually don’t have a problem, but a history of test effort migration either from unit to acceptance tests or, more common, in the other direction. If you find a block, you are fine.

The tower, though, is a case of neglect. The project team might have started serious efforts to automated their tests, but got demotivated by intrinsic or extrinsic influences and abandoned the tests soon after their creation. Nobody has looked after them since and the only reason they still pass green is that they didn’t really test anything to begin with or only cover an area of the system that is as finished as it is boring. Topics like user management or utility classes are usually the first and only things that got tests in a tower scenario.

Don’t get me wrong, the tower indicates the absence of tests, but not the absence of willingness to write automated tests, unless the tower is really a stick. A team willing to invest in automated tests may only lack knowledge and coaching about the topic. Be sure to lead them bottom-up (unit tests first), though.

The Egg

If you’ve categorized and counted the tests and couldn’t find many acceptance or unit tests, you’ve found an egg. The egg consists of mostly integration tests that may lean into unit testing territory by asserting smallest bits of functionality here and there (often embedded in an overarching test storyline) or dip their toes into gui-based testing by asserting presentation-specific properties of widget objects. While they provide ample test coverage for the system, they also tie application logic and presentation details together and don’t help to separate domain code from the use cases.

The project team is probably proud of their test coverage and doesn’t see any value in differentiating the automated tests types, because “every test improves the situation”. The blindness to test types is the core problem that may be cured with training and coaching (I’ve found the ATRIP-rules to be particularly effective to distinguish integration and unit tests), but the symptoms, especially the lack of separation of concerns, have to be mitigated soon, too.

One way to start there is to break the tests down into their integration and their unit test parts. You can work from assertion to assertion and ask: is this necessary to ensure the current use case? If not, extract a new unit test focussed on only this one assertion.

As soon as you add a pedestal consisting of unit tests to your egg, you are on your best way to a healthy test pyramide.

The Ice Cream Cone

This is the most fearsome automated test outline in existence, even more dramatic than the stick. Usually, the project team is really enthusiastic about writing tests or at least follow order to do so, but they cannot test parts of the application in isolation. A really tragic case was a complex system that was so entangled with its database, through countless stored procedures that contributed to the application logic, that it was hopeless to think about tests without the database. And because every automated test had to start the whole system including the database, there was really no need to differentiate between application logic and presentation logic. It all became a gordic knot of dependencies that enforced the habit of writing elaborate automated GUI-based tests to test the smallest logic bits deep inside the core. It felt like eating single rice grains with overly long, flimsy wooden chopsticks that would break often.

The ice cream cone is problematic because the project team needs to realize that their effort was mislead and the tests are all telling the bitter truth: the system’s architecture isn’t fit for proper automated tests. It’s not the tests, it’s you (or your architecture)! Nobody wants to hear that and more so, nobody wants to untangle the mess (without the help of a proper safety net consisting of automated tests). Pinning tests are probably helpful in this scenario.

But you need to turn the test pyramide around or the project team will suffocate by the overly costly test tax while increasing technical debt.

Epilogue

Please keep in mind that it’s not a problem in itself that your project doesn’t have a normal test pyramide. It’s great that you have automated tests at all! But your current test type distribution might not be as effective as possible, might be more expensive than necessary and might be not the right automated test setup for your development goals.

What are your stories with automated test setups? Care to share it with us in the comments?

Why I’m not using C++ unnamed namespaces anymore

Well okay, actually I’m still using them, but I thought the absolute would make for a better headline. But I do not use them nearly as much as I used to. Almost exactly a year ago, I even described them as an integral part of my unit design. Nowadays, most units I write do not have an unnamed namespace at all.

What’s so great about unnamed namespaces?

Back when I still used them, my code would usually evolve gradually through a few different “stages of visibility”. The first of these stages was the unnamed-namespace. Later stages would either be a free-function or a private/public member-function.

Lets say I identify a bit of code that I could reuse. I refactor it into a separate function. Since that bit of code is only used in that compile unit, it makes sense to put this function into an unnamed namespace that is only visible in the implementation of that unit.

Okay great, now we have reusability within this one compile unit, and we didn’t even have to recompile any of the units clients. Also, we can just “Hack away” on this code. It’s very local and exists solely to provide for our implementation needs. We can cobble it together without worrying that anyone else might ever have to use it.

This all feels pretty great at first. You are writing smaller functions and classes after all.

Whole class hierarchies are defined this way. Invisible to all but yourself. Protected and sheltered from the ugly world of external clients.

What’s so bad about unnamed namespaces?

However, there are two sides to this coin. Over time, one of two things usually happens:

1. The code is never needed again outside of the unit. Forgotten by all but the compiler, it exists happily in its seclusion.
2. The code is needed elsewhere.

Guess which one happens more often. The code is needed elsewhere. After all, that is usually the reason we refactored it into a function in the first place. Its reusability. When this is the case, one of these scenarios usually happes:

1. People forgot about it, and solve the problem again.
2. People never learned about it, and solve the problem again.
3. People know about it, and copy-and-paste the code to solve their problem.
4. People know about it and make the function more widely available to call it directly.

Except for the last, that’s a pretty grim outlook. The first two cases are usually the result of the bad discoverability. If you haven’t worked with that code extensively, it is pretty certain that you do not even know that is exists.

The third is often a consequence of the fact that this function was not initially written for reuse. This can mean that it cannot be called from the outside because it cannot be accessed. But often, there’s some small dependency to the exact place where it’s defined. People came to this function because they want to solve another problem, not to figure out how to make this function visible to them. Call it lazyness or pragmatism, but they now have a case for just copying it. It happens and shouldn’t be incentivised.

A Bug? In my code?

Now imagine you don’t care much about such noble long term code quality concerns as code duplication. After all, deduplication just increases coupling, right?

But you do care about satisfied customers, possibly because your job depends on it. One of your customers provides you with a crash dump and the stacktrace clearly points to your hidden and protected function. Since you’re a good developer, you decide to reproduce the crash in a unit test.

Only that does not work. The function is not accessible to your test. You first need to refactor the code to actually make it testable. That’s a terrible situation to be in.

What to do instead.

There’s really only two choices. Either make it a public function of your unit immediatly, or move it to another unit.

For functional units, its usually not a problem to just make them public. At least as long as the function does not access any global data.

For class units, there is a decision to make, but it is simple. Will using preserve all class invariants? If so, you can move it or make it a public function. But if not, you absolutely should move it to another unit. Often, this actually helps with deciding for what to create a new class!

Note that private and protected functions suffer many of the same drawbacks as functions in unnamed-namespaces. Sometimes, either of these options is a valid shortcut. But if you can, please, avoid them.

Monitoring data integrity with health checks

An important aspect for systems, which are backed by a database storage, is to maintain data integrity. Most relational databases offer the possibility to define constraints in order to maintain data integrity, usually referential integrity and entity integrity. Typical constraints are foreign key constraints, not-null constraints, unique constraints and primary key constraints.

SQL also provides the CHECK constraint, which allows you to specify a condition on each row in a table:

ALTER TABLE table_name ADD CONSTRAINT
   constraint_name CHECK ( predicate )

For example:

CHECK (AGE >= 18)

However, these check constraints are limited. They can’t be defined on views, they can’t refer to columns in other tables and they can’t include subqueries.

Health checks

In order to monitor data integrity on a higher level that is closer to the business rules of the domain, we have deployed a technique that we call health checks in some of our applications.

These health checks are database queries, which check that certain constraints are met in accordance with the business rules. The queries are usually designed to return an empty result set on success and to return the faulty data records otherwise.

The health checks are run periodically. For example, we use a Jenkins job to trigger the health checks of one of our web applications every couple of hours. In this case we don’t directly query the database, but the application does and returns the success or failure states of the health checks in the response of a HTTP GET request.

This way we can detect problems in the stored data in a timely manner and take countermeasures. Of course, if the application is bug free these health checks should never fail, and in fact they rarely do. We mostly use the health checks as an addition to regression tests after a bug fix, to ensure and monitor that the unwanted state in the data will never happen again in the future.

Timestamps make horrible identifiers

vetre / fotoliaNot long ago, I’ve struggled with a system that uses timestamps as entity identifiers. What can I say? Timestamps aren’t meant to identify anything else than a specific point in time. Don’t use them as entity identifiers, ever. If you want to know why, I invite you to read on. The blog post is written in Freytag’s dramatic structure for added effect.

Exposition

We’ve designed a system that runs on multiple instances that communicate in all sorts of way. A central archive instance stores all data related to measurements. The whole network revolves around the notion of measurement. Measurement data is the most precious data and all instances will either produce or consume data based on these measurements.

Most important for human operators is an instance that lets you view all existing measurement data. Let’s call it the viewer. The viewer displays an overview list of all measurements in a given context and lets the operator choose to view ever more details of any of them. To be able to provide the overview list as fast as possible, we added a cache that holds the information.

Rising action

This measurement list cache was the source of all kinds of peculiar behaviour of the system. Most, but not all measurement data was incomplete. The list cache entries were assembled from different sources that were available at different times, so it seemed that while one part of the data got written to the cache, another part couldn’t be written for whatever reasons. The operator could load detailed data of some few measurements, but the majority just produced an error message that the data couldn’t be found (despite it being present).
The most obviously broken functionality left the following trace in the log files (paraphrased):

- storing measurement at 2016-02-28T13:25:55.189+01:00 into the list cache
- measurement stored
[...]
- loading measurement at 2016-02-28T13:25:55.189+01:00 from the list cache
- error: measurement not found in list cache

So, the system is essentially telling me that it can’t load some data it just stored. As you can imagine, this may lead to some questions about the sanity of the database product underneath.

Climax

After some investigation and fruitless integration testing, it dawned me: The problem wasn’t timing or the database. All the bugs could be explained with only one circumstance: Measurements were ultimately identified by their timestamp, the moment the measurement was made. There’s also a location, type and some other information in the identifier for each measurement, but only the timestamp changes between two measurements in the same narrow context. And the timestamp was stored in different precisions, depending on the origin of the measurement identifier. Most identifiers were create at the measurement producing system instances (let’s call them measurers) and had millisecond precision. As soon as they got stored in the production database (but not our development database), they lost the milliseconds. And some of the most important measurement data got exported to third-party systems, using a minute-based precision. So we had one measurement identifier in the system, but with three different types, each mostly incompatible to each other.

Falling action

That’s why the log excerpt above never occurred in development, but in production: The measurement is stored in the database, the used identifier gets passed around in the software, but a query on the exact same identifier in the database yields no result because the timestamps now differ in the millisecond range. And the strange effects that sometimes, everything worked just fine? That’s when the milliseconds are zero by chance. Given that most actions in the system are scheduled and performed automatically exactly on the zero mark, the zero milliseconds case happened more often than it would in an even distribution.

Our system dealt with three types of measurement identifiers: Millisecond-precise identifiers produced by the measurers, second-precise identifiers used by the measurement list cache and minute-precise identifiers used (and sometimes fed back into the system) by the data export. These identifiers were incompatible even for the same measurement most of the time, but not always. In unit tests, the timestamps were made up and didn’t reveal the problem properly (who thinks about odd milliseconds when making up a timestamp?).

My solution was to pull this incompatibility up into the type system. Instead of one measurement identifier, there are now three measurement identifiers: MillisecondPreciseIdentifier, SecondPreciseIdentifier and MinutePreciseIdentifier. An identifier of higher precision can be converted to an identifier of lower precision, but not the other way around. Everytime a measurement identifier is created, it needs to explicitely state its precision of the timestamp. This made the compiler highlight the problematic usages clearly as type conflicts and therefore dealing with the problem much easier.

Revelation

Choosing a timestamp as a vital part of a (measurement) identifier was a mistake from the beginning. The greater problem was the omission of the timestamp’s precision. Timestamps perform more like floating-point numbers and less like integers, even if every timestamp can be represented by a long. As soon as I made the precision of each timestamp clear to the compiler, the bugs revealed themselves. The annoying difference between developer and production database would have been detected much sooner, because a millisecond-precise timestamp will now warn in the log files if its millisecond part is zero. As soon as this log entry is seen very often, its clear that something is wrong. The new datatypes not only serve as a clearer API contract definition tool, but also as a runtime sanity check.

If you don’t want to repeat this mistake, keep in mind that each timestamp, date or whatever time-related data type you use will inherently have a maximum precision. As soon as you mix different precisions into the same data type, you’re going to have a bad time. Explicitely state the required precision in your type system and your compiler will keep an eye on it, too.