Every time you write a getter, a function dies

This blog post explores the difference between classic and Tell, don’t Ask-style code without any sourcecode examples.

Don’t be too alarmed by the title. Functions are immortal concepts and there’s nothing wrong with a getter method. Except when you write code under the rules of the Object Calisthenics (rule number 9 directly forbids getter and setter methods). Or when you try to adhere to the ideal of encapsulation, a cornerstone of object-oriented programming. Or when your code would really benefit from some other design choices. So, most of the time, basically. Nobody dies if you write a getter method, but you should make a concious decision for it, not just write it out of old habit.

One thing the Object Calisthenics can teach you is the immediate effect of different design choices. The rules are strict enough to place a lot of burden on your programming, so you’ll feel the pain of every trade-off. In most of your day-to-day programming, you also make the decisions, but don’t feel the consequences right away, so you get used to certain patterns (habits) that work well for the moment and might or might not work in the long run. You should have an alternative right at hands for every pattern you use. Otherwise, it’s not a pattern, it’s a trap.

Some alternatives

Here is an incomplete list of common alternatives to common patterns or structures that you might already be aware of:

  • if-else statement (explicit conditional): You can replace most explicit conditionals with implicit ones. In object-oriented programming, calling polymorphic methods is a common alternative. Instead of writing if and else, you call a method that is overwritten in two different fashions. A polymorphic method call can be seen as an implicit switch-case over the object type.
  • else statement: In the Object Calisthenics, rule 2 directly forbids the usage of else. A common alternative is an early return in the then-block. This might require you to extract the if-statement to its own method, but that’s probably a good idea anyway.
  • for-loop: One of the basic building blocks of every higher-level programming language are loops. These explicit iterations are so common that most programmers forget their implicit counterpart. Yeah, I’m talking about recursion here. You can replace every explicit loop by an implicit loop using recursion and vice versa. Your only limit is the size of your stack – if you are bound to one. Recursion is an early brain-teaser in every computer science curriculum, but not part of the average programmer’s toolbox. I’m not sure if that’s a bad thing, but its an alternative nonetheless.
  • setter method: The first and foremost alternative to a state-altering operation are immutable objects. You can’t alter the state of an immutable, so you have to create a series of edited copies. Syntactic sugar like fluent interfaces fit perfectly in this scenario. You can probably imagine that you’ll need to change the whole code dealing with the immutables, but you’ll be surprised how simple things can be once you let go of mutable state, bad conscience about “wasteful” heap usage and any premature thought about “performance”.

Keep in mind that most alternatives aren’t really “better”, they are just different. There is no silver bullet, every approach has its own advantages and drawbacks, both shortterm and in the long run. Your job as a competent programmer is to choose the right approach for each situation. You should make a deliberate choice and probably document your rationale somewhere (a project-related blog, wiki or issue tracker comes to mind). To be able to make that choice, you need to know about the pros and cons of as much alternatives as you can handle. The two lamest rationales are “I’ve always done it this way” and “I don’t know any other way”.

An alternative for get

In this blog post, you’ll learn one possible alternative to getter methods. It might not be the best or even fitting for your specific task, but it’s worth evaluating. The underlying principle is called “Tell, don’t Ask”. You convert the getter (aka asking the object about some value) to a method that applies a function on the value (aka telling the object to work with the value). But what does “applying” mean and what’s a function?

191px-Function_machine2.svgA function is defined as a conversion of some input into some output, preferably without any side-effects. We might also call it a mapping, because we map every possible input to a certain output. In programming, every method that takes a parameter (or several of them) and returns something (isn’t void) can be viewed as a function as long as the internal state of the method’s object isn’t modified. So you’ve probably programmed a lot of functions already, most of the time without realizing it.

In Java 8 or other modern object-oriented programming languages, the notion of functions are important parts of the toolbox. But you can work with functions in Java since the earliest days, just not as convenient. Let’s talk about an example. I won’t use any code you can look at, so you’ll have to use your imagination for this. So you have a collection of student objects (imagine a group of students standing around). We want to print a list of all these students onto the console. Each student object can say its name and matriculation number if asked by plain old getters. Damn! Somebody already made the design choice for us that these are our duties:

  • Iterate over all student objects in our collection. (If you don’t want to use a loop for this you know an alternative!)
  • Ask each student object about its name and matriculation number.
  • Carry the data over to the console object and tell the console to print both informations.

But because this is only in our imagination, we can go back in imagined time and eliminate the imagined choice for getters. We want to write our student objects without getters, so let’s get rid of them! Instead, each student object knows about their name and matriculation number, but cannot be asked directly. But you can tell the student object to supply these informations to the only (or a specific) method of an object that you give to it. Read the previous sentence again (if you’ve not already done it). That’s the whole trick. Our “function” is an object with only one method that happens to have exactly the parameters that can be provided by the student object. This method might return a formatted string that we can take to the console object or it might use the console itself (this would result in no return value and a side effect, but why not?).  We create this function object and tell each student object to use it. We don’t ask the student object for data, we tell it to do work (Tell, don’t Ask).

In this example, the result is the same. But our first approach centers the action around our “main” algorithm by gathering all the data and then acting on it. We don’t feel pain using this approach, but we were forced to use it by the absence of a function-accepting method and the presence of getters on the student objects. Our second approach prepares the action by creating the function object and then delegates the work to the objects holding the data. We were able to use it because of the presence of a function-accepting method on the student objects. The absence of getters in the second approach is a by-product, they simply aren’t necessary anymore. Why write getters that nobody uses?

We can observe the following characteristics: In a “traditional”, imperative style with getters, the data flows (gets asked) and the functionality stays in place. In a Tell, don’t Ask style with functions, the data tends to stay in place while the functionality gets passed around (“flows”).

Weighing the options

This is just one other alternative to the common “imperative getter” style. As stated, it isn’t “better”, just maybe better suited for a particular situation. In my opinion, the “functional operation” style is not straight-forward and doesn’t pay off immediately, but can be very rewarding in the long run. It opens the door to a whole paradigm of writing sourcecode that can reveal inherent or underlying concepts in your solution domain a lot clearer than the imperative style. By eliminating the getter methods, you force this paradigm on your readers and fellow developers. But maybe you don’t really need to get rid of the getters, just reduce their usage to the hard cases.

So the title of this blog post is a bit misleading. Every time you write a getter, you’ve probably considered all alternatives and made the informed decision that a getter method is the best way forward. Every time you want to change that decision afterwards, you can add the function-accepting method right alongside the getters. No need to be pure or exclusive, just use the best of two worlds. Just don’t let the functions die (or never be born) because you “didn’t know about them” or found the style “unfamiliar”. Those are mere temporary problems. And one of them is solved right now. Happy coding!

Explicit types – and when to use them

Many modern programming languages offer a way declare variables without an explicit type if the type can be inferred, either dynamically or statically. Many also allow for variables to be explicitly defined with a type. For example, Scala and C# let you omit the explicit variable type via the var keyword, but both also allow defining variables with explicit types. I’m coming from the C++ world, where “auto” is available for this purpose since the relatively recent C++11. However, people are still debating whether you should actually use it.

Pros

Herb Sutter popularised the almost-always-auto style. He advocates that using more type inference is good because it is roughly equivalent to programming against interfaces instead of implementations. He says that “Overcommitting to explicit types makes code less generic and more interdependent, and therefore more brittle and limited.” However, he also mentions that you might sometimes want to use explicit types.

Now what exactly is overcommiting here? When is the right time to use explicit types?

Cons

Opponents to implicit typing, many of them experienced veterans, often state that they want the actual type visible in the source code. They don’t want to rely on type inference being right. They want the code to explicitly state what’s going on.

At first, I figured that was just conservatism in the face of a new “scary” feature that they did not fully understand. After all, IDEs can usually infer the type on-the-fly and you can hover on a variable to let it show you the type.

For C++, the function signature is a natural boundary where you often insert explicit types, unless you want to commit to the compile time and physical dependency cost that comes with templates. Other languages, such as Groovy, do not have this trade-off and let you skip explicit types almost everywhere. After working with Groovy/Grails for a while, where the dominant style seems to be to omit types whereever possible, it dawned on me that the opponents of implicit typing have a point. Not only does the IDE often fail to show me the inferred type (even though it still works way more often than I would have anticipated), but I also found it harder to follow and modify code that did not mention explicit types. Seemingly contrary to Herb Sutter’s argument, that code felt more brittle than I had liked.

Middle-ground

As usual, the truth seems to be somewhere in the middle. I propose the following rule for when to use explicit types:

  • Explicit typing for domain-types
  • Implicit typing everywhere else

Code using types from the problem domain should be as specific as possible. There’s no need for it to be generic – it’s actually counter-productive, as otherwise the code model would be inconsistent with model of the problem domain. This is also the most important aspect to grok when reading code, so it should be explicit. The type is as important as the action on it.

On the other hand, for pure-fabrication types that do not respresent a concept in the domain, the action is important, while the type is merely a means to achieve this action. Typically, most of the elements from a language’s standard library fall into this category. All your containers, iterators, callables. Their types are merely implementation details: an associative container could be an array, or a hash-map or a tree structure. Exchanging it rarely changes the meaning of the code in the problem domain – it just changes its performance characteristics.

Containers will occasionally contain domain-types in their type. What do you do about those? I think they belong in the “everywhere else” catergory, but you should be take extra care to name the contained type when working with it – for example when declaring the variable of the for-each loop on it, or when inserting something into it. This way, the “collection of domain-type” aspect will become clear, but the specific container implementation will stay implicit – like it should.

What do you think? Is this a useful proposition for your code?

Exiling a legacy COM component

One of our long-standing Java applications has several dependencies to native libraries, which are called via the Java Native Interface (JNI). We usually avoid native library dependencies, but this application must interface with some hardware devices for which the vendors only provide access through native libraries.

From 32 bit to 64 bit

Until recently the application ran in a 32-bit Java VM and the native libraries were 32-bit DLLs as well. Then the time had come to update the application to the 64-bit world. We wanted the application to run in a 64-bit JVM, and 32-bit library code cannot run in a 64-bit process.

We were able to replace the 32-bit libraries with 64-bit libraries, all except one. This particular dependency is not just a native library, but a Windows COM component. We had developed a wrapper DLL, which connected the COM component via JNI to the Java application. While we do have the source code of the wrapper DLL, we don’t have the source code of the COM component, and the vendor does not provide a 64-bit version of the component.

32 bit

So we decided to exile this COM component to a separate process, which we refer to as a container process. The main application runs in a 64-bit JVM and communicates with this process via a simple HTTP API. The API calls from the application do not require a very low latency.

Initially we had planned to implement this container process as a C++ program. However, after a spike it turned out that there is a quicker way to both interface with a COM component and provide a simple self-hosted HTTP service on the Windows platform: The .NET framework provides excellent support for COM interoperability. Using a COM component in .NET is as simple as adding the component as a reference to the project, importing the namespace and instantiating the COM object like a regular .NET object. The event handling of the COM component, which requires quite some boilerplate code to set up in C++, gets automatically mapped to the C#/.NET event handling. The only reminder of the fact that you’re using a COM component is the amount of out and ref parameters.

For the HTTP API we chose the Nancy framework, with which we have had good experiences in previous projects. The architecture now looks like this:

64-bit

While it is a drawback that the container process now depends on the .NET runtime, it is outweighed by the benefit for us: The C#/.NET code for interfacing with the COM component is more maintainable than the previous JNI wrapper code was or a native implementation of the container process would have been.

Updating from Grails 2.3 to something newer

We are developing, running and maintaining moderately sized Grails web application with > 120 domain classes  since 2008 or Grails 1.0.3. The web application is still in production running on Grails 2.3.8. Just recently we wanted Java 8 support and the usual bugfixes and improvements you get by updating the framework. Since time and budget are very limited (as always…) we decided not to move to 3.x but only to the latest 2.x version. It seemed a safer and easier option and opened up the way to 3.x where many things changed completely.

Trying to go to 2.5.4

The upgrade procedure is generally well documented in Grails. That allowed us to upgrade from 1.0 to 1.3, from 1.3 to 2.2 and finally from 2.2 to 2.3. We skipped 2.0 because of too many problems we faced during the upgrade. As usual the major changes and tasks are mentioned in the upgrade guide. It started smoothly but we finally had to abort the upgrade process because we were bitten by https://github.com/grails/grails-data-mapping/issues/581 . We had not the time to dig fully into it and resolve the issue.

Trying to go to 2.4.5

Many of the changes and improvements and most notably a Groovy version supporting the Java 8 runtime are already available in Grails 2.4.5. So we gave it a shot hoping for fewer problems than with 2.5.4. Actually we got our application running in less than an hour but quite some of our unit, integration and functional tests failed. After finding some advice in http://stackoverflow.com/questions/16532631/grails-unit-test-mock-domain-with-assigned-id we changed our unit tests to use the @Mock() mixin instead of mockDomain() which works in 2.3 and is broken in 2.4.

When trying to fix our integration tests we saw that some of our HQL queries failed. Something was wrong navigating/querying multiple association levels so we finally gave up on this one, too.

Conclusion

Even though we managed to keep our Grails application alive for many years and several framework versions each upgrade carries a significant risk of breakage and requires quite some effort. This time we are stuck again and will have to invest more time to bring the application up-to-date again.

I would advise anyone already using or deciding for Grails as the web framework of choice to start with the latest and greatest release and to budget several person days for upgrades of medium sized projects. The devil is in the details…

Recap of the Schneide Dev Brunch 2016-04-10

If you couldn’t attend the Schneide Dev Brunch at 10th of April 2016, here is a summary of the main topics.

brunch64-borderedLast sunday, we held another Schneide Dev Brunch, a regular brunch on the second sunday of every other (even) month, only that all attendees want to talk about software development and various other topics. In case you miss the recap article about the february brunch: It didn’t happen. We all took a break, but are on track again. So if you bring a software-related topic along with your food, everyone has something to share. We were quite a lot of developers this time, so we had enough stuff to talk about. As usual, a lot of topics and chatter were exchanged. This recapitulation tries to highlight the main topics of the brunch, but cannot reiterate everything that was spoken. If you were there, you probably find this list inconclusive:

Why software development conferences?

We began with a curious question: Why are there even conferences about software development? You can read most of the content for free on the internet and even watch the talks afterwards. So why attend one for a lot of money? We discussed the topic a bit and came up with an analysis:
There are (at least) four different interested groups in a conference:

  • The organizer or commercial host is mostly interested in a positive revenue. As long as there’s a possibility for some net gain, somebody will host a conference. The actual topic is a secondary matter for them (this might explain some of the weirder conferences out there, like the boring conference).
  • The developers that really attend a conference are a small subset of all developers. They all have their own personal motives to pay money and invest time and inconviences to be there in person. Some might rely on the quality filter of a conference board, some are looking forward to meet their peers in an annual ritual. There might be those that can learn best if somebody talk-feeds them the topic. Whatever reason, a lot of developers enjoy participating at conferences. If it happens to be paid by the employer and booked as worktime, who would not?
  • Then there are the speakers. They have the additional burden to convince a committee of their topic, prepare a talk of high quality and be able to perform on stage (something that is harder than it looks). The speakers seek reputation and credible proof of expertise. His resume will probably profit, too.
  • And at last, the companies that sponsor the conference, maintain a booth with big roll-ups and smiling employees and give their developers a chance to attend are in the game to represent, to recruit and build their brand. A lot of traditional marketing effort goes into trade fairs, so why not treat the developer market like any other and be present in the developer fairs?

We can conclude that software development conferences can provide value for every associated stakeholder. As long as this sentence holds true, conferences will be held.
The question didn’t came out of the blue: one of our attendees got accepted as a speaker on the Karlsruher Entwicklertag 2016 and wanted to learn about the different expectations he needs to address. He will give his talk on the next Dev Brunch to practice the flow and to pass the hardest critics. The topic: git internals. We are thrilled!

Stratagems and strategies

The next topic contained another talk, not at a conference, but in the context of a “general topics” series at a local university (the Duale Hochschule in Karlsruhe). The talk introduces the concept of the 36 stratagems and of modern strategies to the audience. We talked a bit about the concept itself and found that the list of logical fallacies is somewhat similar. We even found an application of the stratagems in local history (sorry, only german source found): The Bretten’s Hundle
The talk itself is this monday, so you’ll need to hurry if you want to attend.

Psychology of deception

As often during the dev brunch, one topic led to the other, and we soon talked about morale and ethics. The concept of micro-expressions to reveal the hidden agenda of others came up, as well as the TV series “lie to me” that is inspired by the work of Paul Ekman, a professor of psychology. There even is a commercial training program to improve your skill of “spotting the liar”.

Games with morale aspects

Well, we are nerds. While crime investigation is thrilling, there is the even more enthralling topic of games with psychological and moralistic aspects. We soon exchanged our experiences with games like “Haze” or “Spec Ops: The Line”. But it doesn’t stop at shooter games, you can have similar insights by playing “Papers, Please” (a strong favorite for one of our next Schneide game nights) or “This War Of Mine”. You can even try some multiplayer games specifically designed for social insights, like “The Ship: Murder Party”.
And if you haven’t got much time but still want to learn something about yourself, little games like “60 Seconds!” are a great start.
This topic lead to some ideas for upcoming Schneide game nights in 2016.

Book review: A tour of C++

One attendee of the brunch provided a summary of the book “A Tour of C++” from Bjarne Stroustrup, that recently got updated to the language possibilities of C++ 11. In his words, the book is a rather incomplete introduction to the language, with way too many aspects described in a way too short manner. It’s more of a reading list to really grasp the concepts, so it may serve as a source of inspiration. For example, the notion of “move semantics” was explained, but to discover the consequences is up to the developer. The part about template programming was well done and every chapter has a suitable list of advices in the tradition of “Effective XYZ” at the end. So it’s not a bad book, but too short to be satisfying. It’s like a tourist’s tour around C++ 11, so the title holds its promise.

The left-pad incident

When we finished the “official” agenda, the topic of the recent left-pad incident came up and left us laughing. We really live in glorious times when the happiness of the (Javascript) world depends on a few lines of code. Not that this couldn’t happen in any other ecosystem.

Epilogue

As usual, the Dev Brunch contained a lot more chatter and talk than listed here. The number of attendees makes for an unique experience every time. We are looking forward to the next Dev Brunch at the Softwareschneiderei. And as always, we are open for guests and future regulars. Just drop us a notice and we’ll invite you over next time.

Modern CMake with target_link_libraries

Dependency hell?

One thing that has eluded me in the past was how to efficiently manage dependencies of different components within one CMake project. I’d use the include_directories, add_definitions and add_compile_options command in the top-level or in mid-level CMakeLists.txt files just to get the whole thing to compile. Of course, this is all heavily order-dependent – so the build system breaks as soon as you make an ever so subtle change to the directory layout. I’ve seen projects tackle this problem in various ways – for example by defining specifically named variables for each library and using that for their clients. Other projects defined “interface” files for each library that could be included by other targets. All these homegrown solutions work, but they are rather clumsy and don’t work well when integrating libraries not written in that same convention.

target_link_libraries to the rescue!

It turns out there’s actually a pretty elegant solution built into CMake, which centers around target_link_libraries. But information on this is pretty scarce on the web. The ones that initially put me on the right track were The Ultimate Guide to Modern CMake and CMake – Introduction and best practices. Of course, it’s all in the CMake documentation, but mentioned implicitly at best.

The gist is this: Using target_link_libraries to link A to an internal target B will not only add the linker flags required to link to B, but also the definitions, include paths and other settings – even transitively – if they are configured that way.

To do this, you need to use target_include_directories and target_compile_definitions with the PUBLIC or INTERFACE keywords on your targets. There’s also the PRIVATE keyword that can be used to avoid adding the settings to all dependent targets.

A simple example

Here’s a small example of a library that uses Boost in its headers and therefore wishes to have its clients setup those directories as well:

set(TARGET_NAME cool_lib)

add_library(${TARGET_NAME} STATIC 
  cool_feature.cpp cool_feature.hpp)

target_include_directories(${TARGET_NAME}
  INTERFACE ${CMAKE_CURRENT_SOURCE_DIR})

target_include_directories(${TARGET_NAME} SYSTEM
  PUBLIC ${Boost_INCLUDE_DIR})

Now here’s a program that wants to use that:

set(TARGET_NAME cool_tool)

add_executable(cool_tool main.cpp)

target_link_libraries(cool_tool
  PRIVATE cool_lib)

cool_tool can just #include "cool_feature.hpp" without knowing exactly where it is located in the source tree or without having to worry about setting up the boost includes for itself! Pretty neat!

PRIVATE, PUBLIC and INTERFACE

Typically, you’d use the PRIVATE keyword for includes and definitions that are exclusively used in you implementation, i.e. your *.cpp and *.c files and internal headers. It’s good practice to favor PRIVATE to avoid “leaking” dependencies so they won’t stack up in the dependent libraries and bring down your compile times. The INTERFACE keyword is a bit more curious: For example, with definitions, you can use it to define your .dll interface differently for compilation and usage. For include directories, one common usage is to set the own source directory with INTERFACE if you keep your headers and source files in the same folder. The PUBLIC keyword is used when definitions and includes are relevant for the own and dependent libraries. It pretty much is the combination of PRIVATE and INTERFACE – whenever you’re temped to put something in both of those, put it in PUBLIC instead. It is probably the most common option.

The future!

I hope that all open-source libraries switch to this style sooner rather than later so you can easily include them in your build-trees. Just don’t use the old commands that add properties for all following targets like add_definitions, include_directories etc. and use the commands with the target_ prefix!

All .NET assemblies for one and one for all

Sometimes you have developed a simple utility tool that doesn’t need the directory structure of a full-blown application for resources and other configuration. However, this tool might have a couple of library dependencies. On the .NET platform this usually means that you have to distribute the .dll files for the libraries along with the executable (.exe) file of the tool.

Wouldn’t it be nice to distribute your tool only as a single .exe file, so that users don’t have to drag around a lot of files when they move the tool from one location to another?

In the C++ world you would use static linking to link library dependencies into the resulting executable. For the .NET platform Microsoft provides a command-line tool called ILMerge. It can merge multiple .NET assemblies into a single assembly:

ILMerge

You can either download ILMerge from Microsoft as an .msi package or install it as a NuGet package from the package manager console (accessible in Visual Studio under Tools: Library Package Manager):

PM> Install-Package ilmerge

The basic command line syntax of ILMerge is:

> ilmerge /out:filename <primary assembly> [...]

The primary assembly would be the original executable of your tool. It must be listed first, followed by the library assemblies (.dll files) to merge. Here’s an example, which represents the scenario from the diagram above:

> ilmerge /out:StandaloneApplication.exe Application.exe A.dll B.dll C.dll

Keep in mind that the resulting executable is still dependent on the existence of the .NET framework on the system, it’s not completely independent.

Graphical user interface

There’s also a graphical user interface for ILMerge available. It’s an open-source tool by a third-party developer and it’s called ILMerge-GUI, published on Microsoft’s CodePlex project hosting platform.

ILMerge-GUI

You simply drag and drop the assemblies to merge on the designated area, choose a name for the output assembly and click the “Merge!” button.

Timestamps make horrible identifiers

If you think about using a timestamp or date as an identifier for some kind of entity, object or data record – think again. They are horribly ill-equipped to be identifiers due to their dynamic resolution. Here’s the story how we got to this conclusion.

vetre / fotoliaNot long ago, I’ve struggled with a system that uses timestamps as entity identifiers. What can I say? Timestamps aren’t meant to identify anything else than a specific point in time. Don’t use them as entity identifiers, ever. If you want to know why, I invite you to read on. The blog post is written in Freytag’s dramatic structure for added effect.

Exposition

We’ve designed a system that runs on multiple instances that communicate in all sorts of way. A central archive instance stores all data related to measurements. The whole network revolves around the notion of measurement. Measurement data is the most precious data and all instances will either produce or consume data based on these measurements.

Most important for human operators is an instance that lets you view all existing measurement data. Let’s call it the viewer. The viewer displays an overview list of all measurements in a given context and lets the operator choose to view ever more details of any of them. To be able to provide the overview list as fast as possible, we added a cache that holds the information.

Rising action

This measurement list cache was the source of all kinds of peculiar behaviour of the system. Most, but not all measurement data was incomplete. The list cache entries were assembled from different sources that were available at different times, so it seemed that while one part of the data got written to the cache, another part couldn’t be written for whatever reasons. The operator could load detailed data of some few measurements, but the majority just produced an error message that the data couldn’t be found (despite it being present).
The most obviously broken functionality left the following trace in the log files (paraphrased):

- storing measurement at 2016-02-28T13:25:55.189+01:00 into the list cache
- measurement stored
[...]
- loading measurement at 2016-02-28T13:25:55.189+01:00 from the list cache
- error: measurement not found in list cache

So, the system is essentially telling me that it can’t load some data it just stored. As you can imagine, this may lead to some questions about the sanity of the database product underneath.

Climax

After some investigation and fruitless integration testing, it dawned me: The problem wasn’t timing or the database. All the bugs could be explained with only one circumstance: Measurements were ultimately identified by their timestamp, the moment the measurement was made. There’s also a location, type and some other information in the identifier for each measurement, but only the timestamp changes between two measurements in the same narrow context. And the timestamp was stored in different precisions, depending on the origin of the measurement identifier. Most identifiers were create at the measurement producing system instances (let’s call them measurers) and had millisecond precision. As soon as they got stored in the production database (but not our development database), they lost the milliseconds. And some of the most important measurement data got exported to third-party systems, using a minute-based precision. So we had one measurement identifier in the system, but with three different types, each mostly incompatible to each other.

Falling action

That’s why the log excerpt above never occurred in development, but in production: The measurement is stored in the database, the used identifier gets passed around in the software, but a query on the exact same identifier in the database yields no result because the timestamps now differ in the millisecond range. And the strange effects that sometimes, everything worked just fine? That’s when the milliseconds are zero by chance. Given that most actions in the system are scheduled and performed automatically exactly on the zero mark, the zero milliseconds case happened more often than it would in an even distribution.

Our system dealt with three types of measurement identifiers: Millisecond-precise identifiers produced by the measurers, second-precise identifiers used by the measurement list cache and minute-precise identifiers used (and sometimes fed back into the system) by the data export. These identifiers were incompatible even for the same measurement most of the time, but not always. In unit tests, the timestamps were made up and didn’t reveal the problem properly (who thinks about odd milliseconds when making up a timestamp?).

My solution was to pull this incompatibility up into the type system. Instead of one measurement identifier, there are now three measurement identifiers: MillisecondPreciseIdentifier, SecondPreciseIdentifier and MinutePreciseIdentifier. An identifier of higher precision can be converted to an identifier of lower precision, but not the other way around. Everytime a measurement identifier is created, it needs to explicitely state its precision of the timestamp. This made the compiler highlight the problematic usages clearly as type conflicts and therefore dealing with the problem much easier.

Revelation

Choosing a timestamp as a vital part of a (measurement) identifier was a mistake from the beginning. The greater problem was the omission of the timestamp’s precision. Timestamps perform more like floating-point numbers and less like integers, even if every timestamp can be represented by a long. As soon as I made the precision of each timestamp clear to the compiler, the bugs revealed themselves. The annoying difference between developer and production database would have been detected much sooner, because a millisecond-precise timestamp will now warn in the log files if its millisecond part is zero. As soon as this log entry is seen very often, its clear that something is wrong. The new datatypes not only serve as a clearer API contract definition tool, but also as a runtime sanity check.

If you don’t want to repeat this mistake, keep in mind that each timestamp, date or whatever time-related data type you use will inherently have a maximum precision. As soon as you mix different precisions into the same data type, you’re going to have a bad time. Explicitely state the required precision in your type system and your compiler will keep an eye on it, too.

Simple C++11 – Part III – Best friends

Now that we got the whole rigid setup of how to create a compile unit and a class setup out-of-the-way, we can finally start to write some code. What separates simple modern C++ code from the old ways is the degree of abstraction you can use to write your code. Previously, you had to think in memory and instructions. Now, powerful abstractions and language mechanisms help you to think in values and operations, and still get down to the bare metal of the machine when you need to. Here’s my personal set of “best friend” language and library features that helps me be as expressive as possible in the lower-level application code and still leverage the raw power of C++.

std::vector<T>

With all its simplicity, it is still powerful enough to handle the greater part of all memory management issues. Better yet, it maps excellently to modern hardware and even when used naively, it is often extremely efficient. And in the rare cases when it is not, the performance can usually be easily improved by using std::vector::reserve.

With C++11, you can now even toss it around, nest it and return huge vectors from functions without any performance problems. Also, initializer_lists make it easy to fill them with data.

std::vector<int> my_special_numbers() {
  return {4, 8, 15, 16, 23, 42};
}

Such code is no longer a subtle performance problem, but actually encouraged.

There’s no doubt that whenever you need a container, std::vector should be your first candidate.

for-each

Printing a range like that is now easy. No need to even know about the existence of iterators or use counters:

for (auto&& number : my_special_values()) {
  std::cout << number << std::endl;
}

std::unordered_map<K,V>

For the rare cases when a flat vector will just not suffice, this neat hash-map will make your life easier. C++11’s initializer syntax makes it a lot cleaner to fill these than before:

std::unordered_map<std::string, int>
my_icecream_ratings() {
  return {
    {"vanilla", 3},
    {"chocolate", 9},
    {"strawberry", 8},
    {"raspberry", 7},
    {"lemon", 3}
  };
}

auto

And now working with them becomes nice and easy too:

auto ratings = my_icecream_ratings();
ratings.insert({"caramel", 2});
std::cout << "Chocolate was a "
  << ratings["chocolate"];

You can even change the result type to an unordered_multimap or something similar and the code will still work.

std::shared_ptr<T>

In a perfect or, should I say, functional world, shared ownership would not be a thing. Pointers or even references would not exist. It just makes things a lot more complex than a clear ownership. It just appears that when requirements change, this or that object is no longer exclusively owned by that other object. Or the lifetime of an object cannot easily be scoped in the presence of multithreading. When this happens, and std::shared_ptr will make your tasks bearable. This is as close as you usually get to completely automatic lifetime management in C++.

void save_image_in_background(
  std::shared_ptr<image const> raw_image) {
  auto thread = std::thread([raw_image]{
    raw_image.save("raw.png");
  });
 
  thread.detach();
}

I like to think of pointers as a necessary evil. Sometimes, the alternative just makes things even more confusing, and when that happens, you at least don’t want manual resource management in the way.

Of course, std::unique_ptr seems to a powerful competitor for shared_ptr’s tasks, but in my experience, you very rarely need a single-ownership pointer in application code. Why not use a moveable type instead? unique_ptr can be useful as a helper to implement library primitives, but you should rarely encounter one in application-level code.

Less is more

Note how many fancy C++11 features did not make my list. For example, lambdas are very useful – and I even used one in my shared_ptr example. But they should be used in moderation. They allow to define code out-of-place, to be executed whenever. This makes it harder to reason about them.
Likewise, things like variadic templates are great for library code, but rarely help in application level.

This ends my small series on C++ for now. I hope I have shown how concentrating on a few simple features helps you write more maintainable and less obscure C++ code, on a level of abstraction that is not lower than most comparable languages. Do you have other methods to achieve this? Or do you even want to have this? I’d like to hear!

The JavaScript ‘console’ Object

Most JavaScript developers are familiar with these basic functions of the console object: console.log(), .info(), .warn() and .error(). These functions dump a string or an object to the JavaScript console.

However, the console object has a lot more to offer. I’ll demonstrate a selection of the additional functionality, which is less known, but can be useful for development and debugging.

Tabular data

Arrays with tabular structure can be displayed with the console.table() function:

var timeseries = [
 {timestamp: new Date('2016-04-01T00:00:00Z'), value: 42, checked: true},
 {timestamp: new Date('2016-04-01T00:15:00Z'), value: 43, checked: true},
 {timestamp: new Date('2016-04-01T00:30:00Z'), value: 43, checked: true},
 {timestamp: new Date('2016-04-01T00:45:00Z'), value: 41, checked: false},
 {timestamp: new Date('2016-04-01T01:00:00Z'), value: 40, checked: false},
 {timestamp: new Date('2016-04-01T01:15:00Z'), value: 39, checked: false}
];

console.table(timeseries);

The browser will render the data in a table view:

Output of console.table()
JavaScript console table output

This function does not only work with arrays of objects, but also with arrays of arrays.

Benchmarking

Sometimes you want to benchmark certain sections of your code. You could write your own function using new Date().getTime(), but the functions console.time() and console.timeEnd() are already there:

console.time('calculation');
// code to benchmark
console.timeEnd('calculation');

The string parameter is a label to identify the benchmark. The JavaScript console output will look like this:

calculation: 21.460ms

Invocation count

The function console.count() can count how often a certain point in the code is called. Different counters are identified with string labels:

for (var i = 1; i <= 100; i++) {
  if (i % 15 == 0) {
    console.count("FizzBuzz");
  } else if (i % 3 == 0) {
    console.count("Fizz");
  } else if (i % 5 == 0) {
    console.count("Buzz");
  }
}

Here’s an excerpt of the output:

...
FizzBuzz: 6 (count-demo.js, line 3)
Fizz: 25 (count-demo.js, line 5)
Buzz: 13 (count-demo.js, line 7)
Fizz: 26 (count-demo.js, line 5)
Fizz: 27 (count-demo.js, line 5)
Buzz: 14 (count-demo.js, line 7)

Conclusion

The console object does not only provide basic log output functionality, but also some lesser-known, yet useful debugging helper functions. The Console API reference describes the full feature set of the console object.