I have changed my stance on “using” in C++ headers

I used to be pretty strictly against using either C++ using-directives or -declarations from within header files. It kind of stuck with me as a no-go. But that has changed in recent years.

There are now good cases where using can go into a header. For example, I do not really like putting things like…

using namespace std::string_literals;
using namespace std::string_view_literals;
using namespace std::chrono_literals;

…at the beginning of each source file. Did you know that you can pull all those (and some more) in with a single using namespace std::literals? Either way, in my newer projects, these usually go into one of the more prominent headers. Same goes for other literal operators such as those from the SI library. And so do using declarations for common vocabulary types. E.g. 2D or 3D vector types , in math heavy projects. Of course, they always go after the specific #include(s) the using is referencing. The benefits of doing that usually outweigh the danger of name-clashes and weird order dependencies.

There are cases where I still avoid using in headers however, and that is when the given header is ‘public’, i.e. being consumed by something that is not under my organization’s control. In that case, you better leave that decision to the library consumer.

A Tale of Hidden Variables

Today was one of those frustrating moments that every developer encounters at some point. We were working on a Docker Compose setup and observed behavior that could only happen if a specific environment variable had been set. To ensure that this environment variable wasn’t being set I scoured through the Docker Compose file, checked the local environment variables using the export command, and grepped all the relevant files in the project directory. But no matter what I did, this environment variable was still haunting us, wreaking havoc on the setup.

After what felt like an eternity of troubleshooting, we finally uncovered the culprit: an old, hidden .env file left over from a long-forgotten configuration. This file had been silently setting the environment variable I was desperately trying to eliminate.

Here’s how it all unfolded and what I learned from the experience:

When I first suspected that the environment variable might be lurking somewhere in the project, my instinct was to use grep to search for it in all the files within my local directory. I ran something along the lines of:

grep -r 'MY_ENV_VAR' *

To my surprise, nothing relevant showed up. I had expected this command to search through everything in my local directory. However, I had forgotten one important detail: grep doesn’t search hidden files by default when you use *.

Since .env files are typically hidden (starting with a dot), grep completely skipped over them. Little did I know, that old .env file was sitting quietly in the background, setting the environment variable that was causing all my issues.

After some frustration, my colleague finally had the realization that there might be hidden files at play. In Unix-like operating systems, files that start with a dot (.), like .env, are treated as hidden and are not listed or searched by default with common commands. Just as hidden variables in physics could influence particles without being directly observable, the hidden .env file was affecting my environment variables without being immediately visible.

To include hidden files in your search, you need to modify the grep command to look for them explicitly:

grep -r 'MY_ENV_VAR' . --include=".*"

This experience led me to reflect on whether deployment-relevant files like .env should be hidden in the first place, since they can easily be overlooked during debugging. It also makes them more prone to being forgotten. Hidden files are easy to miss when troubleshooting, especially when you’re under pressure.

Given that .env files can have a significant impact on the behavior of applications, containerized setups, and CI/CD pipelines, making them hidden by default might not always be the best approach. After all, if an environment variable has the power to alter how an entire application runs, it’s something we want to be highly visible and readily accessible.

In the end, this experience taught me two important lessons:

Always search for hidden files when troubleshooting issues related to environment variables. If your Docker Compose or other environment-dependent setups aren’t behaving as expected, don’t forget to check for hidden .env files.

Consider the visibility of critical configuration files. Should .env files be hidden by default, or should they be treated as first-class citizens in our directory structures? In many cases, keeping them visible might help avoid unexpected behavior and wasted hours of debugging.

Every Unit Test Is a Stage Play – Part II

In the first part of this series, I talked about describing unit tests with the metaphor of a stage play that tells short stories about your system.

This metaphor holds its water in nearly every aspect of unit testing and guides my approach from each single line of code to the whole concept of test coverage. In this series, I’m focussing on one aspect in each part.

Today, we look at the backdrop.

We learnt from the first part that most unit tests are performed by four roles that appear on stage. In a theater, the stage is oftentimes decorated by additional items that facilitate the story. This is the backdrop (or the coulisse) of the play. We have the same thing in unit tests.

In a unit test, the stage is the code inside the test method:

@Test
public void rounds_up_to_the_next_decimal_power() {
    final Configuration given = new Configuration(
        StringVirtualFile.createFileFromContent(
            "report.config",
	    "scale.maximum=2E5"
	)
    );
    final ReportConfiguration target = new ReportConfiguration(
        given
        SuffixProvider.none
    );
    final Optional<Double> actual = target.scaleMaximum();
    final double expected = 1E6;
    assertThat(actual).contains(expected);
}

A good director is very picky about every detail that appears on stage. There should be no incidental item visible for the audience. In the case of an unit test, the audience is the next developer that reads the test code.

The stage doesn’t need to be devoid of “extras” and furniture, but it should be limited to the essential. A theater play isn’t a movie set where eye candy is seen as something positive. During the play, the audience should recognize the actors (and their roles) easily and without searching between all the clutter.

So we need to do two things: Declutter the stage and identify the extras.

Decluttering the stage

In our example above, there is a mismatch between the code required to instantiate the given role and the rest of the whole test. If this were a real play, half the time would be spent on an extra that doesn’t even speak a line. It is implied that the target gets some information from the given, but it isn’t shown. We need to remove some details from the stage that aren’t even relevant to the story, but introduced in detail just like the essential things. It might be fun and suspenseful to guess if the “report.config” detail in line three is more meaningful than the “scale.maximum” specified in line four, but unit test stories are not meant to be mysterious or even entertaining. They are meant to inform the audience about a little fact of the tested system. There will be thousands of little stories about the system. Make them trivial to read and easy to understand.

We need to move the stage props off the stage:

@Test
void rounds_up_to_the_next_decimal_power() {
    Configuration given = givenScaleMaximumOf("2E5");
    final ReportConfiguration target = new ReportConfiguration(
        given,
        irrelevant()
    );
    final Optional<Double> actual = target.scaleMaximum();
    final double expected = 1E6;
    assertThat(actual).contains(expected);
}

private Configuration givenScaleMaximumOf(
    String scaleMaximum
) {
    return new Configuration(
        StringVirtualFile.createFileFromContent(
            "report.config",
            "scale.maximum=" + scaleMaximum
        )
    );
}

private SuffixProvider irrelevant() {
    return SuffixProvider.none;
}

By moving the initialization code of the Configuration object to a new private method, we employ the storytelling device of “conveniently prepared circumstances”. The private method is moved to the “lower decks” or the “backstage” area of our test class. “The stage is on top” might be our motto. Everything “private” or not-Test-annotated should not be visible first and should not be required reading.

Notice how I introduced a parameter to the new “givenScaleMaximumOf” method in order to keep the necessary test details on stage. The audience should not have to look behind the curtains to gather important information about the story. I made the parameter a string (and not a double or integer) because it is just a prop. The story doesn’t benefit from it being typecasted correctly. And if you look back, it was a string before, too.

Identifying the extras

I’ve also extracted the “magic number” or “silent extra” SuffixProvider.none into its own method. This method adds nothing but a name that conveys the meaning of the value to the story instead of the value itself. If I were a directory in a theater, this actor would have a plain and bland costume in contrast to the bright colored main roles. Maybe the stage lighting would illuminate the main roles and keep the extra in the shadowy area, too.

Now, the focus of our test method is back on the main story. The attention of our audience will be on the stage and it will not be burdened by irrelevant details. We will even label props as dispensable if they are.

Keep your stages free from clutter and the eyes of your audience on the story. Test code is boring by nature. It doesn’t have to be plodding, too.

Epilogue

This is the second part of a series. All parts are linked below:

Every Unit Test Is a Stage Play – Part I

At the last dev brunch, I got the recommendation about a talk that tries to explain functional programming differently. What really got me was the effectiveness of the changed vocabulatory. I’ve seen this before, in the old talk about test driven development and behaviour driven development. But in my head, I think about unit tests with another overarching metaphor that I’m trying to explain in this blog post series:

Every unit test is a stage play that tells a short story about your system.

And this metaphor really guides my approach to nearly any aspect of unit testing, from each single line of code to the whole concept of test coverage. So I’m breaking my explanation into (at least) five parts and focus on one aspect in each part.

Today, we look at the actors.

In each classic play, there are well-known roles that can be played by nearly any human, but always stay the same role. There’s the hero, the (comedic) sidekick and of course, the villain or antagonist. In every show of Romeo and Juliet, there will be a Romeo. It might not be the most convincing Romeo ever, but the role stays the same, no matter the cast.

The same thing is true for every well-formed unit test. There are four roles that always appear on stage:

  • target: This is the object under test or the code under test if you don’t use objects. The target is probably different for every unit test you write, but the role is always present. I’ve seen it being called “cut” for “code under test”, but I prefer “target”. If you see a reference named “target” in my test code, you can be sure about the role it plays in the story.
  • actual: If you can design your code to adhere to the simple “parameters in, result out” call pattern, the “result out” is the “actual”. It is what your target produced when challenged by the specific parameters of your test. One trick to testable code is to design the “actual” role as being quite simple and “flat”.
  • expected: This might be the closest thing to an antagonist in your play. The “expected” role is filled with the value (or values) that your “actual” is measured against. If your “actual” is simple, your “expected” will be simple, too. If your “actual” is too complex, the “expected” role will be overbearing. In any case, the “expected” role is what drives your assertions.
  • given: Our hero, the “target”, is often dependent on entry parameters or secondary objects (mocked or not). These sidekicks are part of the “given” role. You might think about the “given-when-then” storytelling structure of behaviour driven design for the name. If you strive for a simple code structure, the required “given” in your unit test should be manageable.

As you can see, the story of a typical unit test is always the same: The target, with the help of the given, produces an actual that competes against the expected.

If this story has a happy end, your test runs green. If the actual fails the expectation, your test runs red. If the target fails to produce an actual at all (think about an exception or error), your whole play falls apart and the test runs red.

Enough theory, let’s look at an unit test that uses the four roles:

@Test
public void rounds_up_to_the_next_decimal_power() {
    final Configuration given = new Configuration(
        StringVirtualFile.createFileFromContent(
            "report.config",
	    "scale.maximum=2E5"
	)
    );
    final ReportConfiguration target = new ReportConfiguration(
        given
        SuffixProvider.none
    );
    final Optional<Double> actual = target.scaleMaximum();
    final double expected = 1E6;
    assertThat(actual).contains(expected);
}

I’ve highlighted the roles for better visibility. Note that for a role to appear in the play, it doesn’t really have to be named explicitely. Most of the time, the last two lines would be collated into one:

assertThat(actual).contains(1E6);

You can still see the “expected” role play its part, but not as prominent as before.

You also probably saw the extra “given” that wasn’t highlighted:

SuffixProvider.none

It might be relevant to the story or really be an uncredited extra that is not crucial in the target’s journey to produce the correct actual. If that’s the case, it seems appropriate not to name it. We will learn about techniques that I use to make these extras more nondescript in a later part. Right now, we can differentiate between main roles that are named and secondary roles that are just there, as part of the scenery. Just don’t fool your audience by having an unnamed actor contribute an important piece to the story’s success. That might be a cool plot twist, but I’m not here to be surprised.

Let your tests perform boring plays, but lots of them.

By using the four roles of test play, you make it clear to the reader (your real audience) what to expect from your test code parts. Don’t name irrelevant test code parts and only omit the role names if there are no extras on stage.

Your audience will still find your play boring (that’s the fate of nearly all test code), but it won’t feel disregarded or, even worse, deceived.

Epilogue

This is the first part of a series. All parts are linked below:

The algorithm in an algorithm – Builder design pattern

In the following blog post, I would like to explain to you, the design pattern builder, why this is an algorithm in the algorithm and what advantages result from it.

General

The builder is a creational design pattern. It separates the construction of complex objects from their representations, allowing the same construction processes to be reused.

The design pattern consists of a director, the builder interface, and concrete builder implementations. The director is responsible for the abstract construction of the product and has a defined interface with the builder to pass the design instructions. The concrete builders then build the concrete product according to the instructions and can also provide the generated product.

So in the end, the director defines its own little programming language inside the program where the construction instructions can be programmed as algorithm. The builder then executes that algorithm. So we have a program in the program, an algorithm in the algorithm. Crazy!

Example cake recipe

We know such procedures from real life. For example, from the kitchen. When you bake a cake, you take your yellow mixing bowl, the ingredients, and the blue mixer and make the dough. Very concrete.

But now, if someone asks about the recipe, then we abstract it from our concrete equipment to a general manual. There, it only says you are mixing the ingredients, and your yellow mixing bowl and blue mixer are not mentioned. So someone else can bake the cake in their own kitchen with their own equipment. Should your blue mixer ever fail, you can easily carry out the recipe with a whisk or with the new food processor.

Example file generation

An example from programming is the generation of a file. For example, a pdf certificate. If you program everything directly in PDFBox, it works first. But if you ever want to use a different library, or if you also want the certificate as a normal text document or image, you need to rewrite everything.

With the design pattern, you would have an algorithm that says I want “certificate” as a title, then a dividing line, then a table and then this paragraph. Exactly how this will be implemented is not known. The PDFBox builder takes these instructions and creates the file with its own library-specific commands.

If the library or file type changes, only one new builder needs to be written. For example, a text file builder, an image builder or an OpenPDF builder. The logic of how the certificate should look at the end remains unchanged.

Conclusion

Finally, separating the construction from the production offers some advantages. The program is more expandable and modifiable. It also complies with the single responsibility principle. The disadvantage is a close coupling between the product, the concrete builder, and the classes involved in the construction, making it difficult to change the basic process.

Unexpected: Web Sockets close when downloading a file

The browsers work in mysterious ways.

That is, mostly they have certain reasons for behaving why they do, but because so many details are too sophisticated as to have been explicitly defined in some specification (…yet).

Recently, we came across a new problem that was surprisingly non-googleable.

Many of our interactive Web GUIs are based on Web Sockets for real-time updates. As is usual. Now, one live system showed a strange effect, because our Web Socket always broke down when the user clicked on a <a>...</a> link to download a file.

This was strange for multiple reasons, but it boiled down to:

  • It happened under the current Firefox, but not under Chrome
  • It happened not when the link had target = "_blank" (i.e. the link was opening in a new window)
  • The Web Socket closed with the “going away” status code 1001, indicating no further error

And to boil it further down, the solution was to definitely include the download attribute for this particular <a> Element.

The web socket stayed open for both browsers in both of these cases:

<a href="..." target="_blank" rel="noopener noreferrer">
  This leaves the Web Socket intact
</a>

<a href="..." download>
  This also leaves the Web Socket intact
</a>

<a href="...">
  This might close the Web Socket, or not, just as the browser feels today
</a>

(For the rel="noopener noreferrer", see also here.)

The data from the href endpoint was served with the corresponding Content-Disposition and Content-Type HTTP headers. For Chrome, this is enough to identify that link as a download. Firefox was not so sure, it believes that you are actually “going away”.

Efficient integer powers of floating-point numbers in C++

Given a floating-point number x, it is quite easy to square it: x = x * x;, or x *= x;. Similarly, to find its cube, you can use x = x * x * x;.

However, when raising it to the 4’th power, things get more interesting: There’s the naive way: x = x * x * x * x;. And the slightly obscure way x *= x; x *= x; which saves a multiplication.

When raining to the 8’th power, the naive way really loses its appeal: x = x * x * x * x * x * x * x * x; versus x *= x; x *= x; x *= x;, that’s 7 multiplications version just 3. This process can easily be extended for raising a number to any power-of-two N, and will only use O(log(n)) multiplications.

The algorithm can also easily be extended to work with any integer power. This works by decomposing the number into product of power-of-twos. Luckily, that’s exactly what the binary representation so readily available on any computer is. For example, let us try x to the power of 20. That’s 16+4, i.e. 10100 in binary.

x *= x; // x is the original x^2 after this
x *= x; // x is the original x^4 after this
result = x;
x *= x; // x is the original x^8 after this
x *= x; // x is the original x^16 after this
result *= x;

Now let us throw this into some C++ code, with the power being a constant. That way, the optimizer can take out all the loops and generate just the optimal sequence of multiplications when the power is known at compile time.

template <unsigned int y> float nth_power(float x)
{
  auto p = y;
  auto result = ((p & 1) != 0) ? x : 1.f;
  while(p > 0)
  {
    x *= x;
    p = p >> 1;
    if ((p & 1) != 0)
      result *= x;
  }

  return result;
}

Interestingly, the big compilers do a very different job optimizing this. GCC optimizes out the loops with -O2 exactly up to nth_power<15>, but continues to do so with -O3 on higher powers. clang reliably takes out the loops even with just -O2. MSVC doesn’t seem to eliminate the loops at all, nor does it remove the multiplication with 1.f if the lowest bit is not set. Let me know if you find an implementation that MSVC can optimize! All tested on the compiler explorer godbolt.org.

Oracle DB: How to Pick the Right Function for Current Date and Time


When working with date and time in Oracle, you have several functions available to get the current date and time. Three important ones are CURRENT_DATE, CURRENT_TIMESTAMP, and SYSDATE. Let’s see how they are different and when you should use each one.

The CURRENT_DATE function gives you the current date and time based on the time zone of the session you are in. It returns this information in a simple DATE format, which includes the date and time up to the second but does not show fractions of a second or the time zone. For example, if you run:

SELECT CURRENT_DATE FROM dual;

You might get a result like 29-JUL-24 03:43:19 PM. This shows the current date and time according to your session’s time zone.

You can set the session’s time zone to a specific offset from UTC. For example, to set the time zone to UTC+5:30:

ALTER SESSION SET TIME_ZONE = '+05:30';

Use CURRENT_DATE when you need the date and time for tasks that are specific to a certain time zone. It’s good for simple reports or calculations that don’t need to worry about fractions of a second or different time zones.

The CURRENT_TIMESTAMP function provides more detail. It gives you the current date and time, including fractions of a second and the time zone. This function returns the value in the TIMESTAMP WITH TIME ZONE format. For example, if you run:

SELECT CURRENT_TIMESTAMP FROM dual;

You might see something like 29-JUL-24 15.43.19.123456 PM +01:00. This includes the date, time, fractions of a second, and the time zone offset.

Use CURRENT_TIMESTAMP when you need precise time details, such as for logging events, tracking changes, or working across different time zones. It’s useful when you need to know the exact time down to the fraction of a second and the time zone.

The SYSDATE function gives you the current date and time from the database server’s clock. It’s similar to CURRENT_DATE in that it returns the date and time up to the second but not fractions of a second or time zone information. For example, if you run:

SELECT SYSDATE FROM dual;

You might get 29-JUL-24 03:43:19 PM. This shows the current date and time based on the server’s clock.

Use SYSDATE when you need the current date and time according to the server, not the session. This is helpful for server-side operations, scheduling tasks, and ensuring consistency across database operations that rely on the server’s time.

With this information, you should now be able to choose the right function for your use case.

Defeating TimescaleDB or shooting yourself in the foot

Today, almost everything produces periodically data: A car, cloud-connected smart-home solutions, your “smart” refrigerator and yourself using a fitness tracker. This data has to be stored in a certain way to be usable. We want fast access, beautiful charts and different aggregations over time.

There a several options both free and commercial for storing such time-series data like InfluxDB, Prometheus or TimescaleDB. They are optimised for this kind of data taking advantage of different time-series properties:

  • Storing is (mostly) append-only
  • Data points have a (strict) ordering in time
  • Data points often have fixed and variable meta data in addition to the data value itself

In one of our projects we have to store large quantities of data every 200ms. Knowing PostgreSQL as a powerful relational database management system we opted for TimescaleDB that promises time-series features for an SQL database.

Ingestion, probably the biggest problem of traditional (SQL) databases, of the data worked as expected from the beginning. We were able to insert new data at a constant rate without degrading performance.

The problem

However we got severe performance problems when querying data after some time…

So one of the key points of using a time-series database strangely did not work for us… Fortunately, since Timescale is only an extension to PostgreSQL we could use just use explain/explain analyse with our slow queries to find out what was going on.

It directly showed us that *all* chunks of our hypertable were queried instead of only the ones containing the data we were looking for. Checking our setup and hypertable definition showed that Timescale itself was working as expected and chunking the data correctly on the time axis.

After a bit more analysis and thought is was clear: The problem was our query! We used something like start_time <= to AND from <= end_time in our where clause to denote the time interval containing the requested data. Here start_time is the partition column of our hypertable.

This way we were forcing the database to look at all chunks from long ago until our end-time timestamp.

The solution

Ok, so we have to reformulate our where clause that only the relevant chunks are queried. Timescale can easily do this when we do something like start_time >= from AND start_time < to where from and to are the start and end of our desired interval. That way usually only one or only a few chunks of the hypertable are search and everything is lighning-fast even with billions of rows in our hypertable.

Of course the results of our two queries are not 100% the same sematically. But you can easily achieve the desired results by correctly calculation the start and end of the time-series data to fetch.

Conclusion

Time-series databases are powerful tools for todays data requirements. As with other tools you need some understanding of the underlying concepts and sometimes even implementation to not accidently defeat the purpose and features of the tools.

Used in the correct way they enable you to work with large amounts of data in a performant way todays users would expect. We had similar experiences with full text search engines like solr and ElasticSearch, too.

A neat trick for finding dates

One of my fundamental driving forces for designing a suitable user experience is the motto of “no scrolling”. It is not a hard principle in the sense of Bret Victor’s “Inventing on Principle”, but it is motivation enough to always look for context-aware ordering of lists and tables.

One of the cornerstones of such “scrolling-free” or “scrolling-reduced” applications is a possibility for the user to define a context of work. You might call it an ordinary search field, but it is highly interactive (immediate feedback, as mandated by Bret Victor) and doesn’t lead to a special result view. It just reduces the amount of data that is presented to the user. The idea behind the context field is that the user has some idea about the piece of work she wants to edit. So, if for example she remembers that the customer of an order was named “Miller”, she would type “mil” in the context field and have the list of orders reduced to just the “Millers” and “Camilles” (the latter contains “mil” in the middle).

This works fine for text-based information, but less so for numerical data. Most users typically remember text like names better than numbers like values or phone numbers, so it fits the natural inclination of humans. And then there is a form of data that is remembered easily and used for orientation, but presented as numbers: dates.

If you store a date in a persistent storage, it is probably stored as a number, several numbers or a short piece of text like “2024-12-24”. To make a date searchable, the textual representation of “2024-12-24” is a good start, but we can do better with a simple trick:

Instead of just using one textual representation for the search index (or whatever search functionality you use), you can append several representations of the same date at once:

“2024-12-24 Tuesday, December 24, 2024”

Here, we’ve used a code like this:

String searchtext = xmas.toString() + " " + xmas.format(DateTimeFormatter.ofLocalizedDate(FormatStyle.FULL));

This enables your user to restrict to a context of “dec” for “December” or even “tue” for “Tuesday”.

This search-extended representation of the date is never shown on screen, so it doesn’t have to be readable or brief. It should contain the concepts of managing dates that your users will facilitate during their normal interaction with the data (not with your application!). So it might even be useful to add several more representations like a relative period (“in 5 months, 16 days”) and maybe special date names (“christmas, xmas”). It might be useful to cut down on filler words like “months” or “days”, because they are included in virtually every date you’ll search. So the relative period might come down to “5 five, 16 sixteen”. If your search allows for multiple texts that all need to be present (which I would encourage because in my experience, that’s how people triangulate: “It was Miller and at the end of the year”), you might add the filler words again, because it allows for a context string like “5 mo” with matches with “in 5 months” (and “5 months ago”, but that’s another topic).

In a nutshell, my trick is to craft the textual representation of data for the search (not the visualisation!) in accordance to the navigation patterns of my users. If they can rely on the data being focussed effectively, they won’t miss the scrollbars.