mariuselvert

Creating functors with lambda factories in C++

Back before C++11, the recommended way to customize the behavior of an algorithm was to write a functor-struct, e.g. a small struct overloading operator(). E.g. for sorting by a specific key:

struct key_less {
  bool operator()(T const& lhs, T const& rhs) const { return lhs.key < rhs.key; }
};

// ... later
std::sort(c.begin(), c.end(), key_less{});

Of course, the same logic could be implemented with a free function, but that was often advised against because it was harder for the compiler to inline. Either way, if you had to supply some context variables, you were stuck with the verbose functor struct anyways. And it made it even more verbose. Something like this was common:

struct indirect_less {
  indirect_less(std::vector<int> const& indirection)
  : indirection_(indirection) {}

  bool operator()(T const& lhs, T const& rhs) const
  { return indirection_[lhs.key] < indirection_[rhs.key]; }

private:
  std::vector<int> const& indirection_;
};

That all changed, of course, when lambdas were added to the language, and you could write the same thing as:

auto indirect_less = [&](T const& lhs, T const& rhs)
  { return indirection[lhs.key] < indirection[rhs.key]; };

At least as long as the indirection std::vector<> is in the local scope. But what if you wanted to reuse some more complicated functors in different contexts? In that case, I often found myself reverting back to the old struct pattern. Until recently I discovered there’s a lot nicer way, and I ask myself how I missed that for so long: functor factories. E.g.

auto make_indirect_less(std::vector const& indirection) {
  return [&](T const& lhs, T const& rhs) { /* ... */ };
}

Much better than the struct! This has been possible since C++14’s return type deduction, so a pretty long time. Still, I do not think I have come across this pattern before. What about you?

Use duplication to make your single source of truth

Having a single source of truth is one of the big tenets of programming. It is easy to see why. If you want to figure out something about your program, or change something, you just go to the corresponding source.

One of the consequences of this is usually code duplication, but things can get a lot more complicated very fast, when you think of knowledge duplication or fragmentation, instead of just code. Quite unintuitively, duplication can actually help in this case.

Consider the case where you serialize an enum value, e.g. to a database or a file. Suddenly, you have two conceptual points that ‘know’ about the translation of your enum literals to a numeric or string value: The mapping in your code and the mapping implicitly stored in the serialization. None of these two points can be changed independently. Changing the serialized content means changing the source code and vice-versa.

You could still consider your initial enum to value mapping the single source of truth, but the problem is that you can easily miss disruptive changes. E.g. if you used the numeric value, just reordering the enumerated will break the serialization. If you used the text name of the enum, even a simple rename refactoring will break it.

So to deal with this, I often build my own single source of truth: a unit test that keeps track of such implicit value couplings. That way, the test can tell you when you are accidentally breaking things. Effectively, this means duplicating the knowledge of the mapping to a ‘safe’ space: One that must be deliberately changed, and resists accidentally being broken. And then that becomes my new single source of truth for that mapping.

How I accidentally cut my audio-files in half

A couple of weeks ago, I asked my brother to test out my new game You Are Circle (please wishlist it and check out the demo, if that’s up your alley!) and among lots of other valuable feedback, he mentioned that the explosion sound effects had a weird click sound at the end that he could only hear with his headphones on. For those of you not familiar with audio signal processing, those click or pop sounds usually appear when the ‘curvy’ audio signal is abruptly cut off¹. I did not notice it on my setup, but he has a lot of experience with audio mixing, so I trusted his hearing. Immediately, I looked at the source files in audacity:

They looked fine, really. The sound slowly fades out, which is the exact thing you need to do to prevent clicks & pops. Suspecting the problem might be on the playback side of his particular setup, I asked him to record the sound on his computer the next time he tested and then kind of forgot about it for a bit.

Fast-forward a couple of days. Neither of us had followed up on the little clicky noise thing. While doing some video captures with OBS, I noticed that the sound was kind of terrible in some places, the explosions in particular. Maybe that was related?

While building a new version of my game, Compiling resources... showed up in my console and it suddenly dawned on me: What if my home-brew resource compiler somehow broke the audio files? I use it to encode all the .wav originals into Ogg Vorbis for deployment. Maybe a badly configured encoding setup caused the weird audio in OBS and for my brother? So I looked at the corresponding .ogg files, and to my surprise, it indeed had a small abrupt cut-off at the end. How could that happen? Only when I put both the original and the processed file next to each other, did I see what was actually going on:

It’s only half the file! How did that happen? And what made this specific file so special for it to happen? This is one of many files that I also convert from stereo to mono in preprocessing. So I hypothesized that might be the problem. No way I missed all of those files being cut in half though, or did I? So I checked the other files that were converted from stereo to mono. Apparently, I did miss it. They were all cut in half. So I took a look at the code. It looked something like this:

while (keep_encoding)
{
  auto samples_in_block = std::min(BLOCK_SIZE, input.sample_count() - sample_offset);
  if (samples_in_block != 0)
  {
    auto samples_per_channel = samples_in_block / channel_count;
    auto channel_buffer = vorbis_analysis_buffer(&dsp_state, BLOCK_SIZE);
    auto input_samples = input.samples() + sample_offset;

    if (convert_to_mono)
    {
      for (int sample = 0; sample < samples_in_block; sample += 2)
      {
        int sample_in_channel = sample / channel_count;
        channel_buffer[0][sample_in_channel] = (input_samples[sample] + input_samples[sample + 1]) / (2.f * 32768.f);
      }
    }
    else
    {
      for (int sample = 0; sample < samples_in_block; ++sample)
      {
        int channel = sample % channel_count;
        int sample_in_channel = sample / channel_count;
        channel_buffer[channel][sample_in_channel] = input_samples[sample] / 32768.f;
      }
    }

    vorbis_analysis_wrote(&dsp_state, samples_per_channel);
    sample_offset += samples_in_block;
  }
  else
  {
    vorbis_analysis_wrote(&dsp_state, 0);
  }

  /* more stuff to encode the block using the ogg/vorbis API... */
}

Not my best work, as far as clarity and deep nesting goes. After staring at it for a while, I couldn’t really figure out what was wrong with it. So I built a small test program to debug into, and only then did I see what was wrong.

It was terminating the loop after half the file, which now seems pretty obvious given the outcome. But why? Turns out it wasn’t the convert_to_mono at all, but the whole loop. What’s really the problem here is mismatched and imprecise terminology.

What is a sample? The audio signal is usually sampled several thousand times (44.1kHz, 48kHz or 96kHz are common) per second to record the audio waves. One data point is called a sample. But that is only enough of a definition if the sound has a single channel. But all those with convert_to_mono==true were stereo, and that’s exactly were the confusion is in this code. One part of the code thinks in single-channel samples, i.e. a single sampling time-point has two samples in a stereo file, while the other part things in multi-channel samples, i.e. a single sampling time-point has only one stereo sample, that consists of multiple numbers. Specifically this line:

auto samples_in_block = std::min(BLOCK_SIZE, input.sample_count() - sample_offset);

samples_in_block and sample_offset use the former definition, while input.sample_count() uses the latter. The fix was simple: replace input.sample_count() with input.sample_count() * channel_count.

But that meant all my stereo sounds, even the longer music files, were missing the latter half. And this was not a new bug. The code was in there since the very beginning of the git history. I just didn’t hear its effects. For the sound files, many of them have a pretty long fade out in the second half, so I can kind of get why it was not obvious. But the music was pretty surprising. My game music loops, and apparently, it also loops if you cut it in half. I did not notice.

So what did I learn from this? Many of my assumptions while hunting down this bug were wrong:

My brother’s setup did not have anything to do with it.
Just because the original source file looked fine, I thought the file I was playing back was good as well.
The bad audio in OBS did not have anything to do with this, it was just recorded too loud.
The ogg/vorbis encoding was not badly configured.
The convert_to_mono switch or the special averaging code did not cause the problem.
I thought I would have noticed that almost all my sounds were broken for almost two years. But I did not.

What really cause the problem was an old programming nemesis, famously one of the two hard things in computer science: Naming things. There you have it. Domain language is hard.

I think this is because this sudden signal drop equates to a ‘burst’ in the frequency domain, but that is just an educated guess. If you know, please do tell. ↩︎

Project paths in launch.vs.json with CMake presets

Today I was struggling with a relatively simple task in Visual Studio 2022: pass a file path in my source code folder to my running application. I am, as usual, using VS’s CMake mode, but also using conan 2.x and hence CMake presets. That last part is relevant, because apparently, it changes the way that .vs/launch.vs.json gets its data for macro support.

To make things a little more concrete, take a look at this, non-working, .vs/launch.vs.json:

{
  "version": "0.2.1",
  "defaults": {},
  "configurations": [
    {
      "type": "default",
      "project": "CMakeLists.txt",
      "projectTarget": "application.exe (src\\app\\application.exe)",
      "name": "application.exe (src\\app\\application.exe)",
      "env": {
        "CONFIG_FILE": "MY_SOURCE_FOLDER/the_file.conf"
      }
    }
  ]
}

Now I want MY_SOURCE_FOLDER in the env section there to reference my actual source folder. Ideally, you’d use something like ${sourceDir}, but VS 2022 was quick to tell me that it failed evaluation for that variable.

I did, however, find an indirect way to get access to that variable. The sparse documentation really only hints at that, but you can actually access ${sourceDir} in the CMake presets, e.g. CMakeUsersPresets.json or CMakePresets.json. You can then put it in an environment variable that you can access in .vs/launch.vs.json. Like this in your preset:

{
  ...
  "configurePresets": [
    {
      ...
      "environment": {
        "PROJECT_ROOT": "${sourceDir}"
      }
    }
  ],
  ...
}

and then use it as ${env.PROJECT_ROOT} in your launch config:

{
  "version": "0.2.1",
  "defaults": {},
  "configurations": [
    {
      "type": "default",
      "project": "CMakeLists.txt",
      "projectTarget": "application.exe (src\\app\\application.exe)",
      "name": "application.exe (src\\app\\application.exe)",
      "env": {
        "CONFIG_FILE": "${env.PROJECT_ROOT}/the_file.conf"
      }
    }
  ]
}

Hope this spares someone the trouble of figuring this out yourself!

The Estranged Child

One code construct I encounter every now and then is what my colleague appropriately dubbed “the extranged child”, after I said that I do not have a proper name for it. It happens in OOP when a parent object creates child objects, and then later needs to interact with that child:

class Child { /* ... */ }

class Parent
{
  Child Create() { /* ... */ }
  void InteractWith(Child c) { /* ... */ }
}

This is all good, but as soon as inheritance enters the picture, it becomes more complicated:

abstract class BaseChild { /* ... */ }
abstract class BaseParent
{
  public abstract BaseChild Create();
  public abstract void InteractWith(BaseChild child);
}

class RealChild : BaseChild { }
class RealParent : BaseParent
{
  public override BaseChild Create()
  {
    return new RealChild( /* ... */ );
  }

  public override void InteractWith(BaseChild child)
  {
    // We really want RealChild here...
    var realChild = child as RealChild;
  }
}

The interaction often needs details that only the child type associated with that specific parent type has, so that involves a smelly downcast at that point.

Just looking at the type system, this now violates the Liskov-Substitution-Principle also known as the L from SOLID.

One possible solution is adding a precondition for the InteractWith function. Something along the lines of “May only be called with own children”. That works, but cannot be checked by a compiler.

Another solution is to move the InteractWith function into the child, because at the point when it is created, it can know its real parent. That may not be the natural place for the function to go. Also, it requires the child to keep a reference to its parent, effectively making it a compound. But this approach can usually be done, as long as the relation of ‘valid’ child/parent types is one to one.

If you have a parent object that can create different kinds of children that it later needs to interact with, that approach is usually doomed as well. E.g. let the parent be a graphics API like OpenGL or DirectX, and the children be the resources created, like textures or buffers. For drawing, both are required later. At that point, really only the precondition approach works.

On the implementation side, the “ugly” casts remain. Stand-ins for the children can be used and associated with the data via dictionaries, hash-tables or any other lookup. This approach is often coupled with using (possibly strongly typed) IDs as the stand-ins. However, that really only replaces the downcast with a lookup, and it will also fail if the precondition is not satisfied.

Have you encountered this pattern before? Do you have a different name for it? Any thoughts on designing clean APIs that have such a parent-child relationship in a hierarchy? Let me know!

Calculating the Number of Segments for Accurate Circle Rendering

A common way to draw circles with any kind of vector graphics API is by approximating it with a regular polygon, e.g. as a regular polygon with 32 sides. The problem with this approach is that it might look good in one resolution, but crude in another, as the approximation becomes more visible. So how do you pick the right number of sides $N$ for the job? For that, let’s look at the error that this approximation has.

A whole bunch of math

I define the ‘error’ of the approximation as the maximum difference between the ideal circle shape and the approximation. In other words, it’s the difference of the inner radius and the outer radius of the regular polygon. Conveniently, with a step angle $\alpha=\frac{2\pi}{N}$ the inner radius is just the outer radius $r$ multiplied by the cosine of half of that: $r\times\cos\frac{\alpha}{2}$ . So the error is $r-r\times\cos\frac{\pi}{N}$ . I find it convenient to use relative error $\epsilon$ for the following, and set $r=1$ :

$\epsilon=1-cos\frac{\pi}{N}$

The following plot shows that value for $N$ going from 4 to 256:

Plot showing the relative error numbers of subdivision

As you can see, this looks hyperbolic and the error falls off rather fast with an increasing number of subdivisions. This function lets use figure out the error for a given number of subdivisions, but what we really want is he inverse of that: Which number of subdivisions do we need for the error to be less than a given value. For example, assuming a 1080p screen, and a half-pixel error on a full-size ( $r=540$ ) circle, that means we should aim for a relative error of $0.1\%$ . So we can solve the error equation above for N. Since the number of subdivisions should be an integer, we round it up:

$N=\lceil\frac{\pi}{\arccos{(1-\epsilon)}}\rceil$

So for $0.1\%$ we need only 71 divisions. The following plot shows the number of subdivisions for error values from $0.01\%$ to $1\%$ :

Here are some specific values:

Assuming a fixed half-pixel error, we can plug in $\epsilon=\frac{0.5}{radius}$ to get:

$N=\lceil\frac{\pi}{\arccos{(1-\frac{0.5}{radius})}}\rceil$

The following graph shows that function for radii up to full-size QHD circles:

Give me code

Here’s the corresponding code in C++, if you just want to figure out the number of segments for a given radius:

std::size_t segments_for(float radius, float pixel_error = 0.5f)
{
  auto d = std::acos(1.f - pixel_error / radius);
  return static_cast<std::size_t>(std::ceil(std::numbers::pi / d));
}

Experimenting with CMake’s unity builds

CMake has an option, CMAKE_UNITY_BUILD, to automatically turn your builds into unity-builds, which is essentially combining multiple source files into one. This is supposed to make your builds more efficient. You can just enable enable it while executing the configuration step of your CMake builds, so it is really easy to test. It might just work without any problems. Here are some examples with actual numbers of what that does with build times.

Project A

Let us first start with a relatively small project. It is a real project we have been developing, that reads sensor data, transports it over the network and displays it using SDL and Dear ImGui. I’m compiling it with Visual Studio (v17.13.6) in CMake folder mode, using build insights to track the actual time used. For each configuration, I’m doing a clean rebuild 3 times. The steps are the number of build statements that ninja runs.

Unity Build	#Steps	Time 1	Time 2	Time 3
OFF	40	13.3s	13.4s	13.6s
ON	28	10.9s	10.7s	9.7s

That’s a nice, but not massive, speedup of 124,3% for the median times.

Project A*

Project A has a relatively high number of non-compile steps: 1 step is code generation, 6 steps are static library linking, and 7 steps are executable linking. That’s a total of 14 non-compile steps, which are not directly affected by switching to unity builds. 5 of the executables in Project A are non-essential, basically little test programs. So in an effort to decrease the relative number of non-compile steps, I disabled those for the next test. Each of those also came with an additional source file, so the total number of steps decreased by 10. This really only decreased the relative amount of non-compile steps from 35% to 30%, but the numbers changes quite a bit:

Unity Build	#Steps	Time 1	Time 2	Time 3
OFF	30	9.9s	10.0s	9.7s
ON	18	9.0s	8.8s	9.1s

Now the speedup for the median times was only 110%.

Project B

Project B is another real project, but much bigger than Project A, and much slower to compile. It’s a hardware orchestration system with a web interface. As the project size increases, the chance for something breaking when enabling unity builds also increases. In no particular order:

Include guards really have to be there, even if that particular header was not previously included multiple times
Object files will get a lot bigger, requiring /bigobj to be enabled
Globally scoped symbols will name-clash across files. This is especially true for static globals or things in unnamed namespaces, which basically don’t do their job anymore. More subtly, things moved into the global namespace will also clash, such as the classes with the same name moved into the global namespace via using namespace.

In general, that last point will require the most work to resolve. If all fails, you can disable unity build on a target via set_target_properties(the_target PROPERTIES UNITY_BUILD OFF) or even just skip specific files for unity build inclusion via SKIP_UNITY_BUILD_INCLUSION. In Project B, I only had to do this for files generated by CMakeRC. Here are the results:

Unity Build	#Steps	Time 1	Time 2	Time 3
OFF	416	279.4s	279.3s	284,0s
ON	118	73.2s	76.6s	74.5s

That’s a massive speedup of 375%, just for enabling a build-time switch.

When to use this

Once your project has a certain size, I’d say definitely use this on your CI pipeline, especially if you’re not doing incremental builds. It’s not just time, but also energy saved. And faster feedback cycles are always great. Enabling it on developer machines is another matter: it can be quite confusing when the files you’re editing do not correspond to what the build system is building. Also, developers usually do more incremental builds where the advantages are not as high. I’ve also used hybrid approaches where I enable unity builds only for code that doesn’t change that often, and I’m quite satisfied with that. Definitely add an option to turn that off for debugging though. Have you had similar experiences with unity builds? Do tell!

fmt::format vs. std::format

The excellent {fmt} largely served as the blueprint for the C++20 standard formatting library. That alone speaks for its quality. But I was curious: should you now just use std::format for everything, or is fmt::format still a good option? In this particular instance, I wanted to know which one is faster, so I wrote a small benchmark. Of course, the outcome very much depends on the standard library you are using. In my case, I’m using Visual Studio 17.13.0 and its standard library, and {fmt} version 11.1.3.

I started with a benchmark helper function:


template <std::invocable<> F> steady_clock::duration benchmark(std::string_view label, F f)
{
  auto start = steady_clock::now();
  f();
  auto end = steady_clock::now();
  auto time = end - start;
  auto us = duration_cast<nanoseconds>(time).count() / 1000.0;
  std::cout << std::format("{0} took {1:.3f}us", label, us) << std::endl;
  return time;
}

Then I called it with a lambda like this, with NUMBER_OF_ITERATIONS set to 500000:

int integer = 567800;
float real = 1234.0089f;
for (std::size_t i = 0; i < NUMBER_OF_ITERATIONS; ++i)
  auto _ = fmt::format("an int: {}, and a float: {}", integer, real);

… and the same thing with std::format.

Interestingly, fmt::format only needed about 75%-80% of time of std::format in a release build, while the situation reversed for a debug build to about 106%-108%.

It seems hard to construct a benchmark with low overhead of other things, while still avoiding that the compiler can optimize everything away. My code assumes the compiler keeps the formatting even after throwing it away. So take all my results with a grain of salt!

You are mislead about the Big-O notation

One statement I have people say and people repeat a lot, especially in the data-oriented design bubble, is that Big-O notation cannot accurately real-life performance of contemporary computer programs, especially in the presence of multi-tier memory hierarchies like L1/L2/L3-caches for RAM. This is, at best, misleading and gives this fantastic tool a bad reputation.

At it’s core, Big-O is just a way to categorize functions in how they scale. There’s nothing in the formal definition about performance at all. Of course, it is often used to categorize performance of algorithms and implementations of them. But to use it for that, you need two other things: A machine model and a metric for it.

Traditionally, when performance categorization using Big-O is taught, the machine model is either the Turing-machine or the slightly closer-to-reality RAM-machine. The metric is a number of operations. The operation that is counted has a huge impact. For example, insertion sort can easily be implemented in O(n*log(n)) when counting the number of comparisons (by using binary search to find the insertion point), but is in O(n²) when counting the number of memory moves/swaps.

Neither the model nor the metric is intrinsic to Big-O. To use in in the context of memory hierarchies, you just need to start counting what matters to you, e.g. memory accesses, cache misses or branch mispredictions. This is not new either, I learned about cache-aware and cache-oblivious machine models for this in university over 15 years ago.

TL;DR: Big-O is not obsolete, you just have to use it to count the appropriate performance-critical element in your algorithm.

Beware of using Git LFS on Github

In my private game programming projects, I am often using data files alongside my code for all kinds of game assets like images and sounds. So I thought it might be a good idea to use the Git Large File Storage (=LFS) extension for that.

What is Git LFS?

Essentially, if you’re not using it, the file will be in your local .git folder if it was part of your repository at any time in your history. E.g. if you accidentally added&committed a 800mb video files and then deleted it again, they will still be in your local .git folder. This problem multiplies when using a CI with many branches: each branch will typically have a copy of all files ever used in your repository. This is not a problem with source code files, because they are not that big and they can be compressed really well with different versions of themselves, which is what git typically does.

With Git LFS, the big files are only stored as references in the .git folder. This means that you might need an additional request to your remote when checking them out again, but it will save you lots space and traffic when cloning repositories.

In my previous projects on github, I just did not enable LFS for my assets. And that worked fine, as my assets are usually pretty small and I don’t change them often. But this time I wanted to try it.

Sorry, Github, what?

Imagine my suprise when I got an e-mail from github last month warning me that my LFS traffic quota is almost reached and I have to pay to extend it. What? I never had and traffic quota problems without LFS. Github doesn’t even seem to have one, if I just keep my big files in ‘pure’ git. So that’s what I get for trying to safe Github traffic.

Now the LFS quota is a meager 1 gb per month with Github Pro. That’s nothing. Luckily, my current project is not asset heavy: the full repo is very small at ~60mb. But still the quota was reached with me as a single developer. How did that happen? I just enabled CI for my project on my home server and I was creating lots of branches my CI wanted to build. That’s only 12 branches cloned for the 80% warning to be reached.

Workarounds

Jenkins, which I’m using as a CI tool, has the ability to use a ‘reference repository’ when cloning. This can be used to get the bulk of the data from a local remote, while getting the rest from Github. This is what I’m now using to avoid excess LFS traffic. It is a bit of a pain to set up: you have to manually maintain this reference repository, Jenkins will not do it for you, and you have to do that on each agent. I only have one at this point, so that’s an okay trade-off. But next time, Isure won’t use Git LFS on Github, if I can avoid it.

$\epsilon$	$N$
0.01%	223
0.1%	71
0.2%	50
0.4%	36
0.6%	29
0.8%	25
1.0%	23

	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…
	Anonymous on Creating functors with lambda…