Simple marching squares in C++

Marching squares is an algorithm to find the contour of a scalar field. For example, that can be a height-map and the resulting contour would be lines of a specific height known as ‘isolines’.

At the core of the algorithm is a lookup table that says which line segments to generate for a specific ’tile’ configuration. To make sense of that, you start with a convention on how your tile configuration and the resulting lines are encoded. I typically add a small piece of ASCII are to explain that:

// c3-e3-c2
// |      |
// e0    e2
// |      |
// c0-e1-c1
//
// c are corner bits, e the edge indices

The input of our lookup table is a bitmask of which of the corners c are ‘in’ or above our isolevel. The output is which tile edges e to connect with line segments. That is either 0, 1 or 2 line segments, so we need to encode that many pairs. You could easily pack that into a 32-bit, but I am using a std::vector<std::uint8_t> for simplicity. Here’s the whole thing:

using config = std::vector<std::uint8_t>;
using config_lookup = std::array<config, 16>;
const config_lookup LOOKUP{
  config{},
  { 0, 1 },
  { 1, 2 },
  { 0, 2 },
  { 2, 3 },
  { 0, 1, 2, 3 },
  { 1, 3 },
  { 0, 3 },
  { 3, 0 },
  { 3, 1 },
  { 1, 2, 3, 0 },
  { 3, 2 },
  { 2, 0 },
  { 2, 1 },
  { 1, 0 },
  config{},
};

I usually want to generate index meshes, so I can easily connect edges later without comparing the floating-point coordinates. So one design goal here was to generate each point only once. Here is the top-level algorithm:

using point_id = std::tuple<int, int, bool>;

std::vector<v2<float>> points;
// Maps construction parameters to existing entries in points
std::unordered_map<point_id, std::uint16_t, key_hash> point_cache;
// Index pairs for the constructed edges
std::vector<std::uint16_t> edges;

auto [ex, ey] = map.size();
auto hx = ex-1;
auto hy = ey-1;

// Construct inner edges
for (int cy = 0; cy < hy; ++cy)
for (int cx = 0; cx < hx; ++cx)
{
  std::uint32_t key = 0;
  if (map(cx, cy) > threshold)
    key |= 1;
  if (map(cx + 1, cy) > threshold)
    key |= 2;
  if (map(cx + 1, cy + 1) > threshold)
    key |= 4;
  if (map(cx, cy + 1) > threshold)
    key |= 8;

  auto const& geometry = LOOKUP[key];

  for (auto each : geometry)
  {
    auto normalized_id = normalize_point(cx, cy, each);
    auto found = point_cache.find(normalized_id);
    if (found != point_cache.end())
    {
      edges.push_back(found->second);
    }
    else
    {
      auto index = static_cast<std::uint16_t>(points.size());
      points.push_back(build_point(map, threshold, normalized_id));
      edges.push_back(index);
      point_cache.insert({ normalized_id, index });
    }
  }
}

For each tile, we first figure out the lookup input-key by testing the 4 corners. We then get-or-create the global point for each edge point from the lookup.
Since each edge in a tile can be accessed from two sides, we first normalize it to have a unique key for our cache:

point_id normalize_point(int cx, int cy, std::uint8_t edge)
{
  switch (edge)
  {
  case 3:
    return { cx, cy + 1, false };
  case 2:
    return { cx + 1, cy, true };
  default:
    return { cx, cy, edge == 0 };
  };
}

When we need to create a point an edge, we interpolate to estimate where exactly the isoline intersects our tile-edge:

v2<float> build_point(raster_adaptor const& map, float threshold, point_id const& p)
{
  auto [x0, y0, vertical] = p;
  int x1 = x0, y1 = y0;
  if (vertical)
    y1++;
  else
    x1++;

  const auto s = map.scale();
  float h0 = map(x0, y0);
  float h1 = map(x1, y1);
  float lambda = (threshold - h0) / (h1 - h0);

  auto result = v2{ x0 * s, y0 * s };
  auto shift = lambda * s;
  if (vertical)
    result[1] += shift;
  else
    result[0] += shift;
  return result;
}

For a height-map, that’s about as good as you can get.

You can, however, sample other scalar field functions with this as well, for example sums of distances. This is not the most sophisticated implementation of marching squares, but it is reasonably simple and can easily be adapted to your needs.

Four-way Navigation in UIs

Just yesterday, I was working on the task of enabling gamepad navigation of a graphical UI. I had implemented this before in my game abstractanks but since forgotten how exactly I did it. So I opened the old code and tried to decipher it, and I figured that’d make a nice topic to write about.

Basic implementation

Let’s break down the simple version of the problem: You have a bunch of rectangular controls, and given a specific one, figure out the next one with an input of either left, up, right or down.

This sketch shows a control setup with a possible solution. It also contains an interesting situation: going ‘down’ from B box goes to C, but going up from there goes to A!

The key to creating this solution is a metric that weights the gap for a specific input direction, e.g. neighbor_metric(
box<> const& from, box<> const& to, navigation_direction direction)
. To implement this, we need to convert this gap into numbers we can use. I’ve used a variant of Arvo’s algorithm for that: For both axes, get the difference of the rectangles’ intervals along that axis and store those in a 2d-vector. In code:

template <int axis> inline float difference_on_axis(
box<> const& from, box<> const& to)
{
if (to.min[axis] > from.max[axis])
return to.min[axis] - from.max[axis];
else if (to.max[axis] < from.min[axis])
return to.max[axis] - from.min[axis];
return 0.f;
}

v2<> arvo_vector(box<> const& from, box<> const& to)
{
return {
difference_on_axis<0>(from, to),
difference_on_axis<1>(from, to) };
}

That sketch shows the resulting vectors from the box in the top-left going to two other boxes. Note that these vectors are quite different from the difference of the boxes’ centers. In the case of the two top boxes, the vector connecting the centers would tilt down slightly, while this one is completely parallel to the x axis.

Now armed with this vector, let’s look at the metric I was using. It results in a 2d ‘score’ that is later compared lexicographically to determine the best candidate: the first number determines the ‘angle’ with the selected axis, the other one the distance.

template <int axis> auto metric_on_axis(box<> const& from, box<> const& to)
{
auto delta = arvo_vector(from, to);
delta[0] *= delta[0];
delta[1] *= delta[1];
auto square_distance = delta[0] + delta[1];

float cosine_squared = delta[axis] / square_distance;
return std::make_pair(-cosine_squared, delta[axis]);
}

std::optional<std::pair<float, float>> neighbor_metric(
box<> const& from, box<> const& to, navigation_direction direction)
{
switch (direction)
{
default:
case navigation_direction::right:
{
if (from.max[0] >= to.max[0])
return {};
return metric_on_axis<0>(from, to);
}
case navigation_direction::left:
{
if (from.min[0] <= to.min[0])
return {};
return metric_on_axis<0>(from, to);
}
case navigation_direction::up:
{
if (from.max[1] >= to.max[1])
return {};
return metric_on_axis<1>(from, to);
}
case navigation_direction::down:
{
if (from.min[1] <= to.min[1])
return {};
return metric_on_axis<1>(from, to);
}
}
}

In practice this means that the algorithm will favor connections that best align with the input direction, while ties resolved by using the closest candidate. The metric ‘disqualifies’ candidates going backward, e.g. when going right, the next box cannot start left of the from box.

Now we just need to loop through all candidates and the select the one with the lowest metric.

This algorithm does not make any guarantees that all controls will be accessible, but that is a property that can easily be tested by traversing the graph induced by this metric, and the UI can be designed appropriately. It also does not try to be symmetric, e.g. going down then up does not always result in going back to the previous control. As we can see in the first sketch, this is not always desirable. I think it’s nice to be able to go from B to C via ‘down’, but I’d be weird to go ‘up’ back there instead of A. Instead, going ‘right’ to B does make sense.

Hard cases

But there can be ambiguities that this algorithm does not quite solve. Consider the case were C is wider, so that is is also under B:

The algorithm will connect both A and B down to C, but the metric will be tied for A and B going up from C. The metric could be extended to also include the ‘cross’ axis min-point of the box, e.g. favoring left over right for westerners like me. But going from B down to C and then up to A would feel weird. One idea to resolve this is to use the history to break ties, e.g. when coming from B to C, going back up would go back to C.

Another hard case is scroll-views. In fact, they seem to change the problem domain. Instead of treating the inputs as boxes in a flat plane, navigating in a scroll view requires to navigate to potentially only partially visible or even invisible boxes and bringing them into view. I’ve previously solved this by treating every scroll-view as its own separate plane and navigating only within that if possible. Only when no target is found within the scroll-view, did the algorithm try to navigate to items outside.

My Favorite Pattern

It has become somewhat of an internal meme that I do not like it when programmers use the word “wrapper”. When someone does say it, I usually get a cue from one of the others to start complaining about it. Do not get me wrong, though. I am very much in favor of wrapping things, but with purpose. And my favorite one is the façade.

When simple becomes complex

Many times, APIs start out simple and elegant. This usually works for a while and the API gets used a lot precisely because of its beauty and simplicity. But eventually, a new use case comes along that demands more of the API than it can currently serve. It has to be extended. This usually takes the form of an additional method or function parameter, or an additional function that needs to be called. Using the API now becomes more complex all its users.

Do not underestimate this effect. I have only anecdotal evidence, but in my experience, a lot of unnecessary software complexity can be attributed to this1. The Pareto-Principle applies here: A single use case causes all the users of the previously simple API to deal with new complexity (e.g. 10% of the use cases cause 90% of the complexity in the user-/call-sites).

Façades make it look beautiful

Luckily, it can be dealt with beautifully: using the façade pattern. This pattern abstracts a complex API behind a simple API. The trade-off, of course, is that it is less powerful than the “full API”. In our example though, all of the previous use-cases can keep using the simple API via a façade.

When to apply this

The aforementioned example, extending an API, is a very nice opportunity to apply the façade. Just keep the interface of the old API around, and re-implement it using the new, extended API, which is usually created by modifying the old API’s implementation. Now all the old call-sites can stay the same, yet you can have a more powerful API for those rare cases that need it.

Of course, you can also identify common usage patterns and refactor them using a façade, but that’s usually much harder to do.

What exactly are façades made of?

Façades do not hide the more complex API in the sense that the APIs users are not allowed to use it. Yes, façades make APIs look beautiful, but that is where the metaphor ends. You can still access what is behind the façade. You can even write more façades for the behind. Many APIs have multiple common cases and only very few complex ones.

So… Classes? Functions? Data? Any of those, in fact. Whenever you enable writing something in a simpler way for a common case, you have a façade . Very often, a small function with a simple signature is all the façade you need.

But it makes all the difference.

Now can someone please tell me what that little hook under the c is called?

  1. Façades can, of course, also contribute to creating complexity by growing the codebase and creating ‘variants’. But they rarely do. ↩︎

Finding refactoring candidates using reflection

If some of your types are always used together, that is probably a sign that you are missing an abstraction that bundles them. For example, if I always see the types Rectangle and Color together, it’s probably a good idea to create a ColoredRectangle class that combines the two. However, these patterns tend to emerge over time, so it’s hard to actually find them manually.

Reflection can help find these relationships between types. For example, you can look at all the function/method parameter lists in your code and mark all types appearing there as ‘being used together’. Then count how often these tuples appear, and you might have a good candidate for refactoring.

Here’s how to do that in C#. First pick a few assemblies you want to analyze. One way to get them is using Assembly.GetAssembly(typeof(SomeTypeFromYourAssembly)). Then get all the methods from all the types:

IEnumerable<MethodInfo> GetParameterTypesOfAllMethods(IEnumerable<Assembly> assemblies)
{
  var flags = BindingFlags.Instance | BindingFlags.Static | BindingFlags.Public
    | BindingFlags.NonPublic | BindingFlags.DeclaredOnly;
  foreach (var assembly in assemblies)
  {
    foreach (var type in assembly.GetTypes())
    {
      foreach (var method in type.GetMethods(flags))
      {
        yield return method;
      }
    }
  }
}

The flags are important: the default will not include NonPublic and DeclaredOnly. Without those, the code will not report private methods but give you methods from base classes that we do not want here.

Now this is where things become a little more muddy, and specific to your application. I am skipping generated methods with “IsSpecialName”, and then I’m only looking at non-generic class parameters:

foreach (var method in GetParameterTypesOfAllMethods(assemblies))
{
  if (method.IsSpecialName)
    continue;

  var parameterList = method.GetParameters();

  var candidates = parameterList
      .Select(x => x.ParameterType)
      .Where(x => !x.IsGenericParameter)
      .Where(x => x.IsClass);

  /* more processing here */
}

Then I convert the types to a string using ToString() to get a nice identifier that includes filled generic parameters. I sort and join the type ids to get a key for my tuple and count the number of appearances in a Dictionary<string, int>:

var candidateNames = candidates
    .Select(x => x.ToString())
    .OrderBy(x => x)
    .ToList();

if (candidateNames.Count <= 1)
  continue;

if (candidateNames.Any(string.IsNullOrWhiteSpace))
  continue;

var key = string.Join(",", candidateNames);

if (!lookup.ContainsKey(key))
{
  lookup.Add(key, 1);
}
else
{
  lookup[key]++;
}

Once that is done, you can sort the resulting lookup, print out all the tuples, and see if there are any good candidates.

There’s much room for improvement with a method like this. For example, skipping non-class types is a pretty arbitrary choice. And you will not find new tuples built from built-in types this way. However, because those types offer very little semantic by themselves, it can be hard to correlate multiple occurrences simply by their types.

Even better automated instance construction in C++

In the previous articles on automated instance construction (first and second) I showed how you can use constructor-argument deduction to automatically do dependency injection. While that approach worked nicely in general, one little detail was still nagging me: Since construction of the actual objects happens at the end of a recursion, the stack depth in some of those construction could get quite deep. In fact there are an additional Maxactual number of c’tor parameters functions on the stack before the c’tor is called. This effect is even worse when resolving long dependency chains, were those functions are there for each of the dependencies currently being resolved.

The previous code uses an std::index_sequence of the exactly the right length to inject the same number of mimic parameters that are then used to locate dependencies. If we knew the right length, there wouldn’t have to be any recursion around the construction. And that’s actually easy to refactor out, we can just figure out the std::index_sequence first and return, and then use it outside of the recursion:

template <class T, std::size_t Head, std::size_t... Rest>
constexpr auto
injection_parameter_sequence(std::index_sequence<Head, Rest...>,
  decltype(T{ mimic<T>{ Head }, mimic<T>{ Rest }... })* = nullptr)
{
  return std::index_sequence<Head, Rest...>{};
}

template <class T>
constexpr auto injection_parameter_sequence(std::index_sequence<>)
{
  return std::index_sequence<>{};
}

template <class T, std::size_t... Rest>
constexpr auto
injection_parameter_sequence(std::index_sequence<Rest...>)
{
  return injection_parameter_sequence<T>(std::make_index_sequence<sizeof...(Rest) - 1>{});
}

Starting with a “long” index sequence, this overload set returns the smaller index sequence for the construction. We can use a small tool function to actually create the instance:

template <class T, std::size_t... Params>
constexpr auto make_unique_injected_with_sequence(service_provider const& p, std::index_sequence<Params...>)
{
  return std::make_unique<T>(mimic<T>(p, Params)...);
}

Which can be called like this:

template <class T, std::size_t Max = 16> auto make_unique_injected(service_provider const& p)
{
  return make_unique_injected_with_sequence<T>(p,
    injection_parameter_sequence<T>(std::make_index_sequence<Max>{}));
}

Only these last two function will be added to the call stack for each constructor call, which is not a whole lot. This construction has the additional advantage that only these two need to be changed to support different kinds construction, e.g. using std::make_shared instead of std::make_unique.

Have we made things too easy?

One of the old mantras for API design is “Make doing the right thing easy and the wrong thing hard”. This, of course, applies to much broader topics as well, such as software development or UX.

For software development specifically, are we maybe making “doing the wrong thing” too easy as well? Here are a two examples:

Web Requests

In the old times, requesting data from a web server required first setting up the request, sending it, and then getting the result back to your application either via polling or callbacks. Dave Mark once adequately called this solving the “waiting problem”. It was cumbersome, to say the least. It was clear that making such a request was something to be avoided. You did it when you had to, but you avoided setting up too many different kinds of requests implictly.

Nowadays, with the advent anonymous functions/lambdas in most mainstream programming languages, continuations became the new way handle these things: do_request(...).then(result -> ...) This already made this a lot easier. And even better, now we have some form of coroutines in many languages were you can just do result = await do_request(...). It even looks almost like a normal function call.

With this, programmers can just do requests one after the other. Need one thing from a server? Do one request. Need ten things from a server? Do ten requests. Of course, this is horribly wasteful: each request will incur the full overhead of http/https and a server roundtrip. In the old times, doing the request was painful, so you automatically looked for ways to avoid doing more, and bundle your asks into one request, argueable leading to a better program.

Dependencies

Before nice package-managers where a thing, handling dependencies was a huge pain. You would have to manually get, unpack, configure and install the dependency for each developer and/or consumer system. As a consequence, libraries were big and often duplicated foundational things. But it also caused developers carefully grooming their library selections.

Now with package managers, libraries have started to become small. Duplication within libraries certainly seems to have decreased, and the average library size has decreased. But this also caused developers to be much less cautious when adopting a dependency, with package managers handling thousands of dependencies that no one developer can possibly have a full understanding of. And this then leads to things like the leftpad disaster.

Better or worse?

I am pretty sure that both having nice abstractions to deal with asynchronicity and package managers are good things. But if they make certain things too easy, how can we deal with that? The only thing I can currently think of is figuratively sticking warning-labels on these things during review time, but because those things are now so easy and subtle, it is also easy to miss them.

Are there other examples were we maybe made the wrong thing too easy? Do you have any ideas how to deal with this problem?

Unit-Testing Deep-Equality in C#

In the suite of redux-style applications we are building in C#, we are making extensive use of value-types, which implies that a value compares as equal exactly if all of its contents are equal also known as “deep equality”, as opposed to “reference equality” or “shallow equality”. Both of those imply deep equality, but the other way around is not true. The same object is of course equal to itself, not matter how deep you look. And an object that references the same data as another object also has equal content. But a simple object that contains different lists with equal content will be unequal under shallow comparison, but equal under deep comparison.

Though init-only records already provide a per-member comparison as Equals be default, this fails for collection types such as ImmutableList<> that, against all intuition but in accordance to , only provide reference-equality. For us, this means that we have to override Equals for any value type that contains a collection. And this is were the trouble starts. Once Equals is overridden, it’s extremely easy to forget to also adapt Equals when adding a new property. Since our redux-style machinery relies on a proper “unequal”, this would manifest in the application as a sporadically missing UI update.

So we devised a testing strategy for those types, using a little bit of reflection:

  1. Create a sample instance of the value type with no member retaining its default value
  2. Test, by going over all properties and comparing to the same property in a default instance, if indeed all members in the sample are non-default
  3. For each property, run Equals the sample instance to a modified sample instance with that property set to the value from a default instance.

If step 2 fails, it means there’s a member that’s still at its default value in the sample instance, e.g. the test wasn’t updated after a new property was added. If step 3 fails, the sample was updated, but the new property is not considered in Equals – and it can even tell which property is missing.

The same problems of course arise with GetHashCode, but are usually less severe. Forgetting to add a property just makes collisions more likely. It can be tested much in the same way, but can potentially lead to false positives: collisions can occur even if all properties are correctly considered in the function. In that case, however, the sample can usually be altered to remove the collision – and it is really unlikely. In fact, we never had a false positive.

When laziness broke my code

I was just integrating a new task-graph system for a C# machine control system when my tests started to go red. Note that the tasks I refer to are not the same as the C# Task implementation, but the broader concept. Task-graphs are well known to be DAGs, because otherwise the tasks cannot be finished. The general algorithm to execute a task-graph like this is called topological sorting, and it goes like this:

  1. Find the number of dependencies (incoming edges) for each task
  2. Find the tasks that have zero dependencies and start them
  3. For any finished tasks, decrement the follow-up tasks dependency count by one and start them if they reach zero.

The graph that was failed looked like the one below. Task A was immediately followed by a task B that was followed by a few more tasks.

I quickly figured out that the reason that the tests were failing was that node B was executed twice. Looking at the call-stack for both executions, I could see that the first time B was executed was when A was completed. This is correct as per step 3 in the algorithm. However, the second time it was started was directly from the initial Run method that does the work from step 2: Starting the initial tasks that are not being started recursively. I was definitely not calling Run twice, so how did that happen?

public void Run()
{
    var ready = tasks
        .Where(x => x.DependencyCount == 0);

    StartGroup(ready);
}

Can you see it? It is important to note that many of the tasks in this graph are asynchronous. Their completion is triggered by an IObserver, a C# Task completing or some other event. When the event is processed, StartGroup is used to start all tasks that have no more dependencies. However, A was no such task, it was synchronous, so the StartGroup({B}) call happened while Run was still on the stack.

Now what happened was that when A (instantly!) completed, it set the DependencyCount of B to 0. Since ready in the code snippet is lazily evaluated from within StartGroup, the ‘contents’ actually change while StartGroup is running.

The fix was adding a .ToList after the .Where, a unit test that checked that this specifically would not happen again, and a mental note that lazy evaluation can be deceiving.

Improving my C++ time queue

Another code snippet that can be found in a few of my projects is the “time queue”, which is a simple ‘priority queue’ style data structure that I use to defer actions to a later time.

With this specific data structure, I have multiple implementations that clearly came from the same source. One indicator for that is a snarky comment in both about how std::list is clearly not the best choice for the underlying data structure. They have diverged a bit since then though.

Requirements

In my use case not use time points, but only durations in standard-library nomenclature. This is a pretty restrictive requirement, because otherwise any priority queue (e.g. from boost or even from the standard library) can be used quite well. On the other hand, it allows me to use floating-point durations with predictable accuracy. The queue has two important functions:

  1. insert to insert a timeout duration and a payload.
  2. tick is called with a specific duration and then reports the payloads that have timed out since their insertions.

Typically tick is called a lot more frequently than insert, and it should be fast. The payload is typically something like a std::function or an id for a state-machine that needs to be pulsed.

The basic idea is to only keep the duration difference to the previous item in the list. Only the first item keeps its total timeout. This way, when tick is called, usually only the first item needs to be updated. tick only has to touch more items when they time out.

Simple Implementation

One of the implementations for void insert(TimeType timeout, PayloadType payload) looks like this:

if (tick_active_)
{
  deferred_.push_back({ .remaining = after, .payload = std::move(payload) });
  return;
}

auto i = queue_.begin();
for (; i != queue_.end() && timeout > i->remaining; ++i)
  timeout -= i->remaining;

if (i != queue_.end())
  i->remaining -= timeout;

queue_.insert(i, { .remaining = after, .payload = std::move(payload) });

There is a special case there that guards against inserting into queue_ (which is still a very bad std::list) by instead inserting into deferred_ (which is a std::vector, phew). We will see why this is useful in the implementation for template void tick(TimeType delta, Executor execute):

tick_active_ = true;
auto i = queue_.begin();
for (; i != queue_.end() && delta >= i->remaining; ++i)
{
  delta -= i->remaining;
  execute(i->payload);
}

if (i != queue_.end())
  i->remaining -= delta;

queue_.erase(queue_.begin(), i);
tick_active_ = false;

while (!deferred_.empty())
{
  auto& entry = deferred_.back();
  insert(entry.remaining, std::move(entry.payload));
  deferred_.pop_back();
}

The timed out items are reported via a callback that is supplied as Executor execute. Of course, these can do anything, including inserting new items, which can invalidate the iterator. This is a common use case, in fact, as many deferred actions will naturally want follow ups (let’s ignore for the moment that the implementation is nowhere near exception safe…). The items that were deferred to deferred_ in insert get added to queue_ after the iteration is complete.

This worked well enough to ship, but the other implementation had another good idea. Instead of reporting the timed-out items to a callback, it just returned them in a vector. The whole tick_active_ guard becomes unnecessary, as any processing on the returned items is naturally deferred until after the iteration:

std::vector<PayloadType> tick(TimeType delta)
{
  std::vector<PayloadType> result;
  auto i = queue_.begin();
  for (; i != queue_.end() && delta >= i->remaining; ++i)
  {
    delta -= i->remaining;
    result.push_back(i->payload);
  }

  if (i != queue_.end())
    i->remaining -= delta;

  queue_.erase(queue_.begin(), i);
  return result;
}

This solves the insert-while-tick problem, and lets us use the result neatly in a range-based for-loop like this: for (auto const& payload : queue.tick(delta)) {}. Which I personally always find a little bit nicer than inversion-of-control. However, the cost is at least one extra allocation for timed-out items. This might be acceptable, but maybe we can do better for very little extra complexity.

Return of the second list

Edit: The previous version of this article tried to keep the timed-out items at the beginning of the vector before returning them as a std::span. As commenter Steffen pointed out, this again prevents us from inserting while iterating on the result, as any insert might invalidate the backing-vector.

We can get rid of the allocation for most of the tick calls, even if they return a non-empty list. Remember that a std::vector does not deallocate its capacity even when it’s cleared unless that is explicitly requested, e.g. via shrink_to_fit. So instead of returning a new vector each time, we’re keeping one around for the timed out items and return a const-ref to it from tick:

std::vector<PayloadType> const& tick(TimeType delta)
{
  timed_out_.clear();
  auto i = queue_.begin();
  for (; i != queue_.end() && delta >= i->remaining; ++i)
  {
    delta -= i->remaining;
    timed_out_.push_back(std::move(i->payload));
  }

  if (i != queue_.end())
    i->remaining -= delta;

  queue_.erase(queue_.begin(), i);
  return timed_out_;
}

This solution is pretty similar to the deferred list from the first version, but instead of ‘locking’ the main list while iterating, we’re now separating the items we’re iterating on.

Simple abstractions are good abstractions

I think that a lot of accidental complexity in software is produced by not picking the simplest abstraction for the job. Let me lead with an example: Consider this code from a code generator that generates C++ code:

std::ostringstream extra_properties;
if (!attribute.unit.empty())
{
  extra_properties << fmt::format("\n      properties.set_unit(\"{0}\");", attribute.unit);
}
if (!attribute.min_value.empty())
{
  extra_properties << fmt::format("\n      properties.set_min_value(\"{0}\");", attribute.min_value);
}
if (!attribute.max_value.empty())
{
  extra_properties << fmt::format("\n      properties.set_max_value(\"{0}\");", attribute.max_value);
}

It has a lot of ugly duplication: basically everything but the method names and values. So, how do we get rid of the duplication? Just a couple of years ago, I would probably have used a function for that:

void property_snippet(std::ostringstream& str, std::string const& method_name, std::string const& value)
{
  if (value.empty())
    return;
  str << fmt::format("\n      properties.{0}(\"{1}\");", method_name, value);
}

And then turn the call site code into:

property_snippet(extra_properties, "set_unit", attribute.unit);
property_snippet(extra_properties, "set_min_value", attribute.min_value);
property_snippet(extra_properties, "set_max_value", attribute.max_value);

Back then, I would have said that this is a definite improvement, but nowadays I am not so sure anymore. The call-site is a lot more concise, but we still have about half its code duplicated: the first half of each line. The additional function adds lots of complexity that is not necesarily offset by the gain at the call-site: the declaration with all the parameters. And the code gets separated, which is only really good if the function does a little bit more than this one.

This variant can, however, be made simpler with lambdas that capture extra_properties instead of passing it each time. While that is a better solution, I would argue that function objects and capturing are not necessarily simple either, so this only makes second place.

Nowdays, my first go-to abstraction is an in-place list and a loop:

std::tuple<char const*, std::string> methods_and_values[] = {
  {"set_unit", attribute.unit},
  {"set_min_value", attribute.min_value},
  {"set_max_value", attribute.max_value},
};

for (auto [method_name, value] : methods_and_values)
{
  if (value.empty())
    continue;
  extra_properties << fmt::format("\n      properties.{0}(\"{1}\");", method_name, value);
}

For me, this has the added benefit that is clearly separates the ‘inert’ data part of the code and the ‘active’ transformation. While this example is C++, this works in almost languages that I know of, even such arcane beasts as Xbase++.