Improving my C++ time queue

Another code snippet that can be found in a few of my projects is the “time queue”, which is a simple ‘priority queue’ style data structure that I use to defer actions to a later time.

With this specific data structure, I have multiple implementations that clearly came from the same source. One indicator for that is a snarky comment in both about how std::list is clearly not the best choice for the underlying data structure. They have diverged a bit since then though.

Requirements

In my use case not use time points, but only durations in standard-library nomenclature. This is a pretty restrictive requirement, because otherwise any priority queue (e.g. from boost or even from the standard library) can be used quite well. On the other hand, it allows me to use floating-point durations with predictable accuracy. The queue has two important functions:

  1. insert to insert a timeout duration and a payload.
  2. tick is called with a specific duration and then reports the payloads that have timed out since their insertions.

Typically tick is called a lot more frequently than insert, and it should be fast. The payload is typically something like a std::function or an id for a state-machine that needs to be pulsed.

The basic idea is to only keep the duration difference to the previous item in the list. Only the first item keeps its total timeout. This way, when tick is called, usually only the first item needs to be updated. tick only has to touch more items when they time out.

Simple Implementation

One of the implementations for void insert(TimeType timeout, PayloadType payload) looks like this:

if (tick_active_)
{
  deferred_.push_back({ .remaining = after, .payload = std::move(payload) });
  return;
}

auto i = queue_.begin();
for (; i != queue_.end() && timeout > i->remaining; ++i)
  timeout -= i->remaining;

if (i != queue_.end())
  i->remaining -= timeout;

queue_.insert(i, { .remaining = after, .payload = std::move(payload) });

There is a special case there that guards against inserting into queue_ (which is still a very bad std::list) by instead inserting into deferred_ (which is a std::vector, phew). We will see why this is useful in the implementation for template void tick(TimeType delta, Executor execute):

tick_active_ = true;
auto i = queue_.begin();
for (; i != queue_.end() && delta >= i->remaining; ++i)
{
  delta -= i->remaining;
  execute(i->payload);
}

if (i != queue_.end())
  i->remaining -= delta;

queue_.erase(queue_.begin(), i);
tick_active_ = false;

while (!deferred_.empty())
{
  auto& entry = deferred_.back();
  insert(entry.remaining, std::move(entry.payload));
  deferred_.pop_back();
}

The timed out items are reported via a callback that is supplied as Executor execute. Of course, these can do anything, including inserting new items, which can invalidate the iterator. This is a common use case, in fact, as many deferred actions will naturally want follow ups (let’s ignore for the moment that the implementation is nowhere near exception safe…). The items that were deferred to deferred_ in insert get added to queue_ after the iteration is complete.

This worked well enough to ship, but the other implementation had another good idea. Instead of reporting the timed-out items to a callback, it just returned them in a vector. The whole tick_active_ guard becomes unnecessary, as any processing on the returned items is naturally deferred until after the iteration:

std::vector<PayloadType> tick(TimeType delta)
{
  std::vector<PayloadType> result;
  auto i = queue_.begin();
  for (; i != queue_.end() && delta >= i->remaining; ++i)
  {
    delta -= i->remaining;
    result.push_back(i->payload);
  }

  if (i != queue_.end())
    i->remaining -= delta;

  queue_.erase(queue_.begin(), i);
  return result;
}

This solves the insert-while-tick problem, and lets us use the result neatly in a range-based for-loop like this: for (auto const& payload : queue.tick(delta)) {}. Which I personally always find a little bit nicer than inversion-of-control. However, the cost is at least one extra allocation for timed-out items. This might be acceptable, but maybe we can do better for very little extra complexity.

Return of the second list

Edit: The previous version of this article tried to keep the timed-out items at the beginning of the vector before returning them as a std::span. As commenter Steffen pointed out, this again prevents us from inserting while iterating on the result, as any insert might invalidate the backing-vector.

We can get rid of the allocation for most of the tick calls, even if they return a non-empty list. Remember that a std::vector does not deallocate its capacity even when it’s cleared unless that is explicitly requested, e.g. via shrink_to_fit. So instead of returning a new vector each time, we’re keeping one around for the timed out items and return a const-ref to it from tick:

std::vector<PayloadType> const& tick(TimeType delta)
{
  timed_out_.clear();
  auto i = queue_.begin();
  for (; i != queue_.end() && delta >= i->remaining; ++i)
  {
    delta -= i->remaining;
    timed_out_.push_back(std::move(i->payload));
  }

  if (i != queue_.end())
    i->remaining -= delta;

  queue_.erase(queue_.begin(), i);
  return timed_out_;
}

This solution is pretty similar to the deferred list from the first version, but instead of ‘locking’ the main list while iterating, we’re now separating the items we’re iterating on.

Simple abstractions are good abstractions

I think that a lot of accidental complexity in software is produced by not picking the simplest abstraction for the job. Let me lead with an example: Consider this code from a code generator that generates C++ code:

std::ostringstream extra_properties;
if (!attribute.unit.empty())
{
  extra_properties << fmt::format("\n      properties.set_unit(\"{0}\");", attribute.unit);
}
if (!attribute.min_value.empty())
{
  extra_properties << fmt::format("\n      properties.set_min_value(\"{0}\");", attribute.min_value);
}
if (!attribute.max_value.empty())
{
  extra_properties << fmt::format("\n      properties.set_max_value(\"{0}\");", attribute.max_value);
}

It has a lot of ugly duplication: basically everything but the method names and values. So, how do we get rid of the duplication? Just a couple of years ago, I would probably have used a function for that:

void property_snippet(std::ostringstream& str, std::string const& method_name, std::string const& value)
{
  if (value.empty())
    return;
  str << fmt::format("\n      properties.{0}(\"{1}\");", method_name, value);
}

And then turn the call site code into:

property_snippet(extra_properties, "set_unit", attribute.unit);
property_snippet(extra_properties, "set_min_value", attribute.min_value);
property_snippet(extra_properties, "set_max_value", attribute.max_value);

Back then, I would have said that this is a definite improvement, but nowadays I am not so sure anymore. The call-site is a lot more concise, but we still have about half its code duplicated: the first half of each line. The additional function adds lots of complexity that is not necesarily offset by the gain at the call-site: the declaration with all the parameters. And the code gets separated, which is only really good if the function does a little bit more than this one.

This variant can, however, be made simpler with lambdas that capture extra_properties instead of passing it each time. While that is a better solution, I would argue that function objects and capturing are not necessarily simple either, so this only makes second place.

Nowdays, my first go-to abstraction is an in-place list and a loop:

std::tuple<char const*, std::string> methods_and_values[] = {
  {"set_unit", attribute.unit},
  {"set_min_value", attribute.min_value},
  {"set_max_value", attribute.max_value},
};

for (auto [method_name, value] : methods_and_values)
{
  if (value.empty())
    continue;
  extra_properties << fmt::format("\n      properties.{0}(\"{1}\");", method_name, value);
}

For me, this has the added benefit that is clearly separates the ‘inert’ data part of the code and the ‘active’ transformation. While this example is C++, this works in almost languages that I know of, even such arcane beasts as Xbase++.

Writing windows daemons in C++20

One little snippet I’ve found myself reusing surprisingly often is how to write a daemon program with graceful shutdown in windows. To recap, a daemon is a program that sits and does ‘background work’ until it is explicitly shut down by the user. For my purposes, it is also a console program. Like this one:

int main(int argn, char** argv)
{
  while (true)
  {
    std::cout << "ping!" << std::endl;
    std::this_thread::sleep_for(100ms);
  }
  std::cout << "shutdown!" << std::endl;
  return EXIT_SUCCESS;
}

If you run this program, it will, of course, continuously print “ping!”. And you can kill it by entering ctrl+C on the console. But the shutdown will not be graceful: “shutdown!” will not be printed. It’ll just look like this:

ping!
ping!
ping!
^C

C++20 introduced std::stop_source and std::stop_token, which help to implement a graceful shutdown. We’ll use the following code:

'namespace
{
static std::stop_source exit_source;
static std::atomic<bool> main_exited = false;
static bool already_registered = false;

static void atexit_handler()
{
  main_exited = true;
}

BOOL control_handler(DWORD Type)
{
  switch (Type)
  {
  case CTRL_C_EVENT:
  case CTRL_CLOSE_EVENT:
    exit_source.request_stop();

    while (!main_exited)
      Sleep(10);

    return TRUE;
    // Pass other signals to the next handler.
  default:
    return FALSE;
  }
}
} // namespace

std::stop_token register_exit_signal()
{
  if (!already_registered)
  {
    if (!SetConsoleCtrlHandler((PHANDLER_ROUTINE)control_handler, TRUE))
      throw std::runtime_error("Unable to register control handler");

    atexit(&atexit_handler);
    already_registered = true;
  }
  return exit_source.get_token();
}'namespace
{
static std::stop_source exit_source;
static std::atomic<bool> main_exited = false;
static bool already_registered = false;

static void atexit_handler()
{
  main_exited = true;
}

BOOL control_handler(DWORD Type)
{
  switch (Type)
  {
  case CTRL_C_EVENT:
  case CTRL_CLOSE_EVENT:
    exit_source.request_stop();

    while (!main_exited)
      Sleep(10);

    return TRUE;
    // Pass other signals to the next handler.
  default:
    return FALSE;
  }
}
} // namespace

std::stop_token register_exit_signal()
{
  if (!already_registered)
  {
    if (!SetConsoleCtrlHandler((PHANDLER_ROUTINE)control_handler, TRUE))
      throw std::runtime_error("Unable to register control handler");

    atexit(&atexit_handler);
    already_registered = true;
  }
  return exit_source.get_token();
}

You’re going to have to include both <stop_token> and <Window.h> for this. Now we can adapt our daemon loop slightly:

int main(int argn, char** argv)
{
  auto token = register_exit_signal(); // <-- register the exit signal here
  while (!token.stop_requested()) // ... and test the current state here
  {
    std::cout << "ping!" << std::endl;
    std::this_thread::sleep_for(100ms);
  }
  std::cout << "shutdown!" << std::endl;
  return EXIT_SUCCESS;
}

Note that this requires cooperatively handling the shutdown. But now the output correctly prints “shutdown” when killed with ctrl+C.

ping!
ping!
shutdown!

There’s linux/macOS code for this same interface too. It works by handling SIGINT/SIGTERM. But that information is somewhat easier to come by, so I’ll leave it out for brevity. Feel free to comment if you think that’d be interesting as well.

Improved automated instance construction in C++

In my last blog post, I wrote about how I am automatically deducing constructor parameters in my dependency injection container. The approach had a major drawback: It worked only for 2 or more parameters, since there was an ambiguity with copy- or move-constructors with exactly one parameter.

Right after I wrote that post, I actually found a solution to that problem in the Boost.DI FAQ, which explains how to do that in its CPPnow 2016 slides. It restricts the type conversion operator by using SFINEA on an unused template parameter. I did not even know that was possible! It defines the templated conversion operator very similar to this:

template <class T,
  class = std::enable_if_t<!std::is_same<std::remove_cvref_t<T>, Exclude>{}>>
operator T&() const
{
  return p_->get<std::remove_cvref_t<T>>();
}

Since this is a bit more involved than the bare templated conversion operator from last time, repeating it would be bad. In the last version, I used 3 helper types, the inferred_locator, mimic and the provider_wrapper, but that can be streamlined into one class:

template <typename Exclude> struct mimic
{
  mimic(std::size_t)
  {
  }

  mimic(service_provider const& p, std::size_t)
  : p_(&p)
  {
  }

  template <class T, class = std::enable_if_t<!std::is_same<std::remove_cvref_t<T>, Exclude>{}>> operator T&() const
  {
    return p_->get<std::remove_cvref_t<T>>();
  }

  service_provider const* p_{ nullptr };
};

Note that is uses some unused extra size_t parameters, which make the parameter expansion easier in the next step. Now can use that for the SFINEA in the recursive construction:

// Actual dependency injection..
template <class T, std::size_t Head, std::size_t... Rest> constexpr auto
make_injected_(service_provider const& p, std::index_sequence<Head, Rest...>,
    decltype(T{ mimic<T>{ Head }, mimic<T>{ Rest }... }) * = nullptr)
{
  return std::make_unique(mimic<T>(p, Head), mimic<T>(p, Rest)...);
}

// Trivial no-dependency case
template <class T> constexpr auto
make_injected_(service_provider const& p, std::index_sequence<>)
{
  return std::make_unique<T>();
}

// Fallback to try with fewer parameters
template <class T, std::size_t... Rest> constexpr auto make_injected_(service_provider const& p, std::index_sequence<Rest...>)
{
  return make_injected_<T>(p, std::make_index_sequence<sizeof...(Rest) - 1>{});
}

template <class T, std::size_t Max = 16> auto
make_injected(service_provider const& p)
{
  return make_injected_<T>(p, std::make_index_sequence<Max>{});
}

Just after I found this solution, my former colleague Dirk Reinbach sent me a very neat C++20 variant to restrict the conversion operator via a concept:

template <typename T, typename U>
concept not_is_same = !std::is_same_v<std::remove_cvref_t<T>, std::remove_cvref_t<U>>;

template <typename Exclude> struct mimic
{
  /* other members... */
  template <not_is_same<Exclude> T> operator T&() const
  {
    return p_->get<std::remove_cvref_t<T>>();
  }
};

This works just as well, and is more readable, too. I have not measured, but I guess it’s probably also faster to compile, since all things SFINEA are notoriously slow.

Automated instance construction in C++

I’m currently mostly switching back and forth between C# and C++ projects. One of the things that I’m missing most when switching to C++ is a nice dependency-injection (DI) library. After checking out what was already available, I finally decided I wanted to try to build my own slim type-indexed variant. I quickly started by registering factories and instances in a map on std::type_index, making it possible to both have the DI retain ownership (with std::unique_ptr) or just make a type available via a bare pointer. So I was able to do things like:

// Register an instance
di.insert_unique(std::make_unique<foo_service>());
// Register a factory
di.insert_unique([] {return std::make_unique<bar_service>());
// Register an existing bare pointer
di.insert_bare(my_bare_thingy);

// ... and retrieve them
auto& foo = di.get<foo_service>();

One of the most powerful aspects of a DI library is the ability to transitively setup dependencies. I like constructor injection the most, so I implemented a very naive way like this:

di.insert_unique([](auto& p) { return std::make_unique<complex_service>(
  p.get<base_service1>(), p.get<base_service2>(), p.get<base_service3>());
});

This is pretty verbose and we basically have to repeat all the constructor parameter types. But it’s easy to implement. We can do a little bit better by using a templated type-conversion operator and using it to call the get:

class service_provider
{
  struct inferred_locator
  {
    service_provider const* provider;
    template <class T> operator T&() const
    {
      return provider->get<std::remove_const_t<T>>();
    }
  };
  
  inferred_locator get() const
  {
    return { .provider = this };
  }
  
  /** typed get implementations here... */
};

Now we can reduce the previous registration to:

di.insert_unique([](auto& p) { 
  return std::make_unique<complex_service>(p.get(), p.get(), p.get());
});

That is basically only the number of constructor parameters in a verbose way. We could write a small template that takes the number, creates an std::index_sequence from it and then unpacks each index into an invokation of service_provider::get. But then we would still have to update registrations when adding (or removing) a new dependency to a services’s constructor. With a litte more work, we can actually get this instead:

di.insert_unique<complex_service>();

This partly inspired by Antony Polukhin’s C++ reflection talks, and combines std::index_sequence based unpacking, SFINEA and the templated type-conversion operator:

template <class T, std::size_t Head, std::size_t... Rest>
constexpr auto make_unique_impl(provider_wrapper const& p,
    std::index_sequence<Head, Rest...>,
    decltype(T{ mimic{ Head }, mimic{ Rest }... }) * = nullptr) -> std::unique_ptr<T>
{
    // This next requirement is so we do not accidentally recurse into the copy/move-ctors
    static_assert(sizeof...(Rest) + 1 > 1, "Can only deduce constructors with two or more parameters.");
    return std::make_unique<T>(p(Head), p(Rest)...);
}

template <class T, std::size_t... Rest>
constexpr auto make_unique_impl(provider_wrapper const& p, std::index_sequence<Rest...>) -> std::unique_ptr<T>
{
    // This next requirement is so we do not accidentally recurse into the copy/move-ctors
    static_assert(sizeof...(Rest) > 1, "Can only deduce constructors with two or more parameters.");
    return make_unique_impl<T>(p, std::make_index_sequence<sizeof...(Rest) - 1>{});
}

template <class T, std::size_t Max = 8> auto make_unique(service_provider const& p)
{
    return make_unique_impl<T>(provider_wrapper{ &p }, std::make_index_sequence<Max>{});
}

This uses two new types: mimic, which is only used for SFINEA, takes std::size_t on construction (for the unpacking from the std::index_sequence) and converts to anything (templated type conversion again) and the provider_wrapper, which is a simple adaptor around service_provider that takes an unused std::size_t argument (again, for unpacking). The first overload of make_unique_impl is slightly more specialized (because it has Head and Rest), so the compiler tries it first. If it works, it returns a new instance of the service we want. Otherwise, it will fail without an error due to SFINEA in the unused and defaulted third parameter. The compiler will then try the second overload, which will recurse to a variant with fewer parameters. The outermost make_unique starts this recursion with 8 parameters, because that should be enough for any sane service. I stop this recursion at one constructor parameter, even though that is a useful configuration. This is because I have not yet found a way to avoid calling the copy or move constructors accidentally. If anyone knows how to do that, I’d be very happy to hear how. My workaround right now is to explicitly register a factory in that case.

Reading a conanfile.txt from a conanfile.py

I am currently working on a project that embeds another library into its own source tree via git submodules. This is currently convenient because the library’s development is very much tied to the host project and having them both in the same CMake project cuts down dramatically on iteration times. Yet, that library already has its own conan dependencies in a conanfile.txt. Because I did not want to duplicate the dependency information from the library, I decided to pull those into my host projects requirements programmatically using a conanfile.py.

Luckily, you can use conan’s own tools for that:

from conans.client.loader import ConanFileTextLoader

def load_library_conan(recipe_folder):
    text = Path(os.path.join(recipe_folder, "libary_folder", "conanfile.txt")).read_text()
    return ConanFileTextLoader(text)

You can then use that in your stage methods, e.g.:

    def config_options(self):
        for line in load_library_conan(self.recipe_folder).options.splitlines():
            (key, value) = line.split("=", 2)
            (library, option) = key.split(":", 2)
            setattr(self.options[library], option, value)

    def requirements(self):
        for x in load_library_conan(self.recipe_folder).requirements:
            self.requires(x)

I realize this is a niche application, but it helped me very much. It would be cool if conan could delegate into subfolders natively, but I did not find a better way to do this.

Metal in C++ with SDL2

Metal, Cupertino’s own graphics API, is sort of a middle-ground in complexity between OpenGL and Vulkan. I’ve wanted to try it for a while, but the somewhat tight integration into Apple’s ecosystem (ObjectiveC/Swift and XCode) has so far prevented that. My graphics projects are usually using C++ and CMake, so I wanted a solution that worked with that. Apple released Metal-cpp last year and newer SDL2 versions (since 2.0.14) can create a window that supports drawing to it with metal. Here’s how to weld that together (with minimal ObjectiveC).

metal-cpp

I get the metal-cpp code from the linked website (the download is at step 1). I add a library in CMake that builds a single source file that compiles the metal-cpp implementation with the ??_PRIVATE_IMPLEMENTATION macros as described on the page (see step 3). That target also exports the includes to be used later.

SDL window and view

Next, I use conan to install SDL2. After SDL_Init, I call SDL_CreateWindow to create my window. I do not specify SDL_WINDOW_OPENGL (or in the SDL_CreateWindow‘s flags, or next step will fail. After that, I use SDL_Metal_CreateView from SDL_metal.h to create a metal view. This is where things get a little bit icky. I create a metal device using MTL::CreateSystemDefaultDevice(); but I still need to assign it to the view I just created. I’m doing that in ObjectiveC++. In a new .mm file I add a small function to do that:

void assign_device(void* layer, MTL::Device* device)
{
  CAMetalLayer* metalLayer = (CAMetalLayer*) layer;
  metalLayer.device = (__bridge id<MTLDevice>)(device);
}

I use a small .h file to expose this function to my C++ code like any other free function. There’s another helper I create in the .mm file:

CA::MetalDrawable* next_drawable(void* layer)
{
  CAMetalLayer* metalLayer = (CAMetalLayer*) layer;
  id <CAMetalDrawable> metalDrawable = [metalLayer nextDrawable];
  CA::MetalDrawable* pMetalCppDrawable = ( __bridge CA::MetalDrawable*) metalDrawable;
  return pMetalCppDrawable;
}

At the beginning of each frame, I use that together with SDL_Metal_GetLayer to get a texture to render to:

auto surface = next_drawable(SDL_Metal_GetLayer(view));

Next I create a render pass descriptor that starts by clearing that drawable with our fancy red:

MTL::ClearColor clear_color(152.0/255.0, 23.0/255.0, 42.0/255.0, 1.0);
auto pass_descriptor = MTL::RenderPassDescriptor::alloc()->init();
auto attachment = pass_descriptor->colorAttachments()->object(0);
attachment->setClearColor(clear_color);
attachment->setLoadAction(MTL::LoadActionClear);
attachment->setTexture(surface->texture());

And fire that off to the GPU using a command buffer and render encoder:

auto buffer = queue->commandBuffer();
auto encoder = buffer->renderCommandEncoder(pass_descriptor);
encoder->endEncoding();
buffer->presentDrawable(surface);
buffer->commit();

There you have it, a minimal running metal application. Still a long ways from the traditional “Hello Triangle”, but most metal examples that show how to do that can easily be translated to the C++ API. Note that you probably have to take some extra steps to compile metal shaders (aka MSL). You can either load them from source or precompile them using the command line tools.

My favorite C++20 feature

As I evolved my programming style away from mutating long-lived “big” objects and structures and towards are more functional and data-oriented style based mainly on pure functions, I also find myself needing a lot more structs. These naturally occur as return types for functions with ‘richer’ output if you do not want to use std::tuple or other ad-hoc types everywhere. If you see a program as a sequence of data-transformations, I guess the structs are the immediate representations encoded in the type system.

Let my first clarify what I mean by structs, as opposed to what the language says: A type that has all public data members, obeys the rule of zero, and is valid in any configuration. A typical struct v3 { float x{},y{},z{};}; 3d vector is a struct, std::vector is not.

These types are great. You can copy them around, use them with structured binding, they correctly propagate constness, and they are a great fit to pass them through layers of functions calls. And, when used as function parameters, they are great for evolving your program over time, because you can just change the single struct, as opposed to every function call that uses this parameter combination. Or you can easily batch, or otherwise ‘delay’, calls by recording the function parameters. Just throw the parameters into a container and execute the code later.

And with C++20, they got even better, because now you can use them with my favorite new feature: designated initializers, which allows you to use the member names at the initialization site and use RAII. E.g., for a struct that symbolizes an http request: struct http_request { http_method method; std::string url; std::vector<header_entry> headers; }; You can now initialize it like this:

auto request = http_request{
  .method = http_method::get,
  .uri = "localhost:7634",
  .headers = { { .name = "Authorization", .value = "Bearer TOKEN" } },
};

You can even use this directly as a parameter without repeating the type name, de facto giving your named parameters for a pair of extra curlys:

run_request({
    .method = http_method::get,
    .uri = "localhost:7634",
    .headers = { { .name = "Authorization", .value = "Bearer TOKEN" } },
});

You can, of course, combine this named-parameter style-struct with other function parameters in your API, but like with lambdas, I think they are most readable as the last parameter. Hence, also like with lambdas, you probably never want to have more than one at each call-site. I’m very happy with this new feature and it’s already making the code using my APIs a lot more readable.

return first example

It seems my “return first” post was not as enlightening as I had hoped. It was posted on reddit, and while the majority of commenters completely missed the point, it wasn’t really clear for those that did not just read the title. Either way, I am to blame for that – the examples and my reasoning were not very conclusive. So let me try clearing up the confusion with a better example.

First things first, here’s the mantra again: Whenever you want to call a function, ask yourself:

Can I return first?

But now to the example:

Parsing array braces

The task was to parse a string with a data-type in it. This was already working for single-value types, so we could parse "int", "double", "string" etc, via the function from_input_type. Now I was to extend it to also parse array definitions with one or two fixed dimensions, like "int[5]" or "double[4,7]".

My first attempt, implementing it as a constructor taking the definition string, looked like this:

auto suffix_begin = type_code.find('[');
if (suffix_begin == std::string::npos)
{
  this->type = from_input_type(type_code);
  return;
}

auto suffix_end = type_code.find(']', suffix_begin);
if (suffix_end == std::string::npos)
{
  throw std::invalid_argument("Malformed attribute type suffix: no end brace.");
}

auto type_tag = type_code.substr(0, suffix_begin);
this->type = from_input_type(type_tag);
auto in_brackets = type_code.substr(suffix_begin+1, suffix_end-suffix_begin-1);

auto separator = in_brackets.find(',');
if (separator == std::string::npos)
{
  this->rank = attribute_rank_t::1d;
  this->dim[0] = parse_size(in_brackets);
  return;
}
  
auto first = in_brackets.substr(0, separator);
auto second = in_brackets.substr(separator+1);

this->rank = attribute_rank_t::2d;
this->dim[0] = parse_size(first);
this->dim[1] = parse_size(second);

It’s not pretty, but it passed all the tests I set up for it. And this was pre-refactoring. I knew there was something else coming up: In a different constructor, we wanted to parse type definitions that look similar, but are not quite the same. Instead of 1 or 2 fixed dimensions, the brackets have to be empty there, e.g. "float[]" or "string[]". Note that they are still optional, it can still have single-values as well.
Now I wanted to reuse the code to locate the brackets, but the current structure wasn’t really well suited for that, with the member initialization spread all over the function. Obviously, the code parsing the contents of the brackets (from the auto separator = ... line down) was of no use for the second case, the first half is the interesting bit here. So I was looking at the calls to from_input_type in the upper half and asked myself: Can I return first, before calling this? The answer is, of course, yes.

struct type_with_brackets_t
{
  std::string_view type;
  std::string_view in_brackets;
  // There's a difference between empty brackets (e.g. string[])
  // and no brackets (e.g. string)
  bool has_brackets = false;
};

type_with_brackets_t split_type(std::string_view const& type_code)
{
  auto suffix_begin = type_code.find('[');
  if (suffix_begin == std::string::npos)
  {
    return {type_code, {}, false};
  }

  auto suffix_end = type_code.find(']', suffix_begin);
  if (suffix_end == std::string::npos)
  {
    throw std::invalid_argument("Malformed attribute type suffix: no end brace.");
  }

  auto type_tag = type_code.substr(0, suffix_begin);
  auto suffix = type_code.substr(suffix_begin+1, suffix_end-suffix_begin-1);
  return {type_tag, suffix, true};
}

With this, we can replace the upper half of the first function with:

auto [tag, in_brackets, has_brackets] = split_type(type_code);
this->type = from_input_type(tag);
if (!has_brackets)
  return;

/* continue parsing in_brackets */

The other “int[]” case can obviously be implemented very easiely now:

auto [tag, _, has_brackets] = split_type(type_code);
this->type = from_input_type(tag);
this->is_array = has_brackets;

Of course, when just extracting the code as a function, you could be tempted to also call from_input_type in that function, but return first guided us away from that. I think this is a very good outcome, as it clearly separates splitting the string and interpreting the parts, naturally eliminating the duplicated from_input_type call. You can still have a function that does both, if you want, by adding a small facade araound split_type that also does the conversion.

I hope this example cleared up the method a bit more. One reason why deeply nested function calls are so common is that most languages make it easier to pass parameters than return multiple values. You will often find that this style will require more custom data-types that are just used as function return values. But functions will naturally compose easier because you will bundle smaller pieces, e.g. in this case, you can use the function without from_input_type, and I believe that will pay off in the end.

return first

Let me introduce the “return first” method. Fear not, this is not a treatise on guard-clauses. It is, however, both a code design approach and a refactoring method. It starts with a simple question to ask yourself whenever you want to call a function:

Can I return first?

That’s supposed to be catchy, but it is probably not terribly enlightening. Let me demonstrate with an example. I’ll be using C++, but I dare say that this method can be used in all imperative languages.

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    if (some_filter_applies(each, context))
    {
      hand_off_to(each, target);
      continue;
    }

    process_one(each, context);
    hand_off_to(each, target);
  }
}

For the sake of this example, the three functions some_filter_applies, process_one and hand_off_to are immutable. Let us try to improve process_all by extracting a function:

void maybe_process_and_hand_off(
  input_info_t&amp; input,
  context_t const&amp; context,
  target_t const&amp; target)
{
  if (some_filter_applies(input, context))
  {
    hand_off_to(input, target);
    return;
  }

  process_one(input, context);
  hand_off_to(input, target);
}

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    maybe_process_and_hand_off(each, context, target);
  }
}

So what do we have now:

  1. 26 instead of 17 lines. 22 instead of 13 if we do not count lines with just braces, a 70% increase.
  2. A pretty clumsy name for the function called in the loop. Can you do better?
  3. We have to pass the target all the way to hand_off_to without really using it directly in maybe_process_and_hand_off.
  4. The complexity is more or less the same.

So that was not great. So let us try to use return first and focus on hand_off_to. What if instead of calling hand_off_to, we just return first, and then do it? In this case, it’s pretty easy, since hand_off_to is a tail-call in each case:

void maybe_process(
  input_info_t&amp; input,
  context_t const&amp; context)
{
  if (some_filter_applies(input, context))
  {
    return;
  }

  process_one(input, context);
}

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    maybe_process(each, context);
    hand_off_to(each, target);
  }
}

Now we no longer have to pass target through a function that does not need it, which makes both the call site and the function declaration simpler. Now a few more other refactorings are available. Let’s assume some_filter_applies is pure and process_one only changes its input parameter, as the signatures suggest. We can use loop fission, and inline the function again:

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    if (some_filter_applies(each, context))
    {
      continue;
    }

    process_one(each, context);
  }
  for (auto const&amp; each : input_list)
  {
    hand_off_to(each, target);
  }
}

“return first” actually works for all control structures, not just functions. So in this case we returned from the first loop before starting to call hand_off_to multiple times. Often times, the code will not be as easy to refactor, because there is actually some data flowing between the function we’re in and the one we’re calling. The simple solution then is to pack all the parameters into a struct and return that, aka using data as the interface instead.

void hand_off_individually(
  std::vector&lt;input_info_t&gt;&amp; input_list)
{
  for (auto&amp; each : input_list)
  {
    auto target = compute_target(each);
    if (!target.valid())
      continue;

    hand_off_to(each, target);
  }
}

That be turned into this:

void hand_off_individually(
  std::vector&lt;input_info_t&gt; const&amp; input_list)
{
  struct targeted
  {
    input_info_t info;
    target_t target;
  };
  std::vector&lt;targeted&gt; valid;
  for (auto const&amp; each : input_list)
  {
    auto target = compute_target(each);
    if (!target.valid())
      continue;
    valid.push_back({each, target});
  }

  for (auto const&amp; each : valid)
  {
    hand_off_to(each.info, each.target);
  }
}

This is definitely longer, and probably not worth the hassle for this contrived example – this is more to show how to do it, not that is is effective.

Evaluation

In real programs, this applying “return first” is often worthwhile. It makes the code flow “wider instead of deeper”, which is often easier to follow, especially if you try to debug or measure/profile your code. It is also a gold-recipe to enable batching, which is curcial whenever you’re dealing with latency, e.g. when using RAM or building web requests. Have you tried this technique before? Do you, maybe, know it by another name? Do tell!