Automated instance construction in C++

I’m currently mostly switching back and forth between C# and C++ projects. One of the things that I’m missing most when switching to C++ is a nice dependency-injection (DI) library. After checking out what was already available, I finally decided I wanted to try to build my own slim type-indexed variant. I quickly started by registering factories and instances in a map on std::type_index, making it possible to both have the DI retain ownership (with std::unique_ptr) or just make a type available via a bare pointer. So I was able to do things like:

// Register an instance
di.insert_unique(std::make_unique<foo_service>());
// Register a factory
di.insert_unique([] {return std::make_unique<bar_service>());
// Register an existing bare pointer
di.insert_bare(my_bare_thingy);

// ... and retrieve them
auto& foo = di.get<foo_service>();

One of the most powerful aspects of a DI library is the ability to transitively setup dependencies. I like constructor injection the most, so I implemented a very naive way like this:

di.insert_unique([](auto& p) { return std::make_unique<complex_service>(
  p.get<base_service1>(), p.get<base_service2>(), p.get<base_service3>());
});

This is pretty verbose and we basically have to repeat all the constructor parameter types. But it’s easy to implement. We can do a little bit better by using a templated type-conversion operator and using it to call the get:

class service_provider
{
  struct inferred_locator
  {
    service_provider const* provider;
    template <class T> operator T&() const
    {
      return provider->get<std::remove_const_t<T>>();
    }
  };
  
  inferred_locator get() const
  {
    return { .provider = this };
  }
  
  /** typed get implementations here... */
};

Now we can reduce the previous registration to:

di.insert_unique([](auto& p) { 
  return std::make_unique<complex_service>(p.get(), p.get(), p.get());
});

That is basically only the number of constructor parameters in a verbose way. We could write a small template that takes the number, creates an std::index_sequence from it and then unpacks each index into an invokation of service_provider::get. But then we would still have to update registrations when adding (or removing) a new dependency to a services’s constructor. With a litte more work, we can actually get this instead:

di.insert_unique<complex_service>();

This partly inspired by Antony Polukhin’s C++ reflection talks, and combines std::index_sequence based unpacking, SFINEA and the templated type-conversion operator:

template <class T, std::size_t Head, std::size_t... Rest>
constexpr auto make_unique_impl(provider_wrapper const& p,
    std::index_sequence<Head, Rest...>,
    decltype(T{ mimic{ Head }, mimic{ Rest }... }) * = nullptr) -> std::unique_ptr<T>
{
    // This next requirement is so we do not accidentally recurse into the copy/move-ctors
    static_assert(sizeof...(Rest) + 1 > 1, "Can only deduce constructors with two or more parameters.");
    return std::make_unique<T>(p(Head), p(Rest)...);
}

template <class T, std::size_t... Rest>
constexpr auto make_unique_impl(provider_wrapper const& p, std::index_sequence<Rest...>) -> std::unique_ptr<T>
{
    // This next requirement is so we do not accidentally recurse into the copy/move-ctors
    static_assert(sizeof...(Rest) > 1, "Can only deduce constructors with two or more parameters.");
    return make_unique_impl<T>(p, std::make_index_sequence<sizeof...(Rest) - 1>{});
}

template <class T, std::size_t Max = 8> auto make_unique(service_provider const& p)
{
    return make_unique_impl<T>(provider_wrapper{ &p }, std::make_index_sequence<Max>{});
}

This uses two new types: mimic, which is only used for SFINEA, takes std::size_t on construction (for the unpacking from the std::index_sequence) and converts to anything (templated type conversion again) and the provider_wrapper, which is a simple adaptor around service_provider that takes an unused std::size_t argument (again, for unpacking). The first overload of make_unique_impl is slightly more specialized (because it has Head and Rest), so the compiler tries it first. If it works, it returns a new instance of the service we want. Otherwise, it will fail without an error due to SFINEA in the unused and defaulted third parameter. The compiler will then try the second overload, which will recurse to a variant with fewer parameters. The outermost make_unique starts this recursion with 8 parameters, because that should be enough for any sane service. I stop this recursion at one constructor parameter, even though that is a useful configuration. This is because I have not yet found a way to avoid calling the copy or move constructors accidentally. If anyone knows how to do that, I’d be very happy to hear how. My workaround right now is to explicitly register a factory in that case.

Reading a conanfile.txt from a conanfile.py

I am currently working on a project that embeds another library into its own source tree via git submodules. This is currently convenient because the library’s development is very much tied to the host project and having them both in the same CMake project cuts down dramatically on iteration times. Yet, that library already has its own conan dependencies in a conanfile.txt. Because I did not want to duplicate the dependency information from the library, I decided to pull those into my host projects requirements programmatically using a conanfile.py.

Luckily, you can use conan’s own tools for that:

from conans.client.loader import ConanFileTextLoader

def load_library_conan(recipe_folder):
    text = Path(os.path.join(recipe_folder, "libary_folder", "conanfile.txt")).read_text()
    return ConanFileTextLoader(text)

You can then use that in your stage methods, e.g.:

    def config_options(self):
        for line in load_library_conan(self.recipe_folder).options.splitlines():
            (key, value) = line.split("=", 2)
            (library, option) = key.split(":", 2)
            setattr(self.options[library], option, value)

    def requirements(self):
        for x in load_library_conan(self.recipe_folder).requirements:
            self.requires(x)

I realize this is a niche application, but it helped me very much. It would be cool if conan could delegate into subfolders natively, but I did not find a better way to do this.

Metal in C++ with SDL2

Metal, Cupertino’s own graphics API, is sort of a middle-ground in complexity between OpenGL and Vulkan. I’ve wanted to try it for a while, but the somewhat tight integration into Apple’s ecosystem (ObjectiveC/Swift and XCode) has so far prevented that. My graphics projects are usually using C++ and CMake, so I wanted a solution that worked with that. Apple released Metal-cpp last year and newer SDL2 versions (since 2.0.14) can create a window that supports drawing to it with metal. Here’s how to weld that together (with minimal ObjectiveC).

metal-cpp

I get the metal-cpp code from the linked website (the download is at step 1). I add a library in CMake that builds a single source file that compiles the metal-cpp implementation with the ??_PRIVATE_IMPLEMENTATION macros as described on the page (see step 3). That target also exports the includes to be used later.

SDL window and view

Next, I use conan to install SDL2. After SDL_Init, I call SDL_CreateWindow to create my window. I do not specify SDL_WINDOW_OPENGL (or in the SDL_CreateWindow‘s flags, or next step will fail. After that, I use SDL_Metal_CreateView from SDL_metal.h to create a metal view. This is where things get a little bit icky. I create a metal device using MTL::CreateSystemDefaultDevice(); but I still need to assign it to the view I just created. I’m doing that in ObjectiveC++. In a new .mm file I add a small function to do that:

void assign_device(void* layer, MTL::Device* device)
{
  CAMetalLayer* metalLayer = (CAMetalLayer*) layer;
  metalLayer.device = (__bridge id<MTLDevice>)(device);
}

I use a small .h file to expose this function to my C++ code like any other free function. There’s another helper I create in the .mm file:

CA::MetalDrawable* next_drawable(void* layer)
{
  CAMetalLayer* metalLayer = (CAMetalLayer*) layer;
  id <CAMetalDrawable> metalDrawable = [metalLayer nextDrawable];
  CA::MetalDrawable* pMetalCppDrawable = ( __bridge CA::MetalDrawable*) metalDrawable;
  return pMetalCppDrawable;
}

At the beginning of each frame, I use that together with SDL_Metal_GetLayer to get a texture to render to:

auto surface = next_drawable(SDL_Metal_GetLayer(view));

Next I create a render pass descriptor that starts by clearing that drawable with our fancy red:

MTL::ClearColor clear_color(152.0/255.0, 23.0/255.0, 42.0/255.0, 1.0);
auto pass_descriptor = MTL::RenderPassDescriptor::alloc()->init();
auto attachment = pass_descriptor->colorAttachments()->object(0);
attachment->setClearColor(clear_color);
attachment->setLoadAction(MTL::LoadActionClear);
attachment->setTexture(surface->texture());

And fire that off to the GPU using a command buffer and render encoder:

auto buffer = queue->commandBuffer();
auto encoder = buffer->renderCommandEncoder(pass_descriptor);
encoder->endEncoding();
buffer->presentDrawable(surface);
buffer->commit();

There you have it, a minimal running metal application. Still a long ways from the traditional “Hello Triangle”, but most metal examples that show how to do that can easily be translated to the C++ API. Note that you probably have to take some extra steps to compile metal shaders (aka MSL). You can either load them from source or precompile them using the command line tools.

My favorite C++20 feature

As I evolved my programming style away from mutating long-lived “big” objects and structures and towards are more functional and data-oriented style based mainly on pure functions, I also find myself needing a lot more structs. These naturally occur as return types for functions with ‘richer’ output if you do not want to use std::tuple or other ad-hoc types everywhere. If you see a program as a sequence of data-transformations, I guess the structs are the immediate representations encoded in the type system.

Let my first clarify what I mean by structs, as opposed to what the language says: A type that has all public data members, obeys the rule of zero, and is valid in any configuration. A typical struct v3 { float x{},y{},z{};}; 3d vector is a struct, std::vector is not.

These types are great. You can copy them around, use them with structured binding, they correctly propagate constness, and they are a great fit to pass them through layers of functions calls. And, when used as function parameters, they are great for evolving your program over time, because you can just change the single struct, as opposed to every function call that uses this parameter combination. Or you can easily batch, or otherwise ‘delay’, calls by recording the function parameters. Just throw the parameters into a container and execute the code later.

And with C++20, they got even better, because now you can use them with my favorite new feature: designated initializers, which allows you to use the member names at the initialization site and use RAII. E.g., for a struct that symbolizes an http request: struct http_request { http_method method; std::string url; std::vector<header_entry> headers; }; You can now initialize it like this:

auto request = http_request{
  .method = http_method::get,
  .uri = "localhost:7634",
  .headers = { { .name = "Authorization", .value = "Bearer TOKEN" } },
};

You can even use this directly as a parameter without repeating the type name, de facto giving your named parameters for a pair of extra curlys:

run_request({
    .method = http_method::get,
    .uri = "localhost:7634",
    .headers = { { .name = "Authorization", .value = "Bearer TOKEN" } },
});

You can, of course, combine this named-parameter style-struct with other function parameters in your API, but like with lambdas, I think they are most readable as the last parameter. Hence, also like with lambdas, you probably never want to have more than one at each call-site. I’m very happy with this new feature and it’s already making the code using my APIs a lot more readable.

return first example

It seems my “return first” post was not as enlightening as I had hoped. It was posted on reddit, and while the majority of commenters completely missed the point, it wasn’t really clear for those that did not just read the title. Either way, I am to blame for that – the examples and my reasoning were not very conclusive. So let me try clearing up the confusion with a better example.

First things first, here’s the mantra again: Whenever you want to call a function, ask yourself:

Can I return first?

But now to the example:

Parsing array braces

The task was to parse a string with a data-type in it. This was already working for single-value types, so we could parse "int", "double", "string" etc, via the function from_input_type. Now I was to extend it to also parse array definitions with one or two fixed dimensions, like "int[5]" or "double[4,7]".

My first attempt, implementing it as a constructor taking the definition string, looked like this:

auto suffix_begin = type_code.find('[');
if (suffix_begin == std::string::npos)
{
  this->type = from_input_type(type_code);
  return;
}

auto suffix_end = type_code.find(']', suffix_begin);
if (suffix_end == std::string::npos)
{
  throw std::invalid_argument("Malformed attribute type suffix: no end brace.");
}

auto type_tag = type_code.substr(0, suffix_begin);
this->type = from_input_type(type_tag);
auto in_brackets = type_code.substr(suffix_begin+1, suffix_end-suffix_begin-1);

auto separator = in_brackets.find(',');
if (separator == std::string::npos)
{
  this->rank = attribute_rank_t::1d;
  this->dim[0] = parse_size(in_brackets);
  return;
}
  
auto first = in_brackets.substr(0, separator);
auto second = in_brackets.substr(separator+1);

this->rank = attribute_rank_t::2d;
this->dim[0] = parse_size(first);
this->dim[1] = parse_size(second);

It’s not pretty, but it passed all the tests I set up for it. And this was pre-refactoring. I knew there was something else coming up: In a different constructor, we wanted to parse type definitions that look similar, but are not quite the same. Instead of 1 or 2 fixed dimensions, the brackets have to be empty there, e.g. "float[]" or "string[]". Note that they are still optional, it can still have single-values as well.
Now I wanted to reuse the code to locate the brackets, but the current structure wasn’t really well suited for that, with the member initialization spread all over the function. Obviously, the code parsing the contents of the brackets (from the auto separator = ... line down) was of no use for the second case, the first half is the interesting bit here. So I was looking at the calls to from_input_type in the upper half and asked myself: Can I return first, before calling this? The answer is, of course, yes.

struct type_with_brackets_t
{
  std::string_view type;
  std::string_view in_brackets;
  // There's a difference between empty brackets (e.g. string[])
  // and no brackets (e.g. string)
  bool has_brackets = false;
};

type_with_brackets_t split_type(std::string_view const& type_code)
{
  auto suffix_begin = type_code.find('[');
  if (suffix_begin == std::string::npos)
  {
    return {type_code, {}, false};
  }

  auto suffix_end = type_code.find(']', suffix_begin);
  if (suffix_end == std::string::npos)
  {
    throw std::invalid_argument("Malformed attribute type suffix: no end brace.");
  }

  auto type_tag = type_code.substr(0, suffix_begin);
  auto suffix = type_code.substr(suffix_begin+1, suffix_end-suffix_begin-1);
  return {type_tag, suffix, true};
}

With this, we can replace the upper half of the first function with:

auto [tag, in_brackets, has_brackets] = split_type(type_code);
this->type = from_input_type(tag);
if (!has_brackets)
  return;

/* continue parsing in_brackets */

The other “int[]” case can obviously be implemented very easiely now:

auto [tag, _, has_brackets] = split_type(type_code);
this->type = from_input_type(tag);
this->is_array = has_brackets;

Of course, when just extracting the code as a function, you could be tempted to also call from_input_type in that function, but return first guided us away from that. I think this is a very good outcome, as it clearly separates splitting the string and interpreting the parts, naturally eliminating the duplicated from_input_type call. You can still have a function that does both, if you want, by adding a small facade araound split_type that also does the conversion.

I hope this example cleared up the method a bit more. One reason why deeply nested function calls are so common is that most languages make it easier to pass parameters than return multiple values. You will often find that this style will require more custom data-types that are just used as function return values. But functions will naturally compose easier because you will bundle smaller pieces, e.g. in this case, you can use the function without from_input_type, and I believe that will pay off in the end.

return first

Let me introduce the “return first” method. Fear not, this is not a treatise on guard-clauses. It is, however, both a code design approach and a refactoring method. It starts with a simple question to ask yourself whenever you want to call a function:

Can I return first?

That’s supposed to be catchy, but it is probably not terribly enlightening. Let me demonstrate with an example. I’ll be using C++, but I dare say that this method can be used in all imperative languages.

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    if (some_filter_applies(each, context))
    {
      hand_off_to(each, target);
      continue;
    }

    process_one(each, context);
    hand_off_to(each, target);
  }
}

For the sake of this example, the three functions some_filter_applies, process_one and hand_off_to are immutable. Let us try to improve process_all by extracting a function:

void maybe_process_and_hand_off(
  input_info_t&amp; input,
  context_t const&amp; context,
  target_t const&amp; target)
{
  if (some_filter_applies(input, context))
  {
    hand_off_to(input, target);
    return;
  }

  process_one(input, context);
  hand_off_to(input, target);
}

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    maybe_process_and_hand_off(each, context, target);
  }
}

So what do we have now:

  1. 26 instead of 17 lines. 22 instead of 13 if we do not count lines with just braces, a 70% increase.
  2. A pretty clumsy name for the function called in the loop. Can you do better?
  3. We have to pass the target all the way to hand_off_to without really using it directly in maybe_process_and_hand_off.
  4. The complexity is more or less the same.

So that was not great. So let us try to use return first and focus on hand_off_to. What if instead of calling hand_off_to, we just return first, and then do it? In this case, it’s pretty easy, since hand_off_to is a tail-call in each case:

void maybe_process(
  input_info_t&amp; input,
  context_t const&amp; context)
{
  if (some_filter_applies(input, context))
  {
    return;
  }

  process_one(input, context);
}

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    maybe_process(each, context);
    hand_off_to(each, target);
  }
}

Now we no longer have to pass target through a function that does not need it, which makes both the call site and the function declaration simpler. Now a few more other refactorings are available. Let’s assume some_filter_applies is pure and process_one only changes its input parameter, as the signatures suggest. We can use loop fission, and inline the function again:

void process_all(
  std::vector&lt;input_info_t&gt;&amp; input_list,
  context_t const&amp; context,
  target_t const&amp; target)
{
  for (auto&amp; each : input_list)
  {
    if (some_filter_applies(each, context))
    {
      continue;
    }

    process_one(each, context);
  }
  for (auto const&amp; each : input_list)
  {
    hand_off_to(each, target);
  }
}

“return first” actually works for all control structures, not just functions. So in this case we returned from the first loop before starting to call hand_off_to multiple times. Often times, the code will not be as easy to refactor, because there is actually some data flowing between the function we’re in and the one we’re calling. The simple solution then is to pack all the parameters into a struct and return that, aka using data as the interface instead.

void hand_off_individually(
  std::vector&lt;input_info_t&gt;&amp; input_list)
{
  for (auto&amp; each : input_list)
  {
    auto target = compute_target(each);
    if (!target.valid())
      continue;

    hand_off_to(each, target);
  }
}

That be turned into this:

void hand_off_individually(
  std::vector&lt;input_info_t&gt; const&amp; input_list)
{
  struct targeted
  {
    input_info_t info;
    target_t target;
  };
  std::vector&lt;targeted&gt; valid;
  for (auto const&amp; each : input_list)
  {
    auto target = compute_target(each);
    if (!target.valid())
      continue;
    valid.push_back({each, target});
  }

  for (auto const&amp; each : valid)
  {
    hand_off_to(each.info, each.target);
  }
}

This is definitely longer, and probably not worth the hassle for this contrived example – this is more to show how to do it, not that is is effective.

Evaluation

In real programs, this applying “return first” is often worthwhile. It makes the code flow “wider instead of deeper”, which is often easier to follow, especially if you try to debug or measure/profile your code. It is also a gold-recipe to enable batching, which is curcial whenever you’re dealing with latency, e.g. when using RAM or building web requests. Have you tried this technique before? Do you, maybe, know it by another name? Do tell!

3 good uses for the C++ preprocessor in 2020

As this weird year, 2020, comes to a close, I noticed that I am still using the preprocessor in my C++ programs. And not just for #includes which might, at last, slowly fade away with C++20’s modules. The preprocessor’s got a pretty bad rep, and new C++ programmers are usually taught to stay as far away as possible. Justifiably so – some things, like the dreaded X-Macros really should go the way of the dinosaurs.

But there are still some good uses left in the thing, and here’s my top 3 of those:

0. Commenting out big-chunks of code

I’ve often seen people comment out big parts of code with block comments: /* this is not active */. However, that will only work as long as the code does not contain any other block comments, let alone a stray */ in a string. A great alternative is to use the preprocessor:

#if 0
auto i_do_not_want_to_compile_this() -> auto
{
  std::vector<std::deque<std::mutex>> baz{};
  return baz;
}
#endif

This can easiely by wrapped multiple times around bigger parts of code, which is very helpful when refactoring large chunks of legacy code. It can very easiely be toggled on and off while in this state. And the IDE will usually still show a dimmed version of syntax highlighting in the disabled region.

1. Conditionally throw away “cross-cutting” concerns

Some parts of aspects of programs can be “cross-cutting”, which means they cannot easiely be separated from the rest of the code-base by putting them in a separate module. The most prominent example is probably logging. While you can typically modularize the actual implementation, the actual log calls will be all over your code. Another of those concerns is “profiling”. This is also something that you typically want to take out of your application when deploying it, because users will rarely profile the end-product. Again, the preprocessor comes to the rescue. For example, in the excellent Optick, most of the code you insert is actually macros that can be completely eliminated with a simple compile-time switch. Consider this “tag” that add some additional metric to your profile:

OPTICK_TAG("CoolMetric", compute_cool_metric());

When Optick is turned off via the aforementioned compile-time switch, compute_cool_metric() is never called. The call is not even compiled. Just turning Optick off will completely remove it from your source. Now this can be potentially dangerous, if the function has a side effect, but you wouldn’t do that anyways, would you?

2. Making forward declarations more visible

Presumably owing to its history as a continuously-evolved language, C++ has a very limited set of reserved keywords, often avoiding to introduce keywords to not interfere with any working software out there. Do not get me wrong, that is a great reason. But because of this, some language constructs can sometimes be a bit cryptic, for example forward declarations: class will_be_defined;. If you ever worked with a big, old or big and old code-base with lots of those, you probably know that maintaining them can be a bit of a chore and prone to error. So I think it is a great idea to at least make them more visible with your own macro “KEYWORD”:

#define FORWARD_DECL(x) class x

FORWARD_DECL(will_be_defined);

That FORWARD_DECL immediatly stands out visually and helps me keep track of those subtle declarations.

Crashes when returning references to vector elements

Recently, I was experiencing a strange crash that I traced to a piece of C++ code looking more or less like this:

template <class T>
class container
{
public:
  std::vector<T> values_;
  T default_;

  T const& get() const
  {
    if (values_.empty())
      return default_;
    return values.front();
  }
};

This was crashing when calling get(), with a non-empty values_ member. It looks fairly innocent. And it ran in production for a couple of years already. So what changed?

I had, in fact, never instanciated this template with T = bool before. And that was causing the crash, while still compiling without any errors. Now if you’re a little versed in the C++ standard library you might know that std::vector is a special snowflake indeed. In an effort to save space, and, I suspect, prove the usefulness of template specializations, it is not really a “normal” container holding bool values. Instead, it holds some type of integers and packs each pseudo-bool into one of their bits. The consequence is that the accessor functions like operator[], front() and back() cannot return a reference to a bool. Instead, they return a “proxy” object that supports assignment to and from a bool.

Back to the get() function: it tries to return a reference to a bool. Of course, that bool doesn’t really exist except as a temporary, and so this results in a dangling reference that causes a segmentation fault when used.

I suspect there could have been a warning about a dangling reference somewhere there. I have seen clang-tidy especially report things like this (with a few false positives too), but it did not show up for me. To fix it, I am now just returning a bool instead of a bool const& for T = bool. A special case in my case to work around a special case in std::vector.

Data-Oriented Design: Using data as interfaces

A Code Centric World

In main-stream OOP, polymorphism is achieved by virtual functions. To reuse some code, you simply need one implementation of a specific “virtual” interface. Bigger programs are composed by some functions calling other functions calling yet other functions. Virtual functions introduce a flexibility here to that allow parts of the call tree to be replaced, allowing calling functions to be reused by running on different, but homogenuous, callees. This is a very “code centric” view of a program. The data is merely used as context for functions calling each other.

Duality

Let us, for the moment, assume that all the functions and objects that such a program runs on, are pure. They never have any side effects, and communicate solely via parameters to and return values from the function. Now that’s not traditional OOP, and a more functional-programming way of doing things, but it is surely possible to structure (at least large parts of) traditional OOP programs that way. This premise helps understanding how data oriented design is in fact dual to the traditional “code centric” view of a program: Instead of looking at the functions calling each other, we can also look at how the data is being transformed by each step in the program because that is exactly what goes into, and comes out of each function. IS-A becomes “produces/consumes compatible data”.

Cooking without functions

I am using C# in the example, because LINQ, or any nice map/reduce implementation, makes this really staight-forward. But the principle applies to many languages. I have been using the technique in C++, C#, Java and even dBase.
Let’s say we have a recipe of sorts that has a few ingredients encoded in a simple class:

class Ingredient
{
  public string Name { get; set; }
  public decimal Amount { get; set; }
}

We store them in a simple List and have a nice function that can compute the percentage of each ingredient:

public static IReadOnlyList<(string, decimal)> 
    Percentages(IEnumerable<Ingredient> incredients)
{
  var sum = incredients.Sum(x => x.Amount);
    return incredients
      .Select(x => (x.Name, x.Amount / sum))
      .ToList();
}

Now things change, and just to make it difficult, we need a new ingredient type that is just a little more complicated:

class IngredientInfo
{
  public string Name { get; set; }
  /* other useful stuff */
}

class ComplicatedIngredient
{
  public IngredientInfo Info { get; set; }
  public decimal Amount { get; set; }
}

And we definitely want to use the old, simple one, as well. But we need our percentage function to work for recipes that have both Ingredients and also ComplicatedIngredients. Now the go-to OOP approach would be to introduce a common interface that is implemented by both classes, like this:

interface IIngredient
{
  string GetName();
  string GetAmount();
}

That is trivial to implement for both classes, but adds quite a bunch of boilerplate, just about doubling the size of our program. Then we just replace IReadOnlyList<Ingredient> by IReadOnlyList<IIngredient> in the Percentage function. That last bit is just so violating the Open/Closed principle, but just because we did not use the interface right away (Who thought YAGNI was a good idea?). Also, the new interface is quite the opposite of the Tell, don’t ask principle, but there’s no easy way around that because the “Percentage” function only has meaning on a List<> of them.

Cooking with data

But what if we just use data as the interface? In this case, it so happens that we can easiely turn a ComplicatedIngredient into an Ingredient for our purposes. In C#’s LINQ, a simple Select() will do nicely:

var simplified = complicated
  .Select(x => new Ingredient
   { 
     Name = x.Info.Name,
     Amount = x.Amount
   });

Now that can easiely be passed into the Percentages function, without even touching it. Great!

In this case, one object could neatly be converted into the other, which is often not the case in practice. However, there’s often a “common denominator class” that can be found pretty much the same way as extracting a common interface would. Just look at the info you can retrieve from that imaginary interface. In this case, that was the same as the original Ingredients class.

Further thoughts

To apply this, you sometimes have to restructure your programs a little bit, which often means going wide instead of deep. For example, you might have to convert your data to a homogenuous form in a preprocessing step instead of accessing different objects homogenuously directly in your algorithms, or use postprocessing afterwards.
In languages like C++, this can even net you a huge performance win, which is often cited as the greatest thing about data-oriented design. But, first and foremost, I find that this leads to programs that are easier to understand for both machine and people. I have found myself using this data-centric form of code reuse a lot more lately.

Are you using something like this as well or are you still firmly on the override train, and why? Tell me in the comments!

C++ pass-thru parameters

So in ye olde days, before C++11 and move semantics, it was common for functions to use mutable references to pass container-content to the caller, like this:

void random_between(std::vector<int>& out,
  int left, int right, std::size_t N)
{
  std::uniform_int_distribution<> 
    distribution(left, right);
  for (std::size_t i = 0; i < N; ++i)
    out.push_back(distribution(rng));
}

and you would often use it like this:

std::vector<int> numbers;
random_between(numbers, 7, 42, 10);

Basically trading expressiveness and convenience for speed/efficiency.

Convenience is king

Now obviously, those days are over. With move-semantics and guaranteed copy-elision backing us up, it is usually fine to just return the filled container, like this:

std::vector<int> random_between(int left, int right,
  std::size_t N)
{
  std::vector<int> out;
  std::uniform_int_distribution<>
    distribution(left, right);
  for (std::size_t i = 0; i < N; ++i)
    out.push_back(distribution(rng));
  return out;
}

Now you no longer have to initialize the container to use this function and the function also became pure, clearly differentiating between its inputs and outputs.

Mostly better?

However, there is a downside: Before, the function could be used to append multiple runs into the same container, like this:

std::vector<int> numbers;
for (int i = 0; i < 5; ++i)
  random_between(numbers, 50*i + 7, 50*i + 42, 10);

That use case suddenly became a lot harder. Also, what if you want to keep your vector around and just .clear() it before calling the function again later, to save allocations? That’s also no longer possible. I am not saying that these two use cases should make you prefer the old variant, as they tend not to happen very often. But when they do, it’s all the more annoying. So what if we could have your cake and eat it, too?

A Compromise

How about this:

std::vector<int> random_between(int left, int right,
  std::size_t N, std::vector<int> out = {})
{
  std::uniform_int_distribution<>
    distribution(left, right);
  for (std::size_t i = 0; i < N; ++i)
    out.push_back(distribution(rng));
  return out;
}

Now you can use it to just append again:

std::vector<int> numbers;
for (int i = 0; i < 5; ++i)
  numbers = random_between(
    50*i + 7, 50*i + 42, 10, std::move(numbers));

But you can also use it in the straightforward way, for the hopefully more common case:

auto numbers = random_between(
  50*i + 7, 50*i + 42, 10);

Now you should definitely not do this with all your functions returning a container. But it is a nice pattern to have up your sleeve when the need arises. It should be noted that passing a mutable reference can still be faster in some cases, as that will save you two moves. And you can also add a container-returning facade variant as an overload, but I think this pattern is a very nice compromise that can be implemented by moving a single variable to the parameter list and defaulting it. It keeps 99% of the use cases identically to the original container-returning variant, while making the “append” use slightly more verbose, but also more expressive.