The best of both worlds: scoped_flags

C++11 introduced a pretty nice change to enum types in C++, the scoped enumeration. They mostly supersede the old unscoped enumeration, which was inherited from C and had a few shortcomings. For example, the names in the enumeration where added to its parent scope. This means that given an enum colors {red, green blue}; you can simply say auto my_color = red;. This can, of course, lead to ambiguities and people using some weird workarounds like putting the enums in namespaces or prefixing all elements á la hungarian-notation. Also, unscoped enumerations are not particularly type-safe: they can be converted to integer types and back without any special consideration, so you can write things like int x = red; without the compiler complaining.
Scoped enumerations improves both theses aspects: with enum class colors {red, green, blue};, you have to use auto my_color = colors::red; and int x = colors::red; will simply not compile.
To get the second part to compile, you need to insert a static_cast: int x = static_cast(colors::red); which is purposefully a lot more verbose. Now this is a bit of a blessing and a curse. Of course, this is a lot more type-safe, but it make one really common usage pattern with enums very cumbersome: bit flags.

Did this get worse?

While you could previously use the bit operators to combine different bitmasks defined as enums, scoped enumerations will only let you do that if you cast them first. In other words, type-safety prevents us from combining flags because the result might, of course, no longer be a valid enum.
However, we can still get the convenience and compactness of bit flags with a type that represents combinations bitmasks from a specific enum type. Oh, this reeks of a template. I give you scoped_flags, which you can use like this:

enum class window_flags
{
  has_border = 1 << 0,
  has_caption = 1 << 1,
  is_child = 1 << 2,
  /* ... */
};
void create_window(scoped_flags<window_flags> flags);

void main()
{
  create_window({window_flags::has_border, window_flags::has_caption});
}

scoped_flags<window_flags> something = /* ... */

// Check a flag
bool is_set = something.test(window_flags::is_child);

// Remove a flag
auto no_border = something.without(window_flags::has_border);

// Add a flag
auto with_border = something.with(window_flags::has_border);

Current implementation

You can find my current implementation on this github gist. Even in its current state, I find it a niftly little utility class that makes unscoped enumerations all but legacy code.
I opted not to replicate the bitwise operator syntax, because &~ for “without” is so ugly, and ~ alone makes little sense. A non-explicit single-argument constructor makes usage with a single flag as convenient as the old C-style variant, while the list construction is just a tiny bit more complicated.
The implementation is not complete or final yet; for example without is missing an overload that gets a list of flags. After my previous adventures with initializer_lists, I’m also not entirely sure whether std::initializer_list should be used anywhere but in the c’tor. And maybe CTAD could make it more comfortable? Of course, everything here can be constexpr‘fied. Do you think this is a useful abstraction? Any ideas for improvements? Do tell!

std::initializer_list considered evil

I am so disappointed in you, std::initializer_list. You are just not what I thought you were.

Lights out

While on the train to Meeting C++ this year, I was working on the lighting subsystem of the 3D renderer for my game abstractanks. Everything was looking fine, until I switched to the release build. Suddenly, my sun light went out. All the smaller lights were still there, it just looked like night instead of day.
Now stuff working in Debug and not working in Release used to be quite common and happens when you’re not correctly initializing built-in variables. So I went digging, but it was not as easy as I had thought. Several hours later, I tracked the problem down to my global light’s uniform buffer initialization code. This is a buffer that is sent to the GPU so the shaders can read all the lighting information. It looked like a fairly innocent for-loop doing byte-copies of matrices and vectors to a buffer:

using Pair = std::pair;
auto Mapping = std::initializer_list{
  {ShadowMatrix.ptr(), MATRIX_BYTE_SIZE},
  {LightDirection.ptr(), VECTOR4_BYTE_SIZE},
  {ColorAndAmbient.ptr(), VECTOR4_BYTE_SIZE}
};

std::size_t Offset = 0;
for (auto const& Each : Mapping)
{
  mUniformBuffer.SetSubData(GL_UNIFORM_BUFFER, Each.second, Offset, Each.first);
  Offset += Each.second;
}

The Culprit

After mistakenly blaming alignment issues for a while, I finally tried looking at the values of Each.second and Each.first. To my surprise, they were bogus. Now what is going on there? It turns out not writing this in almost-always-auto style, i.e. using direct- instead of copy-initialization fixes the problem, so there’s definitely a lifetime issue here.

Looking at the docs, it became apparent that std::initializer_list is indeed a reference-type that automatically creates a value-type (the backing array) internally and keeps it alive exactly as binding a reference to that array would. For the common cases, i.e. when std::initializer_list is used as a parameter, this is fine, because the original list lives for the whole function-call expression. For the direct-initialization case, this is also fine, since the reference-like lifetime-extension kicks in. But for copy-initialization, the right-hand-side is done after the std::initializer_list is copied. So the backing array is destroyed. Oops.

Conclusion and alternatives

Do not use std::initializer_list unless as a function parameter. It works well for that, and is surprising for everything else. In my case, a naive “extract variable” refactoring of for (auto const& each : {a, b, c}) { /* ... */ } led me down this rabbit hole.
My current alternative is stupidly simple: a built-in array on the stack:

using Pair = std::pair;
Pair Mapping[]{
  {ShadowMatrix.ptr(), MATRIX_BYTE_SIZE},
  {LightDirection.ptr(), VECTOR4_BYTE_SIZE},
  {ColorAndAmbient.ptr(), VECTOR4_BYTE_SIZE}
};

It does the same thing as the “correct” version of the std::initializer_list, and if you try to use it AAA-style, at least clang will give you this nice warning: warning: temporary whose address is used as value of local variable 'Mapping' will be destroyed at the end of the full-expression [-Wdangling]

Non-determinism in C++

A deterministic program, when given the same input, will always result in the same output. This intuitive, albeit quite fuzzily defined, property is often times pretty important for correct program. Sources of non-determinism can be quite subtle – and once they creep into your program, they can propagate and amplify and have enormous consequences. It is pretty much the well-known butterfly effect.

When discussing this problem, it is important to know what exactly makes up the input and the output of the program. For example, when logging times to a logfile, and considering this an actual output, no two runs will ever be the same – so this is usually not considered an output relevant for determinism. Which brings us to the first common source of non-determinism:

Time

If any part of your program depends on the time it is run at, it will be easily be non-deterministic. Common cases are using the time to initializing some variable depending on the time, or using the time for some kind of numerical integration, like computing a value over time. Also, using execution time as an output respective for determinism is hopeless on a normal desktop computer – but can be crucial for a real-time system.

Random number generation

Random number generation seems like an obvious candidate, yet most random number generators are not really random, but only pseudo-random. For example, std::mersenne_twister_engine will generate the same sequence of values every time, when initialized with the same seed. So do not initialize this with a non-deterministic input like the time, and it will be predictable. However, std::random_device might not share this property and give you fresh non-deterministic input. As a weird middle ground, std::default_random_engine will probably give you the same results when compiled with the same compiler/standard-lib, but on another compiler version or OS, it will not. Subtle.

The allocator

Another source of non-determinism that is pretty tricky is the allocator. For example, consider the following piece of code:

template <class T>
T sum(std::set<Thingy*> const& set)
{
  T result{};
  for (auto const& each : set)
    result += each->value();
  return result;
}

Is this deterministic or not? It depends. Now let’s assume that all the Thingys were allocated using standard new. In that case, the actual pointers, Thingy* are non-deterministic, and hence the order of the Thingy*s in the set is random. But does this matter? Well if T is std::uint32_t, it does not. Order in addition does not matter for unsigned integers, even with overflows. However, if T is float, then it does matter and the whole result becomes unpredictable, at least in the general case (it will even be predictable, if e.g. all the numbers in the computation are integers that are exactly representable as floats). Other languages have “insertion-ordered” containers to get around this problem. A sensible approximation in C++ is to use the (unordered_)set and (unordered_)map containers together with another list to iterate on.

The thread scheduler

When you cannot really control the order of instructions, which is really the whole point of threading, you will have a harder time making things deterministic. Like the allocator problem, this is usually also paired with floating-point arithmetic. The workaround here is to make sure that the order of computation does not influence the final result. One common way around this is to sort the output by a unique criteria. For example, if you use multiple threads to report the intersections of a bunch of line segments, you can later sort them by their position in space.

There’s of course the honorable mention for uninitialized variables, but I’m sure your static analyzer will complain about it. Any interaction with the “outside” of your program, any side-effect, be it filesystems, user input, output or cosmic radiation can lead to non-determinism, so be sure to know the context well enough and plan accordingly to your determinism requirements.

Integrating conan, CMake and Jenkins

In my last posts on conan, I explained how to start migrating your project to use a few simple conan libraries and then how to integrate a somewhat more complicated library with custom build steps.

Of course, you still want your library in CI. We previously advocated simply adding some dependencies to your source tree, but in other cases, we provisioned our build-systems with the right libraries on a system-level (alternatively, using docker). Now using conan, this is all totally different – we want to avoid setting up too many dependencies on our build-system. The fewer dependencies they have, the less likely they will accidentally be used during compilation. This is crucial to implement portability of your artifacts.

Setting up the build-systems

The build systems still have to be provisioned. You will at least need conan and your compiler-suite installed. Whether to install CMake is a point of contention – since the CMake-Plugin for Jenkins can do that.

Setting up the build job

The first thing you usually need is to configure your remotes properly. One way to do this is to use conan config install command, which can synchronize remotes (or the whole of the conan config) from either a folder, a zip file or a git repository. Since I like to have stuff readable in plain text in my repository, I opt to store my remotes in a specific folder. Create a new folder in your repository. I use ci/conan_config in this example. In it, place a remotes.txt like this:

bincrafters https://api.bintray.com/conan/bincrafters/public-conan True
conan-center https://conan.bintray.com True

Note that conan needs a whole folder, you cannot read just this file. Your first command should then be to install these remotes:

conan config install ci/conan_config

Jenkins’ CMake for conan

The next step prepares for installing our dependencies. Depending on whether you’re building some of those dependencies (the --build option), you might want to have CMake available for conan to call. This is a problem when using the Jenkins CMake Plugin, because that only gives you cmake for its specific build steps, while conan simply uses the cmake executable by default. If you’re provisioning your build-systems with conan or not building any dependencies, you can skip this step.
One way to give conan access to the Jenkins CMake installation is to run a small CMake script via a “CMake/CPack/CTest execution” step and have it configure conan appropriatly. Create a file ci/configure_for_conan.cmake:

execute_process(COMMAND conan config set general.conan_cmake_program=\"${CMAKE_COMMAND}\")

Create a new “CMake/CPack/CTest execution” step with tool “CMake” and arguments “-P ci/configure_for_conan.cmake”. This will setup conan with the given cmake installation.

Install dependencies and build

Next run the conan install command:

mkdir build && cd build
conan install .. --build missing

After that, you’re ready to invoke cmake and the build tool with an additional “CMake Build” step. The build should now be up and running. But who am I kidding, the build is always red on first try 😉

Using protobuf with conan and CMake

In my last post, I showed how I got my feet wet while migrating the dependencies of my existing code-base to conan. The first major hurdle I saw coming when I started was adding something with a “special” build step, e.g. something like source-preprocessing. In my case, this was protobuf, where a special build-step converts .proto files to sources and headers.

In my previous solution, my devenv build scripts would install the protobuf converter binary to my devenv’s bin/ folder, which I then used to run my preprocessing. At first, it was not obvious how to do this with conan. It turns out that the lovely people and bincrafters made this pretty comfortable. conan_basic_setup() will add all required package paths to your CMAKE_MODULE_PATH, which you can use to include() some bundled CMake scripts that will either let you execute the protobuf-compiler via a target or run protobuf_generate to automagically handle the preprocessing. It’s probably worth noting, that this really depends on how the package is made. Conan does not really have an official way on how to handle this.

Let’s start with some sample code – Person.proto, like the sample from the protobuf website:

message Person {
  required string name = 1;
  required int32 id = 2;
  optional string email = 3;
}

And some sample code that uses it:

#include "Person.pb.h"

int main(int argn, char** argv)
{
  Person message;
  message.set_name("Hello Protobuf");
  std::cout << message.name() << std::endl;
}

Again, we’re using the bincrafters repository for our dependencies in a conanfile.txt:

[requires]
protobuf/3.6.1@bincrafters/stable
protoc_installer/3.6.1@bincrafters/stable

[options]

[generators]
cmake

Now we just need to wire it all up in the CMakeLists.txt

cmake_minimum_required(VERSION 3.0)
project(ProtobufTest)

include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake)
conan_basic_setup(TARGETS KEEP_RPATHS)

# This loads the cmake/protoc-config.cmake file
# from the protoc_installer dependency
include(cmake/protoc-config)

set(TARGET_NAME ProtobufSample)

# Just add the .proto files to the target
add_executable(${TARGET_NAME}
  Person.proto
  ProtobufTest.cpp
)

# Let this function to the magic
protobuf_generate(TARGET ${TARGET_NAME})

# Need to use protobuf, of course
target_link_libraries(${TARGET_NAME}
  PUBLIC CONAN_PKG::protobuf
)

# Make sure we can find the generated headers
target_include_directories(${TARGET_NAME}
  PUBLIC ${CMAKE_CURRENT_BINARY_DIR}
)

There you have it! Pretty neat, and all without a brittle find_package call.

Migrating an existing C++ codebase to conan

This is a bit of a battle report of migrating the dependencies in my C++ projects to use the conan package manager.
In the past weeks I have started to use conan in half a dozen both work and personal projects. Here’s my experiences so far.

Before

The first real project I started with was my personal game project. The “before” setup used a mixture if techniques to handle dependencies and uses CMake to do most of the heavy lifting.
Most dependencies reside in the “devenv”, which is a separate CMake project that I use to build and bundle the dependencies in a specific installation folder. It uses ExternalProject_Add for most parts (e.g. Boost, SDL, Lua, curl and OpenSSL), add_subdirectory for a few others (pugixml and lz4) and just install(FILES...) for a few header only libs like JSON for Modern C++, Catch2 and spdlog. It should be noted that there are relatively few interdependencies between the projects in there.
Because it is more convenient to update, I keep a few dependencies that I control myself directly in the source tree, either as git externals or just copies of the source files.
I try to keep usage of system dependencies to a minimum so that the resulting binary is more portable to the average gamer who does not want to know about libraries and dependencies and such nonsense. This setup has been has been mostly painless and working for my three platforms Windows, Linux and Mac – at least as long as I did not try to change it significantly.

Baby steps

Since not all my dependencies are available on conan and small iterations are usually more successful, I decided to proceed by changing only a single dependency to conan. For this dependency, it’s a good idea to pick something that does not have many compile-time options and is more or less platform agnostic. So I opted for boost over, e.g. SDL or wxWidgets. Boost was also one of the most painful dependencies to build, if only for the insane amount of files it produces and the time it takes to copy those ten-thousands of files to the install location.

Getting started..

There are currently two popular variants of boost available through conan. The “normal” variant on conan’s main repository/remote “conan-center” and a modular version that splits boost into its component libraries on the bincrafters remote, e.g. Boost.Filesystem. The modular version is more appealing conceptually, and I also had a better time getting it to work in my first tests, so I picked that. I did a quick grep for #include <boost/ through my code for an initial guess which boost libraries I needed to get and created a corresponding conanfile.txt in my project root.

[requires]
boost_filesystem/1.69.0@bincrafters/stable
boost_math/1.69.0@bincrafters/stable
boost_random/1.69.0@bincrafters/stable
boost_property_tree/1.69.0@bincrafters/stable
boost_assign/1.69.0@bincrafters/stable
boost_heap/1.69.0@bincrafters/stable
boost_optional/1.69.0@bincrafters/stable
boost_program_options/1.69.0@bincrafters/stable
boost_iostreams/1.69.0@bincrafters/stable
boost_system/1.69.0@bincrafters/stable

[options]
boost:shared=False

[generators]
cmake

Now conan plays really nice with “single configuration generators” like the new CMake/Ninja support in VS2017 and onward. Basically, just cd into your build dir and call something like conan install -s build_type=Debug -s build_type=x86 whenever you want to update dependencies. More info can be found in the official documentation. The workflow for CLion is essentially the same.

Using it in your build

After the last command, conan will download (or build) the dependencies and generate a file with all the corresponding paths.
To use it, include it from cmake like this:

include(${CMAKE_BINARY_DIR}/conanbuildinfo.cmake)
conan_basic_setup(TARGETS KEEP_RPATHS)

It will then provide targets for all the requested boost libraries that you can link to like this:

target_link_libraries(myTarget
  PUBLIC CONAN_PKG::boost_filesystem
)

I wanted to make sure that the compiler build using the new boost files and not the old ones. Because I have a generic include into my devenv that was still going to be in my compilers include-paths for all the other dependencies, so I just renamed boost’s header include folder on disc. After my first successful compile I felt confident enough to delete them.

First problems

There was one major problem: some of my in-source dependencies had their own claim on using boost via passed CMake variables, Boost_LIBRARY_DIRS and Boost_INCLUDE_DIR. I adapted their CMakeLists.txt to allow for injecting appropriate targets instead. Not the cleanest solution, but it got my builds green again fast.

There’s a still a lot to cover on this: The other platforms had their own quirks and I migrated way more than just this first project. Also, there is still ways to go for a full migration with my game project. But more on that in my next blog post…

A programmer’s report on the Global Game Jam

Last weekend was this year’s global game jam. A game jam is a type of event where the participants – the “jammers” – create a game in a fixed, usually quite short, amount of time with some additional restrictions like a theme. Usually, this means computer games, although you can also make board or pen and paper games.

A worldwide event

As the name implies, the global game jam is a worldwide event. Most bigger cities have “hubs”, i.e. jam locations, where the jammers gather at the beginning of the weekend in their local time zone. Then, a few motivating keynote videos are shown before the theme is announced. Last year’s theme was “Transmission”, while this year’s theme was “What Home means to you”. A game using that theme is to be created within the next 48-hours.

The pitch

People get a few minutes to brainstorm about the theme and maybe get an idea that they want to implement. There is kind of a small stage area, where people with those ideas pitch them to the others in hopes of getting a team assembled. Last year, a guy even made a short power-point presentation to pitch his idea.
While you can make a game on your own, that is pretty hard. You typically need at least programming, graphics, audio, music, project-management and game-design expertise on your team. It is rare to find that in a single person.

Implementation

So if you have a nice idea, or join a team with one – now implementation work starts.

Last year, my team quickly abandoned of using Unity as the engine for our terra-forming game, since no one on the team had any experience with it. We switched to Love2D, a neat little framework based on Lua. It is a very lightweight little framework and Lua is an elegant engine. Given that our team had no dedicated graphics people, I am pretty proud of the little game that we finished: Terraformer

This year, I joined another team and we picked the Godot engine to create our game. Godot comes with its own programming language called “Godot script”. The syntax looks like it borrows heavily from both Python and JavaScript. It is, however, quite pleasant to work with, except that most higher-order functional features that are all so common on most languages seem to be missing. But it does have coroutines, so there’s that.
Either way, working with godot was mostly pretty pleasant, except for its interaction with git. Just opening specific parts of the game in the godot editor would change files, and sometimes we’d get huge nonsensical changes, and merge conflicts, when someone on another platform (i.e. between Mac and Windows) pushed something.
We ended up making this little game: Home God
I’m particularly proud of the character movement controls. I think they turned out quite fluidly.

Not just for game programmers…

The global game jam is an awesome experience for non-game programmers looking to improve their craft. In fact, most participants in my hub where at most hobbyists, from what I could tell. The tight time-budget and the objective of just getting it to work will give many programmers a totally different perspective of how to do things. Things that steal your time will hurt a lot.

How to teach C++

In the closing Keynote of this year’s Meeting C++, Nicolai Josuttis remarked how hard it can be to teach C++ with its ever expanding complexity. His example was teaching rookies about initialization in C++, i.e. whether to use assignment =, parens () or curly braces {}. He also asked for more application-level programmers to participate.

Well, I am an application programmer, and I also have experience with teaching C++. Last year I held a C++ introductory course for experienced C programmers who mostly had never used C++ before. From my experience, I can completely agree with what Nico had to say about teaching C++. The language and its subtlety can be truly overwhelming.

Most of the complexity in C++ boils down to tuning your code for optimal performance. We cannot just leave that out, can we? After all, the sole reason to use C++ is performance, right?

My approach

Performance is one key reason for using C++, no doubt about it. There is a few more, but let us not get distracted. What is even more important is the potential to optimize for performance. But you can do that later! It is actually quite crazy what performance crimes you can get away with in C++ and still have something pretty fast overall, especially with move-semantics and improved RVO.

Given that, I picked a simple subset to start with:

Pass by-value only, do not use pointers nor references.
Structure your programs around simple data-only structs and functions transforming them.
Make good use of the data structures and algorithms from std.

Believe me, you too can write pretty useful programs this way. This is not too far from a data-oriented style anyways.
So, yes, we can actually leave out the performance specific parts for quite some time, while still making sure our programs can be optimized eventually.

And from there?

You can gradually start introducing references and move semantics, when measurement shows copying affecting the performance. This way you can introduce tooling like a profiler and see the effects off passing things around by reference in a language where everything, by default, is passed by value. These things will start to make sense in a context.

The arguably better approach to move semantics is of course types that prevent you from copying, but allow moving. Most resources with side-effects are like this: file handles, locks, threads. You will need to introduce RAII for this to make sense, e.g. writing ctors and dtors.

But you still do not need to write templates, use virtual-function polymorphism, or even pointers. But when you get to those, you will have a good context to use them.

Have you tought C++ and some experience to share? I would like to hear about it!

luabind deboostified tips and tricks

luabind deboostified is a fork of the luabind project that helps exposing APIs to Lua. As the name implies, it replaces the boost dependency with modern C++, which makes it a lot more pleasant to work with.

Here are a few tips and tricks I learned while working with it. Some tricks might be applicable to the original luabind – I do not know.

1. Splitting module registration

You can split the registration code for different classes. I usually add a register function per class, like this:

struct A {
  void doSomething();
  static luabind::scope registerWithLua();
};

struct B {
  void goodStuff();
  static luabind::scope registerWithLua();
};

You can then combine their registration code into a single module on the Lua side:

void registerAll(lua_State* L) {
  luabind::module(L)[
    A::registerWithLua(),
    B::registerWithLua()];
}

The implementation of a registration function looks like this:

luabind::scope A::registerWithLua()
{
  return luabind::class_<A>("A")
    .def("doSomething", &A::doSomething);
}

2. Multiple policies and multiple return values

Unlike C++, Lua has real multiple return values. You can use that by utilizing the return value policies that luabind offers. Lets say, you want to write this in Lua:

local x, y = a.getPosition()

The C++ side could look like this:

void getPosition(A const& a, float& x, float& y);

The deboostified fork needs its policies supplied in a type list. Let’s use a small helper meta-function to build that:

template <typename... T>
using joined = 
  typename luabind::meta::join<T...>::type;

Once you have that, you can expose it like this:

luabind::def("getPosition", &getPosition,
              joined<
                luabind::pure_out_value<2>,
                luabind::pure_out_value<3>
              >());

3. Specialized data structures using `luabind::object`

Using the converters in luabind is not the only way to make Lua values from C++. Almost everything you can do in Lua itself, you can do with luabind::object. Here is a somewhat contrived example:

luabind::object repeat(luabind::object what,
                       int count) {
  // Create a new table object
  auto result = luabind::newtable(
    what.interpreter());
  // Fill it as an array [1..N]
  for (int i = 1; i <= count; ++i)
    result[i] = what;
  return result;
}

This function can then be exported via luabind::def and used just like any other function. This is just the tip of the iceberg, though. For example, you can also write functions that, at runtime, behave differently when a number is passed in as when a table is passed in. You can find out the Lua type with luabind::type(myObject).

Of course, as soon as you want to create new objects to return to Lua, you need the lua_State pointer in that function. Using the interpreter from a passed-in luabind::object is one way, but I have yet to find another pleasant way to do this. It is probably possible to use the policies to do this, and have them pass that in as a special parameter, but for now I am using some complicated machinery to bind lambda functions that capture the Lua interpreter.

That’s it for now..

Keep in mind that these are not thoroughly researched best-practices, but patterns I have used to solve actual problems. There might be better solutions out there – if you know any, please let me know. Hope this helped!

Game Optimization Resolved

In my last blog post, I explained a performance problem in my game abstractanks but not how I solved it.

So I had not done any optimization work in a while, so the first thing I did turned out to be an error. And not only in hindsight – I actually knew how to tackle a problem like that – I just temporarily forgot at that point.

Going down the rabbit hole

Where we left off, my profiler showed FriendlyUnitOccupies as the culprit. That function basically does circle/circle collision detection using a quad-tree as the spatial acceleration structure. Looking at the samples from my profiler, I could see that that it was descending into the tree quite deeply. Like all tree structures, a quad-tree does pointer-chasing which is very bad for modern CPUs. So I figured I should look at how to optimize that. The data structure was implemented in a hurry, so there seemed plenty to do:

Instead of recursing into each node, use tail-call optimization and early culling to speed up traversal.
Pre-cache the query with the max-search radius and the other requirements to the units, e.g. not dead, same team, etc.. and then use that to build a new tree for the actual queries.

Because the data structure was pretty non-generic, I started to basically rewrite it to use it in this scenario. While I was about half way through with that, it dawned on me that I was barking at the wrong tree.

Taking a step back

The excellent book Video Game Optimization has some great advice on which level to attack an optimization problem.

System-level. Can you change the system to do something differently and still solve your problem?
Algorithm-level. Are you using the ~~most efficient~~ right algorithm for the data you have?
Micro-level. Are you not wasting any processing power on the lower levels?

I was already on the algorithm level. So I went back to the systemic level: What if the AI did not try to change the target position that often, maybe just every few seconds? That effectively meant lowering the AIs APM. It’s not a bad solution, especially since that makes the AI behave more human. But on the other hand, real-time games, as the name implies, have a soft real-time requirement. So you generally like to avoid huge workloads that go over your frame budget. With how slow the algorithm was, that could easily be the case. The solution is then to do the work concurrently, either by splitting it up or doing it in the background. Both solutions seemed difficult, since the AI code does currently not allow for easy concurrency. So that idea was out.

What if the parking-positions where cached? Subsequent calls to get parking positions could probably reuse a lot of the positions that were computed in previous frames, given that the target point only moves by a little bit each frame. I figured that might work, but it requires more housekeeping and data-dependencies – the result of the previous query needs to be used for the next. That seemed complex and therefore brittle.

A Solution?

Temporal coherency was a pretty good idea though, but not the scale was to big this time. What if I exploited it within a single frame? Now the original code did obscure this, but maybe it gets a little more clear if I write it like this:

optional<v2> GameWorld::FindFreePosition(v2 Center, std::vector<v2> const& Occupied)
{
  auto CheckPosition = [&](v2 Candiate)
  {
    if (!IsPassable(Candidate))
      return false;

    if (OverlapsWith(Occupied))
      return false;

    return !FriendlyUnitOccupies(Candidate);  
  };
  auto Samples = SampledPositions(Center, SomeRandomness());
  auto Found = find_if(SampledPositions.begin(), SampledPositions.end(), CheckPosition(Position));
  
  return (Found != SampledPositions.end()) ? *Found : none;
}

Now as I explained in the previous post, this was called in a loop for each unit to be parked.

std::vector<v2> GameWorld::FindParkingPositions(v2 Center, std::size_t N)
{
  std::vector<v2> Results;
  for (std::size_t i = 0; i < N; ++i)
  {
    auto MaybePosition = FindFreePosition(Center, Results);
    if (!MaybePosition) // No more free space?
      break;
    Results.push_back(*MaybePosition);
  }
  return Results;
}

Easy to see: counting the number of CheckPosition calls, this algorithm is O(n) in number of sampled positions. The number of sampled positions depends linearly on the number of units to be parked, because more units obviously need more parking positions, essentially making this O(n²) for the unit count! But the positions get resampled for each unit – with the only change being the little bit of randomness that is injected everytime. In other words, each call would just test false for sampled positions roughly corresponding to the units that are already placed.

So what I did was a very small change: only inject the randomness once and merge the loops:

auto Samples = SampledPositions(Center, SomeRandomness());
std::vector<v2> Results;

for (auto const& Sample : Samples)
{
  if (CheckPosition(Sample))
    Results.push_back(Sample);

  if (Result.size() >= N)
    break;
}
return Results;

And this did the trick! The algorithm’s run-time when below the 1ms range, and the smaller variation in randomness is not really visible.

Conslusions

I was thrown off-track be the false conclusion that CheckPositions was too slow when it was in fact just called too often. Context is key! Always approach these things outside-in.
Using less-than-optimal abstractions obscured the opportunity to hoist out the sample generation from me. Iteration is always a separate concern, even when it is not on containers!

	Writing Integration… on Every Unit Test Is a Stage Pla…
	mariuselvert on C# is very strict about modify…
	Anonymous on C# is very strict about modify…
	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…