A Game Optimization War Story

As our customers surely know, I’m not working here on fridays. This is because that’s the time I allocate to my side project, an arcade real-time strategy game called abstractanks. It is a passion project above all else, but of course, I am also learning a lot, much of which I can apply to my “day job” here as well. Today I want to share the story of how I optimized a critical bit of code in that game.

The Big Slowdown

While working on scripted missions, one main element I am using is to make a group of units attack when you enter an area (a.k.a. a zone-trigger). This seems easy enough, but was causing massive slowdowns as soon as the enemy group started moving. My average logic frame-time jumped from 0.3 ms to more than 1500 ms, which essentially makes the game unplayable. When seeing a performance problem, your first instinct should always be to profile it. So I booted up WPR/WPA and did just that. Once I had the profile, I followed the most-sampled path in the stack and found my way to the supposed culprit: the parking algorithm.

Context

When optimizing, you need as much context as you possible to find the best possible course of action. So let me explain how that algorithm fits into the broader picture.

Parking

My main game-mechanic is moving around your units. You do this by selecting a group and then clicking somewhere on the map to issue the move-order. In addition to path-finding process, this also runs an algorithm I call park-planning (as in parking a car). It makes sure that the units know to position themselves around the target point in a roughly circular shape once they arrive. It is essential to the interaction of this mechanic with the capturing of objectives, which are circular as well. Before this was implemented, the units would just decelerate after passing the target point. This caused them to “overshot” and miss the objectives, which was frustrating to the players: they clicked in the right place, but the units would not stop there, but slightly behind it. To make things worse, units arriving later, would bump into those that were already there, further pushing them away and clumping up.

AI Moving

In my particular case, the AI enemy was repeatedly issuing move-orders to close in on the intruder – the player. Since the player group usually also moved, the AI was trying to adapt by changing the move order every frame (effectively working at around 2000 APMs).

Diving into the code

My park-planning implementation is divided into two steps: finding enough parking spots, and then assigning units to it. The profiler was showing that the first part was the problem while the assignment was negligible in terms of run-time. Historically, the first step was reusing and extending some code I first wrote for spawning units, which worked like this:

optional<v2> GameWorld::FindFreePosition(v2 Center, std::vector<v2> const& Occupied)
{
  auto CheckPosition = [&](v2 Candiate)
  {
    if (!IsPassable(Candidate))
      return false;

    if (OverlapsWith(Occupied))
      return false;

    return !FriendlyUnitOccupies(Candidate);   
  };

  if (CheckPosition(Center))
    return Center;

  auto Radius = UNIT_SIZE;
  while (Radius < MAX_SEARCH_RADIUS)
  {
    // Roll a random starting angle
    auto AngleOffset = RandomAngle();
    auto Angle = 0.f;
    while (Angle < 2*Pi)
    {
      auto Candidate = Center + AngleVector(Angle + AngleOffset)*Radius;
      if (CheckPosition(Candidate))
        return Candidate;

      // Move along this circle
      Angle += 2*Pi*Radius / UNIT_SIZE / OVERSAMPLING_FACTOR;
    }

    // Increase the Radius
    Radius += UNIT_SIZE;
  }
  return none;  
}

Note that all the functions in the CheckPosition lambda are “size aware” and respect the UNIT_SIZE – so they are slightly more complex than what the pseudo-code here would have you believe.
The occupied parameter was added for the parking-position finding. It successively fills up the std::vector with positions and uses them once it found enough.

Back to the profiling results: They were showing that most of the time was spent in the FriendlyUnitOccupies, followed by IsPassable and and then OverlapsWith. FriendlyUnitOccupies dominated the time by about 8x times the rest. That function uses a quad-tree to accelerate spatial queries for other units.

Next steps

Obviously, this code uses pretty simplistic approach to the problem – basically just brute-forcing it. But that’s good now there are many different paths to take, many optimization opportunities. My approach was a relatively simple change that got the frame time back down below 1 ms, but before I did that, I considered many and tested a few other different approaches. I will talk about that in detail in my next post. How would you approach this?

Analysing a React web app using SonarQube

Many developers especially from the Java world may know the code analysis platform SonarQube (formerly SONAR). While its focus was mostly integration all the great analysis tools for Java the modular architecture allows plugging tools for other languages to provide linter results and code coverage under the same web interface.

We are a polyglot bunch and are using more and more React in addition to our Java, C++, .Net and “what not” projects. Of course we would like the same quality overview for these JavaScript projects as we are used to in other ecosystems. So I tried SonarQube for react.

The start

Using SonarQube to analyse a JavaScript project is as easy as for the other languages: Just provide a sonar-project.properties file specifying the sources and some paths for analysis results and there you go. It may look similar to the following for a create-react-app:

sonar.projectKey=myproject:webclient
sonar.projectName=Webclient for my cool project
sonar.projectVersion=0.3.0

#sonar.language=js
sonar.sources=src
sonar.exclusions=src/tests/**
sonar.tests=src/tests
sonar.sourceEncoding=UTF-8

#sonar.test.inclusions=src/tests/**/*.test.js
sonar.coverage.exclusions=src/tests/**

sonar.junit.reportPaths=test-results/test-report.junit.xml
sonar.javascript.lcov.reportPaths=coverage/lcov.info

For the coverage you need to add some settings to your package.json, too:

{ ...
"devDependencies": {
"enzyme": "^3.3.0",
"enzyme-adapter-react-16": "^1.1.1",
"eslint": "^4.19.1",
"eslint-plugin-react": "^7.7.0",
"jest-junit": "^3.6.0"
},
"jest": {
"collectCoverageFrom": [
"src/**/*.{js,jsx}",
"!**/node_modules/**",
"!build/**"
],
"coverageReporters": [
"lcov",
"text"
]
},
"jest-junit": {
"output": "test-results/test-report.junit.xml"
},
...
}

This is all nice but the set of built-in rules for JavaScript is a bit thin and does not fit React apps nicely.

ESLint to the recue

But you can make SonarQube use ESLint and thus become more useful.

First you have to install the ESLint Plugin for SonarQube from github.

Second you have to setup ESLint to your liking using eslint --init in your project. That results in a eslintrc.js similar to this:

module.exports = {
  'env': {
    'browser': true,
    'commonjs': true,
    'es6': true
  },
  'extends': 'eslint:recommended',
  'parserOptions': {
    'ecmaFeatures': {
      'experimentalObjectRestSpread': true,
      'jsx': true
    },
    'sourceType': 'module'
  },
  'plugins': [
    'react'
  ],
  'rules': {
    'indent': [
      'error',
      2
    ],
    'linebreak-style': [
      'error',
      'unix'
    ],
    'quotes': [
      'error',
      'single'
    ],
    'semi': [
      'error',
      'always'
    ]
  }
};

Lastly enable the ESLint ruleset for your project in sonarqube and look at the results. You may need to tune one thing or another but you will get some useful static analysis helping you to improve your code quality further.

Analyzing gradle projects using SonarQube without gradle plugin

SonarQube makes static code analysis easy for a plethora of languages and environments. In many of our newer projects we use gradle as our buildsystem and jenkins as our continuous integration server. Integrating sonarqube in such a setup can be done in a couple of ways, the most straightforward being

  • Integrating SonarQube into your gradle build and invoke the gradle script in jenkins
  • Letting jenkins invoke the gradle build and execute the SonarQube scanner

I chose the latter one because I did not want to add further dependencies to the build process.

Configuration of the SonarQube scanner

The SonarQube scanner must be configured by property file called sonar-project.properties by default:

# must be unique in a given SonarQube instance
sonar.projectKey=domain:project
# this is the name and version displayed in the SonarQube UI. Was mandatory prior to SonarQube 6.1.
sonar.projectName=My cool project
sonar.projectVersion=23

sonar.sources=src/main/java
sonar.tests=src/test/java
sonar.java.binaries=build/classes/java/main
sonar.java.libraries=../lib/**/*.jar
sonar.java.test.libraries=../lib/**/*.jar
sonar.junit.reportPaths=build/test-results/test/
sonar.jacoco.reportPaths=build/jacoco/test.exec

sonar.modules=application,my_library,my_tools

# Encoding of the source code. Default is default system encoding
sonar.sourceEncoding=UTF-8
sonar.java.source=1.8

sonar.links.ci=http://${my_jenkins}/view/job/MyCoolProject
sonar.links.issue=http://${my_jira}/browse/MYPROJ

After we have done that we can submit our project to the SonarQube scanner using the jenkins SonarQube plugin and its “Execute SonarQube Scanner” build step.

Optional: Adding code coverage to our build

Even our gradle-based projects aim to be self-contained. That means we usually do not use repositories like mavenCentral for our dependencies but store them all in a lib directory along the project. If we want to add code coverage to such a project we need to add jacoco in the version corresponding to the jacoco-gradle-plugin to our libs in build.gradle:

allprojects {
    apply plugin: 'java'
    apply plugin: 'jacoco'
    sourceCompatibility = 1.8

    jacocoTestReport {
        reports {
            xml.enabled true
        }
        jacocoClasspath = files('../lib/org.jacoco.core-0.7.9.jar',
            '../lib/org.jacoco.report-0.7.9.jar',
            '../lib/org.jacoco.ant-0.7.9.jar',
            '../lib/asm-all-5.2.jar'
        )
    }
}

Gotchas

Our jenkins build job consists of 2 steps:

  1. Execute gradle
  2. Submit project to SonarQube’s scanner

By default gradle stops execution on failure. That means later tasks like jacocoTestReport are not executed if a test fails. We need to invoke gradle with the --continue switch to always run all of our tasks.

Analyzing iOS crash dumps with Xcode

The best way to analyze a crash in an iOS app is if you can reproduce it directly in the iOS simulator in debug mode or on a local device connected to Xcode. Sometimes you have to analyze a crash that happened on a device that you do not have direct access to. Maybe the crash was discovered by a tester who is located in a remote place. In this case the tester must transfer the crash information to the developer and the developer has to import it in Xcode. The iOS and Xcode functionalities for this workflow are a bit hidden, so that the following step-by-step guide can help.

Finding the crash dumps

iOS stores crash dumps for every crash that occured. You can find them in the Settings app in the deeply nested menu hierarchy under Privacy -> Analytics -> Analytics Data.

There you can select the crash dump. If you tap on a crash dump you can see its contents in a JSON format. You can select this text and send it to the developer. Unfortunately there is no “Select all” option, you have to select it manually. It can be quite long because it contains the stack traces of all the threads of the app.

Importing the crash dump in Xcode

To import the crash dump in Xcode you must save it first in a file with the file name extension “.crash”. Then you open the Devices dialog in Xcode via the Window menu:

To import the crash dump you must have at least one device connected to your Mac, otherwise you will find that you can’t proceed to the next step. It can be any iOS device. Select the device to open the device information panel:

Here you find the “View Device Logs” button to open the following Device Logs dialog:

To import the crash dump into this dialog select the “All Logs” tab and drag & drop the “.crash” file into the panel on the left in the dialog.

Initially the stack traces in the crash dump only contain memory addresses as hexadecimal numbers. To resolve these addresses to human readable symbols of the code you have to “re-symbolicate” the log. This functionality is hidden in the context menu of the crash dump:

Now you’re good to go and you should finally be able to find the cause of the crash.

Developer power tools: Big O = big impact

Scalability and performance are often used interchangeable but they are very different. The resource consumption of a program is its performance. So performance looks at a specific part of a program, say the user activating an action and tries to fix the input and circumstances.
Scalability defines how the resource consumption of a program changes when additional resources are added or the input size increases. So a program can have good performance (it runs fast) but when the load increases it can behavely badly (a bad scalability). Take bubble sort for example. Since the algorithm has low overhead it runs fast with small arrays but when the arrays get bigger the runtime gets worse. It doesn’t scale.
There are many ways to improve scalability here we look at one particular: reducing algorithmic complexity. The time complexity of an algorithm is measured with the big T notation. The T notation describes the time behavior of the algorithm when the input size changes. Say you have an algorithm that takes n objects and needs 4 times as long with an overhead of 2. This would be:

T(n) = 4n + 2

Often the correct numbers are not necessary, you just want to have a magnitude. This is where the big O notation comes into play. It describes the asymptotical upper bound or just the behaviour of the fastest growing part. In the above case this would be:

T(n) = 4n + 2 = O(n)

We call such an algorithm linear because the resource consumption grows linear with the increase in input size. Why does this matter? Let’s look at an example.

Say we have a list of numbers and want to know the highest. How long would this take? A straight forward implementation would look like this:

int max = Integer.MIN_VALUE;
for (int number : list) {
  if (number > max) {
    max = number;
  }
}

So we need if the size of the number list is n we need n compare operations. So our algorithm is linear. What if we have two lists of length n? We just our algorithm twice and compare both maximums so it would be

T(n) = 2n + 1 = O(n)

also linear.
Why does this all matter? If our algorithm runs quadratic, so O(n^2) and our input size is n = 1000, we need a multitude of 1 million operations. If it is linear it just needs a multitude of 1000 operations, much less. Reducing the complexity of an algorithm can be business critical or the difference between getting an instant result or losing a customer because he waited too long.
Time complexity isn’t the only complexity you can measure other resources like space can also be measured with big O. I hope this clears some questions and if you have further ones please feel free and leave a comment.

A mindset for inherited source code

One field of expertise our company provides is the continuation of existing software projects. While this sounds very easy to accomplish, in reality, there are a few prerequisites that a software project has to provide to be continuable. The most important one is the source code of the system, obviously. If the source code is accessible (this is a problem more often than you might think!), the biggest hurdle is now the mindset and initial approach of the developers that inherit it.

The mindset

Most developers have a healthy “greenfield” project mindset. There is a list of requirements, so start coding and fulfill them. If the code obstructs the way to your goal, you reshape it in a meaningful manner. The more experience you have with developing software, the better the resulting design and architecture of the code will be. Whether you apply automatic tests to your software (and when) is entirely your decision. In short: You are the master of the code and forge it after your vision. This is a great mindset for projects in the early phases of development. But it will actively hinder you in later phases of your project or in case you inherit foreign code.

For your own late-phase projects and source code written by another team, another mindset provides more value. The “brownfield” metaphor doesn’t describe the mindset exactly. I have three metaphors that describe parts of it for me: You’ll need to be an archeologist, a forensicist (as in “securer of criminal evidence”) and a minefield clearer. If you hear the word archeologist, don’t think of Indiana Jones, but of somebody sitting in the scorching desert, clearing a whole football field from sand with only a shaving brush and his breath. If you think about being a forensicist, don’t think of your typical hero criminalist who rearranges the photos of the crime scene to reveal a hidden hint, but the guy in a white overall who has to take all the photos without disturbing the surrounding (and being disturbed by it). If you think about the minefield clearer: Yes, you are spot on. He has to rely on his work and shouldn’t move too fast in any direction.

The initial approach

This sets the scene for your initial journey inside foreign source code: Don’t touch anything or at least be extra careful, only dust it off in the slightest possible manner. Watch where you step in and don’t get lost. Take a snapshot, mental or written, of anything suspicious you’ll encounter. There will be plenty of temptation to lose focus and instantly improve the code. Don’t fall for it. Remember the forensicist: what would the detective in charge of this case say if you “improved the scenery a bit” to get better photos? This process reminds me so much of a common approach to the game “Minesweeper” that I included the minefield clearer in the analogy. You start somewhere on the field and mark every mine you indirectly identify without ever really revealing them.

Most likely, you don’t find any tests or an issue tracker where you can learn about the development history. With some luck, you’ll have a commit history with meaningful comments. Use the blame view as often as you can. This is your archeological skills at work: Separating layers and layers of code all mingled in one place. A good SCM system can clear up a total mess for you and reveal the author’s intent for it. Without tests, issues and versioning, you cannot distinguish between a problem and a solution, accidental and deliberate complexity or a bug and a feature. Everything could mean something and even be crucial for the whole system or just be useless excess code (so-called “live weight” because the code will be executed, but with no effect in terms of features). To name an example, if you encounter a strange sleep() call (or multiple calls in a row), don’t eliminate or change them! The original author probably “fixed” a nasty bug with it that will come back sooner than you know it.

Walking on broken glass

And this is what you should do: Leave everything in its place, broken, awkward and clumsy, and try to separate your code from “their” code as much as possible. The rationale is to be able to differentiate between “their” mess and “your” mess and make progress on your part without breaking the already existing features. If you cannot wait any longer to clean up some of the existing code, make sure to release into production often and in a timely manner, so you still know what changed if something goes wrong. If possible, try to release two different kinds of new versions:

  • One kind of new version only incorporates refactorings to the existing code. If anything goes wrong or seems suspicious, you can easily bail out and revert to the previous version without losing functionality.
  • The other kind only contains new features, added with as little change to existing code as possible. Hopefully, this release will not break existing behaviour. If it does, you should double-check your assumptions about the code. If reasonably achievable, do not assume anything or at least write an automatic test to validate your assumption.

Personally, I call this approach the “tick-tock” release cycle, modelled after the release cycle of Intel for its CPUs.

Changing gears

A very important aspect of software development is to know when to “change gears” and switch from greenfield to brownfield or from development to maintainance mode. The text above describes the approach with inherited code, where the gear change is externally triggered by transferring the source code to a new team. But in reality, you need to apply most of the practices on your own source code, too. As soon as your system is in “production”, used in the wild and being filled with precious user data, it changes into maintainance mode. You cannot change existing aspects as easily as before.
In his book “Implementation Patterns” (2008), Kent Beck describes the development of frameworks among other topics. One statement is:

While in conventional development reducing complexity to a minimum is a valuable strategy for making the code easy to understand, in framework development it is often more cost-effective to add complexity in order to enhance the framework developer’s ability to improve the framework without breaking client code.
(Chapter 10, page 118)

I not only agree with this statement but think that it partly applies to “conventional development” in maintainance mode, too. Sometimes, the code needs additional complexity to cope with existing structures and data. This is the moment when you’ve inherited your own code.

Clang, The Friendly Compiler

A while back I suggested to make friends with your compiler as a basis for developing high quality code. My focus then was GCC since it was and still is the compiler I use most of the time. Well, turns out that although GCC may be a reasonably good companion on the C/C++ development road, there are better alternatives.

Enter Clang: I had heard about Clang a few times in the past but never gave it a real shot. That changed after I watched Chandler Carruth’s talk at GoingNative 2012.

First of all I was stunned by the quote from Richard Stallman about GCC being deliberatly designed to make it hard to use it in non-free software. I always wondered why IDEs like KDevelop keep reinventing the wheel all the time by implementing their own C/C++ parsers instead of using already existing and free GCC code. This was the answer: THEY SIMPLY COULDN’T!!

One main point of Chandler’s talk was the quality of diagnostic messages of Clang. GCC is a friend that although telling you exactly what’s wrong with your code, it often does it with complicated sentences hidden in walls of text.

Clang on the other hand, tries very hard to comprehend what you really wanted to write, it speaks in much more understandable words and shows you the offending code locations with nice graphics.

You could say that compared to Clang, which is empathic, understanding, pragmatic and always tries to be on the same page with you, GCC comes across more like an arrogant, self-pleasing and I’m-more-intelligent-than-you kinda guy.

Where GCC says: “What? That should be a template instantiation? Guess what, you’re doing WRONG!! “, Clang is more like: “Ok my friend, now let’s sit down together and analyse step-by-step what’s the problem here. I’ll make us tea.

You’ll find many examples of Clangs nice diagnostic output in Chandler’s talk. Here is another one, as a little teaser:

struct A
{
  std::string _str1;
  std::string _str2;
};

struct AHasher
{
  std::size_t operator() (const A& a)
  {
    return std::tr1::hash()(a._str1) ^
      std::tr1::hash()(a._str2);
  }
};
...
typedef std::tr1::unordered_map<A, int> AMap;
...

What’s wrong with this code? Yes, exactly: the operator in AHasher must be const. Errors with const correctness are typical, easy-to-overlook kind of problems in day-to-day programming. GCCs reaction to something like that is that something discards qualifiers. This may be perfectly right, and after a while you even get used to it. But as you can see with Clang, you can do much better.

The following two screenshots directly compare GCCs and Clangs output compiling the code above. Because there is a template instantiation involved, GCC covers you in its typical wall of text, before it actually tells you what’s wrong (last line).

CLang’s output is much better formated, it shows you the template instantiation steps much more cleanly and in the last line it tells you to the point what is really wrong: …but method is not marked const. Yeah!