luabind deboostified tips and tricks

luabind deboostified is a fork of the luabind project that helps exposing APIs to Lua. As the name implies, it replaces the boost dependency with modern C++, which makes it a lot more pleasant to work with.

Here are a few tips and tricks I learned while working with it. Some tricks might be applicable to the original luabind – I do not know.

1. Splitting module registration

You can split the registration code for different classes. I usually add a register function per class, like this:

struct A {
  void doSomething();
  static luabind::scope registerWithLua();
};

struct B {
  void goodStuff();
  static luabind::scope registerWithLua();
};

You can then combine their registration code into a single module on the Lua side:

void registerAll(lua_State* L) {
  luabind::module(L)[
    A::registerWithLua(),
    B::registerWithLua()];
}

The implementation of a registration function looks like this:

luabind::scope A::registerWithLua()
{
  return luabind::class_<A>("A")
    .def("doSomething", &A::doSomething);
}

2. Multiple policies and multiple return values

Unlike C++, Lua has real multiple return values. You can use that by utilizing the return value policies that luabind offers. Lets say, you want to write this in Lua:

local x, y = a.getPosition()

The C++ side could look like this:

void getPosition(A const& a, float& x, float& y);

The deboostified fork needs its policies supplied in a type list. Let’s use a small helper meta-function to build that:

template <typename... T>
using joined = 
  typename luabind::meta::join<T...>::type;

Once you have that, you can expose it like this:

luabind::def("getPosition", &getPosition,
              joined<
                luabind::pure_out_value<2>,
                luabind::pure_out_value<3>
              >());

3. Specialized data structures using luabind::object

Using the converters in luabind is not the only way to make Lua values from C++. Almost everything you can do in Lua itself, you can do with luabind::object. Here is a somewhat contrived example:

luabind::object repeat(luabind::object what,
                       int count) {
  // Create a new table object
  auto result = luabind::newtable(
    what.interpreter());
  // Fill it as an array [1..N]
  for (int i = 1; i <= count; ++i)
    result[i] = what;
  return result;
}

This function can then be exported via luabind::def and used just like any other function. This is just the tip of the iceberg, though. For example, you can also write functions that, at runtime, behave differently when a number is passed in as when a table is passed in. You can find out the Lua type with luabind::type(myObject).

Of course, as soon as you want to create new objects to return to Lua, you need the lua_State pointer in that function. Using the interpreter from a passed-in luabind::object is one way, but I have yet to find another pleasant way to do this. It is probably possible to use the policies to do this, and have them pass that in as a special parameter, but for now I am using some complicated machinery to bind lambda functions that capture the Lua interpreter.

That’s it for now..

Keep in mind that these are not thoroughly researched best-practices, but patterns I have used to solve actual problems. There might be better solutions out there – if you know any, please let me know. Hope this helped!

Using Ansible vault for sensitive data

We like using ansible for our automation because it has minimum requirements for the target machines and all around infrastructure. You need nothing more than ssh and python with some libraries. In contrast to alternatives like puppet and chef you do not need special server and client programs running all the time and communicating with each other.

The problem

When setting up remote machines and deploying software systems for your customers you will often have to use sensitive data like private keys, passwords and maybe machine or account names. On the one hand you want to put your automation scripts and their data under version control and use them from your continuous integration infrastructure. On the other hand you do not want to spread the secrets of your customers all around your infrastructure and definately never ever in your source code repository.

The solution

Ansible supports encrypting sensitive data and using them in playbooks with the concept of vaults and the accompanying commands. Setting it up requires some work but then usage is straight forward and works seamlessly.

The high-level conversion process is the following:

  1. create a directory for the data to substitute on a host or group basis
  2. extract all sensitive variables into vars.yml
  3. copy vars.yml to vault.yml
  4. prefix variables in vault.yml with vault_
  5. use vault variables in vars.yml

Then you can encrypt vault.yml using the ansible-vault command providing a password.

All you have to do subsequently is to provide the vault password along with your usual playbook commands. Decryption for playbook execution is done transparently on-the-fly for you, so you do not need to care about decryption and encryption of your vault unless you need to update the data in there.

The step-by-step guide

Suppose we want work on a target machine run by your customer but providing you access via ssh. You do not want to store your ssh user name and password in your repository but want to be able to run the automation scripts unattended, e.g. from a jenkins job. Let us call the target machine ceres.

So first you setup the directory structure by creating a directory for the target machine called $ansible_script_root$/host_vars/ceres.

To log into the machine we need two sensitive variables: ansible_user and ansible_ssh_pass. We put them into a file called $ansible_script_root$/host_vars/ceres/vars.yml:

ansible_user: our_customer_ssh_account
ansible_ssh_pass: our_target_machine_pwd

Then we copy vars.yml to vault.yml and prefix the variables with vault_ resulting in $ansible_script_root$/host_vars/ceres/vault.yml with content of:

vault_ansible_user: our_customer_ssh_account
vault_ansible_ssh_pass: our_target_machine_pwd

Now we use these new variables in our vars.xml like this:

ansible_user: "{{ vault_ansible_user }}"
ansible_ssh_pass: "{{ vault_ansible_ssh_pass }}"

Now it is time to encrypt the vault using the command

ANSIBLE_VAULT_PASS="ourpwd" ansible-vault encrypt host_vars/ceres/vault.yml

resulting a encrypted vault that can be put in source control. It looks something like

$ANSIBLE_VAULT;1.1;AES256
35323233613539343135363737353931636263653063666535643766326566623461636166343963
3834323363633837373437626532366166366338653963320a663732633361323264316339356435
33633861316565653461666230386663323536616535363639383666613431663765643639383666
3739356261353566650a383035656266303135656233343437373835313639613865636436343865
63353631313766633535646263613564333965343163343434343530626361663430613264336130
63383862316361363237373039663131363231616338646365316236336362376566376236323339
30376166623739643261306363643962353534376232663631663033323163386135326463656530
33316561376363303339383365333235353931623837356362393961356433313739653232326638
3036

Using your playbook looks similar to before, you just need to provide the vault password using one of several options like specifying a password file, environment variable or interactive input. In our example we just use the environment variable inline:

ANSIBLE_VAULT_PASS="ourpwd" ansible-playbook -i inventory work-on-customer-machines.yml

After setting up your environment appropriately with a password file and the ANSIBLE_VAULT_PASSWORD_FILE environment variable your playbook commands are exactly the same like without using a vault.

Conclusion

The ansible vault feature allows you to safely store and use sensitive data in your infrastructure without changing too much using your automation scripts.

Client-side web development: Drink the Kool-Aid or be cautious?

Client side web development is a fast-changing world. JavaScript libraries and frameworks come and go monthly. A couple of years ago jQuery was a huge thing, then AngularJS, and nowadays people use React or Vue.js with a state container like Redux. And so do we for new projects. Unfortunately, these modern client-side frameworks are based on the npm ecosystem, which is notoriously known for its dependency bloat. Even if you only have a couple of direct dependencies the package manager lock file will list hundreds of indirect dependencies. Experience has shown that lots of dependencies will result in a maintenance burden as time passes, especially when you have to do major version updates. Also, as mentioned above, frameworks come and then go out of fashion, and the maintainers of a framework move on to their next big thing pet project, leaving you and your project sitting on a barely or no longer maintained base, and frameworks can’t be easily replaced, because they tend to permeate every aspect of your application.

With this frustrating experience in mind we recently did an experiment for a new medium sized web project. We avoided frameworks and the npm ecosystem and only used JavaScript libraries with no or very few indirect dependencies, which really were necessary. Browsers have become better at being compatible to web standards, at least regarding the basics. Libraries like jQuery and poly-fills that paper over the incompatibilities can mostly be avoided — an interesting resource is the website You Might Not Need jQuery.

We still organised our views as components, and they are communicating via a very simple event dispatcher. Some things had to be done by foot, but not too much. It works, although the result is not as pure as it would have been with declarative views as facilitated by React and a functional state container like Redux. We’re still fans of the React+Redux approach and we’re using it happily (at least for now) for other projects, but we’re also skeptical regarding the long term costs, especially from relying on the npm ecosystem. Which approach will result in less maintenance burden? We don’t know yet. Time will tell.

Books and talks that shaped my mind as a developer

Over the years I’ve read many books and watched many talks but a few stand out (at least for me) that influenced me in my development career.

The inmates are running the asylum by Alan Cooper
This book opened my eyes that I approached software development completely from the wrong standpoint: the software should serve the user not vice versa.

Design Patterns by Erich Gamma et al
Oh others use the same patterns as me and what? you can even talk about it without explaining every detail…

Refactoring by Martin Fowler
This book taught me that you can change the structure and the design of the software without changing its function. Cool.

Inventing on Principle by Bret Victor
Seeing a new way of interacting with your software in development blew my mind. Think WYSIWYG on steroids.

Getting real by 37signals
Getting to the core of what is essential and what really needs to be done in software/product development is laid out here so clear and stripped down that it struck me.

Information visualization by Edward Tufte
Another book which reduces its topic (this time: presenting information) to the core and by this identifying so much unnecessary practice that it hurts.

Start with why by Simon Sinek
Purpose. Why do you develop software? Why do I arrange an UI or the architecture of an application? This is what design is about.

Only openings by Frank Chimero
Do I try to eliminate failures and therefore options or do I leave the possibility to the user to choose…

Web design is 95% typography by Oliver Reichenstein
Concentrate on the main part, the bigger part, the 95%. If you get them right the rest isn’t so important after all.

Discount usability by Jakob Nielsen
Do what you can do with what you have.

The day the machines took gaming away

August 5th, 2018 was a noteworthy day in the history of mankind. It was a Sunday and had Europe aching in unusual heat and drought. But more important, it was the day when the machines gently asserted their dominance in the field of gaming. It was the day when our most skilled players lost a tournament of Dota 2 against a bunch of self-learned bots.

“Bot” used to be a vilification

How did we end up in this situation? Let’s look back at what “bot” used to mean in gaming. Twenty years ago, we were thrilled about games like Starcraft where you control plenty of aggressive, but otherwise dumb units in a battle against another player that also controls plenty of those units. The resulting brawls were bloody, chaotic and ultimately overwhelming with their number of necessary tasks (so-called micromanagement) and the amount of information that needed to be processed at once to react to the opponent. In a human versus human (or pvp for player versus player) game, those battles were usually constrained to a certain area and executed with a certain laissez-faire attitude. Only the best players could stage two or more geographically independent attacks and control every unit to their full potential. We admired those players like astronauts or rockstars.

If you could not play against another human, you would start a game against a bot. A bot usually had four things that worked in their advantage and a lot of things stacked agaist them. In their favor, they had minimal delay in their reactions, ultimate precision in their commands and full information about everything on the gamefield. And more often than not, they received more game resources and other invisible cheats because they didn’t stand a chance against even moderately skilled humans otherwise. Often, their game was defined by a fixed algorithm that couldn’t adapt to human strategy and situational specifics. A very simple war of attrition was enough to defeat them if their resource supply wasn’t unlimited. They didn’t learn from their experience and didn’t cooperate, not with other bots or allied humans. These early bots relied on numbers and reaction speed to overwhelm their human counterparts. They played against our natural biological restrictions because the programmers that taught them knew about these restrictions very well.

Barely tolerated fill-ins

Those bots were so dumb and one-dimensional that playing with them against other opponents was even more of a challenge because you always had to protect them from running in the most obvious traps. They weren’t allies, they were a liability that dictated a certain game style. Everybody preferred human allies even if they made mistakes and reacted slower.

The turning point

Then, a magical thing happened. An artificial intelligence had trained itself the rules of Go, a rather simple game with only two players taking turns on a rather static gamefield. This AI played Go against itself so excessively, it mastered the game on a level that even experts could not grasp easily. In the first half of the year 2017, the machines took the game Go out of our hands and continued to play against themselves. It got so bad that an AI that was named AlphaGo Zero taught itself Go from scratch in three days and outclassed the original bot that outclassed mankind. And it seemed to play more like a human than the other bots.

So we got from dumb bots that were inferior stand-ins for real humans to overly powerful bots that even make it seem as if humans are playing in just a few years.

The present days

It should be no surprise to you anymore that on that first Sunday of August 2018, a group of bots beat our best players in Dota 2. There are a few noteworty differences to the victories in Go:

  • Dota 2 is a game where five players battle against five other players, not one versus one. It wasn’t one bot playing five characters, it was five bots cooperating with only in-game communication against humans cooperating with a speech side-channel.
  • Go is an open map game. Bot opponents see every detail of the gamefield and have the same level of information. In Dota 2, your line of sight is actually pretty limited. The bots did not see the whole gamefield and needed to reconnoiter just like their human opponents.
  • In Go, the gamefield is static if nobody changes it. In Dota 2, there a lots of units moving independently on the gamefield all the time. This fluidity of the scenario requires a lot of intuition from human players and bots alike.
  • The rules of Go are simple, but the game turns out to be complex. The rules of Dota 2 are very complex, but the game refuses to be simple, because the possibilities to combine all the special cases are endless.
  • Go is mostly about logic, while Dota 2 has an added timing aspect. Your perfect next move is only effective in a certain time window, after that, you should reconsider it.

Just a year after the machines took logic games from us (go and read about AlphaZero if you want to be depressed how fast they evolve), they have their foot in the real-time strategy sector, too. Within a few years, there is probably no computer game left without a machine player at the top of the ladder. Turns out the machines are better at leisure activities, too.

The future?

But there is a strange side-note to the story. The Go players reported that at first, the bots played like aliens. Later versions (the purely self-learned ones) had a more human-like style. In Dota 2, if you mix bots with humans in one team, the humans actually prefer the cooperation with the bots. It seems that bots could be the preferred opponent and teammate of the future. And then, it’s no longer a game of humans with a few bots as fill-in, but a game between machines, slowed down so that humans can participate and do their part – as a tolerated inferior fill-in.

Decoding non-utf8 server responses using the Fetch API

The new Javascript Fetch API is really nice addition to the language and my preferable, and in fact the only bearable, way to do server requests.
The Promise based API is a lot nicer than older, purely callback-based, approaches.

The usual approach to get a text response from a server using the Fetch API looks like this:

let request = fetch(url)
  .then(response => response.text())
  .then(handleText);

But this has one subtle problem:

I was building a client application that reads weather data from a small embedded device. We did not have direct access to changing the functionality of that device, but we could upload static pages to it, and use its existing HTML API to query the amount of registered rainfall and lightning strikes.

Using the fetch API, I quickly got the data and extracted it from the HTML but some of the identifiers had some screwed up characters that looked like decoding problems. So I checked whether the HTTP Content-Type was set correctly in the response. To my surprise it was correctly set as Content-Type: text/html; charset=iso-8859-1.

So why did my Javascript Application not get that? After some digging, it turned out that Response’s text() function always decodes the payload as utf-8. The mismatch between that and the charset explained the problem!

Obviously, I had to do the decoding myself. The solution I picked was to use the TextDecoder class. It can decode an ArrayBuffer with a given encoding. Luckily, that is easy to get from the response:

let request = fetch(url)
  .then(response => response.arrayBuffer())
  .then(buffer => {
    let decoder = new TextDecoder("iso-8859-1");
    let text = decoder.decode(buffer);
    handleText(text);
  });

Since I only had to support that single encoding, that worked well for me. Keep in mind that the TextDecoder is still experimental Technology. However, we had a specific browser as a target and it works there. Lucky us!

Selecting all columns of a database table with an SQL GROUP BY expression

Suppose we have an SQL database table named “temperatures” with the following contents:

LOCATION  TIME        CELSIUS
inside    2018-08-01  24
inside    2018-08-02  28
inside    2018-08-03  21
inside    2018-08-04  28
outside   2018-08-01  29
outside   2018-08-02  31
outside   2018-08-03  25
outside   2018-08-04  30

We want to find the highest temperature for each location. We use the MAX aggregate function and a GROUP BY expression:

SELECT location, MAX(celsius) celsius
FROM temperatures
GROUP BY location;

As expected, the result is:

LOCATION  CELSIUS
outside   31
inside    28

Now suppose we also want to know when each of these extreme temperatures occured. Naively, we try the following query:

SELECT location, time, MAX(celsius) celsius
FROM temperatures
GROUP BY location;

The response is an error: “not a GROUP BY expression”. In a GROUP BY expression all selected columns must be either part of the GROUP BY clause or an aggregate.

To achieve what we want we can use a JOIN:

SELECT
  t.location, t.time, t.celsius
FROM
  temperatures t
JOIN (SELECT location, MAX(celsius) celsius
      FROM temperatures
      GROUP BY location) tmax
ON
  t.location=tmax.location AND t.celsius=tmax.celsius;

This query results in multiple rows per location if the maximum temperature was recorded at different times:

LOCATION  TIME        CELSIUS
outside   2018-08-02  31
inside    2018-08-04  28
inside    2018-08-02  28

If we are only interested in the first occurrence of the maximum temperature per location, we can use the following query:

SELECT
  location,
  MIN(time) KEEP (DENSE_RANK LAST ORDER BY celsius) time,
  MAX(celsius) celsius
FROM
  temperatures
GROUP BY
  location;
LOCATION  TIME        CELSIUS
inside    2018-08-02  28
outside   2018-08-02  31

Here we don’t need a JOIN anymore, because select clause for the time column is an aggregate as well.