3 rules for projects under version control

Checking out a project under version control should be easy and repeatable. Here are a few tips on how to achieve that:

1. Self-containment

You should not need a specifically configured machine to start working on a project. Ideally, you clone the project and get started. Many things can be a problem to achieve that.
Maybe your project is needs a specific operating system, dependency installed, hardware or database setup to run. I personally draw that line at this:
You get a manual with the code that helps you to set up your development environment once and this should be as automated & easy as possible. You should always be able to run your projects without periphery, i.e. specific hardware that can be plugged in or databases that need to be installed.
To achieve this for hardware, you can often fake it via polymorphic interfaces and dependency injection, much like mocking it for testing. The same can be done with databases – or you can use in-memory databases as a fallback.

2. Separate build-artifacts

Building the project into an executable form should be clearly separated from the source-controlled files. For example, the build should never ever modify a file that is under version control. Ideally, the “build” directory is completely independent from the source – enabling a true out-of-source build. However, it is often a good middle ground to allow building in a few dedicated directories in your source repository – but these need to be in .gitignore.

3. Separate runtime data

In the same way, running your project should not touch any source controlled files. Ideally, the project can be run out-of-source. This is trivial for small programs that do not have data, but once some data needs to be managed by the source control system, it gets a little more tricky for the executables to find the data. For data that needs to be changed by the program (we call these “stores”), it is advisable to maintain templates in the VCS or in codes, and copy them to the runtime directory during the build process. For data that is not changed by running the program, such as images, videos, translation-tables etc., you can copy them as well, or make sure the program finds them in the source repository.

Following these guidelines will make it easier to work with version control, especially when multiple people are involved.

Consistency over magic, please

The Groovy programming language is a JVM based scripting language. It is used by the Grails web framework and the Gradle build automation system.

Groovy has a language feature called Named argument constructors. This means that given a class with properties, for example

class Example {
  String text
}

you can initialize the properties directly when calling the constructor:

def example = new Example(text: ' This is an example. ')
assert example.text == ' This is an example. '

This is basically a shortcut for initializing the properties via explicit assignment:

def example = new Example()
example.text = ' This is an example. '
assert example.text == ' This is an example. '

So far so good.

Enter Grails

We use the aforementioned Grails framework for some of our web application projects. It is advertised on its website as featuring “convention-over-configuration” and “sensible defaults”. Grails uses the Groovy programming language, and a simple domain class looks just like a plain old Groovy class, except that it lives under the grails-app/domain directory (this is one of the convention-over-configuration aspects):

class Example {
  String text
}

As expected, you can initialize the property via regular assignment:

def example = new Example()
example.text = ' This is an example. '
assert example.text == ' This is an example. '

So one might expect that you can initialize it via a named argument constructor call as well:

def example = new Example(text: ' This is an example. ')
assert example.text == ' This is an example. '

And indeed, you can. But what’s this? Our assertion fails:

assert example.text == ' This is an example. '
               |    |
               |    false
               This is an example.

It is not directly obvious from the assertion failure output, but the property value is indeed no longer equal to the expected text: the leading and trailing spaces got trimmed!

I was surprised, but after some research in Grails documentation it turned out that it’s not a bug, but a feature. In the section on Data Binding, you can find the following sentence:

The mass property binding mechanism will by default automatically trim all Strings at binding time. To disable this behavior set the grails.databinding.trimStrings property to false in grails-app/conf/application.groovy.

Groovy’s named argument constructor feature is used as a data binding mechanism by Grails to bind web request parameters to a domain object. For this the default behavior was modified, so that strings are automatically trimmed. I can only guess that this is considered to be an instance of the “sensible defaults” mentioned on the Grails homepage.

To me personally this kind of surprising behavior is not a sensible default, and I think it goes against the Principle of least astonishement. I prefer consistency over “magic”.

Using PostgreSQL for time-series data

The number of sensors and other things that periodically collect data is ever growing. This advent of the  internet of things (IoT) demands a way of storing and analyzing all this so-called time-series data. There are many options for such data – the most prominent being special time-series databases like InfluxDB or well suited, nicely scaling databases like Apache Cassandra.

The problem is you have to tailor your solution to one of these technologies whereas there is SQL with mature database management systems (DBMS) and drivers/bindings for almost any programming language.

Why not use a plain SQL database?

Relational SQL databases are a mature and well understood piece of technology albeit not as sexy as all those new NoSQL databases. Using them for time series data may not be a problem for smaller datasets but sooner or later your ingestion and query performance will degrade massivly. So in general it is not a good option to store all your time-series data in a traditional relational DBMS (RDBMS).

Why use PostgreSQL with TimescaleDB?

With the PostgreSQL extension TimescaleDB you get the best of both worlds: a well known query language, robust tools and scalability.

You access and manage your time-series database just like your ordinary PostgreSQL database. Almost everything including replication and backups will continue to work like before.

You do not have to deal with limitations of specialized solutions or learn a completely new ecosystem just for one aspect of your solution.

The future

We are successfully using TimescaleDB in one of our projects and will continue to share tipps and experience with this technology taking its rising importance into account.

 

The lorem ipsum in development

We’ve all been guilty of this: showing a new UI mockup to the user with lorem ipsum or typing in fake data into forms for testing. That’s bad because fake data does not have the same characteristics of real data. Real data has specific lengths, formats and structure. Fake data is arbitrary. It is even more embarrassing if a lorem ipsum gets into production. But using fake data is not only a problem in the user interface.
Using fake data is also a problem in programming. We see it in names of classes: *Impl, *Manager or *Holder.
These names do not communicate well. Why not name classes with intent? Take a look at domain driven design and the ubiquitous language for a starter.
But also using only fake data in tests is bad. Yes you need to test the extreme cases but even here you need to inform your selection of data from the real world. Build in the constraints that the domain provides. Without the domain you build highly flexible software which is a nightmare to maintain. But also the test data should not be just arbitrary: you need the corner cases and to find them look again into the domain, the real world. To do this you need to gather real data from users, domain experts or existing systems and reports. Real data has constraints, constraints drive creativity and decisions. The problem with deferring the decisions too much is you maintain your flexibility until then. So make decisions but not from fake but real data.

Developer Experience means nothing

You’ve probably already heard about the concept of User Experience (UX) and the matching virtue of User Experience Design. If not, you might want to go check it out. I would suggest you start with the excellent book “The Design of Everyday Things” from Donald Norman. It has nothing to do with software development, so you won’t mix up User Experience Design with Graphical User Interface Design or even Graphics Design.

In short, User Experience is what an user of your software will experience while using your software. If your software makes them smile, you’re some kind of UX god. If your software makes them curse repeatedly, you’re probably not. You can try to improve the experiences of your users by making changes to the software. This is called User Experience Design and is an important part of software development. Most developers know nothing about User Experience Design.

What developers naturally know about is Developer Experience (DX), a concept that isn’t really defined in the literature. I made it up to explain my point to you. Every software developer has a favorite programming language and IDE. Remember, the source codes of the same problem in different programming languages will ultimately yield the same machine code. The machine is totally agnostic about your preferences for a certain programming language. Your choice of a programming language for a certain problem is more about your Developer Experience than anything else. Developer Experience is everything you feel and think during software development. A bad Developer Experience lets you swear about the tools, the code and everything and everybody else around you. A good Developer Experience gives you a sense of accomplishment and safety while coding and makes your work not harder than necessary.

Because you can’t read elsewhere about the concept of Developer Experience, I want to give two more examples and show how it affects the User Experience. The first example is a big, long-lasting software system that is in production and still developed further. If you have a software system in production, everything requires an additional thought about the matching upgrade strategy. You can’t just modify the database structure, you need to provide migrations. You can’t just add a configuration entry, you need to make it optional or consider the least harmful default value. In the project of our example, the developers had three possible approaches to a new configuration entry:

  • They could add it to the code but leave the configuration files unchanged. This required the code to handle the “absent from configuration” case in a useful manner. It required the developer to make effort in the code. The user would not know about the new configuration entry if not stated in some external documentation.
  • They could add it to the code and write a configuration migration script that added it to the existing configuration files on the customer’s installation. The code could now expect the entry to be present, but the developer had to write and test the migration script code. The user would see the new configuration entry with the default value.
  • They could introduce a new configuration file to the system and place the new configuration entry in it. The code could expect the entry to be present, because new configuration files were added to an existing installation during the upgrade process. The user would see the new configuration file and the new configuration entry with the default value.

You can probably guess which of the three options got used so excessively that users complained about the configuration being all over the place, in a myriad of little one-liner configuration files with ominous names. The developers chose the best option for them and, in short view, for the users. But on the long run, the User Experience declined.

The second example is from a computer game in the mid 2000s. It was a massive multiplayer online shooter with a decent implementation ahead of its time. But one thing was still from the last century: After each update of the game, the key bindings were reset to the default. And every other aspect of local modification to the settings, too. After each update, you had to configure your video, audio, controls and everything else like your in-game equipment again and again. The game didn’t offer any means to copy or reload your settings. It was up to you to maintain a recent backup of the settings files just in case. And if the file format was changed, you needed to combine their changes with yours. WinMerge is a decent tool for that task. But the problem is clear: The game developers couldn’t be bothered to think about how their upgrade strategy would affect their users. They ignored the problem and let the users figure it out. They chose better Developer Experience, free from complexities like user-side modifications over good User Experience like a game that can be upgraded without drawbacks for the gamers.

Sadly, this is an usual formula in software development: Developer Experience is more important than User Experience. Just look at it from an utilitarian point of view: If you burden thousands or even millions of users with just an easy one-minute task that you could fix during development, you have a budget of two workdays up to 10 person-years. Do the math yourself: One million users, each spending one minute of work on your software, equal to over 2000 workdays lost on a thing you could probably fix for everybody in an hour or two.

And this brings me to my central statement: Developer Experience, as opposed to User Experience, means nothing. It’s just not important. At least for all the users. It is important to the small group of developers and should not be forgotten. But never should a decision lead to better Developer Experience on the expense of User Experience. It’s a small inconvenience for us developers to think about smooth upgrades, meaningful and consistent control element titles or an easy installation. It’s a whole new game for our users.

Always choose User Experience over Developer Experience.

Call to action: If you have another good example of developers being lazy on the expense of their users, please share it as a comment.

Call to action 2: The initial picture is linked from the excellent website badhtml.com. If you ever feel like an imposter for your latest design, go and visit this website.

luabind deboostified tips and tricks

luabind deboostified is a fork of the luabind project that helps exposing APIs to Lua. As the name implies, it replaces the boost dependency with modern C++, which makes it a lot more pleasant to work with.

Here are a few tips and tricks I learned while working with it. Some tricks might be applicable to the original luabind – I do not know.

1. Splitting module registration

You can split the registration code for different classes. I usually add a register function per class, like this:

struct A {
  void doSomething();
  static luabind::scope registerWithLua();
};

struct B {
  void goodStuff();
  static luabind::scope registerWithLua();
};

You can then combine their registration code into a single module on the Lua side:

void registerAll(lua_State* L) {
  luabind::module(L)[
    A::registerWithLua(),
    B::registerWithLua()];
}

The implementation of a registration function looks like this:

luabind::scope A::registerWithLua()
{
  return luabind::class_<A>("A")
    .def("doSomething", &A::doSomething);
}

2. Multiple policies and multiple return values

Unlike C++, Lua has real multiple return values. You can use that by utilizing the return value policies that luabind offers. Lets say, you want to write this in Lua:

local x, y = a.getPosition()

The C++ side could look like this:

void getPosition(A const& a, float& x, float& y);

The deboostified fork needs its policies supplied in a type list. Let’s use a small helper meta-function to build that:

template <typename... T>
using joined = 
  typename luabind::meta::join<T...>::type;

Once you have that, you can expose it like this:

luabind::def("getPosition", &getPosition,
              joined<
                luabind::pure_out_value<2>,
                luabind::pure_out_value<3>
              >());

3. Specialized data structures using luabind::object

Using the converters in luabind is not the only way to make Lua values from C++. Almost everything you can do in Lua itself, you can do with luabind::object. Here is a somewhat contrived example:

luabind::object repeat(luabind::object what,
                       int count) {
  // Create a new table object
  auto result = luabind::newtable(
    what.interpreter());
  // Fill it as an array [1..N]
  for (int i = 1; i <= count; ++i)
    result[i] = what;
  return result;
}

This function can then be exported via luabind::def and used just like any other function. This is just the tip of the iceberg, though. For example, you can also write functions that, at runtime, behave differently when a number is passed in as when a table is passed in. You can find out the Lua type with luabind::type(myObject).

Of course, as soon as you want to create new objects to return to Lua, you need the lua_State pointer in that function. Using the interpreter from a passed-in luabind::object is one way, but I have yet to find another pleasant way to do this. It is probably possible to use the policies to do this, and have them pass that in as a special parameter, but for now I am using some complicated machinery to bind lambda functions that capture the Lua interpreter.

That’s it for now..

Keep in mind that these are not thoroughly researched best-practices, but patterns I have used to solve actual problems. There might be better solutions out there – if you know any, please let me know. Hope this helped!

Using Ansible vault for sensitive data

We like using ansible for our automation because it has minimum requirements for the target machines and all around infrastructure. You need nothing more than ssh and python with some libraries. In contrast to alternatives like puppet and chef you do not need special server and client programs running all the time and communicating with each other.

The problem

When setting up remote machines and deploying software systems for your customers you will often have to use sensitive data like private keys, passwords and maybe machine or account names. On the one hand you want to put your automation scripts and their data under version control and use them from your continuous integration infrastructure. On the other hand you do not want to spread the secrets of your customers all around your infrastructure and definately never ever in your source code repository.

The solution

Ansible supports encrypting sensitive data and using them in playbooks with the concept of vaults and the accompanying commands. Setting it up requires some work but then usage is straight forward and works seamlessly.

The high-level conversion process is the following:

  1. create a directory for the data to substitute on a host or group basis
  2. extract all sensitive variables into vars.yml
  3. copy vars.yml to vault.yml
  4. prefix variables in vault.yml with vault_
  5. use vault variables in vars.yml

Then you can encrypt vault.yml using the ansible-vault command providing a password.

All you have to do subsequently is to provide the vault password along with your usual playbook commands. Decryption for playbook execution is done transparently on-the-fly for you, so you do not need to care about decryption and encryption of your vault unless you need to update the data in there.

The step-by-step guide

Suppose we want work on a target machine run by your customer but providing you access via ssh. You do not want to store your ssh user name and password in your repository but want to be able to run the automation scripts unattended, e.g. from a jenkins job. Let us call the target machine ceres.

So first you setup the directory structure by creating a directory for the target machine called $ansible_script_root$/host_vars/ceres.

To log into the machine we need two sensitive variables: ansible_user and ansible_ssh_pass. We put them into a file called $ansible_script_root$/host_vars/ceres/vars.yml:

ansible_user: our_customer_ssh_account
ansible_ssh_pass: our_target_machine_pwd

Then we copy vars.yml to vault.yml and prefix the variables with vault_ resulting in $ansible_script_root$/host_vars/ceres/vault.yml with content of:

vault_ansible_user: our_customer_ssh_account
vault_ansible_ssh_pass: our_target_machine_pwd

Now we use these new variables in our vars.xml like this:

ansible_user: "{{ vault_ansible_user }}"
ansible_ssh_pass: "{{ vault_ansible_ssh_pass }}"

Now it is time to encrypt the vault using the command

ANSIBLE_VAULT_PASS="ourpwd" ansible-vault encrypt host_vars/ceres/vault.yml

resulting a encrypted vault that can be put in source control. It looks something like

$ANSIBLE_VAULT;1.1;AES256
35323233613539343135363737353931636263653063666535643766326566623461636166343963
3834323363633837373437626532366166366338653963320a663732633361323264316339356435
33633861316565653461666230386663323536616535363639383666613431663765643639383666
3739356261353566650a383035656266303135656233343437373835313639613865636436343865
63353631313766633535646263613564333965343163343434343530626361663430613264336130
63383862316361363237373039663131363231616338646365316236336362376566376236323339
30376166623739643261306363643962353534376232663631663033323163386135326463656530
33316561376363303339383365333235353931623837356362393961356433313739653232326638
3036

Using your playbook looks similar to before, you just need to provide the vault password using one of several options like specifying a password file, environment variable or interactive input. In our example we just use the environment variable inline:

ANSIBLE_VAULT_PASS="ourpwd" ansible-playbook -i inventory work-on-customer-machines.yml

After setting up your environment appropriately with a password file and the ANSIBLE_VAULT_PASSWORD_FILE environment variable your playbook commands are exactly the same like without using a vault.

Conclusion

The ansible vault feature allows you to safely store and use sensitive data in your infrastructure without changing too much using your automation scripts.