C++ pass-thru parameters

So in ye olde days, before C++11 and move semantics, it was common for functions to use mutable references to pass container-content to the caller, like this:

void random_between(std::vector<int>& out,
  int left, int right, std::size_t N)
{
  std::uniform_int_distribution<> 
    distribution(left, right);
  for (std::size_t i = 0; i < N; ++i)
    out.push_back(distribution(rng));
}

and you would often use it like this:

std::vector<int> numbers;
random_between(numbers, 7, 42, 10);

Basically trading expressiveness and convenience for speed/efficiency.

Convenience is king

Now obviously, those days are over. With move-semantics and guaranteed copy-elision backing us up, it is usually fine to just return the filled container, like this:

std::vector<int> random_between(int left, int right,
  std::size_t N)
{
  std::vector<int> out;
  std::uniform_int_distribution<>
    distribution(left, right);
  for (std::size_t i = 0; i < N; ++i)
    out.push_back(distribution(rng));
  return out;
}

Now you no longer have to initialize the container to use this function and the function also became pure, clearly differentiating between its inputs and outputs.

Mostly better?

However, there is a downside: Before, the function could be used to append multiple runs into the same container, like this:

std::vector<int> numbers;
for (int i = 0; i < 5; ++i)
  random_between(numbers, 50*i + 7, 50*i + 42, 10);

That use case suddenly became a lot harder. Also, what if you want to keep your vector around and just .clear() it before calling the function again later, to save allocations? That’s also no longer possible. I am not saying that these two use cases should make you prefer the old variant, as they tend not to happen very often. But when they do, it’s all the more annoying. So what if we could have your cake and eat it, too?

A Compromise

How about this:

std::vector<int> random_between(int left, int right,
  std::size_t N, std::vector<int> out = {})
{
  std::uniform_int_distribution<>
    distribution(left, right);
  for (std::size_t i = 0; i < N; ++i)
    out.push_back(distribution(rng));
  return out;
}

Now you can use it to just append again:

std::vector<int> numbers;
for (int i = 0; i < 5; ++i)
  numbers = random_between(
    50*i + 7, 50*i + 42, 10, std::move(numbers));

But you can also use it in the straightforward way, for the hopefully more common case:

auto numbers = random_between(
  50*i + 7, 50*i + 42, 10);

Now you should definitely not do this with all your functions returning a container. But it is a nice pattern to have up your sleeve when the need arises. It should be noted that passing a mutable reference can still be faster in some cases, as that will save you two moves. And you can also add a container-returning facade variant as an overload, but I think this pattern is a very nice compromise that can be implemented by moving a single variable to the parameter list and defaulting it. It keeps 99% of the use cases identically to the original container-returning variant, while making the “append” use slightly more verbose, but also more expressive.

The Java Cache API and Custom Key Generators

The Java Cache API allows you to add a @CacheResult annotation to a method, which means that calls to the method will be cached:

import javax.cache.annotation.CacheResult;

@CacheResult
public String exampleMethod(String a, int b) {
    // ...
}

The cache will be looked up before the annotated method executes. If a value is found in the cache it is returned and the annotated method is never actually executed.

The cache lookup is based on the method parameters. By default a cache key is generated by a key generator that uses Arrays.deepHashCode(Object[]) and Arrays.deepEquals(Object[], Object[]) on the method parameters. The cache lookup based on this key is similar to a HashMap lookup.

You can define and configure multiple caches in your application and reference them by name via the cacheName parameter of the @CacheResult annotation:

@CacheResult(cacheName="examplecache")
public String exampleMethod(String a, int b) {

If no cache name is given the cache name is based on the fully qualified method name and the types of its parameters, for example in this case: “my.app.Example.exampleMethod(java.lang.String,int)”. This way there will be no conflicts with other cached methods with the same set of parameters.

Custom Key Generators

But what if you actually want to use the same cache for multiple methods without conflicts? The solution is to define and use a custom cache key generator. In the following example both methods use the same cache (“examplecache”), but also use a custom cache key generator (MethodSpecificKeyGenerator):

@CacheResult(
  cacheName="examplecache",
  cacheKeyGenerator=MethodSpecificKeyGenerator.class)
public String exampleMethodA(String a, int b) {
    // ...
}

@CacheResult(
  cacheName="examplecache",
  cacheKeyGenerator=MethodSpecificKeyGenerator.class)
public String exampleMethodB(String a, int b) {
    // ...
}

Now we have to implement the MethodSpecificKeyGenerator:

import org.infinispan.jcache.annotation.DefaultCacheKey;

import javax.cache.annotation.CacheInvocationParameter;
import javax.cache.annotation.CacheKeyGenerator;
import javax.cache.annotation.CacheKeyInvocationContext;
import javax.cache.annotation.GeneratedCacheKey;

public class MethodSpecificKeyGenerator
  implements CacheKeyGenerator {

  @Override
  public GeneratedCacheKey generateCacheKey(CacheKeyInvocationContext<? extends Annotation> context) {
    Stream<Object> methodIdentity = Stream.of(context.getMethod());
    Stream<Object> parameterValues = Arrays.stream(context.getKeyParameters()).map(CacheInvocationParameter::getValue);
    return new DefaultCacheKey(Stream.concat(methodIdentity, parameterValues).toArray());
  }
}

This key generator not only uses the parameter values of the method call but also the identity of the method to generate the key. The call to context.getMethod() returns a java.lang.reflect.Method instance for the called method, which has appropriate hashCode() and equals() implementations. Both this method object and the parameter values are passed to the DefaultCacheKey implementation, which uses deep equality on its parameters, as mentioned above.

By adding the method’s identity to the cache key we have ensured that there will be no conflicts with other methods when using the same cache.

Adding a dynamic React page to your classic grails multi-page application

We are developing and maintaining a more than 10 years old classic multi-page application based on the Grails web framework. With the advent of HTML 5 and modern browsers with faster JavaScript engines user expect more and more dynamic and pleasant user experience (UX) from web applications. Our application is used by hundreds of users and our customer expects a stable, familiar and feature-rich experience that continues to improve over time. Something like a complete rewrite of the UI is way out of scope time- and budget-wise.

One of the new feature requests would benefit highly from a client-side JavaScript implementation so we looked at our options. Fortunately it is quite easy to integrate a react app with grails and the gradle build system. So we implemented the new page almost completely as a react app while leaving all the other pages as normal server-side rendered Groovy Server Pages (GSP). The result is quite convincing and opens up a transition path to more and more dynamic client-side pages and perhaps even to the complete transformation to a single-page-application (SPA) in a distant future.

Integrating a React-App into Grails build process

The Grails react-webpack profile can serve as a great starting point to integrate a react app into an existing grails project. First you create the react app for the new page in the folder src/main/webapp, using the create-react-app scripts for example. Then you need to add a $GRAILS_PROJECT/webpack.config.js to configure webpack appropriately like so:

var path = require('path');

module.exports = {
  entry: './src/main/webapp/index.js',
  output: {
    path: path.join(__dirname, 'grails-app/assets/javascripts'),
    publicPath: '/assets/',
    filename: 'bundle.js'
  },
  module: {
    rules: [
      {
        test: /\.js$/,
        include: path.join(__dirname, 'src/main/webapp'),
        use: {
          loader: 'babel-loader',
          options: {
            presets: ["@babel/preset-env", "@babel/preset-react"],
            plugins: ["transform-class-properties"]
          }
        }
      },
      {
        test: /\.css$/,
        use: [
          'style-loader',
          'css-loader'
        ]
      },
      {
        test: /\.(jpe?g|png|gif|svg)$/i,
        use: {
          loader: 'url-loader?limit=10000&prefix=assets/!img'
        }
      }
    ]
  }
};

The next step is to move the package.json to the $GRAILS_PROJECT directory because we want gradle tasks to take care of building and bundling it as a grails asset. To make this convenient we add some gradle tasks employing yarn to our build.gradle:

buildscript {
    dependencies {
        ...
        classpath "com.moowork.gradle:gradle-node-plugin:1.2.0"
    }
}

...

apply plugin:"com.moowork.node"

...

node {
    version = '12.15.0'
    yarnVersion = '1.22.0'
    distBaseUrl = 'https://nodejs.org/dist'
    download = true
}

task bundle(type: YarnTask, dependsOn: 'yarn') {
    group = 'build'
    description = 'Build the client bundle'
    args = ['run', 'bundle']
}

task webpack(type: YarnTask, dependsOn: 'yarn') {
    group = 'application'
    description = 'Build the client bundle in watch mode'
    args = ['run', 'start']
}

bootRun.dependsOn(['bundle'])
assetCompile.dependsOn(['bundle'])

...

Now we have integrated our new react app with the grails build system and packaging. The webpack task allows updating the javascript bundle on the fly so that we have almost the same hot reloading support when developing as with the rest of grails.

Delivering the react app as a page

Now that we have integrated the react app in the build and packaging process of our grails application we need to deliver it when the new page is requested by the browser. This is quite simple and straightforward and can be achieved with a GSP like so:

<html>
<head>
    <meta name="layout" content="main"/>
    <title>
        <g:message code="example.header"/>
    </title>
</head>
<body>
    <div id="react-content">
    </div>
    <asset:javascript src="bundle.js"/>
</body>
</html>

Now you just have to develop the endpoints for the javascript app in form of normal grails controllers rendering JSON instead of GSP views. This is extremely easy using groovy maps and the grails JSON converters:

import grails.converters.JSON

class DataApiController {

    def getData = {
        def responseData = [
            name: 'John',
            age: 37
        ]
        render responseData as JSON
    }
}

Conclusion

Grails and its build infrastructure is flexible enough to easily integrate SPA pages into an existing traditional web application. This allows you to deliver modern UX and features expected by nowadays users without completely rewriting your trusty and proven grails application. The process can be gradually and individual pages/views can be renewed when needed. That way you can continually add value to your customer while incrementally modernizing your application.

The emoji checksum

This blog article is a story about an idea, not an actual project report. If it were a movie, it would feature the “based on real events” disclaimer.

The warehouse

Imagine a warehouse of a medium sized company. You would expect a medium sized warehouse, but in reality, the amount of items in this warehouse is nearly as big as in a big company. The difference might be the storage count of each item, but the item count is a big number. So big that each item has its own “item ID”, which is also used as the location identifier in the warehouse. Let’s see three (contrived) examples:

  • 211 725: Retaining screw, 8 mm
  • 413 114: Power transformer, 5 A
  • 413 115: Power transformer, 10 A

As you can see, different item groups have numbers with a large numerical distance while similar items are numerically close. This makes sense for the engineers using these numbers by muscle memory and for the warehouse navigation. If you read the first three digits, you already know where to turn to in the large hall. If you’ve arrived in the general area, the next three digits lead you to the exact storage space.

The operators

But that’s not how it works. The warehouse workers cannot read. Yes, you’ve read that right. The warehouse is operated by humans and the workers are not familiar with digits and numbers. They decipher each digit on their own and cannot cross-check with the article name. They navigate the warehouse with a best-effort approach. The difference between item 413114 and item 413115 is negligible for them. It’s the same thing anyway – unless you can read (and understand) that one of them blows up above 5 Ampere and the other one doesn’t. And this is a problem for the engineers. The difference between a “Power transformer able to take 10 Ampere” and a “Power transformer (5 A), aka molten copper lump” is a successful or a failed project.

So what can you do? Teach the warehouse workers how to read and deal with numbers? Would be a good approach if the turnover rate among them wasn’t so high. What else can we do? We can abstract the problem at hand, apply a suitable solution approach and see if it works.

The abstraction

If you think about the situation in abstract terms, you deal with an unreliable data transmission. You send your item list to the warehouse and receive a collection of loosely related items. That’s similar to sending data over a faulty cable. To mitigate transmission errors, we’ve invented checksums. Each suitable part of the transmission is validated (or invalidated) by a checksum.

In our case, the “suitable part of the transmission” is each single item. We should add a checksum to the item list! Instead of requesting item 413114, we request 413114/7, while item 413115 is requested as 413115/1. Now, we have a clear indicator for wrong or right. But it is still an indicator in a foreign alphabet. If you ignore the difference between 4 and 5, why not also ignore the difference between 7 and 1?

The emojification

But what if we don’t rely on numbers or characters, but on something every human can understand, regardless of literacy level? What if we transpose the numbers into an emoji alphabet? Let 413114 be 😄🌵☁️🌵🌵😄 and 413115 is written as 😄🌵☁️🌵🌵🏠. But more important: The checksum is in emoji, too:

😄🌵☁️🌵🌵😄 (413114)

🚗 (7)
vs.
😄🌵☁️🌵🌵🏠 (413115)

🌵 (1)

Even if you only glance at the emoji series (and fail to notice the difference at the end), you still have to acknowledge that your checksum doesn’t fit. A cactus is no car, regardless of your literacy.

This transposition of numbers into the iconographic realm plays right into every human’s built-in ability to distinguish concrete objects. Numbers, digits and characters are (more) abstract concepts and objects, but a cloud is recognizable as a cloud even if you draw it by hand and without care. The transposition is reversable quiet easily – you only have to remember ten number/emoji pairs (or eleven, if your checksum has an extra character). And nobody stops you from printing both on the item list and warehouse storage boxes:

And the best thing? You don’t even have to invent the transposition yourself. Just use the existing work of others by checking out emojisum by Vincent Batts or ecoji by Keith Turner.

The only thing that is stopping you is that ancient dot matrix printer that prints the item lists on continuous paper.

The joy of being a student assistant

Lately I heard a lecture by Daniel Lindner about error codes and why you should avoid using them. I had to smile because it reminded me of my time as a student assistant, when I worked with some people that had a slightly different opinion on that point. Maybe they enjoyed torturing student assistants, but it seems the most likely to me that they just did not know any better. But let‘s start at the beginning.

One day the leader of my research group sent me an excel sheet with patient data and asked me to perform some statistical calculations with the programming language R, that is perfectly suitable for such a task. Therefore I did not expect it to take much time – but it soon turned out that I was terribly wrong. Transforming that excel sheet into something R could work with gave me a really hard time and so I decided to write down some basic rules you should consider when recording such data in the hope that at least some future student assistants won‘t have to deal with the problems I had again.

First I told my program to read in the excel sheet with the patient data, which worked as expected. But when I started to perform some simple operations like calculating the average age of the patients, my computer soon told me things like that:

Warning message: In mean.default(age): argument is not numeric or logical: returning NA

I was a little confused then, because I knew that the age of something or someone is a numeric value. But no matter how often I tried to explain that to my computer: He was absolutly sure that I was wrong. So I had no choice but to have a look at the excel sheet with about 2000 rows and 30 columns. After hours of searching (at least it felt like hours) I found a cell with the following content: Died last week. That is indeed no numeric value, it‘s a comment that was made for humans to read. So here‘s my rule number one:

1. Don‘t use comments in data files

There is one simple reason for that: The computer, who has to work with that data, does not understand it. And (as sad as it is) he does not care about the death of a patient. The only thing he wants to do is to calculate a mean value. And he needs numeric values for it. If you still want to have that comment, just save it somewhere else in another column for comments or in a separate file the computer does not have to deal with when performing statistical calculations.

So I removed that comment (and some others I found) and tried to calculate the mean value again. This time my computer did not complain, the warning message disappeard and for one moment I felt relieved. In the next moment I saw the result of the calculation. The average age of the patients was 459.76. And again I told my computer that this is not possible and again he was sure that I was wrong and again I had to take a look at the excel sheet with the data. Did I mention that the file contained about 2000 rows and 30 columns? However, after a little searching I found a cell with the value -999999. It was immediately clear to me that this was not the real age of the patient, but I wasn‘t able to find out by myself what that value meant. It could have been a typo, however the leader of my research group told me that some people use -999999 as an error code. It could mean something like: „I don‘t know the age of the patient.“ Or: „That patient also died.“ But that was only a guess. So here is my rule number 2:

2. Document your error codes

If there would have been some documentation I would maybe have known what to do with that value. Instead I secretly deleted it, hoping that it was not important to anyone, because unfortunately to my computer -999999 is just a numeric value, not better or worse than any other. So I had to tell him not to use it. But that was only the beginning.

I learned from my previous mistakes and before performing any other statistical calculations I had a look at the whole excel sheet. And it was even more horrible than I expected it to be. If every person who worked with that table would have used the same error code, I could just have written a script that eliminated all -999999 from the sheet and it would have been done. But it seemed that everyone had his own favorite (undocumented) error code. Or if at least there would have been some documentation about the value ranges of each column, I could have told my computer to ignore all values that are not in that range. For something like the age of a patient this is easy, but for other medical data a computer science student like me does not know that can be hard. Is 0 a valid value or does it mean that there is no value? What about 999? So in any case: I had to read the whole table (again: 2000×30 values!) and manually guess for each value if it really was a value or an error code and then tell my computer to ignore it, so he could calculate the right means. I don‘t know exactly how much time that cost, but I‘m sure in the same time I could have read all nine books of The Histories by Herodotus twice, watch every single episode of Gilmore Girls including the four episodes of A Year in the Life and learn Japanese. So finally here‘s my rule number 3 (and the good part about this one is that you can immediately forget about rule number 2):

3. Don‘t use error codes in data files

Really. Don‘t. The student assistants of the future will thank you.

Some strings are more equal before your Oracle database

When working with customer code based on ADO.net, I was surprised by the following error message:

The german message just tells us that some UpdateCommand had an effect on “0” instead of the expected “1” rows of a DataTable. This happened on writing some changes to a table using an OracleDataAdapter. What really surprised me at this point was that there certainly was no other thread writing to the database during my update attempt. Even more confusing was, that my method of changing DataTables and using the OracleDataAdapter to write changes had worked pretty well so far.

In this case, the title “DBConcurrencyExceptionturned out to be quite misleading. The text message was absolutely correct, though.

The explanation

The UpdateCommand is a prepared statement generated by the OracleDataAdapter. It may be used to write the changes a DataTable keeps track of to a database. To update a row, the UpdateCommand identifies the row with a WHERE-clause that matches all original values of the row and writes the updates to the row. So if we have a table with two rows, a primary id and a number, the update statement would essentially look like this:

UPDATE EXAMPLE_TABLE
  SET ROW_ID =:current_ROW_ID, 
      NUMBER_COLUMN =:current_NUMBER_COLUMN
WHERE
      ROW_ID =:old_ROW_ID 
  AND NUMBER_COLUMN =:old_NUMBER_COLUMN

In my case, the problem turned out to be caused by string-valued columns and was due to some oracle-weirdness that was already discussed on this blog (https://schneide.blog/2010/07/12/an-oracle-story-null-empty-or-what/): On writing, empty strings (more precisely: empty VARCHAR2s) are transformed to a DBNull. Note however, that the following are not equivalent:

WHERE TEXT_COLUMN = ''
WHERE TEXT_COLUMN is null

The first will just never match… (at least with Oracle 11g). So saying that null and empty strings are the same would not be an accurate description.

The WHERE-clause of the generated UpdateCommands look more complicated for (nullable) columns of type VARCHAR2. But instead of trying to understand the generated code, I just guessed that the problem was a bug or inconsistency in the OracleDataAdapter that caused the exception. And in fact, it turned out that the problem occured whenever I tried to write an empty string to a column that was DBNull before. Which would explain the message of the DBConcurrencyException, since the DataTable thinks there is a difference between empty strings and DBNulls but due to the conversion there will be no difference when the corrensponding row is updated. So once understood, the problem was easily fixed by transforming all empty strings to null prior to invoking the UpdateCommand.

The “parameter self-destruction” bug

A few days ago, I got a bug report for a C++ program about a weird exception involving invalid characters in a JSON format. Now getting weird stuff back from a web backend is not something totally unexpected, so my first instinct was to check whether any calls to the parser did not deal with exceptions correctly. To my surprise, they all did. So I did what I should have done right away: just try to use the feature were the client found the bug. It crashed after a couple of seconds. And what I found was a really interesting problem. It was actually the JSON encoder trying to encode a corrupted string. But how did it get corrupted?

Tick, tick, boom..

The code in question logs into a web-service and then periodically sends a keep-alive signal with the same information. Let me start by showing you some support code:


class ticker_service
{
public:
  using callable_type = std::function<void()>;
  using handle = std::shared_ptr<callable_type>;

  handle insert(callable_type fn)
  {
    auto result = std::make_shared<callable_type>(
      std::move(fn));
    callables_.push_back(result);
    return result;
  }

  void remove(handle const& fn_ptr)
  {
    if (fn_ptr == nullptr)
      return;

    // just invalidate the function
    *fn_ptr = {};
  }

  void tick()
  {
    auto callable_invalid =
      [](handle const& fn_ptr) -> bool
    {
      return !*fn_ptr;
    };

    // erase all the 'remove()d' functions
    auto new_end = std::remove_if(
      callables_.begin(),
      callables_.end(),
      callable_invalid);

    callables_.erase(new_end, callables_.end());

    // call the remainder
    for (auto const& each : callables_)
      (*each)();
  }

private:
  std::vector<handle> callables_;
};

This is dumbed down from the real thing, but enough to demonstrate the problem. In the real code, this only runs the functions after a specific time has elapsed, and they are all in a queue. Invalidating the std::function serves basically as “marking for deletion”, which is a common pattern for allowing deletion in queue or heap-like data structure. In this case, it just allows to mark a function for deletion in constant time, while the actual element shifting is “bundled” in the tick() function.

Now for the code that uses this “ticker service”:

class announcer_service
{
public:
  explicit announcer_service(ticker_service& ticker)
  : ticker_(ticker)
  {
  }

  void update_presence(std::string info)
  {
    // Make sure no jobs are running
    ticker_.remove(job_);

    if (!send_web_request(info))
      return;

    // reinsert the job
    job_ = ticker_.insert(
      [=] {
        update_presence(info);
    });
  }
private:
  ticker_service& ticker_;
  ticker_service::handle job_;
};

The announcer service runs

ticker_service ticker;
announcer_service announcer(ticker);

announcer.update_presence(
  "hello world! this is a longer text.");
ticker.tick();

A subtle change

You might be wondering where the bug is. To the best of my knowledge, there is none. And the real code corresponding to this worked like a charm for years. And I did not make any significant changes to it lately either. Or so I thought.
If I open that code in CLion, Clang-Tidy is telling me that the parameter “info” to update_presence is only used as a reference, and I should consider turning it into one. Well, Clang-Tidy, that’s bad advice. Because that’s pretty much the change I made:

void update_presence(std::string const& info) // <--

And this makes it go boom on the second call to update_presence(), the one from tick(). Whew. But why?

What is happening?

It turns out, even though we are capturing everything by value, the lambda is still at fault here. Or rather, using values that are captured by the lambda after the lambda has been destroyed. And in this case, the lambda actually destroys itself in the call to ticker_service::remove(). In the first call to update_presence(), the job_ handle is still nullptr, turning remove() into a no-op. On the second call however, remove() overwrites the std::function that is currently on the stack, calling into update_presence, with a default-constructed value. This effectively deletes the lambda that was put there by the last iteration of update_presence, thereby also destroying the captured info string. Now if info was copied into update_presence, this is not a problem, but if you’re still referencing the value stored in the lambda, this is a typical use-after-free. Ooops. I guess C++ can be tricky sometimes, even if you are using automatic memory management.

How to avoid this

This bug is not unlike changing your container when changing it while iterating over it. Java people know this error from the ConcurrentModificationException. Yes, this is possible, if you are really really careful. But in general, you better solve this bug by defering your container modification to a later point after you’re done iterating. Likewise, in this example, the std::function that is currently executing is being modified while it is executing.
A good solution is to defer the deletion until after the execution. So I argue the bug is actually in the ticker_service, which is not as safe as it can be. It should make sure that the lambda survives the complete duration of the call. An easy, albeit somewhat inefficient, approach would be copying the std::function before calling it. Luckily, in the real code, the functions are all just executed once, so I could std::move them to a local variable before executing.