When laziness broke my code

I was just integrating a new task-graph system for a C# machine control system when my tests started to go red. Note that the tasks I refer to are not the same as the C# Task implementation, but the broader concept. Task-graphs are well known to be DAGs, because otherwise the tasks cannot be finished. The general algorithm to execute a task-graph like this is called topological sorting, and it goes like this:

  1. Find the number of dependencies (incoming edges) for each task
  2. Find the tasks that have zero dependencies and start them
  3. For any finished tasks, decrement the follow-up tasks dependency count by one and start them if they reach zero.

The graph that was failed looked like the one below. Task A was immediately followed by a task B that was followed by a few more tasks.

I quickly figured out that the reason that the tests were failing was that node B was executed twice. Looking at the call-stack for both executions, I could see that the first time B was executed was when A was completed. This is correct as per step 3 in the algorithm. However, the second time it was started was directly from the initial Run method that does the work from step 2: Starting the initial tasks that are not being started recursively. I was definitely not calling Run twice, so how did that happen?

public void Run()
{
    var ready = tasks
        .Where(x => x.DependencyCount == 0);

    StartGroup(ready);
}

Can you see it? It is important to note that many of the tasks in this graph are asynchronous. Their completion is triggered by an IObserver, a C# Task completing or some other event. When the event is processed, StartGroup is used to start all tasks that have no more dependencies. However, A was no such task, it was synchronous, so the StartGroup({B}) call happened while Run was still on the stack.

Now what happened was that when A (instantly!) completed, it set the DependencyCount of B to 0. Since ready in the code snippet is lazily evaluated from within StartGroup, the ‘contents’ actually change while StartGroup is running.

The fix was adding a .ToList after the .Where, a unit test that checked that this specifically would not happen again, and a mental note that lazy evaluation can be deceiving.

Composition of C# iterator methods

Iterator methods in C# or one of my favorite features of that language. I do not use it all that often, but it is nice to know it is there. If you are not sure what they are, here’s a little example:

public IEnumerable<int> Iota(int from, int count)
{
  for (int offset = 0; offset < count; ++offset)
    yield return from + offset;
}

They allow you to lazily generate any sequence directly in code. For example, I like to use them when generating a list of errors on a complex input (think compiler errors). The presence of the yield contextual keyword transforms the function body into a state machine, allowing you to pause it until you need the next value.

However, this makes it a little more difficult to compose such iterator methods, and in reverse, refactor a complex iterator method into several smaller ones. It was not obvious to me right away how to do it at all in a ‘this always works’ manner, so I am sharing how I do it here. Consider this slightly more complex iterator method:

public IEnumerable<int> IotaAndBack(int from, int count)
{
  for (int offset = 0; offset < count; ++offset)
    yield return from + offset;

  for (int offset = 0; offset < count; ++offset)
    yield return from + count - offset - 1;
}

Now we want to extract both loops into their own functions. My one-size-fits-all solution is this:

public IEnumerable<int> AndBack(int from, int count)
{
  for (int offset = 0; offset < count; ++offset)
    yield return from + count - offset - 1;
}

public IEnumerable<int> IotaAndBack(int from, int count)
{
  foreach (var x in Iota(from, count))
     yield return x;

  foreach (var x in AndBack(from, count))
     yield return x;
}

As you can see, a little ‘foreach harness’ is needed to compose the parts into the outer function. Of course, in a simple case like this, the LINQ version Iota(from, count).Concat(AndBack(from, count)) also works. But that only works when the outer function is sufficiently simple.

Using a C++ service from C# with delegates and PInvoke

Imagine you want to use a C++ service from contained in a .dll file from a C# host application. I was using a C++ service performing some hardware orchestration from a C# WPF application for the UI. This service pushes back events to the UI in undetermined intervals. Let’s write a small C++ service like that real quick:

#include <thread>
#include <string>

using StringAction = void(__stdcall*)(char const*);

void Report(StringAction onMessage)
{
  for (int i = 0; i < 10; ++i)
  {
    onMessage(std::to_string(i).c_str());
    std::this_thread::sleep_for(std::chrono::seconds(1));
  }
}

static std::thread thread;

extern "C"
{
  __declspec(dllexport) void __stdcall Start(StringAction onMessage)
  {
    thread = std::thread([onMessage] {Report(onMessage);});
  }

  __declspec(dllexport) void __stdcall Join()
  {
    thread.join();
  }
}

Compile & link this as a .dll that we’ll call Library.dll for now. Catchy, no?

Now we write a small helper class in C# to access our nice service:

class LibraryLoader
{
  public delegate void StringAction(string message);

  [DllImport("Library.dll", CallingConvention = CallingConvention.StdCall)]
  private static extern void Start(StringAction onMessage);

  [DllImport("Library.dll", CallingConvention = CallingConvention.StdCall)]
  public static extern void Join();

  public static void StartWithAction(Action<string> action)
  {
    Start(x => action(x));
  }
}

Now we can use our service from C#:

LibraryLoader.StartWithAction(x => Console.WriteLine(x));
// Do other things while we wait for the service to do its thing...
LibraryLoader.Join();

If this does not work for you, make sure the C# application can find the C++ Library.dll, as VS does not help you with this. The easiest way to do this, is to copy the dll into the same folder as the C# application files. When you’re starting from VS 2019, that is likely something like bin\Debug\net5.0. You could also adapt the PATH environment variable to include the target directory of your Library.dll.

If you’re getting a BadImageFormatException, make sure the C# application is compiled for the same Platform target as the C++ application. By default, VS builds C++ for “x86”, while it builds C# projects for “Any CPU”. You can change this to x86 in the project settings under Build/Platform target.

Now if this is all you’re doing, the application will probably work fine and report its mysterious number sequence flawlessly. But if you do other things, e.g. something that triggers garbage collection, like this:

LibraryLoader.StartWithAction(x => Console.WriteLine(x));
Thread.Sleep(2000);
GC.Collect();
LibraryLoader.Join();

The application will crash with a very ominous ExecutionEngineException after 2 seconds. In a more realistic environment, e.g. my WPF application, this happened seemingly at random.

Now why is this? The Action<string> we registered to print to the console gets garbage collected, because there is nothing in the managed environment keeping it alive. It exists only as a dependency to the function pointer in C++ land. When C++ wants to message something, it calls into nirvana. Not good. So let’s just store it, to keep it alive:

static StringAction messageDelegate;
public static void StartWithAction(Action<string> action)
{
  messageDelegate = x => action(x);
  Start(messageDelegate);
}

Now the delegate is kept alive in the static variable, thereby matching the lifetime of the C++ equivalent, and the crash is gone. And there you have it, long-lasting callbacks from C++ to C#.

Who watches the FileSystemWatcher?

One of the ways we sometimes implement communication with legacy systems is via the file-system. The legacy system will write files about some events into a predefined directory, and the other system watches this directory for changes. In C#/.NET, the handy FileSystemWatcher class is a great tool for that.

We had a working solution using that in production for the last two years. Then it suddenly stopped working. There was no apparent change in the related code, so I suspect our upgrade from .NET Core 2.1 to .NET 5.0 triggered the change in behavior. The code looked something like this:

public void BeginWatching()
{
    var filter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
        | NotifyFilters.FileName | NotifyFilters.DirectoryName;

    var watcher = new FileSystemWatcher
    {
        Path = directory,
        NotifyFilter = filter,
        Filter = "*.txt",
        EnableRaisingEvents = true,
    };
    /* Hook up some events here... */
}

And after some debugging, it turned out none of the attached event handlers were firing anymore. The solution was to not let go of the watcher instance, and keep it around in the enclosing class.

this.watcher = new FileSystemWatcher

This makes sense, of course. Before, the watcher was only a local variable, and could be collected by the garbage collector at any moment. That, in turn, disabled the process, causing no more events to be emitted.

Interestingly, a few days later, my student asked about his FileSystemWatcher also no longer working. I immediatly suspected the same problem, but when we looked at the code, he had already moved the watcher into a property of the enclosing class. Turns out, for him the problem was just one level up: the enclosing class was only created as a local variable, and the contained watcher stopped after that went out of scope.

Now the only question is: why did we never observe this before? Either something in the GC changed, or something in the implementation of the watcher changed. Can anyone enlighten the situation?

Some strings are more equal before your Oracle database

When working with customer code based on ADO.net, I was surprised by the following error message:

The german message just tells us that some UpdateCommand had an effect on “0” instead of the expected “1” rows of a DataTable. This happened on writing some changes to a table using an OracleDataAdapter. What really surprised me at this point was that there certainly was no other thread writing to the database during my update attempt. Even more confusing was, that my method of changing DataTables and using the OracleDataAdapter to write changes had worked pretty well so far.

In this case, the title “DBConcurrencyExceptionturned out to be quite misleading. The text message was absolutely correct, though.

The explanation

The UpdateCommand is a prepared statement generated by the OracleDataAdapter. It may be used to write the changes a DataTable keeps track of to a database. To update a row, the UpdateCommand identifies the row with a WHERE-clause that matches all original values of the row and writes the updates to the row. So if we have a table with two rows, a primary id and a number, the update statement would essentially look like this:

UPDATE EXAMPLE_TABLE
  SET ROW_ID =:current_ROW_ID, 
      NUMBER_COLUMN =:current_NUMBER_COLUMN
WHERE
      ROW_ID =:old_ROW_ID 
  AND NUMBER_COLUMN =:old_NUMBER_COLUMN

In my case, the problem turned out to be caused by string-valued columns and was due to some oracle-weirdness that was already discussed on this blog (https://schneide.blog/2010/07/12/an-oracle-story-null-empty-or-what/): On writing, empty strings (more precisely: empty VARCHAR2s) are transformed to a DBNull. Note however, that the following are not equivalent:

WHERE TEXT_COLUMN = ''
WHERE TEXT_COLUMN is null

The first will just never match… (at least with Oracle 11g). So saying that null and empty strings are the same would not be an accurate description.

The WHERE-clause of the generated UpdateCommands look more complicated for (nullable) columns of type VARCHAR2. But instead of trying to understand the generated code, I just guessed that the problem was a bug or inconsistency in the OracleDataAdapter that caused the exception. And in fact, it turned out that the problem occured whenever I tried to write an empty string to a column that was DBNull before. Which would explain the message of the DBConcurrencyException, since the DataTable thinks there is a difference between empty strings and DBNulls but due to the conversion there will be no difference when the corrensponding row is updated. So once understood, the problem was easily fixed by transforming all empty strings to null prior to invoking the UpdateCommand.