threading – Schneide Blog

Python desktop applications with asynchronous features (part 1)

The Python world has this peculiar characteristic that for nearly all ideas that exist out there in the current programming world, there is a prepared solution that appears well-rounded at first and when you then try to “just plug it together”, after a certain while you encounter some most specific cases that make all the “just” go up in smoke and leave you with some hours of research.

That is also, why now, for quite some of these “solutions” there seem to be various similar-but-distinct packages that you better care to understand; which is again, hard, because every Python developer always likes to advertise their package with very easy sounding words.

But that is the fun of Python. If I choose it as suitable for a project, this is the contract I sign 🙂

So I recently ran into a surprisingly non-straightforward case with no simple go-to-solution. Maybe you know one and then I’ll be thankful to try that and discuss it through; sometimes one is just blind from all the options. (It’s also what you sign when choosing Python. I cope.)

Now, I thought a lot about my use case and I will split these findings up into multiple blog posts – thinking asynchronously (“async”) is tricky in many languages, but the Python way, again, is to hide its intricacies in very carefully selected places.

A desktop app with TPC socket HANDLING

As long as your program is a linear script, you do not need async features. But if there is some thing to be done for a longer time (or endless), you cannot afford to wait for it or otherwise intervene.

E.g for our desktop application (employing Python’s very basic tkinter) we sooner or later run into the tk.mainloop() , which, from the OS thread it was called from, is a endless loop that draws the interface, handles input events, updates the interface, repeat. This is blocking, i.e. that thread can now only do other stuff from the event handlers acting on that interface.

You might know: Any desktop UI framework really hates if you try to update its interface from “outside the UI thread”. Just think of any order of unpredictable behaviour if you would e.g. draw a button, then handle its click event and while you’re at it, a background thread removes that button – etc.

The good thing is, you will quickly be told not to do such a thing, the bad thing is, you might end up with an unexpected or difficult to parse error and some question marks about what else to do.

The UI-thread problem is a specific case of “doing stuff in parallel requires you to really manage who can actually access a given resource”. Just google race condition. If you think about it, this holds for managing projects / humans in general, but also for our desktop app that allows simple network connections via TCP socket.

Now the first clarifications have to be done. Python gives you

on any tkinter widget, you have “.after()” to call any function some time in the future, i.e. you enqueue that function call to be executed after a time has passed; so this is nice for making stuff happen in the UI thread.
But even small stuff like writing some characters to a file might delay the interface reaction time and people nowadays have no time for that (maybe I’m just particularly impatient.)
Python’s standard library gives us packages like threading, asyncio and multiprocessing package, now for long enough to consider them mature.
There are also more advanced solutions, like looking into PyQt, or the mindset “everything is a Web App nowadays”, and they might equip you with asynchronous handling from the start, but remember – The Schneiderei in Softwareschneiderei means that we prefer tailor-made software over having to integrating bloated dependencies that we neither know nor want to maintain all year long.

We conclude with shining our light to the general choice in this project, and a few reasons why.

What Python calls “threads” are not the same thing as what operating systems understands as a thread (which also differs) among them. See also the Python Global Interpreter Lock. We do not want to dive into all of that.
multiprocessing is the only way to do anything outside the current sytem process, which is what you need for running CPU-heavy tasks in parallel. It’s out our use case, and while it is more “true parallel” it comes with slower startup, more expensive communication costs and also some constraints for such exchange (e.g. to serializable data).
We are IO-bound, i.e. we have no idea when the next network package arrives, so we want to wait in separate event loops that would be blocking on their own (i.e. “while True”, similar to what tkinter does with our main thread.).
Because we have the main thread stolen by tkinter anyway, we would use a threading.Thread anyway to allow concurrency, but then we face the choice to construct everything with Threads or to employ the newer asyncio features.

The basic block to do that would use a threading.Thread:

class ListenerThread(threading.Thread):

    # initialize somehow

    def run(self):
        while True:
            wait_for_input()
            process_input()
            communicate_with_ui_thread()


# usage like
#   listener = ListenerThread(...) 
#   listener.start(...)

Now to think naively, the pseudo-code communicate_with_ui_thread() could then either lead to the tkinter .after() calls, or employ callback functions passed either in the initialization or the .start() call of the thread. But sadly, it was not as easy as that, because you run several risks of

just masking your intention and still executing your callbacks blockingly (that thread can freeze the UI)
still pass UI widget references to your background loop (throwing bad not-the-main-thread-error exceptions)
have memory leaks in the background gobble up more than you ever wanted
Lock starvation: bad error telling you RuntimeError: can't allocate lock
deadlocks, by interdependent callbacks

This list is surely not exhaustive.

So what are the options? Let me discuss these in an upcoming post. For now, let’s just linger all these complications in your brain for a while.

C++ Coroutines on Windows with the Fiber API

Last week, I had the chance to try out coroutines as a way to cooperatively interleave long tasks with event-processing. Unlike threads, where you can have interaction between between threads at any time, coroutines need to yield control explicitly, whic arguably makes synchronisation a little simpler. Especially in (legacy) systems that are not designed for concurrency. Of course, since coroutines do not run at the same time, you do not get the perks from concurrency either.

If you don’t know coroutines, think of them as functions that can be paused and resumed.

Unlike many other languages, C++ does not have built-in support for coroutines just yet. There are, however, several alternatives. On Windows, you can use the Fiber API to implement coroutines easily.

Here’s some example code of how that works:

auto coroutine=make_shared<FiberCoroutine>();
coroutine->setup([](Coroutine::Yield yield)
{
  for (int i=0; i<3; ++i)
  {
    cout << "Coroutine " 
         << i << std::endl;
    yield();
  }
});

int stepCount = 0;
while (coroutine->step())
{
  cout << "Main " 
       << stepCount++ << std::endl;
}

Somewhat surprisingly, at least if you have never seen coroutines, this will output the two outputs alternatingly:

Coroutine 0
Main 0
Coroutine 1
Main 1
Coroutine 2
Main 2

Interface

Since fibers are not the only way to implement coroutines and since we want to keep our client code nicely insulated from the windows API, there’s a pure-virtual base class as an interface:

class Coroutine
{
public:
  using Yield = std::function<void()>;
  using Run = std::function<void(Yield)>;

  virtual ~Coroutine() = default;
  virtual void setup(Run f) = 0;
  virtual bool step() = 0;
};

Typically, creation of a Coroutine type allocates all the resources it needs, while setup “primes” it with an inner “Run” function that can use an implementation-specific “Yield” function to pass control back to the caller, which is whoever calls step.

Implementation

The implementation using fibers is fairly straight-forward:

class FiberCoroutine
  : public Coroutine
{
public:
  FiberCoroutine()
  : mCurrent(nullptr), mRunning(false)
  {
  }

  ~FiberCoroutine()
  {
    if (mCurrent)
      DeleteFiber(mCurrent);
  }

  void setup(Run f) override
  {
    if (!mMain)
    {
      mMain = ConvertThreadToFiber(NULL);
    }
    mRunning = true;
    mFunction = std::move(f);

    if (!mCurrent)
    {
      mCurrent = CreateFiber(0,
        &FiberCoroutine::proc, this);
    }
  }

  bool step() override
  {
    SwitchToFiber(mCurrent);
    return mRunning;
  }

  void yield()
  {
    SwitchToFiber(mMain);
  }

private:
  void run()
  {
    while (true)
    {
      mFunction([this]
                { yield(); });
      mRunning = false;
      yield();
    }
  }

  static VOID WINAPI proc(LPVOID data)
  {
    reinterpret_cast<FiberCoroutine*>(data)->run();
  }

  static LPVOID mMain;
  LPVOID mCurrent;
  bool mRunning;
  Run mFunction;
};

LPVOID FiberCoroutine::mMain = nullptr;

The idea here is that the caller and the callee are both fibers: lightweight threads without concurrency. Running the coroutine switches to the callee’s, the run function’s, fiber. Yielding switches back to the caller. Note that it is currently assumed that all callers are from the same thread, since each thread that participates in the switching needs to be converted to a fiber initially, even the caller. The current version only keeps the fiber for the initial thread in a single static variable. However, it should be possible to support this by replacing the single static fiber pointer with a map that maps each thread to its associated fiber.

Note that you cannot return from the fiberproc – that will just terminate the whole thread! Instead, just yield back to the caller and either re-use or destroy the fiber.

Assessment

Fiber-based coroutines are a nice and efficient way to model non-linear control-flow explicitly, but they do not come without downsides. For example, while this example worked flawlessly when compiled with visual studio, Cygwin just terminates without even an error. If you’re used to working with the visual studio debugger, it may surprise you that the caller gets hidden completely while you’re in the run function. The run functions stack completely replaces the callers stack until you call yield(). This means that you cannot find out who called step(). On the other hand, if you’re actually doing a lot of processing in the run function, this is quite nice for profiling, as the “processing” call tree seemingly has its own root in the call-tree.

I just wish the visual studio debugger had a way to view the states of the different fibers like it has for threads.

Alternatives

On Linux, you can use the ucontext.

Visual Studio 2015 also has another, newer, implementation.

Coroutines can be implemented using threads and condition-variables.

There’s also Boost.Coroutine, if you need an independent implementation of the concept. From what I gather, they only use Fibers optionally, and otherwise do the required “trickery” themselves. Maybe this even keeps the caller-stack visible – it is certainly worth exploring.

When it comes to multithreading, better be safe than sorry

Writing multithreaded applications in Java is hard. Here are five problems and how to avoid them without much effort (mostly).

Recently, I attended a code review of the core parts of a web application, written in Java. The application is used by a large customer base and occassionally, there are error reports and exceptions in the log files. Some of these exceptions are the dreaded ConcurrentModificationExceptions, indicating conflicting read/write access on an unsynchronized collection data structure. In the code review, we found several threading flaws, but not after an exhaustive reading of the whole module. Here, I want to present the flaws and give some advice on how to avoid them:

The public lock

In some parts of the code, methods were defined as synchronized through the method declaration keyword:

public synchronized String getLastReservation() { [...]

While there is nothing wrong with this approach in itself, it can be highly dangerous in combination with synchronized blocks. The code above effectively wraps a synchronized block using the object instance (this) as a lock. No information of an object is more publicly visible as the object reference (this), so you have to check all direct or indirect clients of this object if they synchronize on this instance, too. If they do, you have chained two code blocks together, probably without proper mentioning of this fact. The least harmful defect will be performance losses because your code isn’t locked as fine grained as it could be.

The easiest way to avoid these situations it to always hide the locks. Try not to share one object’s locks with other objects. If you choose publicly accessible locks, you can never be sure about that.

The subtle lock change

In one class, there were both instance and class (static) methods, using the synchronized keyword:

public synchronized String getOrderNumberOf(String customerID) { [...]
public  synchronized static int getTotalPendingOrders() { [...]

And while they were both accessing the same collection data structure (a static hashmap), they were using different locks. The lock of the instance method is the instance itself, while the lock of the static method is the class object of the type. This is very dangerous, as it can be easily missed when writing or altering the code.

The best way to prevent this problem it to avoid the synchronized modifier for methods completely. State your locks explicitely, all the time.

Partial locking

In a few classes, collection datatypes like lists were indeed synchronized by internal synchronized-blocks in the methods, using the private collection instance as lock. The synchronized blocks were applied to the altering methods like putX(), removeX() and getX(). But the toString() method, building a comma-separated list of the textual list entries, wasn’t synchronized to the list. The method contained the following code:

public String toString() {
    StringBuilder result = new StringBuilder();
    for (String entry : this.list) {
        result.append(entry);
        result.append(",");
    }
    [...]
    return result.toString();
}

I’ve left out some details and special cases, as they aren’t revelant here. The problem with the foreach loop is that an anonymous Iterator over the list is used and it will relentlessly monitor the list for any changes and throw a ConcurrentModificationException as soon as one of the properly synchronized sections changes it. The toString() method was used to store the list to a session dependent data storage. Every once in a while, the foreach loop threw an exception and failed to properly persist the list data, resulting in data loss.

The most straight-forward solution to this problem might be to add the missing synchronization block in the toString() method. If you don’t want to block the user session while writing to disk, you might traverse the list without an Iterator (and be careful with your assumptions about valid indices) or work on a copy of the list, given that an in-memory copy of the list would be cheap. In an ACID system scenario, you should probably choose to complete your synchronized block guards.

Locking loophole

Another problem was a collection that was synchronized internally, but could be accessed through a getter method. No client could safely modify or traverse the collection, because they had the collection, but not the lock object (that happened to be the collection, too, but who can really be sure about that in the future?). It would be ridiculous to also provide a getter for the lock object (always hide your locks, remember?), the better solution is to refactor the client code to a “tell, don’t ask” style.

To prevent a scenario when a client can access a data structure but not its lock, you shouldn’t be able to gain access to the data structure, but pass “command objects” to the data structure. This is a perfect use case for closures. Effectively, you’ll end up with something like Function or Operation instances that are applied to every element of the collection within a synchronized block and perform your functionality on them. Have a look at op4j for inspirational syntax.

Local locking

This was the worst of all problems and the final reason for this blog entry: In some methods, the lock objects were local variables. In summary, these methods looked like this:

public String getData() {
    Object lock = new Object();
    synchronized (lock) {
        [...]
    }
}

Of course, it wasn’t that obvious. The lock objects were propagated to other methods, stored in datastructures, removed from them, etc. But in the end, each caller of the method got his own lock and could henceforth wreck havoc in code that appeared very well synchronized on first look. The error in its clarity is too stupid to be widespread. The problem was the obfuscation around it. It took us some time to really understand what is going on and where all that lock objects really come from.

My final advice is: If you have to deal with multithreading, don’t outsmart yourself and the next fellow programmer by building complex code structures or implicit relationships. Be as concise and explicit as you can be. Less clutter is more when dealing with threads. The core problem is the all-or-none law of thread synchronization: Either you’ve got it all right or you’ve got it all wrong – you just don’t know yet.

Hide your locks, name your locks explicitely, reduce the scope of necessary locking so that you can survey it easily, never hand out your locked data, and, most important, remove all clutter around your locking structures. This might make the difference between “just works” and endless ominous bug reports.

	mariuselvert on Calculating the Number of Segm…
	Anonymous on Calculating the Number of Segm…
	Anonymous on Avoiding Code Style Discu…
	Anonymous on What Happens When We Don’t Lis…
	Writing Integration… on Every Unit Test Is a Stage Pla…