C++ Coroutines on Windows with the Fiber API

Last week, I had the chance to try out coroutines as a way to cooperatively interleave long tasks with event-processing. Unlike threads, where you can have interaction between between threads at any time, coroutines need to yield control explicitly, whic arguably makes synchronisation a little simpler. Especially in (legacy) systems that are not designed for concurrency. Of course, since coroutines do not run at the same time, you do not get the perks from concurrency either.

If you don’t know coroutines, think of them as functions that can be paused and resumed.

Unlike many other languages, C++ does not have built-in support for coroutines just yet. There are, however, several alternatives. On Windows, you can use the Fiber API to implement coroutines easily.

Here’s some example code of how that works:

auto coroutine=make_shared<FiberCoroutine>();
coroutine->setup([](Coroutine::Yield yield)
{
  for (int i=0; i<3; ++i)
  {
    cout << "Coroutine " 
         << i << std::endl;
    yield();
  }
});

int stepCount = 0;
while (coroutine->step())
{
  cout << "Main " 
       << stepCount++ << std::endl;
}

Somewhat surprisingly, at least if you have never seen coroutines, this will output the two outputs alternatingly:

Coroutine 0
Main 0
Coroutine 1
Main 1
Coroutine 2
Main 2

Interface

Since fibers are not the only way to implement coroutines and since we want to keep our client code nicely insulated from the windows API, there’s a pure-virtual base class as an interface:

class Coroutine
{
public:
  using Yield = std::function<void()>;
  using Run = std::function<void(Yield)>;

  virtual ~Coroutine() = default;
  virtual void setup(Run f) = 0;
  virtual bool step() = 0;
};

Typically, creation of a Coroutine type allocates all the resources it needs, while setup “primes” it with an inner “Run” function that can use an implementation-specific “Yield” function to pass control back to the caller, which is whoever calls step.

Implementation

The implementation using fibers is fairly straight-forward:

class FiberCoroutine
  : public Coroutine
{
public:
  FiberCoroutine()
  : mCurrent(nullptr), mRunning(false)
  {
  }

  ~FiberCoroutine()
  {
    if (mCurrent)
      DeleteFiber(mCurrent);
  }

  void setup(Run f) override
  {
    if (!mMain)
    {
      mMain = ConvertThreadToFiber(NULL);
    }
    mRunning = true;
    mFunction = std::move(f);

    if (!mCurrent)
    {
      mCurrent = CreateFiber(0,
        &FiberCoroutine::proc, this);
    }
  }

  bool step() override
  {
    SwitchToFiber(mCurrent);
    return mRunning;
  }

  void yield()
  {
    SwitchToFiber(mMain);
  }

private:
  void run()
  {
    while (true)
    {
      mFunction([this]
                { yield(); });
      mRunning = false;
      yield();
    }
  }

  static VOID WINAPI proc(LPVOID data)
  {
    reinterpret_cast<FiberCoroutine*>(data)->run();
  }

  static LPVOID mMain;
  LPVOID mCurrent;
  bool mRunning;
  Run mFunction;
};

LPVOID FiberCoroutine::mMain = nullptr;

The idea here is that the caller and the callee are both fibers: lightweight threads without concurrency. Running the coroutine switches to the callee’s, the run function’s, fiber. Yielding switches back to the caller. Note that it is currently assumed that all callers are from the same thread, since each thread that participates in the switching needs to be converted to a fiber initially, even the caller. The current version only keeps the fiber for the initial thread in a single static variable. However, it should be possible to support this by replacing the single static fiber pointer with a map that maps each thread to its associated fiber.

Note that you cannot return from the fiberproc – that will just terminate the whole thread! Instead, just yield back to the caller and either re-use or destroy the fiber.

Assessment

Fiber-based coroutines are a nice and efficient way to model non-linear control-flow explicitly, but they do not come without downsides. For example, while this example worked flawlessly when compiled with visual studio, Cygwin just terminates without even an error. If you’re used to working with the visual studio debugger, it may surprise you that the caller gets hidden completely while you’re in the run function. The run functions stack completely replaces the callers stack until you call yield(). This means that you cannot find out who called step(). On the other hand, if you’re actually doing a lot of processing in the run function, this is quite nice for profiling, as the “processing” call tree seemingly has its own root in the call-tree.

I just wish the visual studio debugger had a way to view the states of the different fibers like it has for threads.

Alternatives

  • On Linux, you can use the ucontext.
  • Visual Studio 2015 also has another, newer, implementation.
  • Coroutines can be implemented using threads and condition-variables.
  • There’s also Boost.Coroutine, if you need an independent implementation of the concept. From what I gather, they only use Fibers optionally, and otherwise do the required “trickery” themselves. Maybe this even keeps the caller-stack visible – it is certainly worth exploring.
  • Leave a comment

    This site uses Akismet to reduce spam. Learn how your comment data is processed.