When I was interviewing for my job here at the Softwareschneiderei, I was asked a question:
“If you had one wish for what to add to C++, what would that be?”.
I vividly remember not having to give a lot of thought to answer that: modules. And now, it seems modules for C++ are finally materializing. About damn time.
The Past: Hello Preprocessor, my old friend
C++ has a problem with scalability. Traditionally, the only real way to use code from another compile unit, is to use header files and “use” them via preprocessor #include directives. This requires splitting your code into a header and implementation file, which requires duplicating a lot of information. And it does not even work for a lot of code. Templates need to be in the header, and a lot of modern C++ code is template code. This is descreasing uniformity and coherence of the code.
When resolving #include
directives, the preprocessor really only copy-and-paste code from one file into another. Since this is a transitive process, the actual code that gets analyzed by the compiler quickly becomes huge.
Hello World?
Do this little experiment: write a simple C++ “Hello World!” program, and look at the preprocessed output.
#include <iostream> int main() { std::cout << "Hello, World!" << std::endl; return 0; }
I preprocessed this simple version with Visual Studio 2017. The output was about 50500 lines! That’s more than 7200x. Now repeat that while including something from Boost. Still wonder why compilation is so slow?
Pay for what you use?
So if you include a header, you not only get the things you want from it, but also everything else. That means all the other contents of the header and all the headers it includes transitively. Usually, the number of transitively included headers counts over 10000 very quickly. This goes directly against C++’s design mantra: pay only for what you use.
The code that gets included is usually orders of magnitude more than the actual contents of your .cpp
file, even in examples not as contrived as the “Hello, World!” above.
This means a lot of extra code for your toolchain to analyze.
And the work is duplicated for each compile unit.
This is obviously slow.
Leaks everywhere..
But it also means your modules are leaking. For example its dependencies: Some of your users will inadvertently use the code that you use, and if you change your dependency, they will break. How often have you used std::runtime_error
without actually including stdexcept
? Many C++ programmers do not even know which header a particular stdlib feature is located in. Not their fault really – it’s hard enough to memorize the contents alone without their locations in an arbitrary M:N mapping.
But dependencies are not the only things that are leaking. By exposing individual headers, you make clients dependent on the physical structure of your program as well. Moving one type from one header to another? You can not do that, unless you want to break a couple of clients.
Current workarounds
The C++ community has had different approaches on how to deal with the fallout.
- Forward declarations and the PIMPL idiom let you break the transitive dependencies.
But a forward declaration is a very subtle code duplication, and a PIMPL even creates runtime overhead. - Unity builds tackle problem of resolving your include graph multiple times, but at the cost of an obscure extension to your build system and negative impacts for incremental builds.
- Meta-headers tackle to problem of more clearly defined module boundaries, but they make the compile time worse and make it harder to explore modules.
It’s a catch 22.
Tool support
Because macros leak in and out of headers, semantic analysis becomes very hard. In fact, a tool needs to understand the program in its entirety, including all source and build files to properly refactor. After all, each define given on a command line, or even each reordering (!) of #include
files could potentially alter the semantics completely. Every line of code in a header can change its meaning completely depending on its context.
There are also techniques that abuse this feature, i.e. cross-includes, where an include does something based on a previous #define
. Granted, only a small percentage of code is usually directly affected by such subtleties, but there is currently no way to properly isolate from it. That is why refactoring and introspection tools for other languages are so much better.
State of the union
The modules proposal is spearheaded by Gabriel Dos Reis at Microsoft. There’s an in-progress implementation of it since Visual Studio 2015, and they are still regularly updating it, so the most recent one is in VS 2017. If you want to know more, have a look at this video.
One thought on “C++ modules and why we need them desperately”