On the usefulness of volatile memory

Observing ants is a wonderful source of insights. Here are some thoughts about memory and information storage gathered from ants.

I have a strange hobby: Raising ants as pets. Not literally pets, because I don’t give them names and they probably don’t know I even exist, but as tended animals in a controlled and restricted environment, the formicarium. This doesn’t sound too strange if you know three additional things about me: I was fascinated by ants since kindergarten, I play dwarf fortress for fun and I’m interested in computers.

My main colony is a lasius niger nest with about 1500 individuals. Lasius niger or black garden ant is a very common and easy to raise ant that is small enough to not require too much space but big enough to be observed and tracked.

I could tell you hours and hours of ant facts, but let’s concentrate on the topic of this blog post: An ant colony can be perceived as one big organism. Each individual ant has a clear role in the hierarchy:

The queen ant is the sole egg layer. In an established colony, she won’t do anything else. She will be fed, cleaned and protected by her workers. She was born a queen and cannot be replaced. If she dies, the whole colony will slowly fade away because nobody produces new ants anymore. Worker ants don’t know if they still have a queen and don’t care anyway. If a queen dies, she will be fed to the remaining larvas as soon as her scent disappears.
The male ants are born, fed and kicked out of the nest, preferably when the young queens fly out to find a mate. They won’t do anything else and have a lifespan of days.
The worker ants are all sterile females and do all the work. Literally all the work. They are born workers and spend every moment of their lives contributing to the hive or just sitting around scrounging food. Other workers are too busy to judge them.

But keep in mind that ants only have short-range communication means (scent, antennae drumming and ground vibrations), so no single ant has complete overview about the situation. It wouldn’t have the brains to process that information anyway. Ants have limited capability to remember things but no long-time memory. It isn’t necessary for the single worker ant to store any information. The information is stored in the hive – literally.

If you abstract some details away, you can also perceive and describe an ant colony as a computer or at least as a massive parallel problem solver. The algorithms are ingrained in the worker ants and are executed ruthlessly. The problems are food, water, enemies, cleaning and nurture. If the environment (the problem space) is suitable for the algorithms, the colony will thrive. Else the colony will ultimately fail. An ant colony at the side of a busy road will lose many workers in sudden “enemy” attacks, while a colony in the middle of a meadow might find less insects killed by cars. Not a single ant in each colony is aware of these relations. But they have found a way to share information: They mark their position (and therefore their way) by special scent fluid. They use their scents to label the environment for other ants.

If you ever encountered an avid user of printed sticker labels, this is what ants do, too: They put a sticker label on everything they come in contact with. If you seem to be food, they label you as food and rush back to the hive, laying out a “food in this direction” lane. If you seem dead and inedible, they mark you as garbage and them or some other ant will pick you up and follow the garbage line (finding the garbage line often requires them to return near their hive so it seems they mistook the garbage for food). The garbage area is marked with its own scent, preferably at a cliff. My ants love to throw things down a cliff! If you didn’t get the relationship to dwarf fortress yet, it should be clear by now. There are countless more examples, but you get the idea. The environment is not only an area, it is the long-term memory of the colony. The colony’s “memory brain” is scent on dirt, stones and sticks. The ant colony has outsourced its complete knowledge about the surrounding into the surrounding itself.

The ants have invented something I would call an “inverse cloud”. The cloud in IT is a concept of data storage that exists mostly independent of physical location and provides access to that data from virtually anywhere. The ants’ inverse cloud is a concept of data storage that is tightly coupled to a physical location and provides access to that data only if you are in the immediate vincinity. If you remove a physical data storage part in the IT cloud, it gets replaced by other parts that contain mirrored copies of the data. The cloud never forgets. If you remove a physical data storage part in the ants’ inverse cloud, it is forgotten immediately. Ants accept their environment as it is right now and never look back.

Now think about what happens when it rains and all the scents are washed away.

After every rain, the ant colony enters a “new level”, a fresh environment to be discovered and labeled. They probably never grow old to rediscover the same cliff again and again. This is what happens to a computer when the power is lost. It loses its working memory. But keep in mind that the colony retains some memories: the hive is underground and often rain-proof by overlaying stones or plants. So the colony remembers that it is currently in the hive, because it always smells like hive. The colony remembers the queen chamber because it always smells like queen. The colony remembers the brood chambers because they always smell like teen spirit (SCNR).

In my formicarium, it never rains. The ants get enough water by drinking troughs, but the marked lanes are never erased. This leads to all sorts of silly situations that the ants don’t even recognize as such. For example, a strong “food here” scent lane exists long after the food is gone. So a lot of enthusiastic workers run around searching in the target area. And remember that the hive smell never fades? My ants have assimilated area after area as “hive” after enough ants have marked it. So now they react excessively to disturbances because they think they are defending the hive – and therefore the queen! – but are ant miles away from the colony. Even better, a lot of young ants that usually never leave the hive until they are older wander out into the open (“it’s still the hive, just less dark”) and panic as soon as they encounter something that shouldn’t be in a hive. The panic spreads by, you’ve guessed it already, alarm scent and soon hundreds of battle-ready ants are running around frantically without any one of them knowing why.

So, to speak in computer terms, the main memory never gets erased and the caches never flushed. A little error can spread like a wildfire (like the panic example above) and cause disadvantages like energy consumption without any real gain (no enemy to defend against) or a lot of delicate ants sitting around in plain sight of their predators. The whole system is fragile and erratic and probably wouldn’t survive in the wild. A good measure of rain would remove the odd memories and probably ease the ants because their hive, the area that needs to be defended at all costs, would shrink again.

I’ve raised neurotic ants in need of a cold shower.

How does this give us any insight into modern computing? Well, my train of thoughts is this: If modern systems are unable to forget, because memory is cheap and permanent, we might be prone to design software that acts neurotic and hyped up. The ability to forget, to really don’t remember at all, might be crucial in designing resilient parallel systems. There is the cost of losing valueable information, but the benefit of losing all results of errors seems to match it. So volatile memory might be a nuisance for us programmers, but it also provides a “blank slate” every time the system starts and is the reason for the most important question in IT: “Have you tried turning it off and on again?”

Systems that rely on “place oriented programming” seem to have the need of regular reset phases where the working memory is cleared and the system goes into the next cycle fresh and rested. We might even call it sleep. And in case you wonder: The sleep of ants is an ongoing topic for research.

Disclaimer: I know that not all ants are as dumb as lasius niger. Some ants even teach each other facts about their environment. The wikipedia article mentions some wonderful examples. I had ant colonies with more complex ants and they were wonderful. But right now, as I’m typing this, there is a lasius niger worker that heaves a wasp husk part to the top corner of the formicarium, throws it down (did I mention they love throwing things?) and runs back down to heave it up again, probably because the corner is somehow marked as a garbage dump zone. It has repeated this process at least half a dozen times now. This is how a biological infinite loop looks like. Some ants even parallelize such a loop to exhaustion:

The definition of done

From large to small, from projects to issues, a team needs to define when they are considered done.

From large to small, from projects to issues, a team needs to define when they are considered done.
This decision differs from team to team, some have steps to done, others just one state. Even the words used in your issue tracker reflect your choices: what does ‘fixed’ mean, what is ‘closed’ used for…
Even some practices like test driven development define a state of done: the code is done if all tests are green and it is refactored.

What’s your definition of done?

Let’s take a look at some examples:

tests are green and code is refactored
QA says ok
customer/stakeholder/product owner accepts the issue
developer thinks the code reflects the description in the issue
a predefined spec, maybe even with an acceptance test, is fulfilled
no bugs were found while clicking through
the code is merged with the master branch
the continuous integration tool has found no errors

The problem with this ‘definition of done’s is that either they look for an external person to accept by their opinion/guideline or concentrate on some output. But the people needing the software do not want the software in its own regard. They want to reach a goal through the software. The software is a mean to an end: their goals. Without defining the goals and needs beforehand you are either doomed to guess them and are at the mercy of arbitariness (from your point of view) or concentrate on some measurable output like code, tests or a completed feature.

Defining what the user wants to do with this new feature or project should be the first thing in a project right after the initial introductions. Who will use the app or the feature? (the intended audience, the users) What do they expect from it? (the benefits) What goal do they want to reach?
With this questions and answers you have a target. After completing the issues or project you can see if the target has been reached, if the goals are met. It might be the same with an acceptance process from a stakeholder but here you know the target beforehand not after.

Kotlin and null-safety

This week I installed the Android Studio 3.0 preview in preparation for the development of an Android tablet app. Android Studio is based on JetBrains’ IntelliJ IDE. Google recently announced that Android Studio will support Kotlin as an official programming language for Android starting with version 3.0. The language has been designed and developed by JetBrains since 2010.

Kotlin is a language for the Java ecosystem like Scala, Groovy or Clojure that targets both the JVM and Google’s Dalvik VM, which is used for Android. It’s a statically typed language and it has a similar feature set as Scala and C#. Compared to Java it adds things like operator overloading, short syntax for properties, type inference, extension functions, string templates and it supported lambda expressions since before Java 8. But it also fixes some of Java’s inconsistencies. For example, it provides a unified type system with the Any type at the top of the type hierarchy and without special raw types. Arrays in Kotlin are invariant, and it uses declaration-site variance instead of use-site variance (see my other blog post for an explanation of these terms: Declaration-site and use-site variance explained). However, in my opinion the most interesting of Kotlin’s features is null-safety.

Null-safety

In Kotlin all types are non-nullable by default. You can’t assign null to a variable declared as

var s: String = "hi";

If you really want to be able to assign null to a variable you have to declare it with a question mark after the type, for example

var s: String? = null;

However, if you want to access a member of nullable reference or call a method on it, you have to perform a null check before doing so. This is enforced by the compiler.

if (s != null) {
    return s.toUpperCase();
}

The compiler keeps track of the null checks before accessing a member of a nullable reference. Without the check the code wouldn’t compile. Kotlin offers some additional operators to simplify these null checks, like the safe navigation operator ?. (also known from Groovy and C# 6.0) or the “Elvis” operator ?:

person?.address?.country?.name
s?.toUpperCase() ?: ""

With Kotlin’s null-safety feature NullPointerExceptions are a thing of the past.

Kotlin has a lot more to offer and we haven’t decided yet if we will use it for our new Android app project, but it’s definitely an option to consider.

Do most language make false promises?

Some years ago I stumbled over this interesting article about C being the most effective of programming language and one making the least false promises. Essentially Damien Katz argues that the simplicity of C and its flaws lead to simple, fast and easy to reason about code.

C is the total package. It is the only language that’s highly productive, extremely fast, has great tooling everywhere, a large community, a highly professional culture, and is truly honest about its tradeoffs.

-Damien Katz about the C Programming language

I am Java developer most of the time but I also have reasonable experience in C, C++, C#, Groovy and Python and some other languages to a lesser extent. Damien’s article really made me think for quite some time about the languages I have been using. I think he is right in many aspects and has really good points about the tools and communities around the languages.

After quite some thought I do not completely agree with him.

My take on C

At a time I really liked the simplicity of C. I wrote gtk2hack in my spare time as an exercise and definitely see interoperability and a quick “build, run, debug”-cycle as big wins for C. On the other hand I think while it has a place in hardware and systems programming many other applications have completely different requirements.

A standardized ABI means nothing to me if I am writing a service with a REST/JSON interface or a standalone GUI application.
Portability means nothing to me if the target system(s) are well defined and/or covered by the runtime of choice.
Startup times mean nothing to me if the system is only started once every few months and development is still fast because of hot-code replacement or other means.
etc.

But I am really missing more powerful abstractions and better error handling or ressource management features. Data structures and memory management are a lot more painful than in other languages. And this is not (only) about garbage collection!

Especially C++ is making big steps in the right direction in the last few years. Each new standard release provides additional features making code more readable and less error prone. With zero cost abstractions at the core of language evolution and the secondary aim of ease of use I really like what will come to C++ in the future. And it has a very professional community, too.

Aims for the C++11 effort:

Make C++ a better language for systems programming and library building

Make C++ easier to teach and learn

-Bjarne Stroustup, A Tour of C++

What we can learn from C

Instead of looking down at C and pointing at its flaws we should look at its strengths and our own weaknesses/flaws. All languages and environments I have used to date have their own set of annoyances and gotchas.

Java people should try building simple things and having a keen eye on dependencies especially because the eco system is so rich and crowded. Also take care of ressource management – the garbage collector is only half the deal.

Scala and C++ people should take a look at ABI stability and interoperability in general. Their compile times and “build, run, debug”-cycle has much room for improvement to say the least.

C# may look at simplicity instead of wildly adding new features creating a language without opinion. A plethora of ways implementing the same stuff. Either you ban features or you have to know them all to understand code in a larger project.

Conclusion

My personal answer to the title of this blog: Yes, they make false promises. But they have a lot to offer, too.

So do not settle with the status quo of your language environment or code style of choice. Try to maintain an objective perspective and be aware of the weaknesses of the tools you are using. Most platforms improve over time and sometimes you have to re-evaluate your opinion regarding some technology.

I prefer C++ to C for some time now and did not look back yet. But I also constantly try different languages, platforms and frameworks and try to maintain a balanced view. There are often good reasons to choose one over the other for a particular project.

C++ modules example

Two weeks back, I blogged about C++ modules, and why we so desperately need them. Now I have actually played with the implementation in Visual Studio 2017, and I want to share my findings.

The Files

My example consists of four files in two “components”, i.e. one library and one executable. The executable only has one file, main.cpp:

import pets;
import std.core;
import std.memory;

int main()
{
  std::unique_ptr<Pet> pet = std::make_unique<Dog>();
  std::cout << "Pet says: "
    << pet->pet() << std::endl;
}

The library consists of three files. First is pet.cpp, which contains the abstract base class for all pets:

import std.core;
module pets.pet;

export class Pet
{
public:
  virtual char const* pet() = 0;
};

Then there is dog.cpp – our only concrete implementation of that base class (yes, I’m not a cat person).

module pets.dog;
import std.core;
import pets.pet;

export class Dog : public Pet
{
public:
  char const* pet() override;
};

char const* Dog::pet()
{
  return "Woof!";
}

Notice they each define their own submodule. Finally, there is interface.cpp, which just cobbles those submodules together into one single “parent” module:

module pets;

export module pets.pet;
export module pets.dog;

You can get the full source code including the CMake setup at our github repository. I was not able to get the standard library path setup automated so far, so you probably want to adjust that.

Discussion

There are no headers at all, which was one of my goals of laying it out like this. I think that alone means an enormous increase in productivity.

The information that was previously in the header files is now in .ifc files that the microsoft compiler generates automatically from the module definitions.
When trying this out, a couple of things stood out to me:

Intellisense does not work with the new keywords yet.
The way I used it, interface.cpp needs to be compiled after pet.cpp and dog.cpp, so that the appropriate .ifc file exists. Having an order dependency like that within a single library is a new challenge.
I was not able to use the standard lib in the library. That would compile, but not link.
Not having to duplicate the function declaration feels very strange in C++.
There are a lot of paradigm changes required. For example, include paths are a thing of the past – you will need to configure correct module search paths in the future.
We will need to get the naming straight: right now, “modules” is already used as a “distinct software component”. The new meaning is similar, but still competes with it. since the granularity is no longer so flexible. I already started using “components” as a new word for the former.

What are your experiences with modules so far? Do you have another way of composing modules? I really like to hear about it! I think the biggest challenge right now is how to use these new possibilities to improve the design of bigger C++ projects.

Learning about Class Literals after twenty years of Java

The story about a customer request that led to my discovery of a cool feature of Java: Class Literals (you’ve probably already heard about them).

I’ve programmed in Java nearly every day for twenty years now. At the beginning of my computer science studies, I was introduced to Java 1.0.x and have since accompanied every version of Java. Our professor made us buy the Java Language Specification on paper (it was quite a large book even back then) and I occassionally read it like you would read an encyclopedia – wading through a lot of already known facts just to discover something surprising and interesting, no matter how small.

With the end of my studies came the end of random research in old books – new books had to be read and understood. It was no longer efficient enough to randomly spend time with something, everything needed to have a clear goal, an outcome that improved my current position. This made me very efficient and quite effective, but I only uncover surprising facts and finds now if work forces me to go there.

An odd customer request

Recently, my work required me to re-visit an old acquaintance in the Java API that I’ve never grew fond of: The Runtime.exec() method. One of my customer had an recurring hardware problem that could only be solved by rebooting the machine. My software could already detect the symptoms of the problem and notify the operator, but the next logical step was to enable the software to perform the reboot on its own. The customer was very aware of the risks for such a functionality – I consider it a “sabotage feature”, but asked for it anyway. Because the software is written in Java, the reboot should be written in Java, too. And because the target machines are exclusively running on Windows, it was a viable option to implement the feature for that specific platform. Which brings me to Runtime.exec().

A simple solution for the reboot functionality in Java on Windows looks like this:


Runtime.exec("shutdown /r");

With this solution, the user is informed of the imminent reboot and has some time to make a decision. In my case, the reboot needed to be performed as fast as possible to minimize the loss of sensor data. So the reboot command needs to be expanded by a parameter:


Runtime.exec("shutdown /r /t 0");

And this is when the command stops working and politely tells you that you messed up the command line by printing the usage information. Which, of course, you can only see if you drain the output stream of the Process instance that performs the command in the background:


final Process process = Runtime.exec("shutdown /r /t 0");
try (final Scanner output = new Scanner(process.getInputStream())) {
    while (output.hasNextLine()) {
        System.out.println(output.nextLine());
    }
}

The output draining is good practice anyway, because the Process will just stop once the buffer is filled up. Which you will never see in development, but definitely in production – in the middle of the night on a weekend when you are on vacaction.

Modern thinking

In Java 5 and improved in Java 7, the Runtime.exec() method got less attractive by the introduction of the ProcessBuilder, a class that improves the experience of creating a correct command line and a lot of other things. So let’s switch to the ProcessBuilder:


final ProcessBuilder builder = new ProcessBuilder(
        "shutdown",
        "/r",
        "/t 0");
final Process process = builder.start();

Didn’t change a thing. The shutdown command still informs us that we don’t have our command line under control. And that’s true: The whole API is notorious of not telling me what is really going on in the background. The ProcessBuilder could be nice and offer a method that returns a String as it is issued to the operating system, but all we got is the ProcessBuilder.command() method that gives us the same command line parts we gave it. The mystery begins with our call of ProcessBuilder.start(), because it delegates to a class called ProcessImpl, and more specific to the static method ProcessImpl.start().

In this method, Java calls the private constructor of ProcessImpl, that performs a lot of black magic on our command line parts and ultimately disappears in a native method called create() with the actual command line (called cmdstr) as the first parameter. That’s the information I was looking for! In newer Java versions (starting with Java 7), the cmdstr is built in a private static method of ProcessImpl: ProcessImpl.createCommandLine(). If I could write a test program that calls this method directly, I would be able to see the actual command line by myself.

Disclaimer: I’m not an advocate of light-hearted use of the reflection API of Java. But for one-off programs, it’s a very powerful tool that gets the job done.

So let’s write the code to retrieve our actual command line directly from the ProcessImpl.createCommandLine() method:


public static void main(final String[] args) throws Exception {
    final String[] cmd = {
            "shutdown.exe",
            "/r",
            "/t 0",
    };
    final String executablePath = new File(cmd[0]).getPath();

    final Class<?> impl = ClassLoader.getSystemClassLoader().loadClass("java.lang.ProcessImpl");
    final Method myMethod = impl.getDeclaredMethod(
            "createCommandLine",
            new Class[] {
                    ????, // <-- Damn, I don't have any clue what should go here.
                    String.class,
                    String[].class
            });
    myMethod.setAccessible(true);

    final Object result = myMethod.invoke(
            null,
            2,
            executablePath,
            cmd);
    System.out.println(result);
}

The discovery

You probably noticed the “????” entry in the code above. That’s the discovery part I want to tell you about. This is when I met Class Literals in the Java Language Specification in chapter 15.8.2 (go and look it up!). The signature of the createCommandLine method is:


private static String createCommandLine(
        int verificationType,
        final String executablePath,
        final String cmd[])

Note: I didn’t remove the final keyword of verificationType, it isn’t there in the original code for unknown reasons.
When I wrote the reflection code above, it occurred to me that I had never attempted to lookup a method that contains a primitive parameter – the int in this case. I didn’t think much about it and went with Integer.class, but that didn’t work. And then, my discovery started:


final Method myMethod = impl.getDeclaredMethod(
        "createCommandLine",
        new Class[] {
                int.class, // <-- Look what I can do!
                String.class,
                String[].class
        });

As stated in the Java Language Specification, every primitive type of Java conceptionally “has” a public static field named “class” that contains the Class object for this primitive. We can even type void.class and gain access to the Class object of void. This is clearly written in the language specification and required knowledge for every earnest usage of Java’s reflection capabilities, but I somehow evaded it for twenty years.

I love when moments like this happen. I always feel dumb and enlightened at the same time and assume that everybody around me knew this fact for years, it is just me that didn’t get the memo.

The solution

Oh, and before I forget it, the solution to the reboot command not working is the odd way in which Java adds quote characters to the command line. The output above is:


shutdown /r "/t 0"

The extra quotes around /t 0 make the shutdown command reject all parameters and print the usage text instead. A working, if not necessarily intuitive solution is to separate the /t parameter and its value in order to never have spaces in the parameters – this is what provokes Java to try to help you by quoting the whole parameter (and is considered a feature rather than a bug):


final String[] cmd = {
        "shutdown",
        "/r",
        "/t",
        "0",
};

This results in the command line I wanted from the start:


shutdown /r /t 0

And reboots the computer instantaneous. Try it!

Your story?

What’s your “damn, I must’ve missed the memo” moment in the programming language you know best?

C++ modules and why we need them desperately

When I was interviewing for my job here at the Softwareschneiderei, I was asked a question:

“If you had one wish for what to add to C++, what would that be?”.

I vividly remember not having to give a lot of thought to answer that: modules. And now, it seems modules for C++ are finally materializing. About damn time.

The Past: Hello Preprocessor, my old friend

C++ has a problem with scalability. Traditionally, the only real way to use code from another compile unit, is to use header files and “use” them via preprocessor #include directives. This requires splitting your code into a header and implementation file, which requires duplicating a lot of information. And it does not even work for a lot of code. Templates need to be in the header, and a lot of modern C++ code is template code. This is descreasing uniformity and coherence of the code.
When resolving #include directives, the preprocessor really only copy-and-paste code from one file into another. Since this is a transitive process, the actual code that gets analyzed by the compiler quickly becomes huge.

Hello World?

Do this little experiment: write a simple C++ “Hello World!” program, and look at the preprocessed output.

#include <iostream>

int main()
{
  std::cout << "Hello, World!" << std::endl;
  return 0;
}

I preprocessed this simple version with Visual Studio 2017. The output was about 50500 lines! That’s more than 7200x. Now repeat that while including something from Boost. Still wonder why compilation is so slow?

Pay for what you use?

So if you include a header, you not only get the things you want from it, but also everything else. That means all the other contents of the header and all the headers it includes transitively. Usually, the number of transitively included headers counts over 10000 very quickly. This goes directly against C++’s design mantra: pay only for what you use.

The code that gets included is usually orders of magnitude more than the actual contents of your .cpp file, even in examples not as contrived as the “Hello, World!” above.

This means a lot of extra code for your toolchain to analyze.
And the work is duplicated for each compile unit.
This is obviously slow.

Leaks everywhere..

But it also means your modules are leaking. For example its dependencies: Some of your users will inadvertently use the code that you use, and if you change your dependency, they will break. How often have you used std::runtime_error without actually including stdexcept? Many C++ programmers do not even know which header a particular stdlib feature is located in. Not their fault really – it’s hard enough to memorize the contents alone without their locations in an arbitrary M:N mapping.

But dependencies are not the only things that are leaking. By exposing individual headers, you make clients dependent on the physical structure of your program as well. Moving one type from one header to another? You can not do that, unless you want to break a couple of clients.

Current workarounds

The C++ community has had different approaches on how to deal with the fallout.

Forward declarations and the PIMPL idiom let you break the transitive dependencies.
But a forward declaration is a very subtle code duplication, and a PIMPL even creates runtime overhead.
Unity builds tackle problem of resolving your include graph multiple times, but at the cost of an obscure extension to your build system and negative impacts for incremental builds.
Meta-headers tackle to problem of more clearly defined module boundaries, but they make the compile time worse and make it harder to explore modules.

It’s a catch 22.

Tool support

Because macros leak in and out of headers, semantic analysis becomes very hard. In fact, a tool needs to understand the program in its entirety, including all source and build files to properly refactor. After all, each define given on a command line, or even each reordering (!) of #include files could potentially alter the semantics completely. Every line of code in a header can change its meaning completely depending on its context.

There are also techniques that abuse this feature, i.e. cross-includes, where an include does something based on a previous #define. Granted, only a small percentage of code is usually directly affected by such subtleties, but there is currently no way to properly isolate from it. That is why refactoring and introspection tools for other languages are so much better.

State of the union

The modules proposal is spearheaded by Gabriel Dos Reis at Microsoft. There’s an in-progress implementation of it since Visual Studio 2015, and they are still regularly updating it, so the most recent one is in VS 2017. If you want to know more, have a look at this video.

Monitoring long running operations in Oracle databases

We regularly work with database tables with hundreds of millions of entries. Some operations on these table can take a while. Not necessarily queries, but operations in preparation to make queries fast, for example the creation of materialized views or indexes.

The problem with most SQL tools is: once you run your SQL statement you have no indication of how long it will take to complete the operation. No progress bar and no display of the remaining time. Will it take minutes or hours?

Oracle databases have a nice feature I learned about recently that can answer these questions. Operations that take longer than 6 seconds to complete are considered “long operations” and get an entry in a special view called V$SESSION_LONGOPS.

This view does not only contain the currently running long operations but also the history of completed long operations. You can query the status of the current long operations like this:

SELECT * FROM V$SESSION_LONGOPS 
  WHERE time_remaining > 0;

This view contains columns like

TARGET (table or view on which the operation is carried out)
SOFAR (units of work done so far)
TOTALWORK (total units of work)
ELAPSED_SECONDS (number of elapsed seconds from the start of the operation)

Based on these values the view offers another column, which contains the estimated remaining time in seconds: TIME_REMAINING.

This remaining time is really just an estimate, because it assumes long running operations to be linear, which is not necessarily true. Also some SQL statements can spawn multiple consecutive operations, e.g. first a “Table Scan” operation and then a “Sort Output” operation, which will only become visible after the first operation has finished. Nevertheless I found this feature quite helpful to get a rough idea of how long I will have to wait or to inform decisions such as whether I really want to perform an operation until completion or if I want to cancel it.

Personas: The great misunderstanding

Personas are a typical user research and design tool but are often misunderstood

Reminder: What are personas ?

Personas were first described by Alan Cooper in his ground breaking book “The inmates are running the asylum”:

Our most effective tool is profoundly simple: Develop a precise description of our user and what he wishes to accomplish.

He goes on to define personas as “hypothetical archetypes of actual users” and states that personas “are defined by their goals”.
One of the key points here is that personas are never made up but are grounded in research. They are used to provide condensed information about the result of the user research. Another take away is that a persona description should include its goals.

The misunderstanding

In recent times some designers dumped personas because they are 1) imaginary and 2) defined by attributes that leave out causality. The problem here is that personas are often seen as a collection of mere demographic data (like age, job, income, …). But this only describes marketing personas not the personas imagined by Alan Cooper. As seen in his books the data of a persona is never made up but inferred from user research. Also demographics play only a minor role in creating personas, citing Mr. Cooper again:

Personas are segmented along ranges of user behaviour, not demographics or buying behaviour.

So the behaviour of our users defines the persona not any demographic trait.

The causality mentioned in the criticism misses a vital part of a persona: the scenario. Personas go hand-in-hand with scenarios (by Alan Cooper, About Face):

Persona-based scenarios are concise narrative descriptions of one or more personas using a product or service to achieve specific goals.

and

Scenario content and context are derived from information gathered during the Research phase and analyzed during the Modeling phase

So with these scenarios personas describe the context and the goals and behaviours of our users.

As we see with the criticism the context, goals and motivations of our users are important. Personas and scenarios should not be made up but condensed from research. They are used to say ‘no’ to decisions in the process of designing. A word of warning: do not abstract your persona too far away from your users. One goal of personas are to built empathy. If your personas are too artificial your empathy will suffer. Also I like how Jeff Patton uses research findings: for him they are like vacation photos, if you’ve been there they are reminder what happened.

Consensus

The criticism largely comes from designers favoring the jobs-to-be-done (JTBD) methodology. Jobs-to-be-done is a framework to analyse and describe why a users hires a product or service to get something done. It provides a very useful perspective on the context and behaviour of users. Both approaches (personas and jobs) can be combined. Where personas provide a human connection, jobs provide a contextual one. Shahrzad Samadzadeh provides a sketch how both can be combined with the help of a journey map. All three methods help to balance each approach: the personas help to avoid making the jobs too analytic, the jobs help to ground and limit the personas in research valuable to the problem at hand and journeys can bring all together.

Packaging Python projects for Debian/Ubuntu

Deployment of software using built-in software management tools is very convenient and provides a nice user experience (UX) for the users. For debian-based linux distributions like Ubuntu packaging software in .deb-packages is the way to go. So how can we prepare our python projects for packaging as a deb-package? The good news is that python is supported out-of-the-box in the debian package build system.

Alternatively, you can use the distutils-extension stdeb if you do not need complete flexibility in creating the packages.

Basic python deb-package

If you are using setuptools/distutils for your python project debian packaging consists of editing the package metadata and adding --with python to the rules file. For a nice headstart we can generate templates of the debian metadata files using two simple commands (the debhelper package is needed for dh_make:

# create a tarball with the current project sources
python setup.py sdist
# generate the debian package metadata files 
dh_make -p ${project_name}_${version} -f dist/${project_name}-${version}.tar.gz

You have to edit at least the control-file, the changelog and the rules-file to build the python package. In the rules-file the make-target % is the crucial point and should include the flag to build a python project:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python

After that you can build the package issueing dpkg-buildpackage.

The caveats

The debian packaging system is great in complaining about non-conformant aspects of your package. It demands digital signatures, correct file and directory names including version strings etc. Unfortunately it is not very helpful when you make packaging mistakes resulting in empty, incomplete or broken packages.

Issues with setup.py

The setup.py build script has to reside on the same level as the debian-directory containing the package metadata. The packaging tools will not tell you if they could not find the setup script. In addition it will always run setup.py using python 2, even if you specified --with python3 in the rules-file.

Packaging for specific python versions

If you want better control over the target python versions for the package you should use Pybuild. You can do this by a little change to the rules-file, e.g. a python3-only build using Pybuild:

# main packaging script based on dh7 syntax
%:
	dh $@ --with python3 --buildsystem=pybuild

For pybuild to work it is crucial to add the needed python interpreter(s) besides the mandatory build dependency dh-python to the Build-Depends of the control-file, for python3-only it could look like this:

Build-Depends: debhelper (>=9), dh-python, python3-all
...
Depends: ${python3:Depends}

Without the dh-python build dependency pybuild will silently do nothing. Getting the build dependencies wrong will create incomplete or broken packages. Take extra care of getting this right!

Conclusion

Debian packaging looks quite intimidating at first because there are so many ways to build a package. Many different tools can ease package creation but also add confusion. Packaging python software is done easily if you know the quirks. The python examples from the Guide for Debian Maintainers are certainly worth a look!

	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…
	Anonymous on Creating functors with lambda…