The best of both worlds: scoped_flags

C++11 introduced a pretty nice change to enum types in C++, the scoped enumeration. They mostly supersede the old unscoped enumeration, which was inherited from C and had a few shortcomings. For example, the names in the enumeration where added to its parent scope. This means that given an enum colors {red, green blue}; you can simply say auto my_color = red;. This can, of course, lead to ambiguities and people using some weird workarounds like putting the enums in namespaces or prefixing all elements á la hungarian-notation. Also, unscoped enumerations are not particularly type-safe: they can be converted to integer types and back without any special consideration, so you can write things like int x = red; without the compiler complaining.
Scoped enumerations improves both theses aspects: with enum class colors {red, green, blue};, you have to use auto my_color = colors::red; and int x = colors::red; will simply not compile.
To get the second part to compile, you need to insert a static_cast: int x = static_cast(colors::red); which is purposefully a lot more verbose. Now this is a bit of a blessing and a curse. Of course, this is a lot more type-safe, but it make one really common usage pattern with enums very cumbersome: bit flags.

Did this get worse?

While you could previously use the bit operators to combine different bitmasks defined as enums, scoped enumerations will only let you do that if you cast them first. In other words, type-safety prevents us from combining flags because the result might, of course, no longer be a valid enum.
However, we can still get the convenience and compactness of bit flags with a type that represents combinations bitmasks from a specific enum type. Oh, this reeks of a template. I give you scoped_flags, which you can use like this:

enum class window_flags
{
  has_border = 1 << 0,
  has_caption = 1 << 1,
  is_child = 1 << 2,
  /* ... */
};
void create_window(scoped_flags<window_flags> flags);

void main()
{
  create_window({window_flags::has_border, window_flags::has_caption});
}

scoped_flags<window_flags> something = /* ... */

// Check a flag
bool is_set = something.test(window_flags::is_child);

// Remove a flag
auto no_border = something.without(window_flags::has_border);

// Add a flag
auto with_border = something.with(window_flags::has_border);

Current implementation

You can find my current implementation on this github gist. Even in its current state, I find it a niftly little utility class that makes unscoped enumerations all but legacy code.
I opted not to replicate the bitwise operator syntax, because &~ for “without” is so ugly, and ~ alone makes little sense. A non-explicit single-argument constructor makes usage with a single flag as convenient as the old C-style variant, while the list construction is just a tiny bit more complicated.
The implementation is not complete or final yet; for example without is missing an overload that gets a list of flags. After my previous adventures with initializer_lists, I’m also not entirely sure whether std::initializer_list should be used anywhere but in the c’tor. And maybe CTAD could make it more comfortable? Of course, everything here can be constexpr‘fied. Do you think this is a useful abstraction? Any ideas for improvements? Do tell!

Zero Interaction Tools

Some time ago, a customer called us for a delicate task: To develop a little tool in a very tight budget that aggregates pictures in a specific way. The pictures were from the medical domain and comprised of sets with thousands of pictures that needed to be combined into one large picture with the help of some mathematics. Developing the algorithm was not a problem, the huge data sizes neither, but the budget was a challenge. We could program everything and test it on some sample picture sets (that arrived on several blue-ray discs) within the budget, but an elaborate graphical user interface (GUI) would be out of scope. On the other hand, the anticipated users of the tool weren’t computer affine enough to handle a CLI (command line interface). The tool needed to be simple, fast and cheap. And it only needed to do one thing.

Traditional usage of software tools

In the traditional way, a software tool comes with an installer that entrenches the tool onto the target computer and provides a start menu entry, a desktop icon and perhaps even a boot launcher with a fancy tray icon and some balloon tooltips that inform the user from time to time that the tool is still installed and wants some attention. If you click on the tool’s icon, a graphical user interface appears, perhaps in the form of an application window or just a tray pop-up menu. You need to interact with the user interface, then. Let’s say you want to combine several thousand pictures into one, then you need to specify directories or collections of files through some file dialogs with “browse” buttons and complex “ingredients” lists. You’ve probably seen this type of user interface while burning a blue-ray disc or uploading files into the cloud. After you’ve selected all your input pictures, you have to say where to write to (another file dialog) and what name your target file should have. After that, you’ll stare at a progress bar and wait for it to reach the right hand side of the widget. And then, the tool will beep and proudly present a little message box that informs you that everything has worked out just fine and you can find your result right were you wanted to. Afterwards, the tool will sit there on your screen in anticipation of your next move. What will you do? Do it all again because you love the progress bars and beeps? Combine another several thousand pictures? Shutdown the tool? Oh, come on! Are you sure you want to quit now?

None of this could be developed in the budget our customer gave us. The tool didn’t need the self-marketing aspects like a tray icon or launcher, because the customer would only use it internally. But even the rest of the user interface was too much work: the future users would not get a traditional software tool.

Zero interaction tool

So we thought about the minimal user interface that the picture aggregation tool needed to have. And came to the conclusion that no user interface was needed, because it really only needed to do one thing. At least, if certain assumptions hold true:

  • The tool is fast enough to produce no significant delay
  • The input directory holds all pictures that should be aggregated
  • The input directory can be the output directory as well
  • The name of the resulting picture file can contain a timestamp to distinguish between several tool runs

We consulted our customer and checked that the latter three assumptions were valid. So, given we can make the first assumption a reality, the tool could work without any form of user interaction by being copied into the picture directory and then started.

Yes, you’ve read this right. The tool would not be installed, but copied for every usage. This is a highly unusual usage scenario for a program, because it means that every picture directory that should be aggregated holds an identical copy of the program. But if we can make some more assumptions valid, it is a viable way to empower the users:

  • The tool must run on all target machines without additional preparation
  • The tool must only consist of one executable file, no DLLs or configuration files
  • The tool must be small in size, like one megabyte at most

We confirmed with a quick and dirty spike (an embarrasingly inchoate prototype) that we can produce a program that conforms to all three new assumptions/requirements. The only remaining problem was the very first assumption: No harddrive was fast enough to provide the pixel data of thousands of pictures in less than a second. Even if we could aggregate the pixels fast enough (given enough cores, this would be possible), we couldn’t get hold of them fast enough. We needed some kind of progress bar.

Use your information channels

We thought about the information channels our tool would have towards the user. Let’s repeat the scenario: The user navigates to the directory containing the pictures that should be aggregated, copies the executable program into it and double-clicks to start the tool. There are many possibilities to inform the user about progress:

  • Audio (Sound): We can play a little tune or some sound that changes frequency to indicate progress. This is highly unusual and we can’t be sure that the speakers aren’t muted (usage on a notebook was part of the domain analysis results). No sounds, that is.
  • Animation (Graphics): In the most boring case, this would be a little window with a progress bar that runs from left to right and disappears when the work is done. Possible, but boring. Perhaps we can think of something more in tune with the rest of the usage scenario.
  • Text: Well, this was the idea. We produce a result file and give it a name. We don’t need to keep the name static as long as we are working and things change inside the file, anyways. We just update the file name to reflect our progress.

So our tool creates a new result file in the picture directory that is named result_0_percent or something and runs up to result_100_percent and then gets renamed to result_timestamp with the current timestamp. You can just watch your file explorer to keep up with the tool’s completion. This is a bit unusual at first, but all pilot users grasped the concept immediately and were pleased with it.

The result

And this is the story when we developed a highly specialized tool within a very small budget without any graphical or otherwise traditional user interface. The user brings the tool to the data (by copying it into the same directory) and lets it perform its work by simply starting it. The tool reports its progress back via the result file name. As soon as the result file contains a timestamp (and the notebook air fans cease to go beserk), the user can copy it into the next tool in the tool chain, probably a picture viewer or a printer driver. The users loved the tool for its speed and simplicity.

One funny side-note remains to be told: Because thousands of pictures aggregated into one produces a picture with a lot of details, the result file was not too big (about 20-30 megabytes), but could take out any printer for several hours if printed. The tool got informally renamed to “printer-reaper.exe”.

Code duplication is not always evil

Before you start getting mad at me first a disclaimer: I really think you should adhere to the DRY (don’t repeat yourself) principle. But in my opinion the term “code duplication” is too weak and blurry and should be rephrased.

Let me start with a real life story from a few weeks ago that lead to a fruitful discussion with some fellow colleagues and my claims.

The story

We are developing a system using C#/.NET Core for managing network devices like computers, printers, IP cameras and so on in a complex network infrastructure. My colleague was working on a feature to sync these network devices with another system. So his idea was to populate our carefully modelled domain entities using the JSON-data from the other system and compare them with the entities in our system. As this was far from trivial we decided to do a pair-programming session.

We wrote unit tests and fixed one problem after another, refactored the code that was getting messing and happily chugged along. In this process it became more and more apparent that the type system was not helping us and we required quite some special handling like custom IEqualityComparers and the like.

The problem was that certain concepts like AddressPools that we had in our domain model were missing in the other system. Our domain handles subnets whereas the other system talks about ranges. In our system the entities are persistent and have a database id while the other system does not expose ids. And so on…

By using the same domain model for the other system we introduced friction and disabled benefits of C#’s type system and made the code harder to understand: There were several occasions where methods would take two IEnumerables of NetworkedDevices or Subnets and you needed to pay attention which one is from our system and which from the other.

The whole situation reminded me of a blog post I read quite a while ago:

https://www.sandimetz.com/blog/2016/1/20/the-wrong-abstraction

Obviously, we were using the wrong abstraction for the entities we obtained from the other system. We found ourselves somewhere around point 6. in Sandy’s sequence of events. In our effort to reuse existing code and avoid code duplication we went down a costly and unpleasant path.

Illustration by example

If code duplication is on the method level we may often simply extract and delegate like Uncle Bob demonstrates in this article. In our story that would not have been possible. Consider the following model of Price and Discount e-commerce system:

public class Price {
    public final BigDecimal amount;
    public final Currency currency;

    public Price(BigDecimal amount, Currency currency) {
        this.amount = amount;
        this.currency = currency;
    }

    // more methods like add(Price)
}

public class Discount {
    public final BigDecimal amount;
    public final Currency currency;

    public Discount(BigDecimal amount, Currency currency) {
        this.amount = amount;
        this.currency = currency;
    }

    // more methods like add(Discount<span 				data-mce-type="bookmark" 				id="mce_SELREST_start" 				data-mce-style="overflow:hidden;line-height:0" 				style="overflow:hidden;line-height:0" 			></span>)
}

The initial domain entities for price and discount may be implemented in the completely same way but they are completely different abstractions. Depending on your domain it may be ok or not to add two discounts. Discounts could be modelled in a relative fashion like “30 % off” using a base price and so. Coupling them early on by using one entity for different purposes in order to avoid code duplication would be a costly error as you will likely need to disentangle them at some later point.

Another example could be the initial model of a name. In your system Persons, countries and a lot of other things could have a name entity attached which may look identical at first. As you flesh out your domain it becomes apparent that the names are different things really: person names should not be internationalized and sometimes obey certain rules. Country names in contrast may very well be translated.

Modified code duplication claim

Duplicated code is the root of all evil in software design.

— Robert C. Martin

I would like to reduce the temptation of eliminating code duplication for different abstractions by modifying the well known claim of Uncle Bob to be a bit more precise:

Duplicated code for the same abstraction is the root of all evil in software design.

If you introduce coupling of independent concepts by eliminating code duplication you open up a new possibility for errors and maintenance drag. And these new problems tend to be harder to spot and to resolve than real code duplication.

Duplication allows code to evolve independently. I think it is important to add these two concepts to your thinking.

Containers allot responsibilities anew

Earlier this year, we experienced a strange bug with our invoices. We often add time tables of our work to the invoices and generate them from our time tracking tool. Suddenly, from one invoice to the other, the dates were wrong. Instead of Monday, the entry was listed as Sunday. Every day was shifted one day “to the left”. But we didn’t release a new version of any of the participating tools for quite some time.

What we did since the last invoice generation though was to dockerize the invoice generation tool. We deployed the same version of the tool into a docker container instead of its own virtual machine. This reduced the footprint of the tool and lowered our machine count, which is a strategic goal of our administrators.

By dockerizing the tool, we also unknowingly decoupled the timezone setting of the container and tool from the timezone setting of the host machine. The host machine is set to the correct timezone, but the docker container was set to UTC, being one hour behind the local timezone. This meant that the time table generation tool didn’t land at midnight of the correct day, but at 23 o’clock of the day before. Side note: If the granularity of your domain data is “days”, it is not advisable to use 00:00 o’clock as the reference time for your technical data. Use something like 12:00 o’clock or adjust your technical data to match the domain and remove the time aspect from your dates.

We needed to adjust the timezone of the docker container by installing the tzdata package and editing some configuration files. This was no big deal once we knew where the bug originated from. But it shows perfectly that docker (as a representative of the container technology) rearranges the responsibilities of developers and operators/administrators and partitions them in a clear-cut way. Before the dockerization, the timezone information was provided by the host and maintained by the administrator. Afterwards, it is provided by the container and therefore maintained by the developers. If containers are immutable service units, their creators need to accomodate for all the operation parameters that were part of the “environment” beforehands. And the environment is provided by the operators.

So we see one thing clearly: Docker and container technology per se partitions the responsibilities between developers and operators in a new way, but with a clear distinction: Everything is developer responsibility as long as the operators provide ports and volumes (network and persistent storage). Volume backup remains the responsibility of operations, but formatting and upgrading the volume’s content is a developer task all of a sudden. In a containerized world, the operators don’t know you are using a NoSQL database and they really don’t care anymore. It’s just one container more in the zoo.

I like this new partitioning of responsibilities. It assigns them for technical reasons, so you don’t have to find an answer in each organization anew. It hides a lot of detail from the operators who can concentrate on their core responsibilities. Developers don’t need to ask lots of questions about their target environment, they can define and deliver their target environment themselves. This reduces friction between the two parties, even if developers are now burdened with more decisions.

In my example from the beginning, the classic way of communication would have been that the developers ask the administrator/operator to fix the timezone on the production system because they have it right on all their developer machines. The new way of communication is that the timezone settings are developer responsibility and now the operator asks the developers to fix it in their container creation process. And, by the way, every developer could have seen the bug during development because the developer environment matches the production environment by definition.

This new partition reduces the gray area between the two responsibility zones of developers and operators and makes communication and coordination between them easier. And that is the most positive aspect of container technology in my eyes.

Inductive types on the rise

One thing I really got used to when using Agda for academic projects are inductive types. And inductive types are probably what I currently miss most when using mainstream languages to solve practical problems.

This Post is aimed at software developers that do not know inductive types (Agda, Coq, Idris), variants (OCaml, F#) or GADTs (Haskell). Other software developers might still be interested in the last section about higher inductive types.

What are inductive types?

I will use Agda’s syntax for the most part. Here is a simple example of an inductive type named ‘Bool’:

2019-12-16 15_45_01-emacs@DESKTOP-39VG353

The colons are to be read as ‘is of type’ and ‘Set’ is the type of types.  The code defines an inductive type named ‘Bool’ with the constructors ‘True’ and ‘False’. I use ‘constructor’ with a broader meaning than it has in object oriented programming.

The type ‘Bool’ will behave somewhat like the following enum in Java:

2019-12-16 15_43_42-emacs@DESKTOP-39VG353

The analogy with ‘enum’ works as long as the constructors have zero arguments. Here is an inductive type where one constructor has one argument:

2019-12-16 15_54_58-emacs@DESKTOP-39VG353

For any type ‘A’, ‘Optional A’ will be a type that behaves like an immutable version of ‘Optional<A>’ in Java. So for example, ‘Some True’ would be a value of type ‘Optional Bool’ (Note that function application is written without parenthesis). It is also possible, to have constructors with arguments of the type to be defined:

2019-12-16 16_22_42-emacs@DESKTOP-39VG353

The natural numbers defined in this way will be great for verification and very bad for actual calculations since this representation is unary. For example, the number three can be defined with three constructor calls:

2019-12-16 16_27_12-emacs@DESKTOP-39VG353

The really interesting thing to note is, that this quite short inductive definition of the natural numbers actually behaves like the natural numbers from mathematics. And you can prove things about those naturals using the same induction based proofs you learn mathematics courses. In Agda, those proofs can be done using pattern matching, the topic of the next section.

In Agda, inductive definitions are also supported for dependent types. There are lots of interesting things that can be done using this combination of concepts. One is an inductive definition of equality for all types. This won’t give you a reasonable ‘Equals’-method for all your types, but it provides you with a consistent notion what such a method should return.

Patterns

The great thing about inductive types is, that functions may be defined by pattern matching. A simple case is the negation function on the type ‘Bool’ defined above:

2019-12-16 16_06_53-emacs@DESKTOP-39VG353

The first line declares the type of the new function ‘negation’ to be ‘Bool -> Bool’ and the lines below are a definition by pattern matching. Agda checks if the pattern covers all cases. If you want the same compile time check in Java (prior to version 12) you would have to use a trick.

Here is an example using the types defined above, with a more complicated pattern:

2019-12-16 17_02_58-emacs@DESKTOP-39VG353

Note that ‘IsEven’ is recursively used in the last line. The termination checker of Agda makes sure that the recursion doesn’t loop forever and this definition passes this check, since ‘n’ is of a lower ‘height’ than the argument ‘Successor (Successor n)’. So progress will be made on each recursion and the computation will stop eventually.

Those checks are important when pattern matching is used to prove things, which can be done for example in the style of the following pseudo code:

2019-12-16 17_28_23-emacs@DESKTOP-39VG353

Higher inductive types

In mathematics new sets are often created by identifying elements of some easy to understand set. For example, the rational numbers can be constructed as pairs of integers ‘(a,b)’, where b is not zero, by identification of pairs ‘(a,b)’ and ‘(c,d)’ if ‘c*b = a*d’.

It is now possible in some systems to construct such quotients as inductive types. Agda has a special mode called ‘cubical’ which allows inductive types to have constructors that ‘produce’ equalities in the inductive type. Here is an excerpt from the standard library for Agda’s cubical mode, that defines the rational numbers inductively:

2019-12-16 17_57_02-emacs@DESKTOP-39VG353

The first constructor ‘con’ tells us, that we can produce a rational number from a pair of integers ‘u’ and ‘a’ provided ‘a’ is not zero. The constructor ‘path’ makes the identification explained above. The third constructor ‘trunc’ has to do with some curious weirdness that comes with having ‘inductive equalities’ – some elements of a type might be equal in different ways. ‘trunc’ uses inductive equalities again, to ‘truncate’ the possibilites how rational numbers can be equal back to the expected ‘yes’ and ‘no’.

This appearance of extra equalities between things is by no means a pathology, but a connection to a topic in pure mathematics called homotopy theory. But so far there are not much suggestions how the homotopy theory we have at our fingertips in Agda can help us with pratical programming. If we ‘trunc’ our quotient however, we have a pretty usable way of mimicking the mathematical style described above when defining data types.

As more and more concepts from academic languages pour into the mainstream, I have hopes that I can use at least some inductive techniques some day, saving me from some annoying bugs and hard to read constructions.

std::initializer_list considered evil

I am so disappointed in you, std::initializer_list. You are just not what I thought you were.

Lights out

While on the train to Meeting C++ this year, I was working on the lighting subsystem of the 3D renderer for my game abstractanks. Everything was looking fine, until I switched to the release build. Suddenly, my sun light went out. All the smaller lights were still there, it just looked like night instead of day.
Now stuff working in Debug and not working in Release used to be quite common and happens when you’re not correctly initializing built-in variables. So I went digging, but it was not as easy as I had thought. Several hours later, I tracked the problem down to my global light’s uniform buffer initialization code. This is a buffer that is sent to the GPU so the shaders can read all the lighting information. It looked like a fairly innocent for-loop doing byte-copies of matrices and vectors to a buffer:

using Pair = std::pair;
auto Mapping = std::initializer_list{
  {ShadowMatrix.ptr(), MATRIX_BYTE_SIZE},
  {LightDirection.ptr(), VECTOR4_BYTE_SIZE},
  {ColorAndAmbient.ptr(), VECTOR4_BYTE_SIZE}
};

std::size_t Offset = 0;
for (auto const& Each : Mapping)
{
  mUniformBuffer.SetSubData(GL_UNIFORM_BUFFER, Each.second, Offset, Each.first);
  Offset += Each.second;
}

The Culprit

After mistakenly blaming alignment issues for a while, I finally tried looking at the values of Each.second and Each.first. To my surprise, they were bogus. Now what is going on there? It turns out not writing this in almost-always-auto style, i.e. using direct- instead of copy-initialization fixes the problem, so there’s definitely a lifetime issue here.

Looking at the docs, it became apparent that std::initializer_list is indeed a reference-type that automatically creates a value-type (the backing array) internally and keeps it alive exactly as binding a reference to that array would. For the common cases, i.e. when std::initializer_list is used as a parameter, this is fine, because the original list lives for the whole function-call expression. For the direct-initialization case, this is also fine, since the reference-like lifetime-extension kicks in. But for copy-initialization, the right-hand-side is done after the std::initializer_list is copied. So the backing array is destroyed. Oops.

Conclusion and alternatives

Do not use std::initializer_list unless as a function parameter. It works well for that, and is surprising for everything else. In my case, a naive “extract variable” refactoring of for (auto const& each : {a, b, c}) { /* ... */ } led me down this rabbit hole.
My current alternative is stupidly simple: a built-in array on the stack:

using Pair = std::pair;
Pair Mapping[]{
  {ShadowMatrix.ptr(), MATRIX_BYTE_SIZE},
  {LightDirection.ptr(), VECTOR4_BYTE_SIZE},
  {ColorAndAmbient.ptr(), VECTOR4_BYTE_SIZE}
};

It does the same thing as the “correct” version of the std::initializer_list, and if you try to use it AAA-style, at least clang will give you this nice warning: warning: temporary whose address is used as value of local variable 'Mapping' will be destroyed at the end of the full-expression [-Wdangling]

Working with JSON data in Oracle databases

In my last post I showed how to work with JSON data in PostgreSQL. This time I want show how it is done with an Oracle database for comparison. I will use the same example scenario: a table named “events” where application events are stored in JSON format.

JSON data types

In Oracle there is no special data type for JSON data. You can use character string datatypes like VARCHAR2 or CLOB. However, you can add a special CHECK constraint to a column in order to ensure that only valid JSON is inserted:

CREATE TABLE events (
  datetime TIMESTAMP NOT NULL,
  event CLOB NOT NULL
  CONSTRAINT event_is_json CHECK (event IS JSON)
);

If you try to insert something other than JSON you will get a constraint violaiton error:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, 'This is not JSON.');

ORA-02290: check constraint (EVENT_IS_JSON) violated

Let’s insert some valid JSON data:

INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_shelf", "payload": {"id": 2}}');
INSERT INTO events (datetime, event) VALUES
  (CURRENT_TIMESTAMP, '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

In Oracle you use the JSON_VALUE function to select a value from a JSON structure. It uses a special path syntax for navigating JSON objects where the object root is represented as ‘$’ and properties are accessed via dot notation. This function can be used both in the SELECT clause and the WHERE clause:

SELECT JSON_VALUE(event, '$.type') AS type
  FROM events;
TYPE
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
  WHERE JSON_VALUE(event, '$.type')='add_book'
    AND JSON_VALUE(event, '$.payload.shelf')=1;
EVENT
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Constructing JSON objects

JSON objects can be constructed from values via the JSON_OBJECT and JSON_ARRAY functions:

SELECT JSON_OBJECT(
  'id' VALUE 1,
  'name' VALUE 'tree',
  'isPlant' VALUE 'true' FORMAT JSON,
  'colors' VALUE JSON_ARRAY('green', 'brown')
) FROM dual;
{"id":1,"name":"tree","isPlant":true,"colors":["green","brown"]}

Note that you have to use string values with the additional FORMAT JSON clause for boolean values.

Updating

Modifying JSON object fields has become feasible with the introduction of the JSON_MERGEPATCH function in Oracle 19c. It takes two JSON parameters:

1) the original JSON data
2) a JSON “patch” snippet that will be merged into the original JSON data. This can either add or update JSON properties.

It can be used in combination with JSON_VALUE and JSON_OBJECT. In this example we convert all the event “type” fields from lower case to upper case:

UPDATE events SET event=JSON_MERGEPATCH(
  event,
  JSON_OBJECT('type' VALUE UPPER(JSON_VALUE(event, '$.type')))
);

Oracle provides a lot more functions for working with JSON data. This post only covered the most basic ones. See the Oracle JSON reference for more.

Meeting C++ 2019 summary

A fellow colleague and me had the pleasure to attend this years Meeting C++ 2019 from November 14th-16th in Berlin. It was my second visit and a quite interesting and insightful one. Therefore I would like to give a short summary and share some of my take-aways.

General impressions

The organization and venue were great and everything from booking, catering and the talks went smoothly. The C++ Community is very professional and communication is very friendly and open. I am once again impressed that they openly addressed diversity problems, promoted and enforced a code of conduct and the like.

The social events, the legendary C++-Quiz (many thanks again to Diego) and the lightning talks provided relaxing counterparts to the hard technical stuff.

The keynotes

Design Rationale for <chrono> (Howard Hinnant)

The author of the new time- and date API <chrono> coming in C++20 presented the design and showed many examples of how to use it. While this keynote was very technical and maybe missing stories and jokes you often see in keynotes it was extremely interesting and insightful for me. The design and usage of the library is super-elegant and two elements really stood out for me:

  1. Let the API user decide. More concretely the <chrono> library lets the programmer decide on an case-by-case basis what to do with overruns and illegal dates when making calculations. For example, what should happen if you add 1 year to february 29th? What if you add 1 month to the last day of October? <chrono> does not make that decision for you but lets you check if the date is legal and allows you to easily snap to the correct date, make an overflow to the next month or just throw an error.
  2. Find the essence of your domain. The calendar implementation in <chrono> is based on the insight, that a calendar is only a collection of dates with unique names. So the most simple and canonical calendar (called sys_days) simply counts the days since 01.01.1970. Other calendars only need conversions from/to sys_days to be fully interoperable. Most other calendar APIs include time of day which often causes problems when doing calculations.

Can AI replace programmers? (Frances Buontempo)

Entertaining and interesting talk about the history, definition, types and current state of artificial intelligence. The core of todays AI is mostly about automation of non-trivial tasks. The interaction of real people in the feedback loop is totally mandatory today and this will stay so for quite some time. In addition the resulting code/artifacts are often totally incomprehensible for human beings.

Crazy Code and Crazy Coders (Walter E. Brown)

Very entertaining talk with tons of hair-raising real-life code examples. Walter used them not only to entertain but to bring attention to us programmers that we all bear a ton of responsibility for our code because we simply do not know where it will end up in a few years. So we absolutely must deal with it in a professional way or bad things will happen.

Other noteworthy stuff

There were of course a lot more great and interesting talks, so check out the slides or watch last years talks on youtube until this years are available. I just want to mention a few I personally attended and found worthwhile:

  • Combining C++17 Features in Practice – Nicolai Josuttis
  • The C++20 Synchronization Library – Bryce Adelstein Lelbach
  • CPU design effects that can degrade performance of your programs – Jakub Beranek
  • Value Propositon: Allocator-Aware Software – John Lakos
  • Modules are Coming – Bryce Adelstein Lelbach
  • Better Algorithm Intuition – Conor Hoekstra
  • Squaring the circle: value-oriented design in an object-oriented system – Juan Pedro Bolívar Puente

The following two lightning talks stood out for me and are easily relatable by polyglot programmers:

Conclusion

This years Meeting C++ was a well-rounded event. I am very glad that I could attend again and got a lot of new input and impulses that will surely affect my day-to-day work – not only in C++ projects.

Working with JSON data in PostgreSQL

Today most common SQL-based relational database management systems (DBMS) like PostgreSQL, MySQL, MariaDB, SQL Server and Oracle offer functionality to efficiently store and query JSON data in one form or another, with varying syntax. While a standard named SQL/JSON is in the works, it is not yet fully supported by all of these DBMS. This blog post is specific to PostgreSQL.

JSON data types

In PostgreSQL there are two data types for JSON columns: json and jsonb. The former stores JSON data as-is with any formatting preserved, while the latter stores JSON in a decomposed binary format. Operations on data in jsonb format are potentially more efficient.

We’ll use the jsonb data type in the following example to store a sequence of events, for example for an event sourcing based application, in a table.

CREATE TABLE events (date TIMESTAMP NOT NULL,
                     event JSONB NOT NULL);

JSON literals look like string literals. Let’s insert some events:

INSERT INTO events (date, event) VALUES
  (NOW(), '{"type": "add_shelf", "payload": {"id": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Ulysses", "shelf": 1}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Moby Dick", "shelf": 1}}'),
  (NOW(), '{"type": "add_shelf", "payload": {"id": 2}}'),
  (NOW(), '{"type": "add_book", "payload": {"title": "Don Quixote", "shelf": 2}}');

Querying

PostgreSQL has two operators for navigating a JSON structure: -> and ->>. The former accesses an object field by key and the latter accesses an object field as text. These operators can be used both in the SELECT clause and the WHERE clause:

SELECT event->>'type' AS type FROM events;
type
add_shelf
add_book
add_book
add_shelf
SELECT event FROM events
        WHERE event->>'type'='add_book'
          AND event->'payload'->>'shelf'='1';
event
{"type":"add_book","payload":{"shelf":1,"title":"Ulysses"}}
{"type":"add_book","payload":{"shelf":1,"title":"Moby Dick"}}

Note that in the example above the value of "shelf" is compared to a string literal ('1'). In order to treat the value as a number we have to use the CAST function, and then we can use numerical comparison operators:

SELECT event FROM events
        WHERE CAST(
          event->'payload'->>'shelf' AS INTEGER
        ) > 1;
event
{"type":"add_book","payload":{"shelf":2,"title":"Don Quixote"}}

Updating

Updating JSON object fields is a bit more complicated. It is only possible with the jsonb data type and can be done via the JSONB_SET function, which takes four arguments:

1) the original JSON,
2) a path specifying which object fields should be updated,
3) a jsonb value, which is the new value, and
4) a boolean flag that specifies if missing fields should be created.

In this example we convert all the event "type" fields from lower case to upper case:

UPDATE events SET event=JSONB_SET(
  event,
  '{type}',
  TO_JSONB(UPPER(event->>'type')),
  false
);

PostgreSQL provides a lot more operators and functions for working with JSON data. This post only covered the most basic ones. See the PostgreSQL JSON reference for more.

Compiling and using Agda through the Windows Linux Subsystem

The Windows Subsystem for Linux (WSL) more or less runs a linux kernel on Windows 10. In this post, I will describe how to use WSL to compile and run Agda, a dependently typed functional programming language. Compiling agda yourself makes sense if you want to use the latest features, of which there are quite nice ones. The approach presented here is just my preferred way of compiling and using Agda on a Linux system with some minor adjustments.

Prerequisits for compiling

First, you need the “ubuntu app”, you can install it following this guide. Essentially, you just have to activate WSL and install the app through the Microsoft Store, but following the guide step by step allowed me to do it without creating a Microsoft account.
After installing your ubuntu app will ask you to create a new account and it will probably need some updating, which you can do by running:

sudo apt-get update

A usability hint: You can copy-paste to the ubuntu app by copying with CTRL-C and right-clicking into the ubuntu-window. You have to make the ubuntu-window active before the right-click. You can copy stuff in the ubuntu-window by marking and pressing CTRL-C.

There are two tools that can get the dependencies and compile Agda for you, “cabal” and “stack”. I prefer to use stack:

https://docs.haskellstack.org/en/stable/README/

After installing, stack asked me to append something to my PATH, which I did only for the session:

export PATH=$PATH:/home/USER/.local/bin

Getting the sources and compiling

Git is preinstalled, so you can just get the agda sources with:

git clone https://github.com/agda/agda.git

Go into the agda folder. It will contain a couple of files with names like

stack-8.8.1.yaml

These are configuration files for stack. The numbers indicate the version of ghc (the haskell compiler) that will be used. I take always the newest version (if everything works with it…) and make a copy

cp stack-8.8.1.yaml stack.yaml

– since stack will just use the “stack.yaml” for configuration when run in a folder. Now:

stack setup

will download ghc binaries and install them somewhere below “HOME/.stack/”. Before building, we have to install a dependency (otherwise there will be a linker error with the current ubuntu app):

sudo apt install libtinfo-dev

Then tell stack to build and hope for the best (that took around 5.2GB of RAM and half an hour on my system…):

stack build

On success, it should look like this:
agda_folders

If you are not confident with finding the locations from the last lines again, you should secure the path from the last lines. We will need “agda” and “agda-mode”, which are in the same folder.

Using Agda

Of course, you can use agda from the command line, but it is a lot more fun to use from emacs (or, possibly atom, which I have not checked out). Since the ubuntu app does not come with a window system and on the other hand our freshly built agda cannot be invoked easily from windows programs, I found it most convenient to run emacs in the ubuntu app, but use an x-server in windows.

For the latter, you can install Xming and start it. Then install emacs in the ubuntu app:

sudo apt install emacs

Before starting emacs, we should install the “agda-mode” for emacs. This can be done by

./stack/install/[LONG PATH FROM ABOVE]/bin/agda-mode setup

Now run emacs with the variables “DISPLAY” set to something which connects it to Xming and “PATH” appended by the long thing from above, so emacs can find agda (and agda-mode):

PATH=$PATH:~/agda/.stack-work/[LONG PATH FROM ABOVE]/bin/ DISPLAY=:0 emacs

Then everything should work and you can test the agda-mode, for example with a new file containing the following:

agda-mode-test

CTRL-C, CTRL-L tells agda-mode to check and highlight the contents of the file.  Here is more on using the agda-mode. Have fun!

Sources: