Compiling and using Agda through the Windows Linux Subsystem

The Windows Subsystem for Linux (WSL) more or less runs a linux kernel on Windows 10. In this post, I will describe how to use WSL to compile and run Agda, a dependently typed functional programming language. Compiling agda yourself makes sense if you want to use the latest features, of which there are quite nice ones. The approach presented here is just my preferred way of compiling and using Agda on a Linux system with some minor adjustments.

Prerequisits for compiling

First, you need the “ubuntu app”, you can install it following this guide. Essentially, you just have to activate WSL and install the app through the Microsoft Store, but following the guide step by step allowed me to do it without creating a Microsoft account.
After installing your ubuntu app will ask you to create a new account and it will probably need some updating, which you can do by running:

sudo apt-get update

A usability hint: You can copy-paste to the ubuntu app by copying with CTRL-C and right-clicking into the ubuntu-window. You have to make the ubuntu-window active before the right-click. You can copy stuff in the ubuntu-window by marking and pressing CTRL-C.

There are two tools that can get the dependencies and compile Agda for you, “cabal” and “stack”. I prefer to use stack:

https://docs.haskellstack.org/en/stable/README/

After installing, stack asked me to append something to my PATH, which I did only for the session:

export PATH=$PATH:/home/USER/.local/bin

Getting the sources and compiling

Git is preinstalled, so you can just get the agda sources with:

git clone https://github.com/agda/agda.git

Go into the agda folder. It will contain a couple of files with names like

stack-8.8.1.yaml

These are configuration files for stack. The numbers indicate the version of ghc (the haskell compiler) that will be used. I take always the newest version (if everything works with it…) and make a copy

cp stack-8.8.1.yaml stack.yaml

– since stack will just use the “stack.yaml” for configuration when run in a folder. Now:

stack setup

will download ghc binaries and install them somewhere below “HOME/.stack/”. Before building, we have to install a dependency (otherwise there will be a linker error with the current ubuntu app):

sudo apt install libtinfo-dev

Then tell stack to build and hope for the best (that took around 5.2GB of RAM and half an hour on my system…):

stack build

On success, it should look like this:
agda_folders

If you are not confident with finding the locations from the last lines again, you should secure the path from the last lines. We will need “agda” and “agda-mode”, which are in the same folder.

Using Agda

Of course, you can use agda from the command line, but it is a lot more fun to use from emacs (or, possibly atom, which I have not checked out). Since the ubuntu app does not come with a window system and on the other hand our freshly built agda cannot be invoked easily from windows programs, I found it most convenient to run emacs in the ubuntu app, but use an x-server in windows.

For the latter, you can install Xming and start it. Then install emacs in the ubuntu app:

sudo apt install emacs

Before starting emacs, we should install the “agda-mode” for emacs. This can be done by

./stack/install/[LONG PATH FROM ABOVE]/bin/agda-mode setup

Now run emacs with the variables “DISPLAY” set to something which connects it to Xming and “PATH” appended by the long thing from above, so emacs can find agda (and agda-mode):

PATH=$PATH:~/agda/.stack-work/[LONG PATH FROM ABOVE]/bin/ DISPLAY=:0 emacs

Then everything should work and you can test the agda-mode, for example with a new file containing the following:

agda-mode-test

CTRL-C, CTRL-L tells agda-mode to check and highlight the contents of the file.  Here is more on using the agda-mode. Have fun!

Sources:

Think of your code as a maintenance minefield

Most of the cost, effort and time of a software project is spent on the maintenance phase, the modification of a software product after delivery. If you think about all these resources as “negative investments” or debt settlement and try to associate your spendings with specific code areas or even single lines of code, you’ll probably find that the maintenance cost per line is not equally distributed. There are lots of lines of code that outlast the test of time without any maintenance work at all, a fair amount of lines that require moderate attention and some lines that seem to require constant and excessive developer care.

If you transfer this image to another metaphor, your code presents itself like a minefield for maintenance effort: Most of the area is harmless and safe to travel. But there are some positions that will just blow up once touched. The difference is that as a software developer, you don’t tread on the minefield, but you catch the flak if something happens.

You should try to deliver your code free of maintenance mines.

Spotting a maintenance mine

Identifying a line of code as a maintenance mine after the fact is easy. You probably already recognize the familiar code as “troublesome” because you’ve spent hours trying to understand and fix it. The commit history of your version control system can show you the “hottest” lines in your code – the areas that were modified most often. If you add tests for each new bug, you’ll find that the code is probably tested really well, with tests motivated by different bug issues. In hindsight, you can clearly distinguish low-effort code from high maintenance code.

But before delivery, all code looks the same. Or does it?

An example of a maintenance mine

Let’s look at an example. Our system monitors critical business data and sends out alerts if certain conditions are met. One implementation of the part sending the alerts is a simple e-mail sender. The code is given here:


public class SendEmailService {

  public void sendTo(
                Person person,
                String subject,
                String body) {
    execCmd(
         buildCmd(
               person.email(), subject, body));
  }

  private String buildCmd(String recipientMailAdress, String subject, String body){
    return "'/usr/bin/mutt -t " + recipientMailAdress + " -u " + subject + " -m " + body + "'";
  }

  private int execCmd(String command) throws IOException{
    return Runtime.getRuntime()
                  .exec(command).exitValue();
  }
}

This code has two interesting problems:

  • The first problem is that it is written in Java, a platform agnostic programming language, but depends on being run on a linux (or sufficiently similar unixoid) operating system. The system it runs on needs to supply the /usr/bin/mutt program and have the e-mail sending settings properly configured or else every try to run the send command will result in an error. This implicit dependency on the configuration of the production system isn’t the best way to deal with the situation, but it’s probably a one-time pain. The problem clearly presents itself and once the system is set up in the right way, it is gone (until somebody tampers with the settings again). And my impression is that this code separates two concerns between development and operations rather nicely: Development provides software that can send specific e-mails if operations provides a system that is capable of sending e-mails. No need to configure the system for e-mail sending and doing it again for the software on said system.
  • The second problem looks like a maintenance mine. In the line where the code passes the command line to the operating system (by calling Runtime.getRuntime().exec()), a Process object is returned that is only asked for its exitValue(), implicating a wait for the termination of the system command. The line looks straight and to the point. No need to store and handle intermediate objects if you aren’t interested in them. But perhaps, you should care:

By default, the created process does not have its own terminal or console. All its standard I/O (i.e. stdin, stdout, stderr) operations will be redirected to the parent process, where they can be accessed via the streams obtained using the methods getOutputStream(), getInputStream(), and getErrorStream(). The parent process uses these streams to feed input to and get output from the process. Because some native platforms only provide limited buffer size for standard input and output streams, failure to promptly write the input stream or read the output stream of the process may cause the process to block, or even deadlock.

Emphasize by me, see also: https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Process.html

This means that the Process object’s stdout and stderr outputs are stored in buffers of unknown (and system dependent) size. If one of these buffers fills up, the execution of the command just stops, as if somebody had paused it indefinitely. So, depending on your call’s talkativeness, your alert e-mail will not be sent, your system will appear to have failed to recognize the condition and you’ll never see a stacktrace or error exit value. All other e-mails (with less chatter) will go through just fine. This is a guaranteed source of frantic telephone calls, headaches and lost trust in your system and your ability to resolve issues.

And all the problems originate from one line of code. This is a maintenance mine with a stdout fuse.

The fix for this line might lie in the use of the ProcessBuilder class or your own utility code to drain the buffers. But how would you discover the mine before you deliver it?

Mines often lie at borders

One thing that stands out in this line of code is that it passes control to the “outside”. It acts as a transit point to the underlying operating system and therefor has a lot of baggage to check. There are no safety checks implemented, so the transit must be regarded as unsafe. If you look out for transit points in your code (like passing control to the file system, the network, a database or another external system), make sure you’ve read the instructions and requirements thoroughly. The problems of a maintenance mine aren’t apparent in your code and only manifest themselves during the interaction with the external system. And this is a situation that happens disproportionately often in production and comparably seldom during development.

So, think of your code as a maintenance minefield and be careful around its borders.

What is your minesweeper story? Drop us a comment.

Using parameterized docker builds

Docker is a great addition to you DevOps toolbox. Sometimes you may want to build several similar images using the same Dockerfile. That’s where parameterized docker builds come in:

They provide the ability to provide configuration values at image build time. Do not confuse this with environment variables when running the container! We used parameterized builds for example to build images for creating distribution-specific packages of native libraries and executables.

Our use case

We needed to package some proprietary native programs for several linux distribution version, in our case openSuse Leap. Build ARGs allow us to use a single Dockerfile but build several images and run them to build the packages for each distribution version. This can be easily achieved using so-called multi-configuration jobs in  the jenkins continuous integration server. But let us take a look at the Dockerfile first:

ARG LEAP_VERSION=15.1
FROM opensuse/leap:$LEAP_VERSION
ARG LEAP_VERSION=15.1

# add our target repository
RUN zypper ar http://our-private-rpm-repository.company.org/repo/leap-$LEAP_VERSION/ COMPANY_REPO

# install some pre-requisites
RUN zypper -n --no-gpg-checks refresh && zypper -n install rpm-build gcc-c++

WORKDIR /buildroot

CMD rpmbuild --define "_topdir `pwd`" -bb packaging/project.spec

Notice the ARG instruction defines a parameter name and a default value. That allows us to configure the image at build time using the --build-arg command line flag. Now we can build a docker image for Leap 15.0 using a command like:

docker build -t project-build --build-arg LEAP_VERSION=15.0 -f docker/Dockerfile .

In our multi-configuration jobs we call docker build with the variable from the axis definition to build several images in one job using the same Dockerfile.

A gotcha

As you may have noticed we have put the same ARG instruction twice in the Dockerfile: once before the FROM instruction and another time after FROM. This is because the build args are cleared after each FROM instruction. So beware in multi-stage builds, too. For more information see the docker documentation and this discussion. This had cost us quite some time as it was not as clearly documented at the time.

Conclusion

Parameterized builds allow for easy configuration of your Docker images at image build time. This increases flexibility and reduces duplication like maintaining several almost identical Dockerfiles. For runtime container configuration provide environment variables  to the docker run command.

Book review: Developer Hegemony

At the end of 2018, I searched for new software development books to read and came across a list that spiked my interest. My impulse was to buy and read all five books. I’ve bought them all, but only read four of them yet. You can read another book review from this list in a previous blog post.

The book I was most sceptical about came with a black cover and the menacing title “Developer Hegemony” by Erik Dietrich. It’s not a book about software development, it is a book about the industry of software development and why it is fundamentally different than “traditional” industries. And it is a book that promises an outlook on “the future of labor”, at least for us developers. Spoiler: It’s not about taking over the world, as the cover image suggests. It’s about finding your way in an industry that is in very high demand and mostly consists of players that play by the rules of an entirely different game: industrial manufacturing.

Let’s have a short overview about the content: My impression of the book was that it consists of three parts, even if there are five distinct parts in the table of contents:

  • The first part takes a good hard look on the current situation and identifies the losers and “winners” of the game. It introduces a taxonomy of industry employees that all give up on something in exchange for some personal gain. What that something is depends on the worker type. This is like the setting up on a chess board. You get to know the pieces and their characteristics and realize that they are mostly pawn cannon fodder.
  • The second part puts the taxonomy in action and describes the carnage unfolding on the chess board. The grim message is that the only winning move is to own the board itself and make up the rules, but never participate in the game. And if you find yourself on the board, keep moving sideways like the bishop and change your color often. Don’t associate with any team and don’t engage in any stalemate situations. The author describes the “delivery game” very illustrative: If you are responsible of delivering something, you might succeed, but you can also fail. If you are only responsible of counseling a delivery, you can attach yourself to success and detach from failure more easily. Be the bishop and evade delivery responsibility by an elegant sidestep. This part is especially gruesome because it describes in detail how technical expertise in software development is a recipe to remain at the center of the delivery game. It makes every passionate developer’s heart ache.
  • The third and last part shows an alternative to the “own the board or be a pawn” dichotomy. Emotionally, it rescues the developer enthusiast. The message is soothing: You can continue to develop software, but you have to step up and own the results of your development, too. This means effectively being self-employed and acting like a business entity. Yes, I’ve said it wrong: I meant being self-employed and being a business entity. You can probably count on being in high demand for years to come, so the step from “developer” to “entrepreneur” is not as big as you probably are afraid of. And you don’t need to be strictly alone: Find partners and associate with them. But don’t stop being your own business entity. Don’t shed your self-confidence: You are the world’s most sought-after expert.

This book left me speechless. I’ve founded my company nearly twenty years ago without Erik Dietrich’s experience, just based on my beliefs that I couldn’t even articulate. And now he spelled them out for me in detail. Don’t get me wrong: My company is different from Erik Dietrich’s ideal of an “efficiencer company” in many details, but in the root of the matter, this book describes my strategic business alignment and my reasons for it perfectly.

But even besides my own affection to the topic, the book provides a crystal clear view on the software industry and a lot food for thought. Even if you don’t plan to leave your corporate job anytime soon, you should at least be clear about the mechanics of the game you participate in. The rules might differ from company to company, but the mechanics stay the same.

Do yourself a favor and read this book. You don’t have to change anything about your job situation, but you are invited to think about your stance and position in this industry. For my last sentence in this blog entry, let me spoil you one main difference of our industry in comparison to others: If all you need to develop first-class software is a decent notebook and some coffee, why are you still depending on your employer to provide both for you in exchange of all the surplus of your work? That’s the world’s most expensive coffee.

A new star in software verification?

F* (pronounced F star) is a functional programming language with a dependent type system that supports verification. What follows is a story about my experiences with F* and a concluding opinion why languages like F* could be useful for the working software developer.

There is a big project using F* and related tools for the ambitious goal to build a verified implementation of TLS.
So far, judging from the website, there is a verified efficient implementation of the relevant cryptographic primitives but their implementation of the protocol is not yet verified. That is certainly already a great achievement and got me interested F*.

Setting it up

There is a F*-Tutorial which comes with an online editor, which is great if you just want to play a bit with the language. You don’t have to install anything, provided you have a recent mainstream browser. The tutorial is fun to do, but after I spent some time on it, I wanted to know what it is like to use F* not on a prescribed path.

Sooner or later, you probably want to install it on your machine, which is not a problem at all, if you are on linux system and an emacs user. There is a binary and an emacs mode which is also easy to install on windows but the combination was somehow forbiddingly slow on my (fast) machine. In the following, pictures will show excerpts from the emacs-mode.

Programming in F*

Let me first show you, what programming in F* looks like. Here is a definition of the commonly known (higher order) function “filter” which takes a predicate (function to bool in this case) and a list and returns the “sublist” of all things on the list matching the predicate:

filter1

The arrows and strange dots are just “->” and “::” rendered by the emacs mode. Otherwise, it is exactly the pattern matching definition that would also work in OCaml or F# (where you can write exactly the same thing):

The second argument “l” is matched against the two things a list might be, either the empty list “[]” or a list with consisting of some first element “x” and a list “xs”. In the first case, we are done and in the second case we decide if “x” should be in the result list and recursively call “filter” to deal with the “xs”.

The type of “filter” can be inferred by F*, but we could have declared it before the definition like this:

filter3

The alpha is really fancy emacs-mode rendering for ” ‘a ” and stands for an arbitrary type. This is also the type F* would infer, since what we did does not require anything special from the element type of the lists.

So the declaration says that our function takes some predicate “p” a list of things p may be applied to and returns a list of the same type just with “Tot” written in front of it. The latter tells us, that filter is a total function, i.e. it terminates and will not throw an exception (I guess there is a bit more to it, but so far that explanation worked well for me). There is also a modifier “ML” which would tell us, that the function behaves like a function in OCaml – so it could throw runtime exceptions or loop forever. After introducing you to some verification features, I will tell you why this distinction is important here.

Let us first rewrite our declaration a bit:

filter4

I replaced the alpha with a fixed type, the natural numbers (that will simplify things…) and introduced variables “p” and “l” that we can use now to construct types. What kind of types? Well, in this case, I want to construct a replacement for “(list N)” which tells us a bit more about the result of filtering a list. One reason to have types is to make sure, that the result of some construction meets a specification. Usually these kinds of specifications are quite coarse and might just tell you, that the result is an integer. So far, the specification of the filter function only tells us, that the result will be a list of natural numbers. In F*, we can do a lot more. The following specifies the result of filter to be the sublist of all elements of “l” that satisfy “p”:

filter5

Between the curly braces, I put a formula in predicate logic, which describes this specification. The ” ” is a logical “and”, the “∀” is the “forall” from math and ” element x l’ ” is some function which evaluates to true if x is an element of l’ and false if this is not the case.

Now, one of the things that make me happy when playing with F* is that the definition of filter I gave in the beginning checks against this new declaration as is. This is due to F* using a problem solver to prove that our specification is satisfied and the not-so-random coincidence that to prove the formula, you can use the same structural induction on the list as we used to recursively define “filter”.

For completeness, here is my definition of “element”:

element1

The “#a:eqtype” means, that “a” is some type such that comes with a total equality function for terms of type “a”. The natural numbers used above are such an “a”. Btw: Those natural numbers are not limited in size, like e.g. “unsigned int”. They behave a lot like “BigInteger” in some languages and are certainly not fast, but they are good for proving things.

And now, I should tell you, that the declaration of “filter” which nicely specified what filter does, is probably not what you should do. In fact, it is usually better to split things into definitions and “Lemma”s. So here is again the original definition of filter (with inferred type) and a Lemma about it:

filter6

The “Lemma” is again like “Tot” above – it modifies the type. Technically, “filter_specification” is a function, but the only thing important here, is if its definition (“let rec filter_specification = […]”) type checks with the declaration (“val filter_specification […]”) given – because that means that the definition is a proof of the proposition encoded in the declaration. In particular, it is not important what this function returns. In fact, “Lemma” produces a type all of whose terms are equal.

Verifying Project Euler exercises

First thing I usually do when checking out a new programming language is to solve a couple of Project Euler exercises to get fluent in the basic constructions. So, I thought, why shouldn’t I try the same with F* and prove something about my solution code?

Problem 1 asks us to add all multiples of 3 and 5 below 1000. Usually, my mind drops the “below” and I get a wrong result. But not so this time! Here is my definition of the natural numbers below 1000, which produces the correct list of numbers, since specifying precisely what it is made me read the problem text very carefully:

naturals_below_1000

Now, “naturals_below_1000” is specified by its type, which turned out to work well in this case, since it is not a general purpose function I will reuse.

Now for the task at hand, we need to filter numbers that are multiples of 3 or 5. I took the liberty to equivalently say, numbers divisible by 3 or 5:

divisible

I used modulo arithmetic “%” defined in some library to have an easy definition of “divides”. Also note, that I use definitions in proving that I defined the correct list. Both are a bit fishy if I were to claim that I proved some term in my program meets some mathematical definition commonly used for the concepts in the problem text. So you could call that lazy or efficient according to taste.

Now, the number which the exercise asks for can be defined:

ex1

F* can not just check if everything is correct but also evaluate expression. In particular the expression “ex1” which turns out to be the correct answer to the problem.

Is it of use for the working software developer?

I used Scala some years ago to develop a large prototype and, while there were some pains, it worked about as well as expected and I’m quite sure it would have been less fun and less efficient to use Java for the same project. One important point in the decision to use Scala was the possibility to use Java libraries.
F# has the same kind of advantage and I learned to love its type system from OCaml, which apparently was the basis for F#. So for the same reasons that made Scala work for me when constructing prototypes, I could decide to use F# in similar situations. I don’t know, how realistic my hope is, that F* can be used on top of F# one day, but that would be a great improvement over just having F# which is already something I look forward to. So far, it as possible to extract F# code from an F* program.

Usually, tools like F* are thought of and presented as tools to effectively prevent bugs, i.e. catching all of them no matter the effort. I think, what we need for our daily programming tasks are tools for preventing bugs efficiently. My guess is, that dependent types in a practical language like F# are such a tool. This is of course a heavily biased view, since I have a background involving some dependent type theory, which makes me blind to the effort of learning how dependent types work.

One problem with dependently typed languages is, that small changes can break lots of proofs that took effort to prove. This is particularly bad, if you spell out the proofs in all detail and it means refactoring can become quite expensive.

From what I have seen, the use of a problem solver when type checking reduces this problem quite a bit, so this might not be a real issue here and with other dependently typed languages that do something similar. And of, substitute technologies that achieve similar goals, like unit tests or comments,  also increase refactoring work.

So ultimately, apart from the fun, I also look forward at using F* as tool. As I mentioned before, I already knew about dependent types and could translate what I see when learning F*, so I can’t tell you how much effort it takes, if you don’t already know those things. But what I can tell you is, that learning dependent type theory was certainly among the most rewarding scientific experiences in my life, so maybe you want to do that anyway.

xBase gotchas

For the last year, I have regularly worked on a legacy project written in xBase / Clipper, which is a dialect of the dBase programming language. The language as a whole is surprisingly comfortable for its age, being 40 years old now. But there were a few quite infuriating things I stumbled upon:

Trailing commas in arrays

The language allows you to define array contents inline like this:

myArray := { 1, "foo", .t., 4.0 }

It is also dynamically typed, as you can see in this example – where the array holds numbers, strings and boolean (logical) values. Now sometimes, you want to spread-out an array initialization over multiple lines. xBase lets you do that – using the semi-colon quite contrary to many other languages – to continue a line:

myArray := {;
  "foo",;
  "bar";
}

Now you might be temped, as I was, to write it like this instead:

myArray := {;
  "foo",;
  "bar",;
}

With a trailing comma. But, gotcha, that does not do what you probably expect. This actually creates an array with 3 elements, where the last element is nil.

Unequal is not not-equals

xBase has both the == and the != operators. But imagine my suprise when and if a != b was not entered, with what had to be unequal strings. I added some logging (yea, no debugger) and saw that I indeed had the strings "" and "foo". Still, a != b seemed to evaluate to false. Because I could not believe that, I changed it to !(a == b) – and it worked!

It turns out that the != operator is not the opposite of the == operator, but rather of the = operator (xBase uses := for assignment). The = operator, however, implements some kind of “fuzzy” equals by default, also evaluating true if the left operand is a prefix of the right operand. This is a behaviour that can be changed globally by “SET EXACT on”. Wow.

How to disable IP address logging for Apache web server and Tomcat

The General Data Protection Regulation (GDPR) prohibits excessive or unnecessary collection of personal data. IP adresses in server log files are considered personal data.

Logging of IP addresses is usually enabled by default in fresh web server installations. This article describes how to disable it for the Apache web server and the Tomcat application server, which are a common combination for JVM based web applications.

Apache Web Server

The apache web server logs the HTTP requests from web clients in log files called *access.log, which includes IP addresses. Another apache log file, which can contain IP addresses is error.log.

To disable IP address logging for Apache edit the main configuration file, usually called httpd.conf and scan it for LogFormat entries, for example:

LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\"" combined

The format ‘verb’ %h stands for the hostname or IP address. This is what you want to remove. You may also want to remove the “Referer” and “User-Agent” components of the log format for privacy:

LogFormat "%l %u %t \"%r\" %>s %O" combined

Restart the server for the changes to take effect.

Tomcat Application Server

For Tomcat you have to configure the so-called AccessLogValve in the server.xml configuration file located in the $CATALINA_HOME/conf directory. The configuration entry looks like this:

<Valve className="org.apache.catalina.valves.AccessLogValve" directory="logs"
       prefix="localhost_access_log." suffix=".txt"
       pattern="%h %l %u %t "%r" %s %b" />

The pattern attribute specifies the log format. Again, the relevant format verb you want to remove is %h for the hostname or IP address:

       pattern="%l %u %t "%r" %s %b" />

Restart the server for the changes to take effect.