How to Lay a Minefield

… in Minesweeper, of course.

One of the basic building blocks of my hands-on software development workshops are Coding Katas. Minesweeper, the classic game that every Microsoft Windows user knows since 1992, is one of the “medium everything” Katas: Medium scope, medium complexity, medium demand. One advantage is that virtually nobody needs any more explanation than “program a Minesweeper”.

The first milestone when programming a Minesweeper game is to lay the mines on the field in a random fashion. And to my continual surprise, this seems to be a hard task for most participants. Not hard in the sense of giving up, but hard in the sense that the solution lacks suitability or even correctness.

So in order to have a reference point I can use in future workshops and to discuss the usual approaches, here is how you lay a minefield in an adequate way.

The task is to fill a grid of tiles or cells with a predefined amount of mines. Let’s say you have a grid of ten rows and ten columns and want to place ten mines randomly.

Spoiler alert: If you want to think about this problem on your own, don’t read any further and develop your solution first!

Task analysis

If you pause for a moment and think about the ingredients that you need for the task, you’ll find three things:

  • A die (in the form of a random number generator)
  • An amount of mines to place (maybe just represented by a counter variable)
  • An amount of tiles (stored in a data structure)

Each solution I’ll present uses one of these things as the primary focus of interest. My language for this effect is that the solution “takes the thing into the hands”, while the other two things “lay on the table”. That’s because if you simulate how the solution works with real objects like paper and dice, you’ll really have one thing in your hands and the others on the table most of the time.

Solution #1: The probability approach

One way to think about the task of placing 10 mines somewhere on 100 tiles is to calculate the probability of a mine being on a tile and just roll the dice for every tile.

Here’s some code that shows how this approach might look like in Java:

private static final int fieldWidth = 10;
private static final int fieldHeigth = 10;

public static Set<Point> placeMinesOn(Set<Point> field) {
	Random rng = new Random();
	final double probability = 0.1D;
	for (int column = 0; column < fieldWidth; column++) {
		for (int row = 0; row < fieldHeigth; row++) {
			if (rng.nextDouble() < probability) {
				field.add(new Point(row, column));
			}
		}
	}
	return field;
}

Before we discuss the effects of this code, let’s have a little talk about the data structure that I use for the tiles:

The most common approach to model a two-dimensional grid of tiles in software is to use a two-dimensional array. There is nothing wrong with it, it’s just not the most practical approach. In reality, it is really cumbersome compared to its alternatives (yes, there are several). My approach is to separate the aspect of “two-dimensionalness” from my data structure. That’s why I use a set of points. The set (like a HashSet) is a one-dimensional data structure that more or less can only say “yes, I know this point” or “no, I never heard of this point”. To determine if a certain point (or tile at this coordinate) contains a mine, it just needs to be included in the set. An empty set represents an empty field. If you remove “cleared” mines from the set, its size is the number of remaining mines. With a two-dimensional array, you probably write several loop-in-loop structures, one of them just to count non-cleared mines.

Ok, now back to the solution. It is the approach that holds the die in the hands and uses it for every tile. The problem with it is that our customer didn’t ask for a probability of 10 percent for a mine, he/she asked for 10 mines. And the code above might place 10 mines, or 9, or 11, or even 14. In fact, the code places somewhere between 0 and 100 mines on the field.

The one thing this solution has going for it is the guaranteed runtime. We roll the dice 100 times and that’s it.

So we can categorize this solution as follows:

  • Correctness: not guaranteed
  • Runtime: guaranteed

If I were the customer, I would reject the solution because it doesn’t produce the outcome I require. A minesweeper contest based on this code would end in a riot.

Solution #2: Sampling with replacement

If you don’t take up the die, but the mines and try to dispense them on the field, you implement our second solution. As long as you have mines on hand, you choose a field at random and place it there. The only exception is that you can’t place a mine above a mine, so you have to check for the presence of a mine first.

Here’s the code for this solution in Java:

public static Set<Point> placeMinesOn(Set<Point> field) {
	Random rng = new Random();
	int remainingMines = 10;
	while (remainingMines > 0) {
		Point randomTile = new Point(
			rng.nextInt(fieldHeigth),
			rng.nextInt(fieldWidth)
		);
		if (field.contains(randomTile)) {
			continue;
		}
		field.add(randomTile);
		remainingMines--;
	}
	return field;
}

This solution works better than the previous one for the correctness category. There will always be 10 mines on the field once we are finished. The problem is that we can’t guarantee that we are able to place the mines in time before our player gets bored. Taking it to the extreme means that this code might run forever, especially if your random number generator isn’t up to standards.

So, the participants of your minesweeper contest might not protest the arbitrary number of mines on their field, but maybe just because they don’t know yet that they’ll always get 10 mines dispersed.

  • Correctness: guaranteed
  • Runtime: not guaranteed

This solution will probably work alright in reality. I just don’t see the need to utilize it when there is a clearly superior solution at hands.

Solution #3: Sampling without replacement

So far, we picked up the die and the mines. But what if we pick up the tiles? That’s the third solution. In short, we but all tiles in a bag and pick one by random. We place a mine there and don’t put it back into the bag. After we’ve picked 10 tiles and placed 10 mines, we solved the problem.

Here’s code that implements this solution in Java:

public static Set<Point> placeMinesOn(Set<Point> field) {
	List<Point> allTiles = new LinkedList<Point>();
	for (int column = 0; column < fieldWidth; column++) {
		for (int row = 0; row < fieldHeigth; row++) {
			allTiles.add(new Point(row, column));
		}
	}
	
	Collections.shuffle(allTiles);
	
	int mines = 10;
	for (int i = 0; i < mines; i++) {
		Point tile = allTiles.remove(0);
		field.add(tile);
	}
	return field;
}

The cool thing about this solution is that it excels in both correctness and runtime. Maybe we use some additional memory for our bag and some runtime for the shuffling, but both can be predefined to an upper limit.

Yet, I rarely see this solution in my workshops. It’s only after I challenge their first solution that people come up with it. I’m biased, of course, because I’ve seen too many approaches (successful and failed) and thought about the problem way longer than usual. But you, my reader, are probably an impartial mind on this topic and can give some thoughts in the comments. I would appreciate it!

So, let’s categorize this approach:

  • Correctness: guaranteed
  • Runtime: guaranteed

If I were the customer, this would be my anticipation. My minesweeper contest would go unchallenged as long as nobody finds a flaw in the shuffle algorithm.

Summary

It is suprisingly hard to solve simple tasks like “distribute N elements on a X*Y grid” in an adequate way. My approach to deconstruct and analyze these tasks is to visualize myself doing them “in reality”. This is how I come up with the “thing in hands” metaphor that help me to imagine new solutions. I hope this helps you sometimes in the future.

How do you lay a minefield and how do you find your solutions? Write a blog post or leave a comment!

PostgreSQL’s auto-explain feature and execution plans

PostgreSQL’s auto-explain is a built-in feature that automatically generates and logs execution plans for SQL statements. It’s a useful tool for developers to understand how the query planner is executing SQL queries.

You enable and configure auto-explain by setting parameters in the PostgreSQL configuration file (postgresql.conf). Set auto_explain.log_analyze to on to log execution plans along with statistics, and set auto_explain.log_min_duration to specify the minimum execution time in milliseconds that a query must take to be logged. For example, if you want to log queries taking longer than 100 milliseconds, set it to 100. Set auto_explain.log_buffers to on if you want to include information about memory usage, and auto_explain.log_timing to log timing information.

Here’s an example of how to configure these parameters in postgresql.conf:

auto_explain.log_analyze = on
auto_explain.log_buffers = on
auto_explain.log_timing = on
auto_explain.log_min_duration = 100

Reading the execution plan

Suppose you have a “recipe database” that stores recipes, ingredients, and chefs. You want to retrieve a list of recipes along with the names of the chefs who created them and the ingredients they contain. Here’s a query that accomplishes this:

SELECT recipes.recipe_name, chefs.chef_name, ingredients.ingredient_name
  FROM recipes
  JOIN chefs ON recipes.chef_id=chefs.chef_id
  JOIN recipe_ingredients ON recipes.recipe_id=recipe_ingredients.recipe_id
  JOIN ingredients ON recipe_ingredients.ingredient_id=ingredients.ingredient_id
WHERE recipes.cuisine='Italian';

This query fetches Italian recipes, their respective chefs, and the ingredients they use.

When you run this query with auto-explain enabled, PostgreSQL will log the execution plan. The query plan might look something like this:

Hash Join  (cost=100.25..350.75 rows=50 width=96)
  Hash Cond: (recipe_ingredients.recipe_id = recipes.recipe_id)
  ->  Hash Join  (cost=50.12..200.37 rows=50 width=60)
        Hash Cond: (recipes.chef_id = chefs.chef_id)
        ->  Seq Scan on recipes  (cost=0.00..100.00 rows=50 width=24)
              Filter: (cuisine = 'Italian'::text)
        ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
              ->  Seq Scan on chefs  (cost=0.00..30.00 rows=1000 width=36)
  ->  Hash  (cost=30.00..30.00 rows=1000 width=36)
        ->  Seq Scan on recipe_ingredients  (cost=0.00..30.00 rows=1000 width=36)
              Filter: (recipe_id IS NOT NULL)

In this query plan Hash Join indicates a join operation using a hash-based algorithm. Seq Scan signifies a sequential scan of the table, which might imply a full table scan. Hash Cond shows the join condition for the respective hash join.

cost represents the estimated execution cost for each operation, and rows indicates the estimated number of rows returned by each operation.

The estimated cost in PostgreSQL query execution plans is typically represented in an abstract unit known as “cost units.” These cost units are used for relative cost estimation and are not expressed in any specific real-world measurement like time or money. They are designed to provide a relative measure of the cost of different query plan operations so that the query planner can make informed decisions about which plan to choose.

Reading this plan, PostgreSQL starts by filtering Italian recipes (a Seq Scan with a filter). It then joins the recipes with chefs using a hash join, and the result is further joined with ingredients using another hash join. The cost values provide relative estimates of resource usage, allowing you to identify potentially expensive parts of the query, and you can consider improving the performance of the SQL statement with optimisations like indexing.

Converting a Grails app from war deployment in Tomcat to docker

A few years ago we had real servers with servlet containers installed and managed by administrators. These machines were bare-metal unicorns and had to be kept alive as long as possible (for many years). With the advent of first virtual machines and then container runtimes like Docker the approach and responsibilities for hosting web applications changed.

Our application not only went from Grails framework version 1 to 5 (atm, and soon to version 6) but also the above voyage regarding the hosting environment.

While the step from bare-metal to virtual machine was negligible from the developer and application side the migration to docker-based hosting is quite a step. Therefore I would like to depict the biggest changes of running a Grails application as a WAR deployment in Tomcat to a standalone application in Docker.

The original setting

We had our Grails application running on a server with a Tomcat servlet container for years. On the server we had also an ElasticSearch instance running and saved documents in a few well-defined directories on the local file system. Our database (Oracle in the past, PostgreSQL for over a year now) was running managed in a data center. We used container features like JNDI for parts of the configuration and datasources.

Deployment was essentially uploading a new WAR and context.xml file to the Tomcat servlet container using an Ansible playbook (where we restarted the servlet container to ensure not leaving any resource leaks…).

This setup worked quite well for years, only upgrades of the host operating system along with Tomcat and Java Virtual Machine (JVM) updates meant some work.

The target setup

Our customer has been in the process of converting most of the services running on virtual machines to dockerized deployments managed using Portainer. Most of the provisioning is automated and there is almost no snow flaking of the virtual machines hosting the docker runtimes. This eases the operation and administration of the services a lot and makes it much easier to setup new instances of a service and to monitor them.

So it was our task to migrate our setup on the the host VM to a docker-based deployment. In the future we basically push a docker image of our application to a docker registry of the customer and update the stack using Portainer.

The journey to a dockerized Grails app

The first thing to do were some changes to build.gradle to build a Jar-application instead of a WAR archive. We decided it was easier and more lightweight to run the Grails app standalone with the embedded tomcat than to use Tomcat in a docker container:

  • Remove the gradle war plugin
  • Change providedCompile dependencies to implementation
  • Add a task to prepare the Dockerfile and context to build our bootable application image:
task prepareDocker(type: Copy, dependsOn: assemble) {
    description = 'Copy files from src/main/docker and application jar to Docker temporal build directory'
    group = 'Docker'

    from 'src/main/docker'
    from project.bootJar

    into dockerBuildDir
}
  • Add an appropiate Dockerfile to our docker context in src/main/docker:
FROM eclipse-temurin:11-jdk-jammy
MAINTAINER Mihael Koep "mihael.koep@softwareschneiderei.de"

EXPOSE 8080

WORKDIR /app
COPY naomi-boot.jar .

ENV GRAILS_ENV development
# Additional env variables for configuration

CMD java -Dgrails.env=$GRAILS_ENV -jar /app/application-boot.jar
  • Remove JNDI usages and convert configuration from context.xml and JNDI to environment variables
  • Add a docker-compose.yml for the stack definition (here outlined for development with a database as part of the stack)
version: '3.7'
services:
  my-app:
    container_name: my-app-application
    image: ${IMAGE_TAG}
    environment:
      - GRAILS_ENV=development
      - DB_URL=jdbc:postgresql://my-app-db:5432/my-app
    volumes:
      - my-appdata:/app/data_home
    ports:
      - 8080:8080
    depends_on:
      - my-app-db
      - my-app-elastic
  my-app-db:
    container_name: my-app-database-stack
    image: postgres:13
    environment:
      - POSTGRES_USER=my-app
      - POSTGRES_PASSWORD=my-db-password
      - POSTGRES_DB=my-app
    ports:
      - 5432:5432
  my-app-elastic:
    container_name: my-app-elastic-stack
    image: docker.elastic.co/elasticsearch/elasticsearch:7.8.0
    environment:
      - node.name=es01
      - cluster.name=my-app-es
      - discovery.type=single-node
      - bootstrap.memory_lock=true
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    ulimits:
      memlock:
        soft: -1
        hard: -1
    volumes:
      - searchdata:/usr/share/elasticsearch/data
    ports:
      - 9200:9200
      - 9300:9300
volumes:
  searchdata:
  my-appdata:

Conclusion

The journey from classical deployments to a dockerized environment using a docker stack is not that long for a Grails application (or most other Java web applications). It makes it trivial setting up a new instance or migrating it to another host as most configuration is in version controlled files and there is next to none snowflaking. The customer has an easier time running the web application because the requirements are only a container-runtime and explicitly formulated in the stack definition.

In addition, the developers are free to choose the JVM and other details within the containers without having to cross-check with the administrators of the customer if they are supported by their organization.

Subtle Effects of Real Hardware

One key aspect of my work is writing software that interacts directly with hardware that consists of sensors and actors. A typical hardware setting is a machine that moves big steel barrels (or “drums”) around.

In order to being able to develop my code without physically sitting right besides the machine, which might include being in a loud, hazardous or plain dangerous environment, my software architecture consists of “hardware components” that can be the real thing or a simulation that acts as real as possible.

I’m writing this kind of software for over twenty years now. But regardless of how many simulations of real hardware I’ve written, there is always a catch or at least a surprise with every new hardware.

For this story, we need to imagine a machine that can lift and rotate steel barrels on command. The machine interface consists of several status bits and some command flags. Two status bits are of importance:

  • isMoving: Indicates if the machine is changing positions or standing still.
  • isInPosition: Because the machine’s movement is bounded by physical limit switches, this flag indicates if the machine has triggered a limit switch and stopped.

I wrote the simulation for this machine and developed the application code that performs a series of movements by waiting for the location to be at a limit switch and then issuing the next movement command. Right before the command is sent, the following condition is checked:

boolean commandCanBeSent = !isMoving && isInPosition;

My application worked perfectly with the simulated hardware. But when we switched to the real hardware, the series of movements worked oftentimes, but not always. After investigating a lot of possible error sources, we boiled it down to the condition above. The condition evaluated to true most of the times, but resulted in false every time the series of movements got stuck.

Expanding the logging capabilities of the code revealed that in the error cases, the signals showed isInPosition as true and at the same time, isMoving as true, too. This is a peculiar machine state: It is at the limit switch, but still moving around?

The explanation originates from the modularity of the machine. The isInPosition flag is controlled by the physical limit switches. If one of them has contact to the moving part, the flag evaluates to true. The isMoving flag is controlled by the engine activity. As long as there is substantial engine power consumption, the flag evaluates to true. The crucial aspect of this signal is that a negative engine power consumption (or engine power generation) is still considered a deviation from zero and results in isMoving as true. Which is kind of correct, because in both cases, there will be a translocation.

But why is the engine sometimes indicating movement after it was stopped by the limit switch? The answer lies in the mass of the steel barrel. If the machine was tested empty (without a barrel), everything worked fine. But by using a heavy barrel, the stopping wasn’t as instant as before. The deceleration of the engine took longer and converted the electrical engine in a generator. The mass of the barrel produced energy in the electrical engine when stopped, and it did so long enough to see the combination isInPosition=true and isMoving=true.

My simulation of an engine with limit switches had not included the mass of the moved object until now. In my simulation, the limit switch stopped the engine instantaneously, without any residual effects.

The bugfix was only a small change: When deciding if the next movement command can be sent, my application now waits for a small duration that isMoving switches to false when isInPosition is already true.

This kind of “dirty signals” is prevalent when dealing with real hardware. The dependence on the barrel mass for the effect to show up was a new one for me. Maybe previous PLC programmers at other machine had filtered them out in their control interfaces. Maybe other signals didn’t rely on engine power consumption or ignored negative consumption. Regardless, I will be more careful when simulating signals that indicate moving masses.

How to design a dynamic website with flex containers

Every frontend developer will know this situation: You design a frontend so that all components fit together perfectly. You invest your time, heart, and soul, working tirelessly and frequently testing to see the results. And then you deliver it to a customer, only to find that their display presents something different. The beautiful design looks bad, or some parts are no longer accessible.

To avoid these unpleasant surprises, you should make your layout dynamic. One possibility for achieving this is by using the flex container. In the following, I will demonstrate how to dynamically build an example page using flex containers.

Example page

My example page has a header spanning the entire content. The content is divided into a navigation area on the left and the content page on the right. The content page itself has a header, main content and a button bar. Now let’s cast that into HTML.

<!doctype html>
<html>
    <body>
        <div class="whole-page">
            <div class="header">
                Header<br>
                ...
            </div>
            <div class="content">
                <div class="navigation">Navigation</div>
                <div class="content-page">
                    <div class="content-header">
                        Title<br>
                        ...
                    </div>
                    <div class="main-content">
                        Content
                    </div>
                    <div class="button-bar">
                        Button Bar<br>
                        <button>Save</button>
                        <button>Cancel</button>
                    </div>
                </div>
            </div>
        </div>
    </body>
</html>

And now for the design: I want my page to take up the entire screen. The navigation area should take up 20% of the width, and the page should take up the rest. I want the button bar to be at the bottom of the screen and the main content to take up as much space as possible between the heading and the bar.

Let’s implement this with CSS:

html, body {
    font-family: sans-serif;
    font-size: 20pt;
    height: 100%;
    margin: 0;
}
.whole-page{
    display: flex;
    flex-direction: column;
    height: 100%;
    width: 100%;
}

.content {
    display: flex;
    flex-direction: row;

    flex-grow: 1;
    min-height: 0;
}
.navigation {
    min-width: 20%;
    overflow: auto;

}
.content-page {
    display: flex;
    flex-direction: column;
  
    flex-grow: 1;
    min-width: 0;
}

.main-content {
    flex-grow: 1;
    overflow: auto;

}

First, we configure html and body to take up the entire screen. To do this, we set the height to 100% and remove margins.

In the next step, we say that the layout of the whole page should build up flexibly among itself (column) and set the size to the maximum size of the screen.

For the parts with fixed sizes, like headers and the button bar, no further settings are needed.

The content also gets a dynamic layout, but side-by-side (row). Since row is the default, you would not have to write it explicitly, but I did it here for explanation. With “flex-grow: 1” we make the content take the maximum remaining height. With “min-height: 0” we prevent it from hogging more space than is still available and overflowing the screen.

Our navigation area gets the min-width of 20%. In most cases, width is sufficient, but, for example, with tables with large test cells, the table steals more space from the navigation than allowed. With min-width, this does not happen. As a bonus, the area gets a scrollbar with “overflow: auto” if the content is too big.

The content page arranges the elements in columns again and takes the maximum available width. The main content takes all the space left next to the header and button bar and gets a scrollbar if there is not enough space available.

When I fill my containers with background color and example content, I get the following views:

Conclusion

Flex containers are an easy way to create page layouts dynamically. Thus, applications can run independently of the screen to a certain extent. Of course, the whole thing also has its limits, and for very small devices such as smartphones, completely different operating concepts and layouts must be developed.

When custom React Hooks do not rerender Components on their own – make them.

Depending on who you ask, custom React Hooks are

  • a great way to stash away detailed inner workings of your application, making a) them reusable and b) your component cleaner and less complex
  • a horrible invention that hides away all the dreadful complexities one can think of, and by just making it invsible, not reducing any complexity at all

As usual, one has to calculate that balance depending on the use case, but in most cases I prefer my components to have a rather manageable lines-of-code-count (because this makes it easier to visually analyze their actual JSX structure, i.e. their semantics, what they are supposed to do.

However, sometimes an app grows over time and reaches a level of intricacy that seems to “outsmart” React itself, therefore breaking it. I do not know how to describe it otherwise:

I had a case of nested custom Hooks, in which one inner hook was executing a database query, giving a result and also a function to invalidate() and thus re-execute the query. It had been my understanding, that…

const useOurHook = () => {
    const query = useInnerHookWhichExecutesSomeQuery();

    console.log("query returned", query);

    return {
        result: query.result,
        invalidate: query.invalidate
    };
};

const Component1 = () => {
    const {result} = useOurHook();

    return <div>{JSON.stringify(result)}</div>;
};

const Component2 = () => {
    const {invalidate} = useOurHook();

    return (
        <button onClick={() => invalidate()}>
            invalidate
        </button>
    );
};

… pressing the button in Component2 will update the return value of the inner query hook, thus update the return value of the outer hook and finally update Component1.

However, that just did not happen. Even stranger, I could see my updated query result in the console.log statement within useOurHook(), but Component1 was staying as it was.

It took me several attempts in the inner workings of my both hooks, I tried to wrap the return values inside React.useMemo(), or to specifically put them inside a React.useState() that was explicitly set by a React.useEffect() – which should rather have the same outcome, but then again I do not know the actual React source code by heart – and there was just nothing that helped.

If you have any explanation for me that excels “yeah, React was broken” in its level of insight, please tell me. (maybe I have to read some docs, but it wasn’t obvious…)

So this is what helped. Rather than passing the invalidate function to my components, I decided to use the update functionality of the Redux useSelector() hook in such a way:

const useOurHook = () => {
    const lastRequestAt = useSelector(state => state.somewhere.lastRequestAt); // get timestamp from Redux store
    const query = useInnerHookWhichExecutesSomeQuery();

    React.useEffect(() => {
        if (lastRequestAt > 0) {
            query.invalidate();
        }
    }, [lastRequestAt, query.invalidate]);

    console.log("query returned", query);

    return {
        result: query.result,
    };
};

const Component1 = () => {
    const {result} = useOurHook();

    return <div>{JSON.stringify(result)}</div>;
};

const Component2 = () => {
    const dispatch = useDispatch();

    return (
        <button onClick={() => dispatch(updateRequest())}>
            invalidate
        </button>
    );
};

//////// and somewhere in a Redux slice:

...
reducers: {
    updateRequest: (state) => {
        state.lastRequestAt = Date.now();
    }
}
...

and this brought me the desired results. Now, I saw the update of query.result not only in the console.log, but also in Component1.

Now I agree that it appears quite wasteful to employ something as overbearing as Redux just to work around my weird situation, but I had Redux in my project anyway. I guess you couuld also use another state management or custom useContext() solution to work around this, just to give you an idea.

But I found it quite remarkable. It went against what I knew about React that you can have a hook update (visible in the console.log) without actually having React update a component that uses its return value.

Please, please – if any of you has any hint for insight, or is just curious about my concrete use case – I’ll be happy to discuss.

Optional polymorphism by delegation

A code design pattern I’ve used a lot in recent times is the “optional-based polymorphism” that looks like a delegation to another type that might not be available. It might be an implementation of the FCoI-principle (Favour Composition over Inheritance).

Let’s look at an example: An application has several different engines that move stuff around. Some engines are based on limit switches. They move until they are stopped by a physical switch. The application can make these engines move from one predefined position to the next, but not anywhere in between. Another type of engines is based on a relative position. You give the engine the new target position and it positions itself there, without any limit switches or predefined positions.

Traditional approach

A typical implementation using inheritance would be a common supertype “Engine” that provides the functionality both engine types exhibit. From there, we would define two subtypes that extend the functionality in their desired way. One subtype would be the “LimitSwitchEngine”, the other one the “PositionableEngine”.

Our client code that wants to use a particular engine has two possibilities: It only requires the common functionality of an engine and can work with the supertype. Or it needs to perform a downcast after checking the actual type of the engine.

Cast methods

The optional-based polymorphism guides the client code towards the specific subtype by providing all possibilities in the common interface:

public interface Engine {

	/* Common functionality */
	
	boolean isMoving();
	
	void emergencyStop();
	
	/* optional-based polymorphism */
	
	Optional<LimitSwitchEngine> boundToLimitSwitches();
	
	Optional<PositionableEngine> freelyPositionable();
}

The client code uses the Engine’s interface only as a stepping stone for the specific engine that is required for your use case. If the engine object cannot provide that functionality, you’ll get an empty Optional. Else you retrieve your reference to the specific type and work with it.

Disadvantages

One disadvantage of this approach is the fact that the supertype is aware and even dependent on the different subtypes. You limit the scope of your type hierarchy to the types offered in the “entrance interface”. You can still use the traditional downcast way as described in the introduction for all other types, but that separates them into “featured” and “non-featured” subtypes. So this approach will violate the Open/Closed principle by not being open to extension without modification.

Another disadvantage is that your typical navigation in the IDE doesn’t work as well anymore. If you want to know about all the different types of engines in the system, you can’t just look at the type hierarchy of the Engine type anymore. This is because of the first advantage this pattern brings:

Advantages

Not only gets this style rid of the downcast, it frees your type system up in two different dimensions: The LimitSwitchEngine and PositionableEngine don’t need to be subtypes of Engine. They can be totally independent types with no real connection to the Engine. And they can be different instances. Of course, there is no need to use any of these freedoms. You can still inherit PositionableEngine from Engine and implement both types in the same object. But it isn’t mandatory anymore.

Another advantage is discoverability. Your typical type hierarchy lookup in the IDE is replaced with code completion lookup. If you get the names right, this pattern feels like writing code on rails, because your code completion proposals will lead you to the correct place.

Your opinion

What is your opinion on this pattern? What would you expect from a code design that provides those “casting” methods? Tell us in the comments!

Porting an application from Oracle to PostgreSQL

Currently, we are in the process of porting a larger application from an Oracle database to a PostgreSQL database as an additional database system. The aim is for the application to be functional not only with one database system but with both: with Oracle for one customer and with PostgreSQL for another customer.

The first step in this process was to conduct a high-level analysis of what effort this would entail and where the application would need to be adjusted. The application consists of multiple services. A frontend service in Ruby on Rails and a variety of backend services in Java. Rails uses an O/R mapper called Active Record, which is relatively database-agnostic. No major adaptation effort was expected here. Ultimately, only the Active Record adapter needs to be configured per instance.

The Java services do not use an O/R mapper. They make direct SQL queries to the target database system using JDBC. This is where the greatest potential for issues existed. Fortunately, all database queries here are encapsulated in repository classes, making the SQL queries easily accessible and adaptable for different database management systems. However, this has not proven necessary so far. The SQL queries are written in standard SQL, without using database-specific features. The code is limited to basic features such as tables, views, sequences, SELECTs, INSERTs, UPDATEs, and JOINS, all of which are supported by both databases. Particularly, no stored procedures are used, which wouldn’t necessarily be portable.

For implementation, the initial goal was to make the existing Oracle database schema of the application reproducible on a PostgreSQL database, in a way that a new instance can be set up fresh in a Docker container at any time.

The best way to achieve this is with a database migration tool. The options considered were: Active Record Migrations, Liquibase, and Flyway. We ultimately chose Liquibase, which we have had positive experiences with. It can be used standalone and allows for formulating migrations in a database-independent manner. Additionally, it offers the option to export an existing schema as a migrations changelog file using the generate-changelog command.

We only had to slightly adjust the generated changelog to make it work for both database systems. In a Liquibase changelog, you can define variables referred to as properties, which can take on different values for various database systems. One use case for this is having different names for column data types:

<property name="varchar" value="varchar" dbms="postgresql"/>
<property name="varchar" value="VARCHAR2" dbms="oracle"/>

These properties can then be used, for example, in the following way:

<column name="DESCRIPTION" type="${varchar}(4000)">

Here’s a helpful table that maps Oracle datatypes to their equivalent PostgreSQL data types. After some editing of the changelog, mainly through search and replace, we had a migration for the database schema that works for both systems.

Exploring Tango Admin Devices

We are using the open-source control system framework TANGO in several projects where coordinated control of multiple hardware systems is needed.

What is TANGO good for?

TANGO provides uniform, distributed access and control of heterogeneous hardware devices. It is object-oriented in nature and usually one hardware device is represented by one (or more) software devices.

The device drivers can be written in either C++, Java or Python and client libraries exist for all of these languages. Using a middleware adapter like TangoGQL any language can access the devices.

All that makes TANGO useful for building SCADA systems ranging from a handful controlled devices to several hundreds you want to supervise and control.

Lesser know features of TANGO

All of the above is well known in the TANGO and SCADA community and quite straightforward. What some people may not know is that TANGO automatically provides an Admin-device for each TANGO server (an executable running one or more TANGO devices).

These admin devices have an address of the form dserver/<server_name>/<instance_name> and provide numerous commands for controlling and querying the TANGO device server instance:

You can for example introspect the device server to find available device classes, device instances and needed device properties (think of them as configuration settings).

In addition to introspection you can also control some aspects of the TANGO server like polling and logging. The Admin-device also allows restarting individual devices or even the whole server instance. This can be very useful to apply configuration changes remotely without shell access or something similar to the remote machine.

Wrapping it up

Admin-devices automatically exist and run for each TANGO device server. Using them allows clients to explore what devices are available, what they offer and how they can be configured. They also allow some aspects to be changed remotely at runtime.

We use these features to provide a rich web-base UI for managing the control system in a convenient way instead of relying on the basic tools (like Jive and Astor) that TANGO offers out-of-the-box.

The Fragmented Sources of Truth for a Software Project

For each software application, there only exists one single, authoritative source of truth: The source code. If something isn’t in the code, it doesn’t exist. This source of truth is so important that we invented version control (or source control) that allows us to:

  • travel backwards in time
  • create alternative realities
  • progress multiple realities concurrently

That’s pretty awesome and something that not many professions can rely on. It is a “hidden superpower” of software development.

But when you look at a software project and not just the application, there is a lot more “truths” or information available than what fits into the source code. Let’s have a look at a few of them:

The ticket system

The ticket system or issue tracker or bug tracker or whatever you call it is a glorified to-do list that tells people what is lacking in the source code.

One view on the ticket system could be that of a health record system. Each ticket represents an ailment that the software application has. If it’s a bug, it is clearly in an undesirable state that needs to be “healed”. If it’s a new feature, the medical metaphor doesn’t fit perfectly, but we can view our development work as some kind of plastic surgery that makes the software more appealing to the customer.

Either way, the ticket system holds episodical wisdom. It explains the state of our application in hundreds or thousands of more or less independent short stories that are worked on in isolation. To gain a complete vision about the software project from the ticket system alone is possible, but cumbersome.

If you think about it, there is a clear connection between a short story (ticket) and one alternative reality in which the application is told about the story.

The wiki

For each project, there is a lot of information that is fluid, but not episodical. The current state of truth is valid until it gets replaced by a newer truth. Attempting to capture this information in tickets would result in an awkward lack of oversight.

Luckily, there is a tool that was invented specifically for this type of information: The wiki or the editable website graph. Your project can claim an area in this graph and fragment the information according to the mental model of the project team. Every time some outdated information is found, it can be updated in place. Every time some information is not found, it can be added in the place where it was anticipated.

The file storage

Every software project that I know has a lot of accompanying documents that are important for the project, but maybe not so much for the actual day-to-day development. Depending on who you ask, these documents may very well constitute “the project” and everything “below” them are just necessary technicalities.

The nature of a document is that it exists forever once created. There are lots of attempts to bring version control to the document world, but a typical question in this area is: “Is my document still valid?”

Because documents are represented by files (in the digital and the analog world), a file storage is the least we need to manage them. If your file storage entices you to name your documents “_latest”, “_newer”, “_version2” or something like that, you probably want to step up your document versioning game. Document management systems (DMS) might be what you are looking for. For small teams, a central instance of a Nextcloud might already be sufficient.

The derived documentation

If you happen to develop a software product, you need to provide a user manual and additional technical documentation. These documents need to be in eventual synchronisation with your first and central source of truth: The source code. And because your source code adapts, these documents need to adapt constantly, too.

This is the area where our company has the most “room for improvement”. I’m not diving into details here because I know our approaches are not sustainable.

Single source of truth?

The problem with this fragmented approach to capture the whole of a project is that you need to study all the different places and combine the information in your head. And not only you, every team member has to do this.

You can try to combine different sources into one:

If you squint really hard, you might think that a wiki can replace a ticket system, because each ticket can be represented by a graph node and the linking might resemble a grouping mechanism. My uninformed guess would be that this replaces software specialization with the need for human discipline. But maybe it can work and I just don’t know about the proper tooling yet?

One rather obvious integration might be to put the wiki alongside the code. I haven’t seen a good solution for merge conflicts yet, but maybe it is possible somehow.

Putting the file storage into your source repository makes it bigger and unwieldy, but it would be a natural step towards single sourcing – until you want to give your code repository away without revealing your company’s contracts. Suddenly, separate storage areas become important.

The one thing I struggle to integrate into the source repository is the derived documentation. I can think about storing the documents alongside the code and even requiring to update them before a merge request of a feature branch is accepted, but I shudder to think about the inevitable merge requests that need to be resolved.

Maybe there is a suitable solution out there that I’m missing? Leave a hint in the comments!