Pie-Decoration with Python

Some while ago, I found out that it’s actually quite easy to write your own decorators in Python. That thing that is called a “pie” syntax, available since Python 2.4 but still is mostly only the stuff of some frameworks like Django, Flask or PyTango. As to the “what” and especially “why??”, one might benefit from some thoughts I recently verified.

Motivation: As several of our software projects help our customers in supporting their experimental research with Supervistory Control and Data Acquisition (SCADA) applications, we have build a considerable knowledge base around the development with the open-source Tango Controls system. For me as a developer, its Python implementation PyTango makes my life significantly easier, because I can sketch out Tango applications, like Tango Device Servers, clients for data analysis, plotting live data etc. with little to none overhead. Less than an hour of development time usually gets you started.

PyTango actually makes it that easy to code the basic functionality of a device in a network structure with all the easy Tango features – well, you have to set up Tango once. But then the programming becomes easy. E.g. a current measurement device which is accessible from anywhere in your network could basically only need:

class PowerSupply(Device):
 
    def init_device(self):
        Device.init_device(self)
        self._current = 0
 
    @attribute(label="Current", unit="mA", dtype=float)
    def current(self):
        return self._current
 
    @current.write
    def current(self, current):
        self._current = current

Compare that to the standard C++ Tango implementation, where is all that would be loads of boilerplote code.

Now, this not unusual for Python. Hiding away the ugly implementation in its libraries or frameworks so you can focus on the content. However, you need to trust it, and depending on what kind of Python code you usually enjoy to read, this can make you think a bit. What is this usage of @ doing there? Isn’t this Java code??

But to the contrary: in Java, the @annotations are a way of basically interfering with the way the compiler itself behaves. For Python, the pie is pure syntactical sugar. In a way, they both address a similar way of your source code telling you “the next thing is treated in a specific way”, but whereas you would dig quite deep in Java to produce useful custom annotations (if you know otherwise, tell me), decorating pies in Python can easily help you to dynamically abstract layers of code away to places where it’s nicer and cleaner.

It turns out: A Python decorator is any function that maps a function to a function; i.e. it takes a function as an argument and returns the same or a different function. Easy enough; as in Python, functions are first-class objects. They can be passed around and extended accordingly.

But… Every programming pattern comes with the question: “What do I gain for learning such a new habit?”

Can and should you write your own decorators?

Under the hood, something like this happens. You define a decorator e.g. like

def my_decorator(func):
    print("the decorator is initialized")

    def whatever():
        print("the function is decorated")
        return func()

    return whatever

giving you instant access to your own decorator magic:

@my_decorator
def my_func():
    print("the decorated function is called")

A function call of my_func() now results in the output

the function is decorated
the decorated function is called

while the first “the decorator is initialized” in only executed at the declaration of the @my_decorator syntax itself.

This pie syntax is pure syntactical sugar. It does nothing more than if you instead had defined:

def my_anonymous_func():
    print("the decorated function is called")
my_func = my_decorator(my_anonymous_func)

But it appears somehow cleaner, avoiding the need of the two “my_anonymous_func” that are not useful anywhere else in your code. You stash away more irrelevant implementation details from the actualy meaning of “what is this method/function actually for?”. Now you can pack my_decorator in your own set utility functions as you would with any re-usable code block. To make the example more sophisticated, you can also process function arguments (*args, **kwargs) this way; or extend previously decorated functions with further functionality… and while the Decorator itself might be a bit more complex to read, you can test that separatly, and from then on, the actual __main__ code is tidy and easy to extend.

def my_decorator(func):
    # do setup stuff here

    def whatever(*args, **kwargs):
        print("The keyword argument par is:", kwargs.get('par', None))
        return func(*args, **kwargs)

    def whatever2(func2):
        def some_inner_function():
            print("Call the second function and do something with its result")

            # note that the _actual_ function call of the decorated function would happen in the very core of the decorating function
            return func2()
        return some_inner_function

    result = whatever
    result.other_thing = whatever2
    return result


if __name__ == '__main__':
    print("START")

    @my_decorator
    def my_func(*args, **kwargs):
        print("original my_func", kwargs.get('par', None))

    @my_func.other_thing
    def my_func2():
        pass

    my_func(par='Some keyword argument')
    my_func2()

So really, we are equipped to do anything. Any scenario, in which a function call should actually be guided by one or more extra actions, might be nicely packaged in a decorator. Which gives you a number of good use cases as e.g.

  • Logging (if only for debug reasons)
  • Additional metrics (e.g. performance evaluation)
  • Type checks (making even simple scripts more robust)
  • Error Handling
  • Registering of any service or entity e.g. in one centralized list structure
  • Updates on a user interface
  • Caching, e.g. to ease some expensive operation or asynchronous action
  • or any such meta-pattern that you find yourself repeating in otherwise unrelated scenarios.

And more than that, a good name choice of your decorator gives your code a nice self-explaining semantic structure. You describe more “what”, less “how” of a block of code.

I like that pattern, but I always thought the @ looks somewhat cryptical. If you have any objection, want to warn me about unforeseen implications or generally think this was always super-obvious, let me know 🙂

Who watches the FileSystemWatcher?

One of the ways we sometimes implement communication with legacy systems is via the file-system. The legacy system will write files about some events into a predefined directory, and the other system watches this directory for changes. In C#/.NET, the handy FileSystemWatcher class is a great tool for that.

We had a working solution using that in production for the last two years. Then it suddenly stopped working. There was no apparent change in the related code, so I suspect our upgrade from .NET Core 2.1 to .NET 5.0 triggered the change in behavior. The code looked something like this:

public void BeginWatching()
{
    var filter = NotifyFilters.LastAccess | NotifyFilters.LastWrite
        | NotifyFilters.FileName | NotifyFilters.DirectoryName;

    var watcher = new FileSystemWatcher
    {
        Path = directory,
        NotifyFilter = filter,
        Filter = "*.txt",
        EnableRaisingEvents = true,
    };
    /* Hook up some events here... */
}

And after some debugging, it turned out none of the attached event handlers were firing anymore. The solution was to not let go of the watcher instance, and keep it around in the enclosing class.

this.watcher = new FileSystemWatcher

This makes sense, of course. Before, the watcher was only a local variable, and could be collected by the garbage collector at any moment. That, in turn, disabled the process, causing no more events to be emitted.

Interestingly, a few days later, my student asked about his FileSystemWatcher also no longer working. I immediatly suspected the same problem, but when we looked at the code, he had already moved the watcher into a property of the enclosing class. Turns out, for him the problem was just one level up: the enclosing class was only created as a local variable, and the contained watcher stopped after that went out of scope.

Now the only question is: why did we never observe this before? Either something in the GC changed, or something in the implementation of the watcher changed. Can anyone enlighten the situation?

Rounding numbers is not that easy

For many computer programs it is necessary to round numbers. For example an invoice amount should only have two decimal places and a tool for time management often does not have to be accurate to the millisecond. Fortunately you don‘t have to write a method for that yourself. In Java or JavaScript you can use Math.round, Python has a built-in function for rounding and the Kotlin Standard Library also contains a method for this purpose. Anyway some of these functions have a few surprises in store and violate the principle of least astonishment. The principle of least astonishment was first formulated by Geoffrey James in his book The Tao of Programming. It states that a program should always behave in the way the user expects it to, but it can also be applied to source code. Thus a method or a class should have a name that describes its behavior in a proper way.

So, what would you expect a method with the name round to do? The most common way to round numbers is the so called round half up method. It means that half-way values are always rounded up. For example 4.5 gets rounded to 5 and 3.5 gets rounded to 4. Negative numbers get rounded in the same way, for example -4.5 gets rounded to -4. In fact the Math.round functions in Java and JavaScript use this kind of rounding and thus behave in a way most people would expect.

But in other programming languages this can be different. Actually I used the Python built-in rounding function for some time without recognizing it does not always round half-way values up. For example round(3.5) results in 4 as you would expect, but round(4.5) also returns 4. That‘s because Python uses the so called round half to even method for rounding values. This means that half-way values are always rounded to the nearest even number. The advantage in this kind of rounding is that if you add mulitple rounded values the error gets minimized, so it can be beneficial for statistical calculations. If you still want to round half-way values up in Python, you can implement your own rounding function:

def round_half_up(number, decimals: int):
	rounded_value = int(number * (10**decimals) + 0.5) / (10**decimals)

	if rounded_value % 1 == 0:
		rounded_value = int(rounded_value)

	return rounded_value

round_half_up(4.5, decimals=0)    # results in 5

A different way in Python to round half-way values up is to use the decimal module, which contains different rounding modes:

from decimal import *

Decimal("4.5").quantize(Decimal("1"), rounding=ROUND_HALF_UP)    # results in 5

It should be noted that the ROUND_HALF_UP mode in this module does actually not use the round half up method as explained above, but the also very common round half away from zero method. So for positive numbers the results are the same, but -4.5 does not get rounded to -4, but -5.

Python is by the way not the only programming language that uses the round half to even method. For example Kotlin and R round half-way values to the nearest even number, too. However for Kotlin there are several easy ways to round half-way values up: you could use the methods roundToInt or roundToLong from the standard library or the Math.round method from Java instead of the method round.

It should also be noted that the explained methods for rounding are not the only ones. Instead of rounding half-way values up you could also use the round half down method, so rounding 3.5 would result in 3. And instead of rounding half to even you could use the round half to odd method and 4.5 would get rounded to 5, as would 5.5. There are some more methods and everyone of them has its use case, so you should always choose carefully.

To sum it up, rounding is not as easy as it seems. Although most programming languages have a method for rounding in their standard library you should always take a closer look and check if the rounding function you want to use behaves in the way you expect and want it to.

Sometimes you will be surprised.

Developing remotely for Beckhoff ADS on Linux

Today computers are used to control plenty different hardware systems both in laboratories and in the “real” world. Think of simple examples like automatic roller shutters that may be vital in keeping offices cool in summer while allowing for the maximum of light inside when the sun is occluded by clouds.

Most of the time things are way more complicated of course and soon real automation systems come into play providing intricate control and safety-related fail-safe mechanisms. Beckhoff ADS provides a means to communicate with such automation systems, often implemented as programmable logic controllers (PLC).

While many of these systems are Windows-based and provide rich programming environments on Windows they often provide interoperability with other programming languages and operating systems. In case of ADS there is a cross-platform open source C++ library provided by Beckhoff and even a python library (pyads) based on the C library for easy access of ADS devices.

ADS examples

This is great news because it allows you to choose your platform more freely and especially in science many organizations prefer Linux machines in their infrastructure. Here is an example using pyads to read a value from an ADS device:

import pyads

# The ip of the PLC
remote_ip = '192.168.0.55'
# This is the AMS network id. Usually consists of the IP address with .1.1 appended
remote_ads = '192.168.0.55.1.1'
# This is the ads port for the remote SPS controllers.
# Has nothing to do with TCP/IP ports!!!
ads_port = 851
# Set our local AMS network id to the client endpoint
# in the TwinCAT routing configuration
pyads.set_local_address('192.168.11.66.1.1')

with pyads.Connection(remote_ads, ads_port, remote_ip) as plc:
     print(plc.read_by_name('GlobalStructure.live_bit', pyads.PLCTYPE_BOOL))

Remote Access

When developing for our customers using ADS it is often not feasible to have the PLCs and a realistic set of controlled hardware in our own offices. Fortunately it is possible to communicate with the ADS interface of the customers on-site PLC over VPN and SSH-tunneling.

There are some caveats on the way to working remotely against an ADS device, namely the port to be tunneled, the route on the PLC and the correct IPs and NetIds.

SSH-Tunneling the ADS communication

Setting up SSH tunneling is probably the most easy part using putty on Windows or plain OpenSSH local forwarding using config files. The important thing is that you need to tunnel TCP-Port 48898 and not the ADS port 851!

Configuring the PLC route

The ADS endpoint needs a AMS route setup for the machine you SSH into. Otherwise that machine you use to tunnel your requests will not be authorized to communicate to the ADS device. This is well documented and a standard workflow for the automation people but crucial for the remote access to work. We need the AMS Net Id from this step to finally setup the connection.

Connecting remotely using the SSH-Tunnel

After everything is prepared we need to adjust the connection parameters for our ADS client. Taking the example from above this usually means changing remote_ip and the local AMS Net Id:

# The ip the SSH-tunnel is bound to, usually localhost
remote_ip = '127.0.0.1'
# This is the AMS network id of the endpoint. Leave unchanged!
remote_ads = '192.168.0.55.1.1'
# Set our local AMS network id to the client endpoint in the TwinCAT routing config
# This represents our ssh host, not the local machine!
pyads.set_local_address('192.168.0.100.1.1')

Conclusion

Beckhoff ADS provides a state-of-the-art means of communicating to PLCs over the network. With a bit of configuration this can easily be done remotely in addition to on-site in a platform agnostic way.

The IT architect, Part II: Assess your situation

If you want to work on the scale of an IT landscape, you need to have a plan in the form of a map. In the first part of this series, we talked about creating such a map. This blog entry will give you the basic tools to make sense of all the things on it and how to convey meaning to other people while using the map.

The third part will talk about actionable steps that are a result of our interpretation of the map.

Making sense of the map

You’ve drawn the map of all your IT assets and given all the boxes names that you find useful. You’ve asked around to find relationships between your assets, represented by arrows between the boxes. You’ve moved the boxes around a bit to reduce arrow intersections. The map seems to be as “clean” as it can get at the moment.

Now is the time to apply meaning to the structures you see.

Interpreting loners

The first thing you want to look for are boxes without any relationships. These entities don’t interact with other things on your map and are not required by anything, too. Let’s think of them as independent value sources. If this asset brings your organization a describable and current advantage, you’ve found the ideal asset.

An example could be the blue box “L” in our example map. It isn’t coupled to any other asset. Let’s say it is a “customer relationship management” (CRM) system. Remember, boxes are not labeled by their actual implementation (in this case, maybe a vTiger or SugarCRM), but by the value they provide for the organization. If your organization needs a CRM (or benefits from its presence), then you have a “loner”, which is a good thing.

If the CRM stops working, the humans in the organization will be unhappy about it, but the outage itself will be limited to the CRM and not spread to other parts of your IT landscape (given that your map reflects the reality). If the outage lasts longer, your employees will adapt their work processes to circumvent the pothole in your IT. There will be a lot of post-it notes, at least for some time.

If the CRM is updated to a new version, you need to train your employees, but it won’t require other IT entities in your organization to match that update. The CRM can run on ancient hardware and software, as long as the human requirements are met. A loner on your map is a good thing.

Interpreting relicts

If you find a lonely box without a current use case, you’ve found a relict. Be glad that you’ve found it, because relicts tend to remain hidden and not show up on architect maps. If you can make sure that the relict serves no purpose for the organization anymore, you can eliminate it. Removing an asset from the map (and your real IT infrastructure) is a good thing, because you reduce complexity, costs and risks. There is no IT asset without associated costs and risks.

If, for example, the yellow box “P” represents a computer that provides a service that nobody uses anymore, the computer itself is still present in the network and can be used as a stepping stone for malicious itents. Let’s say the computer is a Raspberry Pi that isn’t included in the first tier of workhorse computers, its operating system might be outdated and susceptible to attacks. It doesn’t provide value for the organization anymore, but it increases the organization’s risk.

Revealing this kind of “dead weight” in your IT landscape is a real advantage, because you can cut it out rather easy.

Interpreting rings

A typical structure on your map could be a circular dependency. In its smallest form, it is just two boxes that both depend on each other. The more elaborate ring consists of several boxes that are connected without a clear start and end. This is the worst thing to find.

A ring in your entities means that you have to consider all elements in the ring as one big entity. You cannot modify them independently, neither on the technological level nor on in the temporal dimension. A ring is basically a mexican standoff situation for all included entities. You can also call it a deadlock. Whatever you call it, it is bad news. You probably want to break the ring as soon as possible.

Breaking a ring would warrant its own blog post altogether. A basic starting point might be the Acyclic dependencies principle of software design. You probably need to split at least one of your entities into smaller parts or introduce a new entity. The least favorable move would be to merge all entities into one bigger entity, creating a monolith. You will regret this move when the inevitable modernization pressure rises.

Interpreting chains

If your entities form “deep” dependency lines where A depends on B, B depends on C, C depends on D and so forth, you have discovered a chain. This structure is less troublesome compared to the ring, but worth a worry nonetheless. In terms of operational risk, the chain creates a meta-system with a failure rate that is the sum of the failure rates of the chain elements. To make a long story short, you’ll never get a reliable infrastructure with long chains.

The longer your chains are, the more ripple effects an outage will have on your IT landscape. Remember that a chain always breaks at its weakest link, but this link will bring down the whole line.

You can reduce the length of a chain of entities in your IT landscape by inserting buffer elements like read-only copies of central data sources. But more important is to think (and talk) about why the dependencies are there in the first place. Maybe your data storage strategy is too decentralized and you would gain some favorable dependency structures by pooling data together (essentially creating a data monolith if you overdo it).

Introducing zones

Recognizing the basic shapes on your map is important, but you also have to look at the forest and not only the trees. The basic layout of your boxes already tell you a lot about your IT landscape zones.

A zone on your map is a region of boxes that you can encircle and give it a superordinate name. The basic rule of a zone is that all entities in it should share a common property. The less technology-based this property is, the better is your zoning. A zone for “java web services” or “metal computers” is eventually useful, but won’t stand the test of time. Sooner or later, some java services are replaced by other programming languages and some real machines get virtualized. Do you move them to other zones on your map? What really changed for the users of your IT landscape?

If you concentrate on your users, you might be able to come up with properties that really affect them. Look at this example that takes our initial example and separates it into three zones:

And now, we find a user-oriented name for each zone. In our example, we’ve grouped the entities by user role and are now able to label our zones:

This grouping has the added advantage that the target audience for each modification to the map can be identified nearly immediately. It makes it easier to anticipate the effects of outages or problems and to identify non-cohesive usage of the same tool/entity.

In our example, each box in the “Both” zone is essential to the functioning of the organization. But just because a specific service is used by both other groups doesn’t mean they have overlapping requirements. Maybe it is better for everybody involved to actually divide an entity into two separate boxes in the respective zones, even if both boxes are implemented with the exact same tool/technology at the moment.

Identifying the zones takes your map to the next level. You end up with fewer, but bigger boxes and their dependencies. It’s the same IT landscape, but with less detail. Now you can start your discovery process again.

Conclusion

Your IT landscape map can be interpreted by looking for common structures (like loners, rings and chains) and by defining zones. This allows us to gather a list of problem points that we want to improve. It also allows to evaluate the expectable ramifications of changes to entities in our IT landscape. And there will be changes. The one (and probably only) constant in IT is that all things change.

In the next part of this series, we look at ways to transform the map from the current state towards a better one. Stay tuned!

Applied User Research on my own pile of synthesizer machines

Last year, I moved. Moving into a new apartment is very much akin to a major rewrite of a complex piece of software. A piece of software with a limited amount of users, maybe (well, me, my girlfriend, that’s it), but with an immensely sophisticated structure of requirements… despite the decade-long experience of “living somewhere” one usually has at this point.

For example, I have a scarce habit of hoarding hardware synthesizers – from Soviet-era monstrosities like the Поливокс to modern machines, they have a few things in common, as they

  1. take quite some space (due to their extensive wiring) and some are heavy
  2. idle around for most of the time (I mean, I have a job)
  3. are versatile enough not to have a clear-cut function in any workflow.

These are somewhat essential. All in all, I had some unassigned space in my new home and my machines instantly colonized there. Like his original version of “work expands to fill the time available“, there seems to be another Parkinson‘s Law correlating hardware synthesizers and the space you allow them to have. So basically, they now occupied one room of their own.

Which could be the end of the story and everyone would live happily ever after. If you have, like, a Googol amount of rooms. And considering Point 2 above – this solution feels quite like a waste.

Now – also last year, I began to deep-dive into the techniques of User Experience. And basically, my problem is pretty much comparable to a customer with a few quite diverging requirements. Who wants his problem solved, but hasn‘t yet figured out his actual needs to begin with.

Ergo, I should be able to use my insights of the field of UX, or Interaction Design, directly to my advantage, by applying techniques of User Research to my own behaviour.

And the first question should always be: By which approach do we get the largest understanding by the least effort? But the zeroth question is actually: How “wicked” is my problem?. Do I have an absolutely ill-defined, ever-changing, non-testable set of challenges? Or is my problem-solving rather a technical feat, implementing the best patterns, clearest details, best documentation?

Indeed. Point 3 above makes my problem rather ill-defined. On some wickedness scale, with 1 being a quick YouTube search for a How-To, 10 being a problem that I would need to go on a week-long mountain retreat with daily meditation sessions… I would give it a 6-7.

This feels like it should be in the range of difficult, but solvable with a kind of generalized abstract thinking. I choose to opt for the technique of the Five Whys: The goal is to find a consecutive number of deeper questions after my problem solving. But generally, I understand this as

  • “Five” is a general number one can aim for. There can be more, less, and there can be branches in the questioning.
  • “Why” is a placeholder for any qualifier that goes to a more abstract level. It can also be a „What do you want…“, „How about…“, „What‘s wrong with…“ serving that purpose

So. I consider myself sitting on opposite sides of a table and asking:

  1. What do you want to accomplish, and why?
    • I want to have a truckload of synthesizers, fully functional, and also not to waste my space.
  2. Why do you need your space?
    • Not only do I also have other stuff. Less clutter is generally a way to improve life quality.
  3. Why don‘t you just get rid of the machines, i.e. selling, basement, …?
    • Well. I do want to have access to spontaneous synthesizer jams. The pandemic makes it hard to get that creative input anywhere else.
  4. Why does this need to be spontaneous?
    • Because musical, like any creative flow, comes spontaneously. I don‘t want to have a great idea gone by because of long efforts in setting up.
  5. Why would set up times have to be slow?
    • Because in an strictly ordered system I need to collect stuff from various spaces, i.e. the synthesizer itself, the power plug from a box of power plugs, cables for MIDI, audio, control voltages, …
  6. Why would these have to be in their ordered boxes, for apart from the synthesizer?
    • Hm. Well, I guess they wouldn‘t have to be. That‘s just what one would do..?

Now basically, this example did not happen as straightforward as I just made it seem, but it helped me to channel my focus on a rule that is actually quite known in the design of enjoyable user interfaces: „Group things according to their usage, not their nature“. This is a form of reducing cognitive load. UI designers sometimes make this mistake, e.g. grouping “everything that is a filter” in one section, “everything that is a configuration” in one section, and so on; focussing more on the technical implementation of what these are, not on the proximity in the actual domain. This gets worse, the more technical any domain in its essence is, which is the mistake I did.

Wiring hardware synthesizers together is a very technical task, but there is much more use in grouping what belongs together when one wants to use it, not when one looks at its definition.

So now I have it. I, as a user, am happy that I, as a researcher, actually asked stupid questions like “Why don‘t you sell all the garbage?” in order to channel my focus to a system, in which setting up the whole stuff is rather quick, reducing the cognitive load, or rather shift it to the cleanup process – which is fine because I now can trust the system 😉

Upgrade with a twist

A few weeks ago, I heard a nice story about the hidden cost of new features. Imagine a website, driven by a content management system, consisting of text, pictures and fancy styling. When the content management system gets an update, the website developer takes a look at the release notes and finds that a lot of new and cool features are included that you’ll get for free once you update.

So he updates the site, tries it out and publishes it onto the web. A few days later, the customer and owner of the website sends a bug report of some arbitrarily flipped images. There are just short of a hundred images on the website and a handful of them now show up upside down.

Who would update a website and randomly rotate some images?

Why would a content management system decide that excatly these images need a spin?

The answer is not as obvious as one might think.

The latest cause of the effect was a change of the imaging library the content management system uses to deliver the image content. It got upgraded to a new engine that essentially does the same thing as the old one: Take the image file content and put it on the web. But, it does it more thoroughly.

One feature of JPEG images are the EXIF metadata properties. Examples of useful properties are the photography time, the geolocation or the camera model. Some cameras add even more information into the metadata, like exposure time or the camera’s orientation (rotation) during the photographing process. There are cameras that notice if you hold them upside down and store this circumstance into the picture.

Then, there are imaging libraries that just take the pixels and put them on the screen. And there are libraries that know about their domain and read the EXIF metadata, interpret the rotation data and accomodate for that fact. Because, who would like to look at pictures that are displayed totally wrong?

The first version of the content management system’s imaging library didn’t care much about metadata. The new version takes rotation into account.

So, the cause of the suddenly rotated pictures originates with the photographer that happened to work during a workout session or in australia. This fact was registered and stored by the camera and promply ignored by the picture editing software and the earlier content management system. It was rediscovered only when the new version went live.

For the customer, this is a random regression. It worked just fine all those years! For the developer, this is a minefield. Every picture could contain an evil rotation information that gets applied someday.

For a security engineer, this is a harmless but perfect example of a persistence attack. You embed malicious payloads into data that do nothing for a long time, but are activated suddenly, without outside intervention, by an unrelated change of system parts towards a “lucky” constellation.

Guess what you can embed into EXIF metadata, too? Javascript or any other form of executable code. And then you wait.

To end this blog entry on a light note, sometimes the payload may just happen to be your last name – True!

return first example

It seems my “return first” post was not as enlightening as I had hoped. It was posted on reddit, and while the majority of commenters completely missed the point, it wasn’t really clear for those that did not just read the title. Either way, I am to blame for that – the examples and my reasoning were not very conclusive. So let me try clearing up the confusion with a better example.

First things first, here’s the mantra again: Whenever you want to call a function, ask yourself:

Can I return first?

But now to the example:

Parsing array braces

The task was to parse a string with a data-type in it. This was already working for single-value types, so we could parse "int", "double", "string" etc, via the function from_input_type. Now I was to extend it to also parse array definitions with one or two fixed dimensions, like "int[5]" or "double[4,7]".

My first attempt, implementing it as a constructor taking the definition string, looked like this:

auto suffix_begin = type_code.find('[');
if (suffix_begin == std::string::npos)
{
  this->type = from_input_type(type_code);
  return;
}

auto suffix_end = type_code.find(']', suffix_begin);
if (suffix_end == std::string::npos)
{
  throw std::invalid_argument("Malformed attribute type suffix: no end brace.");
}

auto type_tag = type_code.substr(0, suffix_begin);
this->type = from_input_type(type_tag);
auto in_brackets = type_code.substr(suffix_begin+1, suffix_end-suffix_begin-1);

auto separator = in_brackets.find(',');
if (separator == std::string::npos)
{
  this->rank = attribute_rank_t::1d;
  this->dim[0] = parse_size(in_brackets);
  return;
}
  
auto first = in_brackets.substr(0, separator);
auto second = in_brackets.substr(separator+1);

this->rank = attribute_rank_t::2d;
this->dim[0] = parse_size(first);
this->dim[1] = parse_size(second);

It’s not pretty, but it passed all the tests I set up for it. And this was pre-refactoring. I knew there was something else coming up: In a different constructor, we wanted to parse type definitions that look similar, but are not quite the same. Instead of 1 or 2 fixed dimensions, the brackets have to be empty there, e.g. "float[]" or "string[]". Note that they are still optional, it can still have single-values as well.
Now I wanted to reuse the code to locate the brackets, but the current structure wasn’t really well suited for that, with the member initialization spread all over the function. Obviously, the code parsing the contents of the brackets (from the auto separator = ... line down) was of no use for the second case, the first half is the interesting bit here. So I was looking at the calls to from_input_type in the upper half and asked myself: Can I return first, before calling this? The answer is, of course, yes.

struct type_with_brackets_t
{
  std::string_view type;
  std::string_view in_brackets;
  // There's a difference between empty brackets (e.g. string[])
  // and no brackets (e.g. string)
  bool has_brackets = false;
};

type_with_brackets_t split_type(std::string_view const& type_code)
{
  auto suffix_begin = type_code.find('[');
  if (suffix_begin == std::string::npos)
  {
    return {type_code, {}, false};
  }

  auto suffix_end = type_code.find(']', suffix_begin);
  if (suffix_end == std::string::npos)
  {
    throw std::invalid_argument("Malformed attribute type suffix: no end brace.");
  }

  auto type_tag = type_code.substr(0, suffix_begin);
  auto suffix = type_code.substr(suffix_begin+1, suffix_end-suffix_begin-1);
  return {type_tag, suffix, true};
}

With this, we can replace the upper half of the first function with:

auto [tag, in_brackets, has_brackets] = split_type(type_code);
this->type = from_input_type(tag);
if (!has_brackets)
  return;

/* continue parsing in_brackets */

The other “int[]” case can obviously be implemented very easiely now:

auto [tag, _, has_brackets] = split_type(type_code);
this->type = from_input_type(tag);
this->is_array = has_brackets;

Of course, when just extracting the code as a function, you could be tempted to also call from_input_type in that function, but return first guided us away from that. I think this is a very good outcome, as it clearly separates splitting the string and interpreting the parts, naturally eliminating the duplicated from_input_type call. You can still have a function that does both, if you want, by adding a small facade araound split_type that also does the conversion.

I hope this example cleared up the method a bit more. One reason why deeply nested function calls are so common is that most languages make it easier to pass parameters than return multiple values. You will often find that this style will require more custom data-types that are just used as function return values. But functions will naturally compose easier because you will bundle smaller pieces, e.g. in this case, you can use the function without from_input_type, and I believe that will pay off in the end.

JDBC’s wasNull method pitfall

Java’s java.sql package provides a general API for accessing data stored in relational databases. It is part of JDBC (Java Database Connectivity). The API is relatively low-level, and is often used via higher-level abstractions based on JDBC, such as query builders like jOOQ, or object–relational mappers (ORMs) like Hibernate.

If you choose to use JDBC directly you have to be aware that the API relatively old. It was added as part of JDK 1.1 and predates later additions to the language such as generics and optionals. There are also some pitfalls to be avoided. One of these pitfalls is ResultSet’s wasNull method.

The wasNull method

The wasNull method reports whether the database value of the last ‘get’ call for a nullable table column was NULL or not:

int height = resultSet.getInt("height");
if (resultSet.wasNull()) {
    height = defaultHeight;
}

The wasNull check is necessary, because the return type of getInt is the primitive data type int, not the nullable Integer. This way you can find out whether the actual database value is 0 or NULL.

The problem with this API design is that the ResultSet type is very stateful. Its state does not only change with each row (by calling next method), but also with each ‘get’ method call.

If any other ‘get’ method call is inserted between the original ‘get’ method call and its wasNull check the code will be wrong. Here’s an example. The original code is:

var width = rs.getInt("width");
var height = rs.getInt("height");
var size = new Size(width, rs.wasNull() ? defaultHeight : height);

A developer now wants to add a third dimension to the size:

var width = rs.getInt("width");
var height = rs.getInt("height");
var depth = rs.getInt("depth");
var size = new Size(width, rs.wasNull() ? defaultHeight : height, depth);

It’s easy to overlook the wasNull call, or to wrongly assume that adding another ‘get’ method call is a safe code change. But the wasNull check now refers to “depth” instead of “height”, which breaks the original intention.

Advice

So my advice is to wrap the ‘get’ calls for nullable database values in their own methods that return an Optional:

Optional<Integer> getOptionalInt(ResultSet rs, String columnName) {
    final int value = rs.getInt(columnName);
    if (rs.wasNull()) {
        return Optional.empty();
    }
    return Optional.of(value);
}

Now the default value fallback can be safely applied with the orElse method:

var width = rs.getInt("width");
var height = getOptionalInt(rs, "height").orElse(defaultHeight);
var depth = rs.getInt("depth");
var size = new Size(width, height, depth);

Running Tango-Servers in Docker

Containers are a great way of running your software in an environment that you defined and no one else needs to maintain of even know about. This is especially true if your software provides its service using a network port. All that operators have to provide is the execution platform for the container and the required resources.

Tango servers fit quite well into this type of software: Essentially they provide a service over the network. Unfortunately, they need a tango control system to be fully usable, so thats some other services. Luckily, there are docker images for this so building a stack of containers to run your Tango server in the scope of a control system is quite easy (sample docker-compose.yml):

version: '3.7'
services:
  my-tango-server:
    container_name: my-tango-server
    image: my-ts
    environment:
      - TANGO_HOST=tango-cs:10000
    depends_on:
      - tango-cs
  tango-cs:
    container_name: tango-database
    image: tangocs/tango-cs:9.3.2-alpha.1-no-tango-test
    ports:
      - "10000:10000"
    environment:
      - TANGO_HOST=localhost:10000
      - MYSQL_HOST=tango-db:3306
      - MYSQL_USER=tango
      - MYSQL_PASSWORD=tango
      - MYSQL_DATABASE=tango
    links:
      - "tango-db:localhost"
    depends_on:
      - tango-db
  tango-db:
    container_name: mysql-database
    image: tangocs/mysql:9.2.2
    environment:
      - MYSQL_ROOT_PASSWORD=root
    volumes:
      - tango_mysql_data:/var/lib/mysql
volumes:
  tango_mysql_data:

Example Dockerfile for the image my-ts used for the my-tango-server container:

FROM debian:buster

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD my-tango-server myts

Unfortunately this naive approach has the following issues:

  • The network port of our tango server changes with each restart and is not available from outside the docker stack
  • The tango server communicates the internal IP-Address of its container to the tango database so connection attempts from the outside fail even if the port is correct
  • Even though the tango server container depends on the tango control system container the tango server might start up before the tango control system is available.

Let us tackle the issues one by one.

Expose a fixed port for the Tango server

This is probably the easiest issue to fix because it is well documented for OmniORB and widely used. So let us change docker-compose.yml to expose a port and pass it into the container as an environment variable:

version: '3.7'
services:
  my-tango-server:
    container_name: my-tango-server
    image: my-ts
    ports:
      - "45450:45450"
   environment:
      - TANGO_HOST=tango-cs:10000
      - TANGO_SERVER_PORT=45450
    depends_on:
      - tango-cs
  tango-cs:
 ...

And use it in our Dockerfile for the my-ts image:

FROM debian:buster

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD my-tango-server myts -ORBendPoint giop:tcp::$TANGO_SERVER_PORT

Fine, now our Tango server always listens on a defined port and that same port is exposed to the outside of our docker stack.

Publish the correct IP/hostname of the Tango server

I found it hard to find documentation about this and how to put it on the command line correctly but here is the solution to the problem. The crucial command line parameter is -ORBendPointPublish . We need to pass a hostname of the host machine into the container and let OmniORB publish that name (tango-server.local in our example):

version: '3.7'
services:
  my-tango-server:
    container_name: my-tango-server
    image: my-ts
    ports:
      - "45450:45450"
   environment:
      - TANGO_HOST=tango-cs:10000
      - TANGO_SERVER_PORT=45450
      - TANGO_SERVER_PUBLISH=tango-server.local:45450
    depends_on:
      - tango-cs
  tango-cs:
 ...

And the corresponding changes to our Dockerfile:

FROM debian:buster

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD my-tango-server myts -ORBendPoint giop:tcp::$TANGO_SERVER_PORT -ORBendPointPublish giop:tcp:$TANGO_SERVER_PUBLISH

Waiting for availability of a service

This one is not specific to our tango server and the whole stack but a general problem when building stacks where one service depends on another one being up and listening on a network port. Docker compose takes care of the startup order of the containers but does not check for the readiness of services running in the containers. In linux containers this can easily be achieved by using the small shell utility wait-for-it. Here we only need to make some changes to our Dockerfile:

FROM debian:buster
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get install wait-for-it

WORKDIR /tango-server
COPY ./my-tango-server .

RUN DEBIAN_FRONTEND=noninteractive apt-get install -y ./*.deb

CMD wait-for-it $TANGO_HOST -- my-tango-server myts -ORBendPoint giop:tcp::$TANGO_SERVER_PORT -ORBendPointPublish giop:tcp:$TANGO_SERVER_PUBLISH

Depending on your distribution/base image installation of the tool (or other similar alternatives like dockerize or docker-compose-wait).

Summing it up

After moving some rocks out of the way it is dead easy to deploy your tango servers together with a complete tango control system to any host platform providing a container runtime. You do not need to delivery everything in one single stack if you do not want to. Most of the stuff described above works the same when deploying tango control system and tango servers in independent stacks or as separate containers even on multiple hosts.