Using the File System as an Interaction Device

In a recent project, my job was to build a scientific data processing pipeline for a new algorithm that wasn’t set in stone yet. Part of my work would be to explore different mathematical formulas interactively with the customer.

My usual approach to projects is a “risk first” strategy. I try to identify the riskiest or most demanding part of the project and deal with it first. This approach essentially resembles the “fail fast” mindset, just that we haven’t failed yet.

In the case of the calculation pipeline, the riskiest part and at the same time the functionality that matters to the customer most, was the pipeline itself. If we were able to implement a system that can transform the given entry data into the desired results, we had an end-to-end prototype and the means to explore different mathematical approaches.

The pipeline consists of different steps that can be described as a complex transformation each. The first step/transformation takes a proprietary data format file and converts it into a big JSON file. The main effort of this step is a deep physical analysis of the data contained in the proprietary format. This analysis requires a lot of thought, exploration and work, but can be seen as a black box that the data traverses on its way from proprietary format to JSON.

The next step takes the JSON input and extracts the necessary information required by the following step. It is essentially a data reduction operation.

The third step feeds the analyzed, reduced data into the formulas and stores the calculation result.

The fourth step aggregates the calculation results into a daily time series report in a format that can be read by a spreadsheet application. This report is the end product of the pipeline and will be used to make decisions and to rule out certain environmental hazards.

The main difference of this project to virtually every project before is that I didn’t write any user interface code. The application’s main window is still blank. The whole interaction of the system with other systems that provide the entry data, of the pipeline steps among each other and with the human user is based on files in the file system.

The system periodically checks for the existence of new entry data. If some is found, it is copied in the “inbox” directory of the first step. The first step periodically checks for the existence of files in its inbox and processes them into its “outbox” that conveniently serves as the inbox of the second step. You probably get the idea by now. All the steps in the system, including the upstream data fetching routine, are actors in an file-based actor model. The files serve as messages from one actor to another. The file system and its directory structure is the common communication channel that passes the messages around.

Each processing step is an actor node with input and output storages

One advantage of this approach is that the file system viewer application of the operating system can be used as the (graphical) user interface. By opening the appropriate directories and viewing their content, the user can supervise the operating state of the system. The system can report problems by moving the incoming message not in the step’s “done” directory , but into its “failed” or “problem” directory. If several directories are on display at once, the user can follow a specific piece of data through the pipeline and view the intermediate results. For domain specific reasons, the actors in this project also have the result directory “omitted” for data that will not be processed any further because some domain rules have determined a cancellation.

An user can even manipulate the data’s flow by moving files away or into a specific directory. Let’s say that we want to calculate a certain amount of data again, we can just copy the files from the “done” directory of the first step into its “inbox” and the system will process it again.

Because the analysis step takes some time while the calculation step is surprisingly fast, we can perform just the calculation again by not moving the initial data files, but the analyzed and reduced entry files for the calculation step. Using this approach, we can try different mathematical formulas by stopping the system, swapping the calculation step with a new version, starting the system again and moving the desired entry files into its inbox.

Using the file system as an interaction device for the user and the system’s parts has many immediate advantages, but some drawbacks, too. One drawback is performance. Using the harddisk for data transfer is the slowest possible way to bring data from step X to step X+1. If your system is required to have high throughput or low latency, this approach isn’t suitable. My project has a low, forecastable throughput and a latency requirement that is measured in minutes or seconds, but not in milliseconds or even nanoseconds. It can spend some time in the filesystem, because the first step alone takes several seconds for each file.

Another drawback is a certain fragility of the communication medium, the file system. You have to account for concurrent reads, writes or even deletes. The target platform of my system (Microsoft Windows) exhibits signs of exhaustion if the amount of files in one directory grows too large. This means that your file selection, already a costly operation, becomes more costly if the systems is put under pressure. If your throughput is usually steady, which is the case in my project, this won’t be a problem. Until you manually copy 100k files in an inbox for swift recalculation and discover that the file copy process alone takes several minutes.

Of course, the system cannot operate without a graphical user interface forever. But some basic interactions with the system will probably just result in some files being copied from one directory to another one in the background.

Use real(istic) data from early on

When developing software in general and also specifically user interfaces (UIs) one important aspect is often neglected: The form, shape and especially the amount of data.

One very common practice is to fill unknown texts with fragments of the famous Lorem ipsum placeholder text. This may be a good idea if you are designing a software for displaying a certain kind of articles similar in size and structure to your placeholder text. In all other cases I would regard using lorem ipsum as a smell.

My recommendation is to collect as many samples of real or at least realistic data as feasible. Use them to build and test your application. Why do I think it matters? Let me elaborate a bit in the following sections.

Data affects the layout

You can only choose a fitting layout if you have knowledge about the length of certain texts, size of image etc. The width of columns can be chosen more appropriately, you can descide if you need scrollbars, if you want them permantently visible for a more stable and calm layout, how large panels or text areas have to be for optimum readability and so on.

Data affects the choice of UI controls

The data your application has to handle should reflect not only in the layout but also in the type of controls to be used.

For example, the amount of options for the user to make a choice from drastically affects the selection of an adequate UI control. If you have only 2 or 3 options toggle buttons, checkboxes or radio buttons next to each other or layed out in one column may be a good fit. If the count of options is greater, dropdowns may be better. At some point maybe a full-blown list with filters, sorting and search may be necessary.

To make a good decision, you have to know the expected amount and shape of your data.

Data affects algorithms and technical decisions regarding performance

The data your system has to work with and to present to the user also has technical impact. If the datasets are moderate in size, you may be able to transfer them all to the frontend and do presentation, filtering etc. there. That has the advantage of reducing backend stress and putting computational effort in the hands of the clients.

Often this becomes unfeasible when the system and its data pool grows. Then you have to think about backend search and filtering, datacompression and the like.

Also algorithmns and datastructure may change from simple lists and linear search to search trees, indexes and lookup tables.

The better you know the scope of your system and the data therein the better your technical decisions can be. You will also be able to judge if the YAGNI principle applies or not.

Conclusion

To quickly sum-up the essence of the advice above: Get to know the expected amount and shape of data your application has to deal with to be able to design your system and the UI/UX accordingly.

The charged charging switch

In this blog post, I’ll describe my experiences with a certain product (a computer monitor) and its manual. It might serve as an example of how ridiculous a poorly designed customer experience is perceived on the receiving end. Hopefully, it inspires some readers to think about sensible defaults and how to communicate them.

Let’s start with the context. In a previous blog post, I described my journey from one small monitor to four monitors in total (three big ones, one small additional one). Well, it is not just my journey – all of my co-workers have now four computer monitors for their office workplace.

This meant that we bought a lot of smaller monitors in the last months. We decided to go the monoculture route and bought one piece of our favorite model.

It arrived faulty. The only thing that this device did was to indicate “battery full” when the battery status button was pressed (yes, this particular monitor has its own battery for mobile usage). Everything else didn’t work, especially not the power button. The device was a dead fish. I returned it to the supplier.

The replacement unit was also dead on arrival. This puzzled me, because the odds of having two duds in a row seem very small. So I investigated and found an interesting fact: The unpacking and assembly instruction sheet is incomplete. Well, even more than that. It’s plain misleading.

It starts with a big lettered alert that reads “Please follow the illustration and text description strictly when opening the package and installing the display.” It then shows three illustrations of a totally different monitor and ends the instructions at the step when the styrofoam is removed (and no cables attached). At the bottom of the sheet, there’s an explanation: “The machine picture and styrofoam shown are for illustration purpose only and may differ from the actual product”. You can’t make this up.

The manual urges me to follow it “strictly” and then vaguely tells me how to unwrap the monitor from the styrofoam and nothing more. Even better, in the illustrations, there are different options given like “For binding-less, please ignore the untying action” (actual quote!). You can’t follow strictly if given multiple options and hand-wavey instructions. “Unpack the monitor correctly” is more actionable than this manual.

But that was just the beginning. The user manual actually references the correct monitor and gives usage instructions for common use cases, but it lacks a troubleshooting section. The user manual starts with a working device – and my device(s) don’t work. They don’t turn on if the power button is pressed – and it has to be pressed for 3 seconds to turn on the monitor! Yes, the manual is clear on this one: To turn the monitor on by using its power button, you have to press for three, long, “twenty-two”, tedious, “twenty-three”, seconds. That’s like having a light switch, but if you press it in the dark, it requires you to keep pressing because it could be a mistake – do you really want to have the lights on?

The device is still dead, the manual is no help for my situation, so I inspect the material a little bit more thorough. There is a sticker at the bottom of the monitor (at the opposite side from the power plug and the power button) that catches my eye. I have photographed it, because nobody would believe me otherwise. Here it is:

The first sentence is a no-brainer. But the second one is a head-scratcher: “Please turn on the charging switch for the first time”.

There is no mention of a “charging switch” in the manual. There is no switch labeled “charging” on the device. All the buttons/switches and ports that are present are described in the manual and can’t be interpreted as a “charging switch”.

But if you look at the sticker more closely, you’ll see the illustration at the right side. In reality, it is 3 mm wide and 18 mm in height. It is very small. Even smaller are the depicted things – they resemble the input ports on the right side! From the bottom up, there is a USB-C port, a micro-HDMI port and something that is encircled in the illustration. The circle is probably our hint that this is indeed the “charging switch” mentioned on the sticker.

I searched for the switch and only found a notch in the plastic, about 3 mm wide. Only by using a magnifying glass did I find a small black plastic knob at the bottom of the notch (2 mm deep). The knob is probably one square-millimeter tiny. It was situated more to the top of the notch.

I have built electronics since the early nineties. I know how to solder and recognize all kinds of electronic parts. This thing was a DIP-switch, but one of the smallest ones I’ve ever seen. And it wasn’t labeled at all. The only hint we get to search for it is the illustration on the sticker.

So – is it in the “on” position? I decided to find out by moving it down. A paper clip wire was too big to fit, so I used the smallest screwdriver my micro-mechanic screwdriver set would offer. Just a bit smaller and I would have resorted to an actual hair. The DIP-switch moved half a millimeter down and got stuck more to the bottom of the notch.

The monitor suddenly worked – after the three second pressing. The unlabeled “on” position of the unlabeled “charging switch” that you have to manipulate by using the smallest metal rod that you can find in an electronics lab is at the bottom. Good to know.

I won’t reiterate the madness that we just experienced. It gets even worse, so buckle up.

Right now, I have a working monitor that is actually pleasing to use. I buy it again – the same routine. I wonder if I should report the trick to the supplier.

We have more than two workplaces, so I buy the monitor – the same product for the same price – again, but five times now.

I get five packages with identical content. Well, nearly identical. The stickers are different!

Three monitors have the same sticker as seen above. One of them needs to be switched to turn on, the other two were already in the “on” position.

But the other two monitors have a different sticker:

Both monitors were already in the “on” position, so nothing needed to be done. But this sticker tells you to leave the charging switch alone – A switch that is never mentioned in the manual, that is so small that you probably miss it even if you search for it and that needs special equipment to be changed. That’s as if my refrigerator came with a warning sticker not to disable a particular fuse when this fuse is safely hidden away in the internals of the refrigerators electronics and never mentioned in the manual. Why point it out if my only job is to ignore it?

Remember the first manual that “strictly” tells a vague story? This is the same logic. And it gets even better with the second sentence, the one with an exclamation mark! “Let it keep the factory state!” means that it is turned off when coming from the factory? Or does it mean to keep it in the state that is delivered, regardless of the monitor being functional or disabled by it?

I still don’t know what the “on” position of this switch really is and now I’m even more confused than before.

My mind invented this elaborate fantasy story about a factory that produces monitors. One engineer is tasked with designing the charging functionality and adds the “charging switch” to enable or disable the whole feature. But she/he forgets to remove it before the blueprint is committed into production and now the switch is part of the consumer product. The DIP switch is on the “off” position by default from its producer. This renders the first batches of monitors useless because the documentation doesn’t mention the magic switch that needs to be flipped once to have the monitors turn on. The return rates are horrendous and management gets involved. They decide to get rid of the problem by applying a quick fix – the first sticker. This sentences their customers to perform a scavenger hunt of subtle hints to have the monitors work. They also install a new production line station – the switch flipper. This person needs training and is only available for the day shift – Half of the monitors leave the factory with the switch in the “on” position, the other half is in the “off” position. The first sticker remains, it is still a mystery, but the return rates are cut in half nearly overnight.

In my story, the original engineer recognizes her/his error and tries to correct it – by reversing the switch positions. The default position (“off”) now enables the feature, while the “on” position disables it. Just by turning the (still unlabeled) positions around, the factory produces ready-to-use monitors without requiring intervention from the customer.

The problem? A lot of customers have now learned the switch-flip trick and deactivate their product. And the switch flipper still deactivates half of the production without noticing. They need to inform their customers! They apply the second sticker, hoping to clear this matter once and for all.

And here I am, having bought 7 monitors so far and received nearly every possible combination of sticker and initial switch position. I am more confused and wary as if they had stuck to their original approach and just updated their manual.

But there is one indicator that might be helpful: The serial number of the monitors start with some letters and then two digits:

  • 79: You get sticker 1 and need to flip the switch
  • 99: You get sticker 2 and need not flip the switch
  • 69: You get sticker 1, but the switch is already flipped

At least that was my observation with the samples at hand.

What can we, as software developers, learn from this disaster?

First, keep an eye on your feature switches! One non-sensible default and you chase that error forever.

Second, don’t compensate the first error by making the complemental error, too. Sometimes, the cure is worse than the disease.

Third, don’t ever not avoid negative logic! Boolean logic is hard enough itself, if you further complicate it, people like me will just resort to guessing and trial-and-error.

Fourth, and that is the most important one for me: Don’t explain things that need no attention from the user. I’m definitely guilty of that one. Often, I want my documentation to be “complete” and to “show all opportunities” when all I do is confuse my users with sentences like “Do not turn on the charging switch. Let it keep the factory state!” and then never mention the “charging switch” anywhere again.

Always apply the Principle Of Least Astonishment to yourself, too

Great principles have the property that while they can be stated in a concise form, they have far-reaching consequences one can fully appreciate after many years of encountering them.

One of these things is what is known as the Principle of Least Astonishment / Principle of Least Surprise (see here or here). As stated there, in a context of user interface design, its upshot is “Never surprise the user!”. Within that context, it is easily understandable as straightforward for everyone that has ever used any piece of software and notices that never once was he glad that the piece didn’t work as suggested. Or did you ever feel that way?

Surprise is a tool for willful suspension, for entertainment, a tool of unnecessary complication; exact what you do not want in the things that are supposed to make your job easy.

Now we can all agree about that, and go home. Right? But of course, there’s a large difference between grasping a concept in its most superficial manifestation, and its evasive, underlying sense.

Consider any software project that cannot be simplified to a mere single-purpose-module with a clear progression, i.e. what would rather be a script. Consider any software that is not just a script. You might have a backend component with loads of requirements, you have some database, some caching functionality, then you want a new frontend in some fancy fresh web technology, and there’s going to be some conflict of interests in your developer team.

There will be some rather smart ways of accomplishing something and there will be rather nonsmart ways. How do you know which will be which? So there, follow your principle: Never surprise anyone. Not only your end user. Do not surprise any other team member with something “clever”. In most situations,

  1. it’s probably not clever at all
  2. the team member being fooled by you is yourself

Collaboration is a good tool to let that conflict naturally arise. I mean the good kind of conflict, not the mistrust, denial of competency, “Ctrl+A and Delete everything you ever wrote!”-kind of conflict. Just the one where someone would tell you “hm. that behaviour is… astonishing.”

But you don’t have a team member in every small project you do. So just remember to admit the factor of surprise in every thing you leave behind. Do not think “as of right now, I understand this thing, ergo this is not of any surprise to anyone, ever”. Think, “when I leave this code for two months and return, will there be anything… of surprise?”

This principle has many manifestations. As one of Jakob Nielsen’s usability heuristics, it’s called “Recognition rather than Recall”. In a more universal way of improving human performance and clarity, it’s called “Reduce Cognitive Load”. It has a wide range of applicability from user interfaces to state management, database structures, or general software architecture. I like the focus of “Surprise”, because it should be rather easy for you to admit feeling surprised, even by your own doing.

Improving Windows Terminal

As mentioned in my earlier post about hidden gems in the Windows 10 eco system a very welcomed addition is Windows Terminal. Finally we get a well performing and capable terminal program that not only supports our beloved tabs and Unicode/UTF-8 but also a whole bunch of shells: CMD, PowerShell, WSL and even Git Bash.

See this video of a small ASCII-art code golf written in Julia and executed in a Windows Terminal PowerShell:The really curious may try running the code in the standard CMD-Terminal or the built-in PowerShell-Terminal…

But now on to some more productive tipps for getting more out of the already great Windows Terminal.

Adding a profile per Shell

One great thing in Windows Terminal is that you can provide different profiles for all of the shells you want to use in it. That means you can provide visual clues like Icons, Fonts and Color Schemes to instantly visually recognize what shell you are in (or what shell hides behind which tab). You can also set a whole bunch of other parameters like transparency, starting directory and behaviour of the tab title.

Nowadays most of this profile stuff can simply be configured using the built-in windows terminal settings GUI but you also have the option to edit the JSON-configuration file directly or copy it to a new machine for faster setup.

Here is my settings.json provided for inspiration. Feel free to use and modify it as you like. You will have to fix some paths and provide icons yourself.

Pimping it up with oh-my-posh

If that is still not enough for you there are a prompt theme engine like oh-my-posh using a command like

Install-Module oh-my-posh -Scope CurrentUser

and try different themes with Set-PoshPrompt -Theme <name>. Using your customized settings for a specific Windows Terminal profile can be done by specifying a commandline to execute expressions defined in a file:

powershell.exe -noprofile -noexit -command \"invoke-expression '. ''C:/Users/mmv/Documents/PowerShell/PoshGit.ps1

where PoshGit.ps1 contains the commands to set up the prompt:

Import-Module oh-my-posh

$DefaultUser = 'Your Name'

Set-PoshPrompt -Theme blueish

Even Microsoft has some tutorials for highly customized shells and prompts

How does my Window Terminal look like?

Because seeing is believing take a look at my setup below, which is based on the instructions and settings.json above:

I hope you will give Windows Terminal a try and wish a lot of fun with customizing it to fit your needs. I feel it makes working with a command prompt on Windows much more enjoyable than before and helps to speed you up when using many terminal windows/tabs.

A final hint

You may think, that you cannot run Windows Terminal as an administrator but the option appears if you click the downward-arrow in the start menu:

CSS: z-index can be weird.

Before I start this post, there are three things I want to state:

  1. If you think the “z-index” is quite simple, you probably never bothered to care.
  2. When in doubt, one can always read the official specification
  3. There are multiple good elaborations available already (see bottom of this post), but I was missing a comprehensive list of the most important points.
Quick Motivation (skip that if you only want the facts)

Yes. We know: The web has become a place which it never intended to be. Nowadays, it seems to be accommodate everything. You want live control of measurement devices? 3D camera applications? Advanced data wizardry? Or in my case, a kind of sophisticated layout engine? … Web Dev in 2021 gives you the impression that it is all merely a matter of time (or cost).

But then, there are always the caveats. Some semi-suggestive idea turns out to be not that accessible at all. Our user experience considerations made us implement “kind of basic” windows (in the operation system sense), that appear at times and disappear at other times, and give the user maximum information while maintaining minimum clutter.

Very early on, I noticed that I had to implement my own drag’n’drop functionality, because HTML5 isn’t really there yet. But I consider that as something advanced, which also has its idiosyncrasy in every conceivable use case, so that’s ok.

But then again, a somewhat-native-feeling windowing system (even if they are only rectangles with text) makes use of a seemingly simple thing: That stuff gets drawn over other stuff in the right order. And this comes with certain pecularities.

The painting order of HTML elements is divided into stacking contexts. Stacking contexts can be stacked above or below each other, and most of the times they behave as expected, but sometimes, they are not. So, for the roundup…

Stacking Context – Essential Rules

(This assumes CSS knowledge, but don’t hesitate to comment if you have any questions.)

  • If you set any z-index, you set that z-index within the current stacking context.
    • You can never enter an outside stacking context, only create new ones inside
  • One stacking context as a whole is always either above or below other stacking contexts as a whole
  • The root stacking order (from the <html> element) is as you expect:
    • Further down in the HTML source means more upfront
  • Higher z-index means “more upfront”, but
    • z-index doesn’t mean a thing if your neighboring elements do not live inside the same stacking context!
  • Within any parent stacking context, new child stacking contexts are created by
    • Setting CSS “position” to something other than “static”
    • Setting a z-index different from the default value “auto”
      • For clarity: “z-index: 0;” is nearly the same as “z-index: auto;”, but the latter doesn’t open up a child stacking context, while the former does.
    • Setting CSS “display: flex;” or “display: grid”
    • Setting CSS: “isolation: isolate;” (what is that even?)
    • Setting CSS: “will-change” to something non-initial (what is that even?)
    • Setting one of the “graphically advanced” CSS properties like
      • opacity, transform, filter, mix-blend-mode, clip-path, mask, …

There is more, but maybe this can help you bugtracing. And two meta-points:

  • CSS evolves, so with new features, always have stacking context in mind
  • In a framework context (like the React biosphere), you might not know what your imported dependencies do under the hood. Maybe better isolate them.
Reading recommendations:

They have nice illustrations, too.

If everything fails, go back to square one:

https://www.w3.org/TR/CSS2/visuren.html#propdef-z-index

Be aware.

Applied User Research on my own pile of synthesizer machines

Last year, I moved. Moving into a new apartment is very much akin to a major rewrite of a complex piece of software. A piece of software with a limited amount of users, maybe (well, me, my girlfriend, that’s it), but with an immensely sophisticated structure of requirements… despite the decade-long experience of “living somewhere” one usually has at this point.

For example, I have a scarce habit of hoarding hardware synthesizers – from Soviet-era monstrosities like the Поливокс to modern machines, they have a few things in common, as they

  1. take quite some space (due to their extensive wiring) and some are heavy
  2. idle around for most of the time (I mean, I have a job)
  3. are versatile enough not to have a clear-cut function in any workflow.

These are somewhat essential. All in all, I had some unassigned space in my new home and my machines instantly colonized there. Like his original version of “work expands to fill the time available“, there seems to be another Parkinson‘s Law correlating hardware synthesizers and the space you allow them to have. So basically, they now occupied one room of their own.

Which could be the end of the story and everyone would live happily ever after. If you have, like, a Googol amount of rooms. And considering Point 2 above – this solution feels quite like a waste.

Now – also last year, I began to deep-dive into the techniques of User Experience. And basically, my problem is pretty much comparable to a customer with a few quite diverging requirements. Who wants his problem solved, but hasn‘t yet figured out his actual needs to begin with.

Ergo, I should be able to use my insights of the field of UX, or Interaction Design, directly to my advantage, by applying techniques of User Research to my own behaviour.

And the first question should always be: By which approach do we get the largest understanding by the least effort? But the zeroth question is actually: How “wicked” is my problem?. Do I have an absolutely ill-defined, ever-changing, non-testable set of challenges? Or is my problem-solving rather a technical feat, implementing the best patterns, clearest details, best documentation?

Indeed. Point 3 above makes my problem rather ill-defined. On some wickedness scale, with 1 being a quick YouTube search for a How-To, 10 being a problem that I would need to go on a week-long mountain retreat with daily meditation sessions… I would give it a 6-7.

This feels like it should be in the range of difficult, but solvable with a kind of generalized abstract thinking. I choose to opt for the technique of the Five Whys: The goal is to find a consecutive number of deeper questions after my problem solving. But generally, I understand this as

  • “Five” is a general number one can aim for. There can be more, less, and there can be branches in the questioning.
  • “Why” is a placeholder for any qualifier that goes to a more abstract level. It can also be a „What do you want…“, „How about…“, „What‘s wrong with…“ serving that purpose

So. I consider myself sitting on opposite sides of a table and asking:

  1. What do you want to accomplish, and why?
    • I want to have a truckload of synthesizers, fully functional, and also not to waste my space.
  2. Why do you need your space?
    • Not only do I also have other stuff. Less clutter is generally a way to improve life quality.
  3. Why don‘t you just get rid of the machines, i.e. selling, basement, …?
    • Well. I do want to have access to spontaneous synthesizer jams. The pandemic makes it hard to get that creative input anywhere else.
  4. Why does this need to be spontaneous?
    • Because musical, like any creative flow, comes spontaneously. I don‘t want to have a great idea gone by because of long efforts in setting up.
  5. Why would set up times have to be slow?
    • Because in an strictly ordered system I need to collect stuff from various spaces, i.e. the synthesizer itself, the power plug from a box of power plugs, cables for MIDI, audio, control voltages, …
  6. Why would these have to be in their ordered boxes, for apart from the synthesizer?
    • Hm. Well, I guess they wouldn‘t have to be. That‘s just what one would do..?

Now basically, this example did not happen as straightforward as I just made it seem, but it helped me to channel my focus on a rule that is actually quite known in the design of enjoyable user interfaces: „Group things according to their usage, not their nature“. This is a form of reducing cognitive load. UI designers sometimes make this mistake, e.g. grouping “everything that is a filter” in one section, “everything that is a configuration” in one section, and so on; focussing more on the technical implementation of what these are, not on the proximity in the actual domain. This gets worse, the more technical any domain in its essence is, which is the mistake I did.

Wiring hardware synthesizers together is a very technical task, but there is much more use in grouping what belongs together when one wants to use it, not when one looks at its definition.

So now I have it. I, as a user, am happy that I, as a researcher, actually asked stupid questions like “Why don‘t you sell all the garbage?” in order to channel my focus to a system, in which setting up the whole stuff is rather quick, reducing the cognitive load, or rather shift it to the cleanup process – which is fine because I now can trust the system 😉

Changing the keyboard navigation behaviour of form inputs

The default behaviour in HTML forms is that you can move the focus from one input element to the next via the tab key and submit the form via the enter key. This is also how dialogs work on most operating systems when using the native UI components. This behaviour is consistent across all browsers, and changing it messes with the user’s expectations and reduces accessibility. So I would normally advise against changing this behaviour without good reasons.

However, one of our customers wanted a different behaviour for an application developed by us. This application replaced an older application where the enter key did not submit the form, but moved the focus to the next input element. The ‘muscle memory’ effect made users accidentally submit the form by hitting the enter key, causing frustration. Since this application is not a public web site, but merely a web technology based intranet application with a small and specialized user base, changing the default behaviour is acceptable if the users want it.

So here’s how to do it. The following JavaScript function focusNextInputOnEnter takes a form element as a parameter and changes the focus behaviour on the input elements within this form.

function focusNextInputOnEnter(form) {
  var inputs = form.querySelectorAll('input, select, textarea');
  for (var i = 0; i < inputs.length; i++) {
    var input = inputs[i];
    input.addEventListener('keypress', (function(index) {
      return function(event) {
        if (!isEnter(event.which)) {
          return;
        }
        var nextIndex = index + 1;
        while (nextIndex < inputs.length) {
          var nextInput = inputs[nextIndex];
          if (nextInput.disabled) {
            nextIndex++;
            continue;
          }
          nextInput.focus();
          break;
        }
      };
    })(i));
  }

  function isEnter(keyCode) {
    return keyCode === 13;
  }
}

It works by handling the keypress events on the input elements and checking the key code for the enter key (code 13). It has an additional check so that disabled input elements are skipped.

To apply this change in behaviour to a form we have to call the function when the DOM content is loaded:

<form id="demo-form">
  <input type="text">
  <input type="text" disabled="disabled">
  <input type="checkbox">
  <select>
    <option>A</option>
    <option>B</option>
  </select>
  <textarea></textarea>
  <input type="text">
  <input type="text">
</form>

<script>
  document.addEventListener('DOMContentLoaded', function() {
    focusNextInputOnEnter(document.getElementById('demo-form'));
  });
</script>

I want to reiterate my warning that you should definitely not do this for public web sites, and elsewhere only if you know that this is what your users want.

The emoji checksum

This blog article is a story about an idea, not an actual project report. If it were a movie, it would feature the “based on real events” disclaimer.

The warehouse

Imagine a warehouse of a medium sized company. You would expect a medium sized warehouse, but in reality, the amount of items in this warehouse is nearly as big as in a big company. The difference might be the storage count of each item, but the item count is a big number. So big that each item has its own “item ID”, which is also used as the location identifier in the warehouse. Let’s see three (contrived) examples:

  • 211 725: Retaining screw, 8 mm
  • 413 114: Power transformer, 5 A
  • 413 115: Power transformer, 10 A

As you can see, different item groups have numbers with a large numerical distance while similar items are numerically close. This makes sense for the engineers using these numbers by muscle memory and for the warehouse navigation. If you read the first three digits, you already know where to turn to in the large hall. If you’ve arrived in the general area, the next three digits lead you to the exact storage space.

The operators

But that’s not how it works. The warehouse workers cannot read. Yes, you’ve read that right. The warehouse is operated by humans and the workers are not familiar with digits and numbers. They decipher each digit on their own and cannot cross-check with the article name. They navigate the warehouse with a best-effort approach. The difference between item 413114 and item 413115 is negligible for them. It’s the same thing anyway – unless you can read (and understand) that one of them blows up above 5 Ampere and the other one doesn’t. And this is a problem for the engineers. The difference between a “Power transformer able to take 10 Ampere” and a “Power transformer (5 A), aka molten copper lump” is a successful or a failed project.

So what can you do? Teach the warehouse workers how to read and deal with numbers? Would be a good approach if the turnover rate among them wasn’t so high. What else can we do? We can abstract the problem at hand, apply a suitable solution approach and see if it works.

The abstraction

If you think about the situation in abstract terms, you deal with an unreliable data transmission. You send your item list to the warehouse and receive a collection of loosely related items. That’s similar to sending data over a faulty cable. To mitigate transmission errors, we’ve invented checksums. Each suitable part of the transmission is validated (or invalidated) by a checksum.

In our case, the “suitable part of the transmission” is each single item. We should add a checksum to the item list! Instead of requesting item 413114, we request 413114/7, while item 413115 is requested as 413115/1. Now, we have a clear indicator for wrong or right. But it is still an indicator in a foreign alphabet. If you ignore the difference between 4 and 5, why not also ignore the difference between 7 and 1?

The emojification

But what if we don’t rely on numbers or characters, but on something every human can understand, regardless of literacy level? What if we transpose the numbers into an emoji alphabet? Let 413114 be 😄🌵☁️🌵🌵😄 and 413115 is written as 😄🌵☁️🌵🌵🏠. But more important: The checksum is in emoji, too:

😄🌵☁️🌵🌵😄 (413114)

🚗 (7)
vs.
😄🌵☁️🌵🌵🏠 (413115)

🌵 (1)

Even if you only glance at the emoji series (and fail to notice the difference at the end), you still have to acknowledge that your checksum doesn’t fit. A cactus is no car, regardless of your literacy.

This transposition of numbers into the iconographic realm plays right into every human’s built-in ability to distinguish concrete objects. Numbers, digits and characters are (more) abstract concepts and objects, but a cloud is recognizable as a cloud even if you draw it by hand and without care. The transposition is reversable quiet easily – you only have to remember ten number/emoji pairs (or eleven, if your checksum has an extra character). And nobody stops you from printing both on the item list and warehouse storage boxes:

And the best thing? You don’t even have to invent the transposition yourself. Just use the existing work of others by checking out emojisum by Vincent Batts or ecoji by Keith Turner.

The only thing that is stopping you is that ancient dot matrix printer that prints the item lists on continuous paper.

Zero Interaction Tools

Some time ago, a customer called us for a delicate task: To develop a little tool in a very tight budget that aggregates pictures in a specific way. The pictures were from the medical domain and comprised of sets with thousands of pictures that needed to be combined into one large picture with the help of some mathematics. Developing the algorithm was not a problem, the huge data sizes neither, but the budget was a challenge. We could program everything and test it on some sample picture sets (that arrived on several blue-ray discs) within the budget, but an elaborate graphical user interface (GUI) would be out of scope. On the other hand, the anticipated users of the tool weren’t computer affine enough to handle a CLI (command line interface). The tool needed to be simple, fast and cheap. And it only needed to do one thing.

Traditional usage of software tools

In the traditional way, a software tool comes with an installer that entrenches the tool onto the target computer and provides a start menu entry, a desktop icon and perhaps even a boot launcher with a fancy tray icon and some balloon tooltips that inform the user from time to time that the tool is still installed and wants some attention. If you click on the tool’s icon, a graphical user interface appears, perhaps in the form of an application window or just a tray pop-up menu. You need to interact with the user interface, then. Let’s say you want to combine several thousand pictures into one, then you need to specify directories or collections of files through some file dialogs with “browse” buttons and complex “ingredients” lists. You’ve probably seen this type of user interface while burning a blue-ray disc or uploading files into the cloud. After you’ve selected all your input pictures, you have to say where to write to (another file dialog) and what name your target file should have. After that, you’ll stare at a progress bar and wait for it to reach the right hand side of the widget. And then, the tool will beep and proudly present a little message box that informs you that everything has worked out just fine and you can find your result right were you wanted to. Afterwards, the tool will sit there on your screen in anticipation of your next move. What will you do? Do it all again because you love the progress bars and beeps? Combine another several thousand pictures? Shutdown the tool? Oh, come on! Are you sure you want to quit now?

None of this could be developed in the budget our customer gave us. The tool didn’t need the self-marketing aspects like a tray icon or launcher, because the customer would only use it internally. But even the rest of the user interface was too much work: the future users would not get a traditional software tool.

Zero interaction tool

So we thought about the minimal user interface that the picture aggregation tool needed to have. And came to the conclusion that no user interface was needed, because it really only needed to do one thing. At least, if certain assumptions hold true:

  • The tool is fast enough to produce no significant delay
  • The input directory holds all pictures that should be aggregated
  • The input directory can be the output directory as well
  • The name of the resulting picture file can contain a timestamp to distinguish between several tool runs

We consulted our customer and checked that the latter three assumptions were valid. So, given we can make the first assumption a reality, the tool could work without any form of user interaction by being copied into the picture directory and then started.

Yes, you’ve read this right. The tool would not be installed, but copied for every usage. This is a highly unusual usage scenario for a program, because it means that every picture directory that should be aggregated holds an identical copy of the program. But if we can make some more assumptions valid, it is a viable way to empower the users:

  • The tool must run on all target machines without additional preparation
  • The tool must only consist of one executable file, no DLLs or configuration files
  • The tool must be small in size, like one megabyte at most

We confirmed with a quick and dirty spike (an embarrasingly inchoate prototype) that we can produce a program that conforms to all three new assumptions/requirements. The only remaining problem was the very first assumption: No harddrive was fast enough to provide the pixel data of thousands of pictures in less than a second. Even if we could aggregate the pixels fast enough (given enough cores, this would be possible), we couldn’t get hold of them fast enough. We needed some kind of progress bar.

Use your information channels

We thought about the information channels our tool would have towards the user. Let’s repeat the scenario: The user navigates to the directory containing the pictures that should be aggregated, copies the executable program into it and double-clicks to start the tool. There are many possibilities to inform the user about progress:

  • Audio (Sound): We can play a little tune or some sound that changes frequency to indicate progress. This is highly unusual and we can’t be sure that the speakers aren’t muted (usage on a notebook was part of the domain analysis results). No sounds, that is.
  • Animation (Graphics): In the most boring case, this would be a little window with a progress bar that runs from left to right and disappears when the work is done. Possible, but boring. Perhaps we can think of something more in tune with the rest of the usage scenario.
  • Text: Well, this was the idea. We produce a result file and give it a name. We don’t need to keep the name static as long as we are working and things change inside the file, anyways. We just update the file name to reflect our progress.

So our tool creates a new result file in the picture directory that is named result_0_percent or something and runs up to result_100_percent and then gets renamed to result_timestamp with the current timestamp. You can just watch your file explorer to keep up with the tool’s completion. This is a bit unusual at first, but all pilot users grasped the concept immediately and were pleased with it.

The result

And this is the story when we developed a highly specialized tool within a very small budget without any graphical or otherwise traditional user interface. The user brings the tool to the data (by copying it into the same directory) and lets it perform its work by simply starting it. The tool reports its progress back via the result file name. As soon as the result file contains a timestamp (and the notebook air fans cease to go beserk), the user can copy it into the next tool in the tool chain, probably a picture viewer or a printer driver. The users loved the tool for its speed and simplicity.

One funny side-note remains to be told: Because thousands of pictures aggregated into one produces a picture with a lot of details, the result file was not too big (about 20-30 megabytes), but could take out any printer for several hours if printed. The tool got informally renamed to “printer-reaper.exe”.