The charged charging switch

In this blog post, I’ll describe my experiences with a certain product (a computer monitor) and its manual. It might serve as an example of how ridiculous a poorly designed customer experience is perceived on the receiving end. Hopefully, it inspires some readers to think about sensible defaults and how to communicate them.

Let’s start with the context. In a previous blog post, I described my journey from one small monitor to four monitors in total (three big ones, one small additional one). Well, it is not just my journey – all of my co-workers have now four computer monitors for their office workplace.

This meant that we bought a lot of smaller monitors in the last months. We decided to go the monoculture route and bought one piece of our favorite model.

It arrived faulty. The only thing that this device did was to indicate “battery full” when the battery status button was pressed (yes, this particular monitor has its own battery for mobile usage). Everything else didn’t work, especially not the power button. The device was a dead fish. I returned it to the supplier.

The replacement unit was also dead on arrival. This puzzled me, because the odds of having two duds in a row seem very small. So I investigated and found an interesting fact: The unpacking and assembly instruction sheet is incomplete. Well, even more than that. It’s plain misleading.

It starts with a big lettered alert that reads “Please follow the illustration and text description strictly when opening the package and installing the display.” It then shows three illustrations of a totally different monitor and ends the instructions at the step when the styrofoam is removed (and no cables attached). At the bottom of the sheet, there’s an explanation: “The machine picture and styrofoam shown are for illustration purpose only and may differ from the actual product”. You can’t make this up.

The manual urges me to follow it “strictly” and then vaguely tells me how to unwrap the monitor from the styrofoam and nothing more. Even better, in the illustrations, there are different options given like “For binding-less, please ignore the untying action” (actual quote!). You can’t follow strictly if given multiple options and hand-wavey instructions. “Unpack the monitor correctly” is more actionable than this manual.

But that was just the beginning. The user manual actually references the correct monitor and gives usage instructions for common use cases, but it lacks a troubleshooting section. The user manual starts with a working device – and my device(s) don’t work. They don’t turn on if the power button is pressed – and it has to be pressed for 3 seconds to turn on the monitor! Yes, the manual is clear on this one: To turn the monitor on by using its power button, you have to press for three, long, “twenty-two”, tedious, “twenty-three”, seconds. That’s like having a light switch, but if you press it in the dark, it requires you to keep pressing because it could be a mistake – do you really want to have the lights on?

The device is still dead, the manual is no help for my situation, so I inspect the material a little bit more thorough. There is a sticker at the bottom of the monitor (at the opposite side from the power plug and the power button) that catches my eye. I have photographed it, because nobody would believe me otherwise. Here it is:

The first sentence is a no-brainer. But the second one is a head-scratcher: “Please turn on the charging switch for the first time”.

There is no mention of a “charging switch” in the manual. There is no switch labeled “charging” on the device. All the buttons/switches and ports that are present are described in the manual and can’t be interpreted as a “charging switch”.

But if you look at the sticker more closely, you’ll see the illustration at the right side. In reality, it is 3 mm wide and 18 mm in height. It is very small. Even smaller are the depicted things – they resemble the input ports on the right side! From the bottom up, there is a USB-C port, a micro-HDMI port and something that is encircled in the illustration. The circle is probably our hint that this is indeed the “charging switch” mentioned on the sticker.

I searched for the switch and only found a notch in the plastic, about 3 mm wide. Only by using a magnifying glass did I find a small black plastic knob at the bottom of the notch (2 mm deep). The knob is probably one square-millimeter tiny. It was situated more to the top of the notch.

I have built electronics since the early nineties. I know how to solder and recognize all kinds of electronic parts. This thing was a DIP-switch, but one of the smallest ones I’ve ever seen. And it wasn’t labeled at all. The only hint we get to search for it is the illustration on the sticker.

So – is it in the “on” position? I decided to find out by moving it down. A paper clip wire was too big to fit, so I used the smallest screwdriver my micro-mechanic screwdriver set would offer. Just a bit smaller and I would have resorted to an actual hair. The DIP-switch moved half a millimeter down and got stuck more to the bottom of the notch.

The monitor suddenly worked – after the three second pressing. The unlabeled “on” position of the unlabeled “charging switch” that you have to manipulate by using the smallest metal rod that you can find in an electronics lab is at the bottom. Good to know.

I won’t reiterate the madness that we just experienced. It gets even worse, so buckle up.

Right now, I have a working monitor that is actually pleasing to use. I buy it again – the same routine. I wonder if I should report the trick to the supplier.

We have more than two workplaces, so I buy the monitor – the same product for the same price – again, but five times now.

I get five packages with identical content. Well, nearly identical. The stickers are different!

Three monitors have the same sticker as seen above. One of them needs to be switched to turn on, the other two were already in the “on” position.

But the other two monitors have a different sticker:

Both monitors were already in the “on” position, so nothing needed to be done. But this sticker tells you to leave the charging switch alone – A switch that is never mentioned in the manual, that is so small that you probably miss it even if you search for it and that needs special equipment to be changed. That’s as if my refrigerator came with a warning sticker not to disable a particular fuse when this fuse is safely hidden away in the internals of the refrigerators electronics and never mentioned in the manual. Why point it out if my only job is to ignore it?

Remember the first manual that “strictly” tells a vague story? This is the same logic. And it gets even better with the second sentence, the one with an exclamation mark! “Let it keep the factory state!” means that it is turned off when coming from the factory? Or does it mean to keep it in the state that is delivered, regardless of the monitor being functional or disabled by it?

I still don’t know what the “on” position of this switch really is and now I’m even more confused than before.

My mind invented this elaborate fantasy story about a factory that produces monitors. One engineer is tasked with designing the charging functionality and adds the “charging switch” to enable or disable the whole feature. But she/he forgets to remove it before the blueprint is committed into production and now the switch is part of the consumer product. The DIP switch is on the “off” position by default from its producer. This renders the first batches of monitors useless because the documentation doesn’t mention the magic switch that needs to be flipped once to have the monitors turn on. The return rates are horrendous and management gets involved. They decide to get rid of the problem by applying a quick fix – the first sticker. This sentences their customers to perform a scavenger hunt of subtle hints to have the monitors work. They also install a new production line station – the switch flipper. This person needs training and is only available for the day shift – Half of the monitors leave the factory with the switch in the “on” position, the other half is in the “off” position. The first sticker remains, it is still a mystery, but the return rates are cut in half nearly overnight.

In my story, the original engineer recognizes her/his error and tries to correct it – by reversing the switch positions. The default position (“off”) now enables the feature, while the “on” position disables it. Just by turning the (still unlabeled) positions around, the factory produces ready-to-use monitors without requiring intervention from the customer.

The problem? A lot of customers have now learned the switch-flip trick and deactivate their product. And the switch flipper still deactivates half of the production without noticing. They need to inform their customers! They apply the second sticker, hoping to clear this matter once and for all.

And here I am, having bought 7 monitors so far and received nearly every possible combination of sticker and initial switch position. I am more confused and wary as if they had stuck to their original approach and just updated their manual.

But there is one indicator that might be helpful: The serial number of the monitors start with some letters and then two digits:

  • 79: You get sticker 1 and need to flip the switch
  • 99: You get sticker 2 and need not flip the switch
  • 69: You get sticker 1, but the switch is already flipped

At least that was my observation with the samples at hand.

What can we, as software developers, learn from this disaster?

First, keep an eye on your feature switches! One non-sensible default and you chase that error forever.

Second, don’t compensate the first error by making the complemental error, too. Sometimes, the cure is worse than the disease.

Third, don’t ever not avoid negative logic! Boolean logic is hard enough itself, if you further complicate it, people like me will just resort to guessing and trial-and-error.

Fourth, and that is the most important one for me: Don’t explain things that need no attention from the user. I’m definitely guilty of that one. Often, I want my documentation to be “complete” and to “show all opportunities” when all I do is confuse my users with sentences like “Do not turn on the charging switch. Let it keep the factory state!” and then never mention the “charging switch” anywhere again.

Effective computer names with DNS aliases

If you have a computer in a network, it has a lot of different names and addresses. Most of them are chosen by the manufacturer, like the MAC address of the network device. Some are chosen by you, like the IP address in the local network. And some need to be chosen by you, like the computer’s name in your local DNS (domain name service).

A typical indicator for an under-managed network is the lack of sufficiently obvious computer names in it. You want to connect to the printer? 192.168.0.77 it is. You need to access the network drive? It is reachable under nas-producer-123.local. You can be sure that either of these names change as soon as anything gets modified in the network.

Not every computer in a network needs a never-changing, obvious name. If you connect a notebook for some hours, it can be addressable only by 192.168.0.151 and nobody cares. But there will be computers and similar network devices like printers that stay longer and provide services to others. These are the machines that require a proper name, and probably not only one.

Our approach is a layered one, with four layers:

  • MAC-address, chosen by the manufacturer
  • IP address, chosen by our DHCP
  • Device name, chosen by our DNS
  • Device aliases, chosen by our DNS

Of course, our DHCP and our DNS is told by our administrator what addresses and names to give out. Our IP addresses are partitioned into sections, but that is not relevant to the users.

The device name is a mapping of a name on an IP address. It is chosen by the administrator in case of a server/service machine. It will tell you about the primary service, like “printer0”, “printer1” or “nas0”. It is not a creative name and should not be remembered or used directly. If the machine has a direct user, like a workstation or a notebook, the user gets to choose the name. The only guideline is to keep it short, the rest is personal preference. This name should only be remembered by the user.

On top of the device name, each machine gets one or several additional DNS names, in the form of DNS aliases (CNAME records). These are the names we work with directly and should be remembered. Let’s see some examples:

I want to print on the laser printer: “laserprinter.local” is the correct address. It is an alias to printer0.local which is a mapping to 192.168.0.77 which resolves to a specific MAC address. If the laser printer gets replaced, every entry in this chain will probably change, except for one: the alias will point to the new printer and I don’t have to care much about it (maybe I need to update my driver).

I want to access the network drive: “nas.local” is one possibility. “networkdrive.local” is another one. Both point to “nas0” today and maybe “nas1” tomorrow. I don’t need to care which computer provides the service, because the service alias always points to the correct machine.

I want to connect to my colleague’s workstation: Because we have different naming preferences, I cannot remember that computer’s name. But I also don’t have to, because the computer has an alias: If my colleague’s name is “Joe”, the computer’s alias is “joe.local”, which resolves to his “totallywhackname.local”, which points to the IP address, etc. There is probably no more obvious DNS name than “joe.local”.

Another thing that we do is give a service its purpose as a name. This blog is run by wordpress, so we would have “wordpress.local”, but also “blog.local” which is the correct address to use if you want to access the blog. Should we eventually migrate our blog to another service, the “blog.local” address would point to it, while the “wordpress.local” address would still point to the old blog. The purpose doesn’t change, while the product that provides it might some day.

Of course, maintaining such a rich ecosystem of names and aliases is a lot of work. We don’t type our zone files directly, we use generators that supply us with the required level of comfort and clarity. This is done by one of our internal tools (if you remember the Sunzu blog post, you now know 2 out of our 53 tools). In short, we maintain a table in our wiki, listing all IP addresses and their DNS aliases and linking to the computer’s detail wiki page. From there, the tool scrapes the computer’s name and MAC address and generates configuration files for both the DHCP and DNS services. We can define our whole network in the wiki and have the tool generate the actual settings for us.

That way, the extra effort for the DNS aliases is negligible, while the positive effects are noticeable. Most network modifications can be done without much reconfiguration of dependent services or machines. And it all starts with alias names for your computers.

Applying the KonMari method to your IT supplies room

Our company is rather small, with less than ten people working in one big room on two floors (yes, the room is divisioned vertically, not horizontally). There are a few additional rooms, like a bathroom or a kitchen, but everything else has to find a place in our working space.

There are two exceptions to this rule:

  • A small room holds all cleaning utilities
  • A bigger room holds all things IT, like our servers and our IT supplies

None of these rooms “spark joy”, as Marie Kondo would describe them. You open the door, search around while ignoring the mess, grab the thing you came for and close the door again. When it is time to put the thing back, you more or less place it where you’ve found it. The state of these rooms is slow deterioration, because it can only get worse, but not better.

The situation became unfortunate for the IT room, because it contained far more things than storage space. Cables piled up on shelves, harddisks lingered on tables at specific locations that probably indicated something. A huge collection of CDs and DVDs waited in boxes for a second installation – most of our computers don’t even have a drive for them anymore. Every drawer contained some kind of main theme (manuals, adapters, cables), but a lot of surprises, too. The time it took to find something only went up and most of the time, it was cheaper to just buy the device (again) than search for it. And if you don’t use it anymore? Put it in the IT room.

A few years back, the KonMari method of cleaning up and organizing things was promoted by Marie Kondo. It is intended for your wardrobe and kitchen, but the guidelines can also be applied to your toolshed – and your IT room:

  • Not keeping a thing is the default
  • Concentrate on only keeping useful things (things that you use regularly or that make you happy)
  • If you keep a thing, it needs a dedicated place
  • Dedicate places by “category” and don’t deviate from your categorization
  • Provide a container for each category
  • Try to stack upright in horizontal direction, not vertically

The last guideline was really eye-opening for me: Every time I dedicated a box for things, like software CDs, the stacks grew upwards. This means that “lower layers” aren’t in direct access anymore and tend to be forgotten. If you dig to the ground of the box, you find copies of obscure software like “Windows 2000” or “Nero burning rom” that you’ve not thought about in ten years or even longer.

At the bottom of our cables box, we found a dozen cables for the parallel port, an interface that was forgotten the minute USB came around in 1996. The company was founded in 2000 and we never owned a device that used this port. We also found disks for the zip 100 drive, which might have used it – we don’t remember.

These things spark nostalgia (something else than joy), but serve no practical purpose anymore. And even if somebody came around with a zip disk, we wouldn’t remember that we have the cables at the bottom of our box.

If you try to stack your things upright, everything is visible and in fast access. There is no bottom layer anymore. Applied to CDs, this means that every CD case’s spine is readable. Every CD that you want to keep needs to be in a labled case. The infamous mainboard driver CD in a paper box with drivers from 2002 for a mainboard you scrapped in 2009 has no place in this collection.

The fitting categorization of things is the most important part of the process, in my opinion. Let me explain it by a paradigm shift that made all the difference for me:

In the early days our categories were like manual, CD, cable, screw, etc. Everytime a new computer was bought, the accompanying utilities box (often the mainboard carton) got looted for these categories – manuals to the manuals, CDs to the CDs. It was easy to find the place where the CDs were stored, but hard to find the right CD.

Now, we provide a small carton for each computer and put everything related to it in this carton. It is labeled with the computer’s number and stored like a book on the shelf. If you search anything for this computer – a CD, a screw, whatever – it is in this carton. If we get rid of the computer, the carton follows suit.

We now categorize by device and not by item type. This means that the collection of 10,000 screws that were collected over the years can be discarded. They simply aren’t needed anymore. They never sparked joy.

Another topic are the cables. While most cables can be associated with a computer or a specific device, there are lots of cables that are “unbound”. Instead of lumping them all together (and forming the aforementioned layers of parallel, serial and USB1 cables), we sort them by main connector and dedicate a box for this connection type. If you search a DisplayPort cable, you grab the DisplayPort box. If you require a VGA cable – well, we’ve thrown this specific box out last year. Look in the “exotic” box.

Each box is visible and clearly labeled. Inside each box are only things that you would expect. This means that there is a lot of boxed air. But it also means that you have to think about what to store and what not – simply because the number of boxes is limited.

And this is where “sparking joy” comes into play. The IT room is not an archive for all things digital. It is also not a graveyard for discard electronics. If you can’t see yourself using the part in the future and having joy using it, don’t keep it.

We have a box labeled “random loot” that defies this filter. It contains things that we can’t categorize, don’t have an immediate use case for, but hesitate to throw away. Every household has a similar thing with “that drawer”. Our plan is to add a year label to the box and just throw it away unopened if it is older than X years.

We need to evolve the categories of the room to keep it useful. An example are USB cables that are all stored in one cable box. With USB-C on the rise, the need to separate into different USB “layers” became apparent. We will soon have at least two USB cable boxes. And perhaps, one day in the future, we might throw the non-USB-C box away.

The IT room was transformed from a frustrating mess to a living and evolving storage space that solves your concern in an efficient way. The typical use cases of the room are adressed right away, with a structure that is maintainable without too much effort.

The inspiration and guidelines of Marie Kondo and the thoughts about proper categorization helped us to have an IT room that actually sparks joy.

Basic business service: Sunzu, the list generator

This might be the start of a new blog post series about building blocks for an effective business IT landscape.

We are a small company that strives for a high level of automation and traceability, the latter often implemented in the form of documentation. This has the amusing effect that we often automate the creation of documentation or at least the creation of reports. For a company of less than ten people working mostly in software development, we have lots of little services and software tools that perform tasks for us. In fact, we work with 53 different internal projects (this is what the blog post series could cover).

Helpful spirits

Some of them are rather voluminous or at least too big to replace easily. Others are just a few lines of script code that perform one particular task and could be completely rewritten in less than an hour.

They all share one goal: To make common or tedious tasks that we have to do regularly easier, faster, less error-prone or just more enjoyable. And we discover new possibilities for additional services everywhere, once we’ve learnt how to reflect on our work in this regard.

Let me take you through the motions of discovering and developing such a “basic business service” with a recent example.

A fateful friday

The work that led to the discovery started abrupt on Friday, 10th December 2021, when a zero-day vulnerability with the number CVE-2021-44228 was publicly disclosed. It had a severity rating of 10 (on a scale from 0 to, well, 10) and was promptly nicknamed “Log4Shell”. From one minute to the next, we had to scan all of our customer projects, our internal projects and products that we use, evaluate the risk and decide on actions that could mean disabling a system in live usage until the problem is properly understood and fixed.

Because we don’t only perform work but also document it (remember the traceability!), we created a spreadsheet with all of our projects and a criteria matrix to decide which projects needed our attention the most and what actions to take. An example of this process would look like this:

  • Project A: Is the project at least in parts programmed in java? No -> No attention required
  • Project B: Is the project at least in parts programmed in java? Yes -> Is log4j used in this project? Yes -> Is the log4j version affected by the vulnerability? No -> No immediate attention required

Our information situation changed from hour to hour as the whole world did two things in parallel: The white hats gathered information about possible breaches and not affected versions while the black hats tried to find and exploit vulnerable systems. This process happened so fast that we found ourselves lagging behind because we couldn’t effectively triage all of our projects.

One bottleneck was the creation of the spreadsheet. Even just the process of compiling a list of all projects and ruling out the ones that are obviously not affected by the problem was time-consuming and not easily distributable.

Post mortem

After the dust settled, we had switched off one project (which turned out to be not vulnerable on closer inspection) and confirmed that all other projects (and products) weren’t affected. We fended off one of the scariest vulnerabilities in recent times with barely a scratch. We could celebrate our success!

But as happy as we were, the post mortem of our approach revealed a weak point in our ability to quickly create spreadsheets about typical business/domain entities for our company, like project repositories. If we could automate this job, we would have had a complete list of all projects in a few seconds and could have worked from there.

This was the birth hour of our list generator tool (we called it “sunzu” because – well, that would require the explanation of a german word play). It is a simple tool: You press a button, the tool generates a new page with a giant table in the wiki and forwards you to it. Now you can work with that table, remove columns you don’t need, add additional ones that are helpful for your mission and fill out the cells that are empty. But the first step, a complete list of all entities with hyperlinks to their details, is a no-effort task from now on.

No-effort chores

If Log4Shell would happen today, we would still have to scan all projects and decide for each. We would still have to document our evaluation results and our decisions. But we would start with a list of all projects, a column that lists their programming languages and other data. We would be certain that the list is complete. We would be certain that the information is up-to-date and accurate. We would start with the actual work and not with the preparation for it. The precious minutes at the beginning of a time-critical task would be available and not bound to infrastructure setup.

Since the list generator tool can generate a spreadsheet of all projects, it has accumulated additional entities that can be listed in our company. For some, it was easy to collect the data. Others require more effort. There are some that don’t justify the investment (yet). But it had another effect: It is a central place for “list desires”. Any time we create a list manually now, we pose the important question: Can this list be generated automatically?

Basic business building blocks

In conclusion, our “sunzu” list generator is a basic business service that might be valueable for every organization. Its only purpose is to create elaborate spreadsheets about the most important business entities and present them in an editable manner. If the spreadsheet is created as an Excel file, as an editable website like tabble or a wiki page like in our case is secondary.

The crucial effect is that you can think “hmm, I need a list of these things that are important to me right now” and just press a button to get it.

Sunzu is a web service written in Python, with a total of less than 400 lines of code. It could probably be rewritten from scratch on one focussed workday. If you work in an organization that relies on lists or spreadsheets (and which organization doesn’t?), think about which data sources you tap into to collect the lists. If a human can do it, you can probably teach it to a computer.

What are entities/things in your domain or organization that you would like to have a complete list/spreadsheet generated generated automatically about? Tell us in the comments!

My own little Y2K22 bug

Ever since the year 2000 (or Y2K), software developers dread the start of a new year. You’ll never know which arbitrary limit will affect the fitness of your projects. Sometimes, it isn’t even the new year (see the year 2038 problem that will manifest itself in late January). But more often than not, the first day of a new year is a risky time.

Welcome, 2022!

The year 2022 started with Microsoft Exchange quarantining lots of e-mails for no apparent reason other than it is no longer 2021. I was amused about this “other people’s problem” until my phone rang.

A customer reported that one of my applications doesn’t start anymore, when it ran perfectly a few days ago – in 2021. My mind began to race:

The application in question wasn’t updated recently. It has to be something in the code that parses a current date with an unfortunate date/time format. My search for all format strings (my search term was “MMddHH” without the quotes) in the application source code brought some expected instances like “yyyyMMddHHmmss” and one of a very suspicious kind: “yyMMddHHmm”.

The place where this suspicious format was used took a version information file and reported a version number, some other data and a build number. The build number was defined as an integer (32 bit). Let me explain why this could be a problem:

2G should be enough for everyone!

A 32-bit integer has an arbitrary value limit of 231=2.147.483.648. If you represent the last minute of 2021 in the format above, you get 2.112.312.359 which is beneath the limit, but quite close.

If you add one minute and count up the year, you’ll be at 2.201.010.000 which is clearly above the value limit and result in either an integer overflow ending in a very negative number or an arithmetic exception.

In my case, it was the arithmetic exception which halted the program in its very first steps while figuring out what, where and when it is.

This is a rookie mistake that can only be explained by “it evolved that way”. The mistake is in the source code since the year 2004. I wrote it myself, so it is my mistake. But I didn’t just think about a weird date format that won’t spark joy 18 years later. I started with a build number from continuous integration. The first build of the project is “build 1”, the next is “build 2”, and so on. You really have to commit early, commit often (and trigger builds) to reach the integer limit that way. This is true for a linear series of builds. But what if you decide to use feature branches? The branches can happen in parallel and each have their own build number series. So “build 17” could be the 17th build of your main branch and go in production or it could be a fleeting build result on a feature branch that gets merged and deleted a few days later. If you want to use the build number as a chronological ordering, perhaps to look for updates, you cannot rely on the CI build numbering. Why not use time for your chronological ordering?

Time as an integer

And how do you capture time in an integer? You invent a clever format that captures the essence of “now” in a string that can be parsed as an integer. The infamous “yyMMddHHmm” is born. The year 2022 is a long time down the road if you apply a quick and clever fix in 2004.

But why did the application crash in 2022 without any update? The build number had to be from 2021 and would still pass the conversion. Well, it turned out that this specific application had no build number set, because we changed our build system and deemed this information not important for this application. So the string in the version file was empty. How is an empty string interpreted as today?

Well, there was another clever code by another developer from 2008 that took a string being null or empty and replaced it with the current date/time. The commit message says “Quickfix for new version format”.

Combined cleverness

Combine these three things and you have the perfect timebomb:

  1. A clever way to store a date/time as an integer
  2. A clever way to intepret missing settings
  3. A lazy way to intriduce a new build process

The problem described above was present in a total of five applications. Four applications had fixed build numbers/dates and would have broken with the next version in 2022 or later. The fifth application had an empty build number and failed exactly as programmed after the 01.01.2022.

Lessons learnt

What can we learn from this incident?

First: clever code or a quick fix is always a bad idea.

Second: cleverness doesn’t stack. One clever workaround can neutralize another clever hack even if both “solutions” would work on their own.

Third: If your solution relies on a certain limit to never be reached, it is only a temporary solution. The limit will be reached eventually. At least leave an automated test that warns about this restriction.

Fourth: Don’t mitigate a hack with another hack. You only make your situation worse in the long run.

The fourth take-away is important. You could fix the problem described above in at least two ways:

  • Replace the integer with a long (64 bit) and hope that your software isn’t in production anymore when the long wraps around. Replace the date/time format with the usual “yyyyMMddHHmmss”.
  • Leave the integer in place and change the date/time format to “yyDDDHHmm” with “DDD” being the day of the year. With this approach, you shorten the string by one digit and keep it below the limit. You also make the build number even less readable and leave a timebomb for the year 2100.

You can probably guess which route I took, even if it was a lot more work than expected. The next blog entry about this particular code can be expected at 01.01.10000.

The four stages of automation – Part II

One of the core concepts of software development and IT in general is “automation”. By delegating work to machines, we hope to reduce costs and save time while maintaining the quality of results. But automation is not an all-or-nothing endeavor, there are at least four different stages of automation that can be distinguished.

In the first part of this blog series, we looked at the first two stages, namely “documentation” and “recurring reminders”. Both approaches are low tech, but high effect. Machines only played a minor role – this will change with this blog series part. Let’s look at the remaining two stages of automation:

Stage 3: Semi-automatic

If you have a process that is properly documented and you are reminded in a regular fashion, like once a month, you’ll soon find that some steps of the process could be done by a machine, while you as the “human in duty” still pull all the strings that orchestrate the whole thing.

If you know the term “semi-automatic” from firearms, a semi-automatic firearm doesn’t aim or shoot itself, it just reloads automatically after each shot. The shooter still has to pull (and release) the trigger for each single shot. The shooter is in full control of the weapon, it just automates the mundane and repetitive task of chambering the next round.

This is the kind of automation we are taking about for stage 3. It is the most common type of automation. We know it from our cars, our coffee machines and other consumer electronics. The car manages a lot of different tasks under the hood while we are still in control of the overall task of driving from A to B.

How does it look like for business processes? One class of stage 3 utilities are reporting tools that gather and aggregate data from different sources and present the result in a suitable manner. In our company, these tools make up the majority of stage 3 services. There are reporting tools for the most important numbers (the key performance indices – KPI) and even some for less important, but cumbersome to acquire data. Most tools just present a nice website with the latest results while others send e-mails or create pages in our wiki. If you need a report, just press a button or visit an URL and the machine comes up with the answer. I tend to call this class of tools “sensors”, because they acquire data and process it, but don’t decide on the results.

The other class of stage 3 utilities that are common are “actuators” in the sense that they perform tasks on command. We have scripts in place to shut down whole clusters of computers, clear wiki spaces or reset custom fields on important data objects, but those scripts are only triggered by humans.

A stage 3 actuator could even be something small as a mailto link. Let’s say you have to send a standardized e-mail to a known recipient as part of a monthly process. Sure, you can save a draft in your e-mail application, but you can also prepare the whole mail in an URL directly in the documentation of the process:

mailto:nobody@softwareschneiderei.de?subject=The%20schneide%20blog%20rocks!&body=I%20read%20your%20blog%20post%20about%20automation%20and%20tried%20the%20mailto%20link.%20This%20thing%20is%20awesome%2C%20thank%20you!

If you click the link above, your e-mail application will prompt you to send an e-mail to us. You don’t need to follow through – we won’t read it on that address.

You can read about the format of mailto links here, but you probably want to create working mailto links right away, which is possible with this nifty stage 3 service utility written by Michael McKeever (buy him a beer!).

Be aware that this is a classic example of chaining stage 3 tools together: You use a tool to create the mailto link that you use subsequently to write, but not send, e-mails. You, as the human coordinator, decide when to write the e-mail, if you want to adapt it to current circumstances and when to send it. The tools only speed you up, but don’t act or decide on their own.

An important aspect of this type of automation is the human duty of orchestration (which service does its thing when) and the possibility of inspection and adaption. The mailto link doesn’t send an e-mail, it just prepares it for you to send. You have the final word on the things that happen.

If you require this level of control, stage 3 automation is where your automation journey ends. It still needs the competent human operator (what, when, why) – but given a decent documentation (as outlined in stage 1), this competence can be delegated quickly. It is also the first automation stage that enables higher effectiveness through speedup and error reduction. The speedup is capped by the maximum speed of the human operator, though.

Stage 4: Full automation

The last stage of automation is “full automation”, which means that a machine gathers the data on its own, comes up with a decision based on the data and acts on its own. This is a powerful tool, but a dangerous one, too.

It is powerful because you just employed an additional worker. Not a human worker, but a machine. It doesn’t go on holiday, it doesn’t lose interest and won’t ask for a bonus.

It is dangerous, because your additional worker does exactly what it is told (programmed) to do, even if it doesn’t make sense or needs just the slightest adaption to circumstances.

Another peril lies in the fact that the investment to reach the fully automated stage is maximized. As with nearly everything related to IT, there is a relevant xkcd comic for this:

https://xkcd.com/1319/

The problem is that machines are not aware of their context. They don’t deal well with slight deviations (like “1,02” instead of “1.02”) and cannot weigh the consequences of task failure. All these things are done by a competent human operator, even without specific training. You need to train a machine for every eventuality, down to the dots.

This means that you can’t just program the happy path, as you do in stage 3, when a human operator will notice the error and act accordingly. You have to implement special case behaviour, failure detection, failure handling and problem reporting. You have to adapt the program to changes in the environment in a timely manner (this work is also present in stage 3, but can be delayed more often).

If the process contains mostly routine and is recurring often enough to warrant full automation, it is a rewarding investment that pays off quickly. It will take your human-based work on a new level: designing and maintaining an automation platform that is cost-efficient, scalable and adjustable. The main problem will be time-critical adjustments and their overall effects on the whole system. You don’t need routine workers anymore, but you’ll need competent technicians on stand-by.

Examples of fully automated processes in our company are data backups, operating system upgrades, server monitoring and the recurring reminder system that creates the issues for our stage 2 automation. All of these processes have increased reporting capabilities that highlight problems or just anomalies in a direct manner. They all have one thing in common: They are small, work on only one thing and try to do so with minimal dependencies and interaction.

Conclusion

There are four distinguishable stages of automation:

  1. documentation
  2. recurring reminders
  3. semi-automatic
  4. full automation

The amount of human work for the actual process decreases with each stage, while the amount of human work for the automation increases. For most processes in an organization, there will be a sweet spot between process cost and automation cost somewhere on that spectrum. Our job as automaters is to find the sweet spot and don’t apply too much automation.

If you have a good story about not enough automation or too much automation or even about automation being just right – tell us in the comments!

The four stages of automation – Part I

One of the core concepts of software development and IT in general is “automation”, the “creation and application of technologies to produce and deliver goods and services with minimal human intervention” (definition from techopedia).

The problem is that “minimal human intervention” is often misunderstood as “no human intervention”, which is the most laborious and expensive stage of automation that might not have the most economic return on investment. It might be more efficient to have some degree of intervention left while investing only a fraction of the automation work and duration.

In order to decide “how much” automation is the most profitable for the foreseeable future, I’ve established a model with four stages of automation that I can quickly check against the circumstances. In this blog post, I describe the first two stages and give some ideas how to implement them.

Stage 1: Documentation

The first step to automation is to just describe the process in a manner that can be repeated. The documentation itself does nothing, but it enables repetition and scalability, two fundamental aspects of automation.

Think about baking a pie. If you just mix some ingredients and put it in the oven for an arbitrary amount of time, you might produce the most delicious pie ever, but you cannot do it again if you don’t remember all details and, even more tragic, nobody else can bake your pie. In order to give others the secret to your special pie, you have to give them the recipe – the documentation of its production process. Once the recipe is written down (and published), it can be read by many bakers in parallel and enables all of them to recreate your invention (to some degree at least, there are probably still some tricks and secrets left out of the recipe).

While the pie baking process still needs human intervention (the bakers that read the recipe and transform it into a series of actions), it is automated in the sense that it can be repeated with roughly the same result and these repetitions, given enough bakers and ovens, can be performed in parallel.

The economic evaluation of documentation shows that it is really easy to create, fast to change and, given some quality of content, nearly universally understood. If you don’t want to invest a lot of time and money, documenting your processes is the first and most important step towards automation. For a lot of your processes, it will also be the last possible stage of automation, at least until artificial intelligence learns your tricks and interpretations.

Documenting your processes is (no surprises here) the foundation of most quality assurance standards. But it is surprisingly hard to start with. This is not a matter of tools – pen and paper will do in the beginning. It is a shift in your mindset. The goal is no longer to bake a pie. It is to write a recipe while you bake the pie as a reference piece for it. If you want to start documenting your processes, here are three tips that might help you:

  • Choose a digital tool that doesn’t obstruct you. It should be digital because this facilitates distribution and collaboration. It should not hinder you because every time you need to think about the tool, you lose the focus on your process. I’m using a Wiki that lets me type the things I want to say without interference. In my case, that’s Confluence, but Obsidian or other tools are just as good.
  • Try to adopt a narrative structure to describe your processes. Think about the established structure of a baking recipe. For example, there is an ingredients list separate from the preparation instructions. If you find a structure that works for you, repeat and evolve it. It helps you and your readers to stay on track and don’t scatter the information all over the place. In my case, the structure consists of four paragraphs:
    1. Event/Trigger – The circumstance(s) that should be present at the beginning of the process
    2. Actions/Steps – The things you have to do, described in the necessary details for the target audience. This is often the paragraph with the most content.
    3. Result – Description of the circumstance(s) that should be present once you’ve done all steps. In recipes, this is often a photo of the meal/pastry. For first-time performers, this description is important to be able to declare success.
    4. Report – Who needs to be informed? This paragraph is often missing in descriptions, but crucial for collaboration. If nobody knows there is a fresh and delicious pie in the kitchen, it will not be eaten. Ok, that’s a bad example: Pies in the oven announce themselves with their smell. Digital products often have no smell – inform your peers!
  • Iterate over your documentation any chance you get. It is easy to bake your signature pie from memory. But is the recipe still accurate? Are there details that are important, but missing from the description? Your digital tool probably allows immediate modification of your documentation and maybe even informs interested readers about your update. Unchanged documentation is dead documentation. In my case, I always open my process description on a secondary monitor whenever I perform them. Sometimes, I invite others to perform the process for me to review the accuracy and fidelity of the documentation.

If you can open the process description of many of your routine tasks, you have reached the first stage of automation for your work. Of course, there will be lots of things you do that are not “routine” – yet. With good documentation, you can even think about delegation – the art of maximizing the amount of work done by others – without sacrificing essential quality.

In later stages, the delegation target (the “others”) will be machines.

Stage 2: Recurring reminders

If you’ve documented a process with a structure similar to mine, you specified a trigger or event that requires the process to be performed. Perhaps its the first day of the month and you need to update your timesheet or send out the appointment overview for the next weeks. Maybe your office plants silently thirst for some water. Whatever it is, if your process is recurrent, you might think about recurring reminders.

This will not automate the performance of the process, but unburden you of thinking about the triggering event. The machines will now remind you about certain tasks. This can be a simple series of reminders in your schedular app or, like in my case, the automated creation of issues (or todo items, tickets) in your work planning application.

For example, once every few weeks, a friendly machine creates an issue for me to write a blog entry on this blog. It does the same for my colleagues and even sets a “due date” (The due date for this post is today). With this simple construct, some discipline and coordination, we’ve managed to write one blog post every week for more than ten years now.

The machine that creates the issues doesn’t check them. It doesn’t supervise their progress and isn’t offended if we “won’t fix” issues because we are on holiday or the plants are still wet. It will just create the next issue according to the rhythm. It is our duty as humans to check if that rhythm fits or if it should be sped up or slowed down.

If you want to employ really elaborate triggers for your reminders, a platform like “If this then that (IFTTT)” might be the right choice. Just keep in mind that with complexity, there often comes rigidity, which isn’t always desired.

By automating the aspect of reminding us about the routine tasks, we can concentrate on doing them. We don’t forget to write blog posts or to water the plants because the machine doesn’t forget. Another improvement is that this clearly distinguishes between routine (has a recurring reminder) and anomaly. If the special one-time task occurs again, we give it a recurring reminder and adopt it as a new routine task. If a reminder about a routine task is “won’t fixed” often enough without any inclination that it will be required again, we delete the reminder.

Conclusion for part I

If you combine automated recurring reminders with structured documentation, you already gain a lot of advantages and can free your mind from the mundane details and intervals of your routine tasks. You haven’t automated any aspect of your real work yet, which means that these two stages can be applied to most if not all workplaces.

In the next part of this series, we will look at the two stages that become integrated with your actual work. Stay tuned!

How my display usage changed over time

When I was eight years old, my parents bought our first computer. With it came a tiny monochrome display that could be used to show 80×25 characters in amber yellow. I’m typing this text on my most recent computer that is equipped with several displays that show a combined amount of nearly 27 million pixels with at least 2^24 colors each. I don’t dare to count the number of characters that are on screen right now. Something happened along the way.

The formative years

Me as an eight year old boy immediately “clicked” with that first computer. It became my destiny to unlock its full potential. I was delighted when my parents upgraded to a much better PC years later with a color display that could actually show 256 colors at once on a 14″ frame. It was still a CRT monitor, so the refresh rate was probably around 30 Hz and I remember the “fishbowl eyes” you got from longer computer sessions.

If we want to have a visual representation of this monitor, it looks like this:

14″ CRT, 4:3

My first own computer came with a 17″ CRT monitor, which was considered a luxury size and didn’t really fit on the desk. I used this monitor up until my first year of my studies. Nothing in my world would suggest that using more than one monitor per computer is even possible. This computer had a mouse without scroll wheel and no internet access:

17″ CRT, 4:3

When I studied computer science, I came in contact with a lot of people that all took computing and programming serious. Some had monitors the size of a freezer, which hinted at me (and my peers) that 17″ is not as lavish as we thought. But still, a computer had one CPU and one monitor. I scraped my money together and bought a 19″ flat panel CRT monitor. Flat panel just meant that the display area didn’t resemble a fish bowl by itself. It could run up to 60 Hz:

19″ CRT, 4:3

The professional setup

That was my personal computing situation when I founded my company in the third year of my studies. I knew that the equipment had to be better and more professional. Our first work computers still had one CPU and one monitor. It just happened to be gigantic 21″ CRTs. Our desks had extra depth to provide a healthy distance between eyes and display area. Those monitors were delicate enough to provide a “de-gauss” button that would unhinge random electronics around it if pressed:

21″ CRT, 4:3

This was how software was developed in the early 2000s. A “fast” computer with lots of RAM (1 GB were not unheard of), a magnetic harddisk with 160 GB of storage and still one CPU and one monitor. At least, they had internet access and a mouse with a scroll wheel now.

Everything we did, we did in the same place

This setup lasted for four or five years, with better computers, but still the same old monitors. Then, virtually over night, the prices for the new and very cool TFT “flat panel” monitors dropped to readonable numbers. These monitors were really flat and very thin compared to the CRT fish bowls that hogged our desks. We were thrilled and replaced all of our monitors within one year.

The double setup

But just as the CPUs now got two “cores”, we didn’t just replace our one monitor, we doubled it. Every workplace now had two monitors:

2x 24″ TFT, 16:10

And not just that. The monitors were bigger, better, smaller, easier on the eye and had a greater resolution (called WUXGA, essentially FullHD with some extra pixels on the vertical axis).

And we had two of them! This was a game changer because things that used to be done one after another could now be done in parallel – on the CPU and on the monitors. We began to dedicate screen estate to fixed activities:

The monitors are now assigned to certain tasks

The left monitor was the “coding space” while the right monitor was the “tryout space”. The actual distribution of activity to screen location differed from developer to developer, but we all agreed that we would not go back to single monitoring.

During that time, I was sometimes asked if two monitors “are worth the investment”. I blogged about it and I’m still convinced that a second monitor is the single most profitable investment you can do for a developer.

The triple setup

In the blog post above, I made one statement that I soon took back: A third monitor is not the game changer like the transition from one to two monitors, but – if the hardware issues are solved – the next step in the evolution that truly separates work, work result and communication:

3x 27″ TFT, 16:9

This setup is probably wider than your standard desk and requires a dedicated monitor stand, but it is the first time you can do the three essential things a programmer does in parallel:

  • Browse the internet (like stackoverflow or an API documentation)
  • Edit your source code in a fullscreen IDE
  • Watch the result of your changes live (with hot reloading)

Your workflow essentially moves your head from the left (gather new knowledge) over the middle (apply the new knowledge) to the right (evaluate the result of the new knowledge) and back again for the next step:

A typical left-to-right setup

This is our default workplace setup since 2018, with two possible resolution levels:

  • QHD: 3x 2560 x 1440 pixels. This results in 11 million pixels per computer
  • UHD: 3x 3840 x 2160 pixels. You now have almost 25 million pixels at your disposal

There is a biological limit what a human can see at once. This setup nearly fills your complete viewspace. You cannot fit a fourth monitor to the sides that you can really see. The only possibility to expand is now the vertical axis, with additional monitors above and maybe below.

The pandemic setup

I would probably still use the triple monitor setup if there hadn’t happened a fundemental change in the way we develop software in early 2020. In March 2020, we decided within days to abandon our office desks and retreat into home office workplaces that were improvised at first. Now, nearly two years later, all these workplaces are fully equipped and still continually improved. But not only our places changed, our communication as well. Video calls are a natural component of our workday now. And in my case, they happen in parallel to my normal work. So I had to dedicate screen space to videoconferencing. And I’ve done it by adding a fourth monitor:

3x 27″ TFT, 1x 10″ TFT, 16:9

This small monitor sits right next to the webcam, so if I look at my dialog partner, I also seem to look right into the camera. This setup adds a new distinctive activity to the mix:

You can guess what I’m doing by following my gaze

I’ve described the other ingredients for a fully equipped home office in a previous blog post. You can see an early photo of my setup in this post.

Conclusion

And this is the setup I’m writing this blog post on. 27 million pixels that I can use to speed up my workflow by assigning dedicated working zones. If you had asked my in 2009 if I can imagine to double the amount of monitors and have nearly six times more pixels available, I would have said no way.

But by looking back to the beginning, I can see how the fundamentals of personal computing changed in every aspect. A computer is no longer “one CPU” and it doesn’t have only one monitor. Today’s displaying technology is capable of providing a lot of screen estate. The main limiting factor is our own imagination. Reaping the benefits of dedicated display areas is satisfying and increases your work troughput effortlessly.

If you ask yourself how your ideal monitor setup should look like, try to reflect on how you move your application windows around or how often you switch applications without moving your head. If you would like to make the switch without hiding the previous context, you’ve just found a use case for an additional monitor.

What is your monitor setup and your usage pattern with it? Tell us in the comments!

Hyperfocus on Non-Essentials

When tasked with managing a complex and potentially overwhelming project, a common behaviour of inexperienced managers/developers is to focus on things that are easy to achieve (“low-hanging fruits”), fun to produce (“cherry-picking”) or within the comfort zone.

This means that in the extreme, the developer exclusively focusses on things that are of no interest for the business client but can simulate progress and results.

This behaviour is an application of the “path of least resistance” and I know exactly what it feels like. Here’s the story why:

When I was fourteen years old, my programming career was already 6 years in the making. Of course, I only wrote code for myself, teaching myself new concepts and new errors alike. My only scale of success was “does it run?” and “is it still fun for me?”. My only programming language was BASIC, first the dialect GW-BASIC (still with line numbers!), then the more advanced QBasic (with named jump markers instead of line numbers).

I grew up in small cities and was basically alone with my hobby. But a friend had a parent that owned an optometrist shop that was interested in using computers for their day-to-day operations. I was asked to write a program to handle the shop’s inventory and sales. The task was interesting, but I had no idea how any shop, let alone this particular one, handles their business. I agreed to build a prototype and work from there.

I knew that this project was bigger and more ambitious than any hobby project of my own before, but it was programming after all – how hard could it be?

My plan was to do two things in parallel: Buy and read a book about real software development with BASIC and try to sketch out the application as as “coded paper prototype”.

The book turned out to be the confessions of a frustrated software developer that basically assured the reader on every page that BASIC was not dead and appended dozens of pages with code listings to every chapter. There was probably a lot of wisdom in this book, too, but it missed me by miles.

The sketch of the application began with a menu of all the things I thought would be necessary, like “inventory” or “sales process”. I also included an “Extras” menu and one thing in the menu should be a decent screen saver. Back in those days, the CRT monitors suffered from burn-in if the same image was shown for a long time and I figured that this application would run all day every day, so it seemed logical and important to have a screen saver that is automatically turned on after some period of inactivity.

Which presented itself as a really hard problem, because BASIC was essentially single-threaded (or at least it was to my knowledge back then) and I had to invent some construct that can perhaps be described as “obscure co-routines”. That was some fun programming!

After I solved the automatic activation of the screensaver functionality, I discovered that I could easily make the actual screensaver that gets shown a parameter. So I programmed not one, but several cool and innovative ASCII art screensavers that you could choose from in the extras menu. One screen saver was inspired by the snake game, another one was “colored blocks” that would appear and disappear to form a captivating mood picture.

That was the state of the application when my friend’s parent asked for a demo. I had:

  • No additional knowledge about application design
  • A menu of things I invested no second thought in
  • Several very cool screensavers that activated themselves automatically. Isn’t that great?

You can probably guess how that demo went. None of the things I had developed mattered in the slightest for the optometrist shop. My passion for my creation didn’t translate to the business very well.

I had worked intensively on this project. I hyperfocused on totally non-essential stuff and stayed mostly in my comfort zone, even if I felt as if I had made great progress.

It is easy to fall into this trap. It is easy to mistake one’s own feelings of progress and success with the external (real) ones. It feels very good to work frantically on things that matter to oneself. It becomes a tragedy if the things only matter to oneself and nobody else.

So what can we do to avoid this trap? If you have an idea, write a comment about it! I hope to hear lots of different takes on this problem.

Here is my solution: “Risk first”. With this project strategy, the first task in a project is to solve the hardest part, to cut the biggest knot or to chart the most relevant area. It means that after the first milestone is a success, the project will gradually become easier. It’s the precursor to “fail fast”, which is a “risk first” project that didn’t meet its first milestone.

It is almost guaranteed that the first milestone in a “risk first” project will not be in your comfort zone, is no low-hanging fruit that you can pick without effort and while it might be fun to work on, it’s probably something your customer has a real interest in.

By starting a project “risk first”, I postpone my tendency to focus on non-essentials towards the end of the project. And with concepts like “business value”, I can see very clearly when my work becomes irrelevant for the customer. That’s when I stop my professional work and my hobby begins.

Wear parts in software

I want to preface my thoughts with the story that originally sparked them (and yes, I oftentimes think about software development when unrelated things happen in the real world).

I don’t own a car myself, but I’m a non-hesistant user of rental cars and car sharing services. So when I have to drive long distances, I use many different models of cars. One model family is the Opel Corsa compact cars, where I’ve driven the models A to C and in the story, model D.

It was on the way back, on the highway, when darkness settled in. I switched on the headlamps and noticed that one of them was not working. In germany, this means that your car is unfit for travel and you should stop. You cannot stop on the highway, so I continued driving towards the next gas and service station.

Inside the station, I headed to the shelf with car spare parts and searched for a lightbulb for a Corsa model D. Finding the lightbulbs for A, B and C was easy, but the bulbs for D weren’t there. In fact, there wasn’t even a place for them on the shelf. I asked the clerk for help and he laughed. They didn’t sell lightbulbs for the Corsa model D because changing them wasn’t possible for the layman.

To change a lightbulb in my car, you have to remove the engine block, exchange the lightbulb and install the engine block again. You need to perform this process in a repair shop and be attentive to accidental leakage and connector damage.

Let me summarize the process: To replace an ordinary wear part, you have to perform delicate expert work.

This design paradigm seems to be on the rise with consumer products. If you know how to change the battery on your smartphone or laptop, you probably explicitly chose the device because of this feature.

Interestingly, the trend is reversed for software development. Our architectures and design efforts try to separate between primary code and wear part code. Development principles like SRP (Single Responsibility Principle) or OCP (Open/Closed Principle) have the “wear part code” metaphor in mind, even if it isn’t communicated in such clarity.

On the architecture field, a microservice paradigm maps a complex mechanism onto several small and isolated parts. The isolation aspect is crucial because it promotes replaceability – you don’t need to remove and reinstall a central microservice if you want to replace a more peripheral one. And even the notion of “central and peripheral” services indicates the existence and consideration of an abrasion effect.

For a single application, the clean, hexagonal or onion architecture layout makes the “wear part code” metaphor the central aspect of your code positioning. The goal is to prepare for the inevitable technology replacement and don’t act surprised if the thing you chose as your baseplate turns out to behave like rotting wood.

A good product design (at least for the customer/user) facilitates maintainability by making simple upkeep tasks easy.

We software developers weren’t expected to produce good products because the technological environment moved faster than the wear and nobody but ourselves could inspect the product anyway.

If a field moves faster than the abrasion can occur, longevity of a product is not a primary concern. Your smartphone will be outdated and replaced long before the battery is worn out. There is simply no need to choose wear parts that live longer than the main product. My postulation is that software development as a field has slowed down enough to make the major abrasive factors and areas discernable.

If nobody can inspect the software product and evaluate its sustainability, at least the original developer can, right? You can check for yourself with a simple experiment. Print the source code of your software (or parts of it), take two text markers (my favorite colors for this kind of approach are green and blue) and mark the code you deem primary with the first text marker. Any code you consider a wear part gets colored with the second marker. If you find it difficult to make the distinction or if the colors are mingled all over the place, this might be an indication that you could improve things.

What is a wear part in software? I would love to hear your thoughts and definitions in the comment section! My description, with no claim to be complete, would be any code that has a high probability to change because of one of the following reasons:

  • The customer/user is forced to make a change request by external forces like legal regulation
  • Another software/system/service changes, forcing your software to adjust its understanding of its surrounding
  • The technical field moved, changing your perception of the code

If you plan for maintainability in software development, you always plan for obsolescence and replacement. Our wear parts are different from mechanical ones in their uniqueness – we don’t replace a lightbulb with the same model, we replace unique code with different, but also unique code. But the concept of wear parts is the same:

Things that are likely to be replaced are designed for easy replacement.