When everything’s an issue

Years ago, I read the novel “Manna” by Marshall Brain. It’s a science fiction story about the robotic takeover and it features “Manna”, an (artificially) intelligent work management software that replaces human managers and runs the shop. The story starts with “Manna” and goes on to explore the implications of such a system on mankind. It’s a good read and contains a lot of thoughts about what kind of labor we want to do.

The idea that really captivated me was the company that runs itself. Don’t get me wrong: Most organizations are so big that the individual employee cannot see the big picture anymore. Those organizations seem to “run itself” to the untrained eye, but it is still humans that manage the workload. And like all humans, they make mistakes and, perhaps very subtle, infuse their own selfish goals into the process. But an organization that has its goals and instructs its workers (humans and machines alike) directly is an interesting thought for me.

It also is totally unrealistic with today’s technology and probably contains some risks that should be explored carefully before implementing such a system in the wild.

But what about a more down-to-earth approach that achieves the core advancement of “Manna” without many or all of the risks? What if the organization doesn’t instruct, but makes its needs visible and relies on humans to interpret and schedule those needs and fulfill them? In essence, a “Manna” system without the sensors and decision-making and certainly without the creepy snooping tendencies. Built with today’s technology, that’s called an automated work scheduler.

And that is what we’ve built at our company. We use an issue tracking system to manage and schedule our project work already. We extended its usage to manage and schedule our administrative work, too. Now, every work unit in the company is (or could be) accompanied by an issue in the issue tracker. And just like software developers don’t change code without an issue, we don’t change our company’s data or decisions without an issue that also provides a place for documentation related to the process. We’ve come to the conclusion that most of those administrative issues are recurring. So we automated their creation.

Our very early stage “Manna” system is called “issue scheduler”, a highly creative name on its own. It is a system that basically contains a lot of glorified cron expressions and just enough data to create a meaningful issue in the issue tracker, should a cron expression fire. So basically, our company creates issues for us on a fixed schedule. Let’s look at some examples:

  • We add a new article to our developer blog (you’re reading it right now) every week. This means that every week, our “issue scheduler” creates a blog issue and assigns it to the next author in line. This is done some time in advance to give the author enough time to prepare and possibly trade with other authors. Our developer blog has the “need” for one article each week, but it doesn’t require a particular topic or author. This need is made visible by the automatic blog issues and it is our duty to fulfill this need. On a side note: Maybe you’ve noticed that I wrote two blog articles in direct succession. There is definitely some issue trading going on behind the scenes right now!
  • We tend to have many plants in our office. To look at something green and living adds to our comfort. But those plants have needs, too. They probably make their needs pretty visible, but we aren’t expert plant caregivers. So we gave the “issue scheduler” some entries to inform us about the regular watering and fertilization duties for our office plants. A detailed description of the actual work exists in our company wiki and a link to it gives the caregiver of the week all the information that’s needed.
  • Every month, we are required to file a sales tax summary report. This is a need of the german government agencies that we incorporate into our company’s needs. To work on this issue, you need to have more information and security clearances than fits on a wiki page, but to process is documented nonetheless. So once a month, our company automatically creates an issue that says “do your taxes now!” and assigns it to our administrative employees.

These are three examples of recurring tasks that are covered by our poor man’s “Manna” system. To give you a perspective on the scale of this system for a small company like ours, we currently have about 140 distinct rules for recurring issues. Some of them fire almost every day, some of them sleep for years and wake up just in time to express a certain need of the company that otherwise would surerly be forgotten or rediscovered after the fact.

This approach relieves us from the burden to remember all the tasks and their schedules and lets us concentrate on completing them. And our system, in contrast to “Manna” in the story, isn’t judging or controlling. If you don’t think the plants need any more water, just resolve the issue with “won’t fix”. Perhaps you can explain your decision in a short comment for other humans, but our “issue scheduler” won’t notice.

This isn’t the robotic takeover, after all. It’s just automated scheduling of recurring work. And it works great.

Ignoring YAGNI – 12 years later

Fourteen years ago, we started to build a distributed system to gather environmental data in an automated 24/7 fashion. Our development process was agile and made heavy use of short iterations (at least that was what they were then, today they are normal-sized). So the system grew with many small new features and improvements, giving the customer immediate business value.

One part of the system was the task scheduler. Because the system had to run 24/7 and be mostly independent of human interaction, the task scheduler’s job was to launch different measurement processes at the right time. We had done extensive domain crunching and figured out that all tasks follow a rigid time regime like “start every 10 minutes” or “start every hour”, regardless of the processes’ runtime. This made the scheduler rather easy to develop. You should keep it simple, after all.

But another result of the domain crunching bothered us: The schedule of all tasks originated from the previous software system, built 30 years ago and definitely unfit for the modern software world. The schedules weren’t really rooted in the domain, they all had technical explanations like “the recording of the values is done sequentially and takes up to 8 minutes, we can’t record them more often than that”. For our project, the measurement hardware was changed, so our recording took a couple of milliseconds. We could store and display the values continuously, if the need arises.

So we discussed the required simpleness or complexity of the task scheduler with the customer and they seemed pleased with all the new possibilities. But they decided that the current schedules were sufficient and didn’t need to be changed. We could go ahead and build our simple task scheduler.

And this is when we decided to abandon KISS and make the task scheduler more powerful than needed. “But you ain’t going to need it!” was the enemy. Because we knew that the customer will inevitably come around and make use of their new possibilities. We knew that if we build the system with more complexity, we would be the heroes in a future time, wearing a smug smile and telling the customer: “We’ve already built this, you can use it right away”. Oh how glorious this prospect of the future shone! Just a few more thoughts going into the code and we’re set for a bright future.

Let me tell you a few details about the “few more thoughts” with the example of an “every hour” task schedule. Instead of hard-coding the schedule, we added a configuration file with a cron-like expression for the schedule. You could now leverage the power of cron expressions to design your schedule as you see fit. If you wanted to change the schedule from “every hour” to “every odd minute and when the pale moon rises”, you could do so. The task scheduler had to interpret the configuration file and make sure that tasks don’t pile up: If you schedule a task to run “every minute”, but it takes two minutes to process, you’ve essentially built a time-bomb for your system load. This must not be feasible.

But it doesn’t stop there. A lot of functionality, most of which wasn’t even present or outlined at the time of our decision, relies implicitly on that schedule. Two examples: There are manual operations that must not be performed during the execution of the task. The system goes into a “protected state” around the task execution. It disables these operations a few minutes before the scheduled execution and even some time afterwards. If you had a fixed schedule of “every hour”, you could even hard-code the protected timespan. With a possible dynamic schedule, you have to calculate your timespan based on the current schedule and warn your operator if it isn’t possible anymore to find a time slot to even perform the manual operation.
The second example is a functionality that supervises the completeness of the recorded data. The problem is: This functionality is on another computer (it’s a distributed system, remember?) that doesn’t know about the configuration files. To be able to scan the data archive and say “everything that should be there, is there”, the second computer needs to know about all the schedules of the first computers (there are many of them, recording their data on their own schedules and transferring it to the second computer). And if a schedule changes, the second computer needs to take the change into account and scan the data archive for two areas: one area with the old schedule and one area with the new schedule. Otherwise, there would be false alarms.

You can probably see that the one decision to make the task scheduler a little more complex and configurable as required had quite some impact on the complexity of other parts of the system. But this investment will be worth it as soon as the customer changes the schedule! The whole system is programmed, tested and documented to facilitate schedule changes. We are ready!

It’s been over twelve years since we wrote the first line of code for the more complex implementation (I’ve checked the source control logs). The customer hasn’t changed a single bit of the schedule yet. There are over twenty “first computers” and they all still run the same task schedule as initially planned. Our decision did nothing but to add accidental complexity to the system. It probably introduced some bugs along the way, too. It certainly increased our required level of awareness (“hurdle of understanding”) during the development of features that are somewhat coupled with the task schedule.

In short: It’s been a disaster. The smug smile we thought we’d wear has been replaced by a deep frown. Who wrote all that mess? And why? It wasn’t the customer, it was us. We will never be going to need it.