Applying the Golden Circle to software development

When I was a young and impressionable software developer (1998), the slogan of the year was “the code is the documentation”. This essentially meant that comments, and (inline) code comments in particular, were a sign of bad code. The reasoning was (and still is) that if you can’t articulate your ideas clear enough in source code, adding additional text won’t rescue your communication. “Communication” is the conveying of what you want to accomplish with your code towards two listeners: the computer and the next human that works with your code. The computer is often the easier part, because it just does as it is told without interpretation.

Some years later (2004), Jeff Atwood, the author of the influential codinghorror blog, still condemns most of the comments we could find in contemporary source code. But there is more nuance than just “comments are bad” and it is cleared up later (2006) that there is a difference in what should be expressed in source code and what should be expressed in comments. Yes, you should write comments, but the “right kind”.

According to Jeff Atwood, the source code contains the “how”. It tells the story in all its details for the next human and the computer. The comments, on the other hand, are not intended for the computer and shouldn’t contain details. They should contain the “why”, the high-level picture and the motivation behind the specific “how” that we find.

The code tells you how, the comments tell you why” (2006) is a great way to describe the expected content of both “layers”.

But my feeling was (and still is) that something is missing in that description. My programs aren’t just code and comments, there are more things that I try to tell my story with. And just a few weeks ago, I had a sudden idea that I might be able to describe what that missing piece could be. It is just an idea and the puzzle probably is still missing pieces, but it feels “right” enough that I want to write this blog post to discuss the idea. But I need to talk about something else first.

In 2009, Simon Sinek presents the Golden Circle to the world. The TED talk is probably the most energetic piece of explanation in human history. The Golden Circle is “the world’s simplest idea”, defining three layers of “clarity” to actions:

  • What: The “lowest” level, meaning that every person knows “what they are doing”.
  • How: The “intermediate” level. Some people know “how they do it”. They explicitly choose their method of doing and can reason about it.
  • Why: The “highest” level. According to Simon Sinek, only a few people know “why they do it”. He talks about “purpose, cause and belief”.

If you don’t know about the Golden Circle yet, please watch the 20 minutes TED talk while I wait here for you. If you want to think about it first – I’m patient. The Golden Circle has inspired and guided me ever since. Not that I’m very good at applying it in my business or personal life, but it stayed with me and gave me a coordinate system to categorize things.

And with that categorization practice, I feel as if the layers in Jeff Atwood’s blog post from 2006 are misnamed and one crucial layer is missing:

  • What: The code tells you what (not how!). It is the detailed step-by-step recipe to replicate a behaviour. It is so simple that even a computer can do it, without being aware of it.
  • How: This is the missing layer. I want to talk about it in a minute.
  • Why: That’s the part that is correctly placed and named: The why of a story requires the spoken word. Only comments are viable for this kind of information. The computer (as of today) has no understanding what any of it means.

The missing layer is the “How”. In my idea, I envisioned that everything that is deliberately put in the code, but not readily understood by a computer, like structures, patterns, idioms or even names (I call them “activated comments”) are there for the “How”. We structure our code not because the computer requires it, the compiling stages of our programming languages even interfold and compact it until it is a binary blob. We don’t name our variables and types because the computer would learn something from them. The first thing a compiler does is to shorten our names to unreadable “symbols”. Most of our patterns get replaced by other, more minute patterns that the compiler puts into place. We put these things in our code because it helps us understand “how the story goes”. It provides us with guidance how to make sense of the mess. The structure tells you how to approach the program.

The computer only knows about “What”. There are maybe some indications about a simple “How”-awareness in some technologies, but most of the time, the computer is deaf to human communication.

Which brings me to my description of code and comment, as an updated version of Jeff Atwood’s motto:

“The code tells you what, the structure tell you how and the comments tell you why”

By “structure”, I mean everything that gets lost during translation for the computer, but is visible for the human reader. It entails high-level things like “architecture” or “code design” and lower-end decisions like names and formatting. If you have a better word for it, let me know!

I hope that this blog post inspired you to have a thought yourself. Don’t hesitate and tell us your thought in the comment below.

Renaming is hard work

Imagine you’ve developed a software solution for one of your customers, for example a custom CRM. Everything in this system revolves around the concept of a customer as the key entity. Of course, this is reflected in your code as well. The code is crowded with variable and class names like customerThis and CustomerThat. But now your customer wants you to change the term customer to client.

The minimal and lazy solution would be to change only those occurrences of the word, which are visible to the users. This includes:

  • Texts displayed in the user interface. If your application is internationalized or at least prepared for internationalization, they are usually neatly contained in translation files and can be easily updated.
  • User documentation like manuals. Don’t forget to update the screenshots.

However, if you do not want to let the code diverge from the language of the domain, you have to rename and update a lot more:

  • identifiers in the code (types, namespaces/packages, variables, constants)
  • source files and directories
  • code comments
  • log output strings
  • internationalization keys
  • text protocol strings (e.g. JSON keys) and HTTP API routes/parameters. Of course you can only do that if you are allowed to break the protocol! Don’t accidentaly break your API or formats, e.g. if JSON keys or XML tags are generated via reflection from code identifiers.
  • IDs of HTML tags in views, CSS class names and selector strings in referencing Javascript code
  • internal documentation (e.g. Wiki)

IDE support for renaming can help a lot. Especially the renaming of identifiers is reliable in statically typed languages. However, local variables, function parameters, and fields usually have to be renamed separately. In dynamically typed languages automated renaming of identifiers is often guesswork. Here you must rely on a good test coverage. Even in statically typed languages identifiers can be referenced via strings if reflection is used.

If you use a grep-like tool or the search feature of your IDE or editor to assist you, keep in mind that composite terms can occur in many different forms like FooBar, fooBar, foo-bar, foo_bar, foo.bar, foo bar or foobar. Don’t forget about irregular plural forms, e.g. technology – pl. technologies. Also think of abbreviations, otherwise a

for (Option opt : options)

might end up as

for (Alternative opt : alternatives)

But this is not the end of the story. A lot of renamings require migration scripts or update scripts to be executed at the time of the next update or deployment:

  • database tables and columns
  • some ORMs like Hibernate store class names in the database for the “table-per-hierarchy” mapping
  • string-based enum values in database entries
  • configuration keys and values
  • keys and values in data formats (JSON, XML, …) of stored data. You probably have to provide backward-compatibility for the old formats if the data files are not stored all in one place and can’t be converted in one go
  • names of data folders

If you don’t rename everything all at once, but decide to use the new term only in newly written code and to leave the old term in the existing code until it eventually gets replaced, you will probably end up with a long transition phase and lots of confusion for new project members who don’t know about the history of the project.

Software development is code organization

The biggest problem in developing and maintaining software is understanding code. Software developers should get good training in crafting code which can be understood. To make sense of the mess that is code we need to organize it.

The biggest problem in developing and maintaining software is understanding code. Software developers should get good training in crafting code which can be understood. To make sense of the mess we need to organize it.

In 2000 Edsger Dijkstra wrote about our problems organizing and designing software:

I would therefore like to posit that computing’s central challenge, “How not to make a mess of it”, has not been met. On the contrary, most of our systems are much more complicated than can be considered healthy, and are too messy and chaotic to be used in comfort and confidence.

Our code bases get so big and complicated today that we cannot comprehend them all at once. Back in the days of UNIX technical constraints led to smaller code. But the computer is not the limiting factor anymore. We are. Our mind cannot comprehend what we create. Brian Kernighan wrote:

Everyone knows that debugging is twice as hard as writing a program in the first place. So if you’re as clever as you can be when you write it, how will you ever debug it?

Writing code that we (or other developers) can understand is crucial. But why do we fail?

Divide and lose

Usually the first argument when tackling code is to decouple it. Make it clean. Use clean code principles like DRY, SOLID, KISS, YAGNI and what other acronyms you know. These really help to decouple. But they are missing the point. They are the how not the why.
Take a look at your codebase and tell me where are the classes which constitute a subdomain or a specific feature? In which project or part do they live?
Normally you cannot. We only know how to divide code by technical aspects. But features and changes often come from the domain, not from the technology.
How can we understand our creations when we cannot understand its structure? Its architecture? How can we understand something we do not see.

But it does work

The next argument is not much better. Our code might work now. But what if a bug is found or a new feature is about to be implemented? Do you understand the code and its structure? Even weeks, months or years later? Working code is good but you can only change code reliably that you understand.

KISS

Write simple code. Write simple and small methods. Write cohesive classes. The dream of components. But the whole is more than the sum of its parts. You can write simple classes but the communication and threading issues between them can be very complex. Even if the interfaces are sound and simple. Understanding a simple class can be easy in isolation. But understanding a system of simple classes can be difficult and complex. Things are complex. Domains are complex. We cannot ignore that.

Code as an interface

When writing code we have to take the reader and the domain into account. Treat code as an interface. An interface to the system and the domain. It is an opinionated view of the world. The computer does not care about the code we use. Just like the printer who prints our favorite book. But the reader does.
This isn’t just nice thinking, understanding code is key to successfully crafting and changing software.