Unit-Testing Deep-Equality in C#

In the suite of redux-style applications we are building in C#, we are making extensive use of value-types, which implies that a value compares as equal exactly if all of its contents are equal also known as “deep equality”, as opposed to “reference equality” or “shallow equality”. Both of those imply deep equality, but the other way around is not true. The same object is of course equal to itself, not matter how deep you look. And an object that references the same data as another object also has equal content. But a simple object that contains different lists with equal content will be unequal under shallow comparison, but equal under deep comparison.

Though init-only records already provide a per-member comparison as Equals be default, this fails for collection types such as ImmutableList<> that, against all intuition but in accordance to , only provide reference-equality. For us, this means that we have to override Equals for any value type that contains a collection. And this is were the trouble starts. Once Equals is overridden, it’s extremely easy to forget to also adapt Equals when adding a new property. Since our redux-style machinery relies on a proper “unequal”, this would manifest in the application as a sporadically missing UI update.

So we devised a testing strategy for those types, using a little bit of reflection:

  1. Create a sample instance of the value type with no member retaining its default value
  2. Test, by going over all properties and comparing to the same property in a default instance, if indeed all members in the sample are non-default
  3. For each property, run Equals the sample instance to a modified sample instance with that property set to the value from a default instance.

If step 2 fails, it means there’s a member that’s still at its default value in the sample instance, e.g. the test wasn’t updated after a new property was added. If step 3 fails, the sample was updated, but the new property is not considered in Equals – and it can even tell which property is missing.

The same problems of course arise with GetHashCode, but are usually less severe. Forgetting to add a property just makes collisions more likely. It can be tested much in the same way, but can potentially lead to false positives: collisions can occur even if all properties are correctly considered in the function. In that case, however, the sample can usually be altered to remove the collision – and it is really unlikely. In fact, we never had a false positive.

Partitioning in Oracle Database: Because Who Wants to Search an Endless Table?

As data volumes continue to grow, managing large database tables and indexes can become a challenge. This is where partitioning comes in. Partitioning is a feature of database systems that allows you to divide large tables and indexes into smaller, more manageable parts, known as partitions. This can improve the performance and manageability of your database. Aside from performance considerations, maintenance operations, such as backups and index rebuilds, can become easier by allowing them to be performed on smaller subsets of data.

This is achieved by reducing the amount of data that needs to be scanned during query execution. When a query is executed, the database can use the partitioning information to skip over partitions that do not contain the relevant data, instead of having to scan the entire table. This reduces the amount of I/O required to execute the query, which can result in significant performance gains, especially for large tables.

There are several types of partitioning available in Oracle Database, including range partitioning, hash partitioning, list partitioning, and composite partitioning. Each type of partitioning is suited to different use cases and can be used to optimize the performance of your database in different ways. In this blog post we will look range partitioning.

Range partitioning

Here is an example of range-based partitioning in Oracle:

CREATE TABLE books (
  id NUMBER,
  title VARCHAR2(200),
  publication_year NUMBER
)

PARTITION BY RANGE (publication_year) (
  PARTITION p_before_2000 VALUES LESS THAN (2000),
  PARTITION p_2000s VALUES LESS THAN (2010),
  PARTITION p_2010s VALUES LESS THAN (2020),
  PARTITION p_after_2020 VALUES LESS THAN (MAXVALUE)
);

In this example, we have created a table called books that stores book titles, partitioned by the year of publication. We have defined four partitions, p_before_2000, p_2000s, p_2010s, and p_after_2020.

Now, when we insert data into the books table, it will automatically be placed in the appropriate partition based on the year of publication:

INSERT INTO books (id, title, publication_year)
  VALUES (1, 'Nineteen Eighty-Four', 1949);

This book will be inserted into partition p_before_2000, as the year of publication is before 2000. The following book will be placed into partition p_2000s:

INSERT INTO books (id, title, publication_year)
  VALUES (2, 'The Hunger Games', 2008);

When we query the books table, the database will only access the partitions that contain the data we need. For example, if we want to retrieve data for books published in 2015 and 2016, the database will only access partition p_2010s.

SELECT * FROM books WHERE publication_year>=2015 AND publication_year<=2016:

However, you should be aware that while partitioning can improve query performance for some types of queries, it can also negatively impact query performance for others, especially if the partitioning scheme does not align well with the query patterns. Therefore, you should tailor the partitioning to your needs and check if it brings the desired effect.

Format-based sorting looks clever, but is dangerous

A neat trick I learnt early in my career, even before I learnt about version control, was how to format a date as a string so that alphabetically sorted lists would contain them in the “correct” order:

“YYYYMMDD” is the magic string.

If you format your dates as 20230122 and 20230123, the second name will be sorted after the first one. With nearly any other format, your date strings will not be sorted chronologically in the file system.

I’ve found out that this is also nearly the only format that most people cannot intuitively recognize as a date. So while it is familiar with me and conveniently sorted, it is confusing or at least in need of explanation for virtually every user of my systems.

Keep that in mind when listening to the following story:

One project I adopted is a custom enterprise resource planning system that was developed by a single developer that one day left the company and the code behind. The software was in regular use and in dire need of maintenance and new features.

One concept in the system is central to its users: the list of items in an invoice or a bill of delivery. This list contains items in a defined order that is important to the company and its customers.

To my initial surprise, the position of an item in the list was not defined by an integer, but a string. This can be explained by the need of “sub-positions” that form a hierarchy of items, like in this example:

1 – basic item

1.1 – item upgrade #1

1.2 – item upgrade #2

Both positions “1.1” and “1.2” are positioned “underneath” position “1” and should be considered glued to it. If you move position “1” to position “4”, you also move 1.1 to 4.1 and 1.2 to 4.2.

But there was a strange formatting thing going on with the positions: They were stored as strings in the database, but with a strange padding in front. Instead of “1”, “2” and “3”, the entries contained the positions ” 1″, ” 2″ and ” 3″. All positions were prefixed with two space characters!

Well, nearly all positions. As soon as the list grew, the padding turned out to be dependent on the number of digits in the position: ” 9″, but then ” 10″ and “100”.

The reason can be found relatively simple: If you prefix with spaces (or most other characters, maybe “0”), your strings will be ordered in a numerical way. Without the prefixes, they would be sorted like “1”, “10”, “11”, “2”.

That means that the desired ordering of the positions is hardcoded in the database representation! You probably already thought about the case of a position greater than 999. That’s when trouble begins! Luckily, an invoice with a thousand items on the list is unheard of in the company (yet!).

Please note that while the desired ordering is hardcoded in the database, the items are still loaded in a different order (as they were entered into the system) and need to be sorted by the application. The default sorting for strings is the alphabetical order, so the original developer probably was clever/lazy, went with it and formatted the data in a way that would produce the result and not require additional logic during the sorting.

If you look at the code, you see seemingly strange formatting calls to the position all over the place. This is necessary because, for example, every time a user enters a position into the system, it needs to be reformatted (or at least sanitized) in order to adhere to the “auto-sortable” format.

If you wonder how a hierarchical sub-position looks like with this format, its ” 1. 1″, ” 1. 10″ and even ” 1. 17. 2. 4″. The database stores mostly blanks in this field.

While this approach might seem clever at the moment, it is highly dangerous. It conflates several things that should stay separated, like “storage format” and “display format”, “item order” or just “valid value range”. It is a clear violation of the “separation of concerns” principle. And it broke the application when I missed one place where the formatting was required, but not present. Of course, this only manifests in a problem when your test cases (or manual tries) exceed a list of 9 entries – lesson learnt here.

I dread the moment when the company calls to tell me about this “unusually large invoice” that exceeds the 999 limit. This would mean a reformatting of all stored data or another even more clever hack to circumvent the problem.

Did you encounter a format that was purely there for sorting in the wild? What was the story? Tell us in the comments!

Try ending the workday with a beneficial ritual

One thing that is important to me is to start and end the workday with a proven and familiar routine – lets call it a ritual. There are some advantages to this approach. First, you have a defined starting point. No matter what the day may throw at you, there are some anchors in your structure or environment that you can rely on. For example, I don’t start my work without a (big) filled glass of water on my desk. It might get hectic, but my supply of water is secured until lunch. I make it a habit to empty that glass before lunch, too, but that’s not as important as the ritual of supplying myself with a beverage and only then starting my work.

My guess is that most of you already do this, too. The start of a workday is the natural point in time to install habits or even rituals. But what about the end of your workday? Sure, there is a point in time when you “drop the pen” and rush out the door. But right before this moment, there is a possibility to introduce a beneficial ritual that might only cost minutes, but brings value that furthers your career and even your current work.

My usual ritual is a short daily reflection. That’s not exactly my own idea, I just borrowed it from the Clean Code Developer Initiative. My problem with the CCDI version is the focus on software development alone, which is probably a good start, but too narrow for my work profile.

My adaption is to have three basic questions that I ask myself at the end of each workday and answer in “articulated thoughts”. You may prefer to say it out loud or write your answer down (Obsidian or similar tools might be a suitable tool for that). My questions to myself are:

  • How do you feel right now?
  • What surprised you today?
  • What do you want to remember from today’s work?

Note how these questions don’t deal with details of your current work. If you have specific topics that you want to reflect on, you can always add some more questions for a period. I have found it important not to skip or replace the three basic questions, though.

“How do you feel?” is a complicated question because it leads to your motivation for work. Of course, “tired” or “stressed” is always a valid answer. But what if you legitimately feel “proud” or “fulfilled”? Can you identify what aspect of today’s work made you proud? Can you think of a way to have more of that without neglecting other important duties?

“What surprised you today?” tries to carve out your latest learning experience. It is possible that your day was dull enough to have no surprises, but if there were, you’ve probably expanded your knowledge on a topic you didn’t expect. If the surprise was a negative one, maybe you can think about a way to make it less surprising, more rare or downright impossible in the future. In my case, this lead to some unusual gadgets like the “bad idea commands” list that hangs right besides the admininstation console. The most infamous command on this list is “mdadm –create”, by the way (I meant “mdadm –assemble” and was very surprised by the result).

“What do you want to remember?” is an explicit appeal to write your answer down. You don’t need to tell an elaborate story. Just give your future self some cues, preferably from outside your brain (Obsidian’s market claim of “a second brain” is no coincidence). Make a small note or write your future self an e-mail (this is my typical way of offloading things to future me). But persist this information now or it will be gone.

After this daily reflection, I shut down my computer and put the (probably empty) glass of water into the dishwasher. Then I switch into leisure mode.

Of course, my three questions are inspired from other sources, too. One is the workshop hosting manual for code retreats, which has a great section about the “closing circle”, a group reflection on a probably awesome day.

If you have a similar ritual, let us know about it! Write a blog entry or drop a comment below.

How comments get you through a code review

Code comments are a big point of discussion in software development. How and where to use comments. Or should you comment at all? Is the code not enough documentation if it is just written well enough? Here I would like to share my own experience with comments.

In the last months I had some code reviews where colleagues looked over my merge requests and gave me feedback. And it happened again and again that they asked questions why I do this or why I decided to go this way.
Often the decisions had a specific reason, for example because it was a customer requirement, a special case that had to be covered or the technology stack had to be kept small.

That is all metadata that would be tedious and time-consuming for reviewers to gather. And at some point, it is no longer a reviewer, it is a software developer 20 years from now who has to maintain the code and can not ask you questions any more . The same applies if you yourself adjust the code again some time later and can not remember your thoughts months ago. This often happens faster than you think. To highlight how fast details disappear here is a current example: This week I set up a new laptop because the old one had a hardware failure. I did all the steps only half a year ago. But without documentation, I would not have been able to reconstruct everything. And where the documentation was missing or incomplete, I had to invest effort to rediscover the required steps.

Example

Here is an example of such a comment. In the code I want to compare if the mixer volume has changed after the user has made changes in the setup dialog.

var setup = await repository.LoadSetup(token);

var volumeOld = setup.Mixers.Contents.Select(mixer=>mixer.Volume).ToList();

setup = Setup.App.RunAsDialog(setup, configuration);

var volumeNew = setup.Mixers.Contents.Select(mixer=>mixer.Volume).ToList();
if (volumeNew == volumeOld)
{
     break;
}
            
ResizeToMixerVolume(setup, volumeOld);

Why do I save the volume in an additional variable instead of just writing the setup into a new variable in the third line? That would be much easier and more elegant. I change this quickly – and the program is broken.

This little comment would have prevented that and everyone would have understood why this way was chosen at the moment.

// We need to copy the volumes, because the original setup is partially mutated by the Setup App.
var volumeOld = setup.Mixers.Contents.Select(mixer=>mixer.Volume).ToList();

If you annotate such prominent places, where a lot of brain work has gone into, you make the code more comprehensible to everyone, including yourself. This way, a reviewer can understand the code without questions and the code becomes more maintainable in the long run.



The year 2022 in tickets

We are a company of software developers that decided to run the company itself similar to a typical software project. All company documents are put under version control, most things that can be automated are automated (or listed on a backlog and estimated for their business value), a wiki contains all relevant information and is continually updated and extended and, most important, everything is an issue. The word “issue” is the developer synonym for “ticket”, so what I’m really saying is: “Every activity in our company has a ticket number”. Just like you don’t change the source code of a software project without an issue that motivates the change, we don’t perform work for the company without a motivating ticket. This means that you can review the company’s progress, performance and efforts by at least three activity streams or history tracks:

  • The commit history of the version control system tells the story from the viewpoint of documents. Company documents are mostly the beginning or the result of activity of our administration department. Typical documents that start processes are project orders or letters from official agencies. Typical documents that are created as a result of processes include invoices, filled in forms and more letters.
  • The edit history of the wiki tells the story from the viewpoint of process learning. We document our actual administrative processes in a structured way that might be seen as “source code for humans”. Everything we change in our approach to process and create documents can be traced in this source code. Additionally created business processes indicate a growth in business scope or complexity – or the payback of “business process debt”, the administrative equivalent to “technical debt” in a software project.
  • The resolution history of the ticket system tells the story from the actual footwork. Every activity has its ticket, so we can measure how much activity was necessary to run the company, where this activity was invested and how much regular versus extraordinary work occurred.

There are more “story lines” in our company and I could probably talk for days about how to read them and set them into context with one another, but in this blog post, I try to visualize only the footwork of the year 2022 for our small company by showing the ticket numbers. But before I can do that, we need one more piece of theory about our tickets:

We have two kind of tickets in our system:

  • Manually created tickets accompany activities that occurred “without a schedule”. A human being recognized the need for some work and wrote a ticket to document the motivation and track the progress of this work.
  • Automatically created tickets denote recurring activities that are handled by some form of automation in our company. We have developed a tool to manage the schedules of these “recurring activities”. Our job as humans is to recognize the recurring character of some of our activities, estimate a suitable schedule for it and tell the tool about it. The tool then creates tickets based on that schedule and we need to deal with them. The simplest form to deal with a recurring ticket is to close it as “won’t fix” because there is nothing to do in its regard yet.

Just keep in mind that manually created tickets always denote required activity while automatically created tickets “only” denote the need to check for required activity, but not always to perform it.

Let’s look at some numbers!

The most obvious ticket section is of course the tickets for our blog entries (you are reading BLOG-368). Because we publish one entry each week, there will be at least 52 tickets for 2022. In fact, we have fixed 54 tickets this year, with only one ticket manually created. There are not many surprises with such a strict schedule.

A less predictable topic is the purchase department (don’t be too impressed, the “department” is just its own section in the ticket system). Every purchase is tracked by its own ticket. In 2022, we had 113 purchase activities, with 94 of them manually created. This means that the non-automation ratio for our purchases is above 80 percent. We bought two different things every week and most of it was “on demand”.

The most important section of tickets for me as the CEO is the “business administration” section which encompasses all necessary non-specialized work to keep our company afloat. Let’s dissect it for the year 2022:

956 tickets were resolved over the course of the year. That’s a lot of work for a small company! Luckily, 865 tickets were created by our tool, so they don’t always require actual activity. But 91 tickets were things we needed to do but couldn’t anticipate this need (or else we would have created a schedule for it). This is two things per week that “surprised” us.
If you look at these numbers with a different mindset, you can see the effects of consequent automation: Our administration has an automation factor of 90 percent! We mostly deal with routine tasks and can rely on defined, documented and automated processes. That’s quite an achievement and I still remember the times when we had lower factors. They were more “interesting” (in the asian curse sense).
I want to add another perspective to these numbers: We also track our work time and assign it to different projects, with “administration” being one of them. In the year 2022, we booked approximately 600 person hours of work for “administration” of the company. So we spent circa 35 minutes on each ticket. This is a misleading number, because there were lots of tickets that require no more than a few minutes and some that can hog our attention for days. We also track detailed “time per ticket” numbers, but only use this data to extract “expected durations” for the most important and time-consuming tasks. This helps us to plan our administrative work around the customer project schedules.

I could write a lot more about our different topics in the adminstration section of our ticket system. We have identified around 20 distinguishable topics. But it would become more boring over time, so I close this blog entry with one last topic that is very important for an IT company: the “IT administration” or “operations”.

To keep our IT systems up and running, we worked on 48 different tickets, but only 13 of them were automatically created. This seems rather low when compared to the adminstration with around 1000 tickets, but it is very misleading. Our IT administration is nearly fully automated, so that routine work doesn’t create tickets, but starts automation runs (Jenkins builds, Ansible playbooks and such). The 48 tickets were additional work or, in the case of the 13 tickets, recurring work that requires human oversight and interference.
I’m glad that this number is as low as it is. It means that the IT runs smooth and rather silent. The 115 person hours booked for it tell the same story: Our IT is low maintenance. A tad more than 2 hours per week is an affordable price.

I hope this blog entry was entertaining enough to give you an idea of how we make things visible in our company. We use the data to test hypotheses, expose problems and track our improvement efforts. Without this data, we could only rely on assumptions, feelings and spotty memory. By reading the numbers, we (or at least I) get a feeling for the intricacies of the company that translate down to the day-to-day work and makes intuitive and appropriate management possible.

If you want to know more, feel free to leave a comment!