fmt::format vs. std::format

The excellent {fmt} largely served as the blueprint for the C++20 standard formatting library. That alone speaks for its quality. But I was curious: should you now just use std::format for everything, or is fmt::format still a good option? In this particular instance, I wanted to know which one is faster, so I wrote a small benchmark. Of course, the outcome very much depends on the standard library you are using. In my case, I’m using Visual Studio 17.13.0 and its standard library, and {fmt} version 11.1.3.

I started with a benchmark helper function:


template <std::invocable<> F> steady_clock::duration benchmark(std::string_view label, F f)
{
  auto start = steady_clock::now();
  f();
  auto end = steady_clock::now();
  auto time = end - start;
  auto us = duration_cast<nanoseconds>(time).count() / 1000.0;
  std::cout << std::format("{0} took {1:.3f}us", label, us) << std::endl;
  return time;
}

Then I called it with a lambda like this, with NUMBER_OF_ITERATIONS set to 500000:

int integer = 567800;
float real = 1234.0089f;
for (std::size_t i = 0; i < NUMBER_OF_ITERATIONS; ++i)
  auto _ = fmt::format("an int: {}, and a float: {}", integer, real);

… and the same thing with std::format.

Interestingly, fmt::format only needed about 75%-80% of time of std::format in a release build, while the situation reversed for a debug build to about 106%-108%.

It seems hard to construct a benchmark with low overhead of other things, while still avoiding that the compiler can optimize everything away. My code assumes the compiler keeps the formatting even after throwing it away. So take all my results with a grain of salt!

Surviving the “Big One” in IT – Part I

For every kind of natural disaster, there is a “Big One”. Everybody who lived through it remembers the time, everybody else has at least heard stories about it. Every time a similar natural disaster occurs, it gets compared to this one.

We just remembered the “Boxing Day Tsunami” twenty years ago. Another example might be “The Big One”, the devastating earthquake of San Francisco in 1906. From today’s viewpoint, it wasn’t the strongest earthquake since, but it was one of the first to be extensively covered by “modern” media. It preceded the Richter scale, so we can’t directly compare it to current events.

In the rather young history of IT, we had our fair share of “natural” disasters as well. We used to give the really bad ones nicknames. The first vulnerability that was equipped with a logo and its own domain was heartbleed in 2014, ten years ago.

Let’s name-drop some big incidents:

The first entry in this list is different from the others in that it was a “near miss”. It would have been a vertitable catastrophe with millions of potentially breached and compromised systems. It just got discovered and averted right before it would have been distributed worldwide.

Another thing we can deduce from the list is the number of incidents per year:

https://www.cve.org/about/Metrics

From around 5k published vulnerabilities per year until 2014 (roughly one every two hours) it rose to 20k in 2021 and 30k in 2024. That’s 80 reports per day or 4 per hour. A single human cannot keep up with these numbers. We need to rely on filters that block out the noise and highlight the relevant issues for us.

But let’s assume that the next “Big One” happens and attains our attention. There is one common characteristic for all incidents I witnessed that is similar to earthquakes or floods: It happens everywhere at once. Let me describe the situation at the example of Log4Shell:

The first reports indicated a major vulnerability in the log4j package. That seemed bad, but it was a logging module, what could possibly happen? We could lose the log files?

It soon became clear that the vulnerability can be used from a distance by just sending over a malicious request that gets logged. Like a web request without proper authentication to a route that doesn’t exist. That’s exactly what logging is for: Capturing the outliers and preserving them for review.

Right at the moment that it dawned on us that every system with any remote accessibility was at risk, the first reports of automated attacks emerged. It was now friday late evening, the weekend just started and you realized that you are in a race against bots. The last thing you can do is call it a week and relax for 2 days. In these 48 hours, the war is lost and the systems are compromised. You know that you have at most 4 hours to:

  • Gather a list of affected projects/systems
  • Assess the realistic risk based on current knowledge
  • Hand over concrete advice to the system’s admins
  • Or employ the countermeasures yourself

In our case, that meant to review nearly 50 projects, document the decision and communicate with the operators.

While we did that, during friday night, new information occurred that not only log4j 2.x, but also 1.x was susceptible to similar attacks.

We had to review our list and decisions based on the new situation. While we were doing that, somebody on the internet refuted the claim and proclaimed the 1.x versions safe.

We had to split our investigation into two scenarios that both got documented:

  • scenario 1: Only log4j 2.x is affected
  • scenario 2: All versions of log4j are vulnerable

We employed actions based on scenario 1 and held our breath that scenario 2 wouldn’t come true.

One system with log4j 1.x was deemed “low impact” if down, so we took it off the net as a precaution. Spoiler: scenario 2 was not true, so this was an unnecessary step in hindsight. But in the moment, it was one problem off the list, regardless of scenario validity.

The thing to recognize here is that the engagement with the subject is not linear and not fixed. The scope and details of the problem change while you work on it. Uncertainties arise and need to be taken into account. When you look back on your work, you’ll notice all the unnecessary actions that you did. They didn’t appear unnecessary in the moment or at least you weren’t sure.

After we completed our system review and had carried out all the necessary actions, we switched to “survey and communicate” mode. We monitored the internet talk about the vulnerability and stayed in contact with the admins that were online. I remember an e-mail from an admin that copied some excerpts from the server logfiles with the caption: “The attacks are here!”.

And that was the moment my heart sank, because we had totally forgotten about the second front: Our own systems!

Every e-mail is processed by our mailing infrastructure and one piece of it is the mail archive. And this system is written in Java. I raced to gather insights what specific libraries are used in it. Because if a log4j 2.x library were included, the friendly admin would have just inadvertently performed a real attack on our infrastructure.

A few minutes after I finished my review (and found a log4j 1.x library), the producer of the product sent an e-mail, validating my result by saying that the product is not at risk. But those 30 minutes of uncertainty were pure panic!

In case of an airplane emergency, they always tell you to make sure you are stable first (i.e. place your own oxygen mask first). The same thing can be said about IT vulnerabilities: Mind your own systems first! We would have secured our client’s systems and then fallen prey to friendly fire if the mail archive would have been vulnerable.

Let’s re-iterate the situation we will find ourselves in when the next “Big One” hits:

  • We need to compile a list of affected instances, both under our direct control (our own systems) and under our ministration.
  • We need to assess the impact of immediate shutdown. If feasible, we should take as many systems as possible out of the equation by stopping or airgapping them.
  • We need to evaluate the risk of each instance in relation to the vulnerability. These evaluations need to be prioritized and timeboxed, because they need to be performed as fast as possible.
  • We need to document our findings (for later revision) and communicate the decision or recommendation with the operators.

This situation is remarkably similar to real-world disaster mitigation:

  • The lists of instances are disaster plans
  • The shutdowns are like evacuations
  • The risk evaluation is essentially a triage task
  • The documentation and delegation phase is the command and control phase of disaster relief crews

This helps a lot to see which elements can be prepared beforehands!

The disaster plans are the most obvious element that can be constructed during quiet times. Because no disaster occurs according to plan and plans tend to get outdated quickly, they need to be intentionally fuzzy on some details.

The evacuation itself cannot be fully prepared, but it can be facilitated by plans and automation.

The triage cannot be prepared either, but supported by checklists and training.

The documentation and communication can be somewhat formalized, but will probably happen in a chaotic and unpredictable manner.

With this insight, we can look at possible ideas for preparation and planning in the next part of this blog series.

String Representation and Comparisons

Strings are a fundamental data type in programming, and their internal representation has a significant impact on performance, memory usage, and the behavior of comparisons. This article delves into the representation of strings in different programming languages and explains the mechanics of string comparison.

String Representation

In programming languages, such as Java and Python, strings are immutable. To optimize performance in string handling, techniques like string pools are used. Let’s explore this concept further.

String Pool

A string pool is a memory management technique that reduces redundancy and saves memory by reusing immutable string instances. Java is a well-known language that employs a string pool for string literals.

In Java, string literals are automatically “interned” and stored in a string pool managed by the JVM. When a string literal is created, the JVM checks the pool for an existing equivalent string:

  • If found, the existing reference is reused.
  • If not, a new string is added to the pool.

This ensures that identical string literals share the same memory location, reducing memory usage and enhancing performance.

Python also supports the concept of string interning, but unlike Java, it does not intern every string literal. Python supports string interning for certain strings, such as identifiers, small immutable strings, or strings composed of ASCII letters and numbers.

String Comparisons

Let’s take a closer look at how string comparisons work in Java and other languages.

Comparisons in Java

In this example, we compare three strings with the content “hello”. While the first comparison return true, the second does not. What’s happening here?

String s1 = "hello";
String s2 = "hello";
String s3 = new String("hello");

System.out.println(s1 == s2); // true
System.out.println(s1 == s3); // false

In Java, the == operator compares references, not content.

First Comparison (s1 == s2): Both s1 and s2 reference the same object in the string pool, so the comparison returns true.

Second Comparison (s1 == s3): s3 is created using new String(), which allocates a new object in heap memory. By default, this object is not added to the string pool, so the object reference is unequal and the comparison returns false.

You can explicitly add a string to the pool using the intern() method:

String s1 = "hello";
String s2 = new String("hello").intern();

System.out.println(s1 == s2); // true

To compare the content of strings in Java, use the equals() method:

String s1 = "hello";
String s2 = "hello";
String s3 = new String("hello");

System.out.println(s1.equals(s2)); // true
System.out.println(s1.equals(s3)); // true
Comparisons in Other Languages

Some languages, such as Python and JavaScript, use == to compare content, but this behavior may differ in other languages. Developers should always verify how string comparison operates in their specific programming language.

s1 = "hello"
s2 = "hello"
s3 = "".join(["h", "e", "l", "l", "o"])

print(s1 == s2)  # True
print(s1 == s3)  # True

print(s1 is s2)  # True
print(s1 is s3)  # False

In Python, the is operator is used to compare object references. In the example, s1 is s3 returns False because the join() method creates a new string object.

Conclusion

Different approaches to string representation reflect trade-offs between simplicity, performance, and memory efficiency. Each programming language implements string comparison differently, requiring developers to understand the specific behavior before relying on it. For example, some languages differentiate between reference and content comparison, while others abstract these details for simplicity. Languages like Rust, which lack a default string pool, emphasize explicit memory management through ownership and borrowing mechanisms. Languages with string pools (e.g., Java) prioritize runtime optimizations. Being aware of these nuances is essential for writing efficient, bug-free code and making informed design choices.

Python desktop applications with asynchronous features (part 1)

The Python world has this peculiar characteristic that for nearly all ideas that exist out there in the current programming world, there is a prepared solution that appears well-rounded at first and when you then try to “just plug it together”, after a certain while you encounter some most specific cases that make all the “just” go up in smoke and leave you with some hours of research.

That is also, why now, for quite some of these “solutions” there seem to be various similar-but-distinct packages that you better care to understand; which is again, hard, because every Python developer always likes to advertise their package with very easy sounding words.

But that is the fun of Python. If I choose it as suitable for a project, this is the contract I sign 🙂

So I recently ran into a surprisingly non-straightforward case with no simple go-to-solution. Maybe you know one and then I’ll be thankful to try that and discuss it through; sometimes one is just blind from all the options. (It’s also what you sign when choosing Python. I cope.)

Now, I thought a lot about my use case and I will split these findings up into multiple blog posts – thinking asynchronously (“async”) is tricky in many languages, but the Python way, again, is to hide its intricacies in very carefully selected places.

A desktop app with TPC socket HANDLING

As long as your program is a linear script, you do not need async features. But if there is some thing to be done for a longer time (or endless), you cannot afford to wait for it or otherwise intervene.

E.g for our desktop application (employing Python’s very basic tkinter) we sooner or later run into the tk.mainloop() , which, from the OS thread it was called from, is a endless loop that draws the interface, handles input events, updates the interface, repeat. This is blocking, i.e. that thread can now only do other stuff from the event handlers acting on that interface.

You might know: Any desktop UI framework really hates if you try to update its interface from “outside the UI thread”. Just think of any order of unpredictable behaviour if you would e.g. draw a button, then handle its click event and while you’re at it, a background thread removes that button – etc.

The good thing is, you will quickly be told not to do such a thing, the bad thing is, you might end up with an unexpected or difficult to parse error and some question marks about what else to do.

The UI-thread problem is a specific case of “doing stuff in parallel requires you to really manage who can actually access a given resource”. Just google race condition. If you think about it, this holds for managing projects / humans in general, but also for our desktop app that allows simple network connections via TCP socket.

Now the first clarifications have to be done. Python gives you

  • on any tkinter widget, you have “.after()” to call any function some time in the future, i.e. you enqueue that function call to be executed after a time has passed; so this is nice for making stuff happen in the UI thread.
  • But even small stuff like writing some characters to a file might delay the interface reaction time and people nowadays have no time for that (maybe I’m just particularly impatient.)
  • Python’s standard library gives us packages like threading, asyncio and multiprocessing package, now for long enough to consider them mature.
  • There are also more advanced solutions, like looking into PyQt, or the mindset “everything is a Web App nowadays”, and they might equip you with asynchronous handling from the start, but remember – The Schneiderei in Softwareschneiderei means that we prefer tailor-made software over having to integrating bloated dependencies that we neither know nor want to maintain all year long.

We conclude with shining our light to the general choice in this project, and a few reasons why.

  1. What Python calls “threads” are not the same thing as what operating systems understands as a thread (which also differs) among them. See also the Python Global Interpreter Lock. We do not want to dive into all of that.
  2. multiprocessing is the only way to do anything outside the current sytem process, which is what you need for running CPU-heavy tasks in parallel. It’s out our use case, and while it is more “true parallel” it comes with slower startup, more expensive communication costs and also some constraints for such exchange (e.g. to serializable data).
  3. We are IO-bound, i.e. we have no idea when the next network package arrives, so we want to wait in separate event loops that would be blocking on their own (i.e. “while True”, similar to what tkinter does with our main thread.).
  4. Because we have the main thread stolen by tkinter anyway, we would use a threading.Thread anyway to allow concurrency, but then we face the choice to construct everything with Threads or to employ the newer asyncio features.

The basic block to do that would use a threading.Thread:

class ListenerThread(threading.Thread):

    # initialize somehow

    def run(self):
        while True:
            wait_for_input()
            process_input()
            communicate_with_ui_thread()


# usage like
#   listener = ListenerThread(...) 
#   listener.start(...)

Now to think naively, the pseudo-code communicate_with_ui_thread() could then either lead to the tkinter .after() calls, or employ callback functions passed either in the initialization or the .start() call of the thread. But sadly, it was not as easy as that, because you run several risks of

  • just masking your intention and still executing your callbacks blockingly (that thread can freeze the UI)
  • still pass UI widget references to your background loop (throwing bad not-the-main-thread-error exceptions)
  • have memory leaks in the background gobble up more than you ever wanted
  • Lock starvation: bad error telling you RuntimeError: can't allocate lock
  • deadlocks, by interdependent callbacks

This list is surely not exhaustive.

So what are the options? Let me discuss these in an upcoming post. For now, let’s just linger all these complications in your brain for a while.

Trait-queries for my C++ dependency injection container

This posts builds upon my previous posts on my C++ dependency-injection container: Automated instance construction in C++, Improved automated instance construction in C++ and Even better automated instance construction in C++. I was actually quite happy with the version from the last post and didn’t really touch the implementation for a good long while. But lately, I identified a few related requirements that could be solved elegantly by an extension to the container, so I decided to give it a go.

New Requirements

  1. Sometimes I need to get services from the DI just to create them. They would then register themselves with an event bus or some other system. I would not really call into them actively, and therefore I did not need access to the instances created. This could previously be done via something like (void)provider.get<my_autonomous_system>(), after all the services were registered. That works, but doesn’t scale up very well once you have a few of those. It would be much better to have something like provider.instantiate_all_autonomous_systems().
  2. Some groups of systems I would instantiate and keep around just to call them in a totally homogeneous way, like system_one.update(), system_two.update(), etc.. Again it would be better to not require the concrete types at the call site and instead just get the requested systems and call their update() in a loop.

Query Interface

It turns out that both requirements can be solved by requesting instances for “a group” of registered services. In the case of the first requirement, that’s actually all that is needed, but for the second requirement, the instances also need to be processed in some way, e.g. upcasting or other forms of type-erasure. Here’s how I wanted it to look:

di di;
di.insert_unique<actual_update_thing_one>().trait<update_trait>();
di.insert_unique<actual_update_thing_two>().trait<update_trait>();

auto updaters = di.query_trait<update_trait>();
for (auto const& each : updaters)
  each->update();

After registration with the DI, types can be marked with one or many traits, which can later be queried. For this example, the trait looks like this:

struct update_trait
{
  using type = update_service*;
  static update_service* type_erase(update_service* x)
  {
    return x;
  }
};

It really just does an upcast to update_service, which is derived-from by both of the types. But it would be equally possible to use std::function<> in case the types are only compatible via duck-typing:

struct update_trait
{
  using type = std::function<void()>;
  template <class T> static std::function<void()> type_erase(T* x)
  {
    return [x]
    {
      x->update();
    };
  }
};

Of course, that changes the final loop in the example to:

for (auto const& each : updaters)
  each();

So a traits type needs to contain a type-alias for the target type and a function to process the instance pointer into that target type, be it by wrapping it in some sort of adaptor or via upcasting. They type is separate, and not the return type of the function, because it has to be independent of the instance type that goes in, while the function can be a template and thus have different return (which is fine if they all convert to the target type).

Implementation

When you add a trait for a type T via the .trait<Trait>() template, I register a what I call a ‘resolver’, which is just a std::function<typename Trait::type()> that invokes Trait::type_erase(get_ptr<T>()). These are all put into a std::vector<>:

template <typename Trait> using trait_resolvers =
  std::vector<std::function<typename Trait::type()>>;

For all the traits, these are stored in an std::unordered_map<std::type_index, std::any> where the key is typeid(Trait).

On query_trait<Trait>, I look into that map, get the trait_resolvers<Trait> out of it, and call each resolver to fill a new std::vector<typename Trait::type>, which is then returned and can be iterated by the user.

This implementation maps better to to the second use-case, but the first can be done with bogus type_erase function in the trait like this:

struct auto_create_trait
{
  using type = int;
  template <class T>
  static int type_erase(T* x)
  {
    return 0;
  }
};

This creates an std::vector<int> that isn’t needed, which is not ideal but not a deal-breaker either. On the other hand, it is not too hard to properly support void as the type with just two if constexpr (std::is_same_v<typename Traits::type, void>), one in the resolver lambda that omits the type_erase call and one in query_trait that omits storing the resolver result. This way, I can also use [[nodiscard]] on query_trait, and the trait can be written as just struct auto_create_trait { using type = void; };.

Dependent Subqueries and LATERAL in PostgreSQL

When working with databases, you often need to run a query that depends on results from another query. PostgreSQL offers two main ways to handle this: Dependent Subqueries and LATERAL joins.

A dependent subquery is like a mini-query inside another query. The inner query (subquery) depends on the outer query for its input. For every row in the outer query, the subquery runs separately.

Imagine you have two tables: customers, which holds information about customers (e.g., their id and name), and orders, which holds information about orders (e.g., customer_id, order_date, and amount). Now, you want to find the latest order date for each customer. You can use a dependent subquery like this:

SELECT id AS customer_id,
  name,
  (SELECT order_date
     FROM orders
    WHERE customer_id=customers.id
    ORDER BY order_date DESC
    LIMIT 1) AS latest_order_date
FROM customers;

For each customer in the customers table, the subquery looks for their orders in the orders table. The subquery sorts the orders by date (ORDER BY order_date DESC) and picks the most recent one (LIMIT 1).

This works, but it has a drawback: If you have many customers, this approach can be slow because the subquery runs once for every customer.

LATERAL join

A LATERAL join is a smarter way to solve the same problem. It allows you to write a subquery that depends on the outer query, but instead of repeating the subquery for every row, PostgreSQL handles it more efficiently.

Let’s solve the “latest order date” problem using LATERAL:

SELECT
  c.id AS customer_id,
  c.name,
  o.order_date AS latest_order_date
FROM customers c
LEFT JOIN LATERAL (
  SELECT order_date
   FROM orders
  WHERE orders.customer_id=c.id
  ORDER BY order_date DESC
  LIMIT 1
) o ON TRUE;

For each customer (c.id), the LATERAL subquery finds the latest order in the orders table. The LEFT JOIN ensures that customers with no orders are included in the results, with NULL for the latest_order_date.

It’s easier to read and often faster, especially for large datasets, because PostgreSQL optimizes it better.

Both dependent subqueries and LATERAL joins allow you to handle scenarios where one query depends on the results of another. Dependent subqueries are straightforward and good enough for simple tasks with small datasets. However, you should consider using LATERAL for large datasets where performance matters.

Working with JSON-DOM mapping in EntityFramework and PostgreSQL

A while ago, one of my colleagues covered JSON usage in PostgreSQL on the database level in two interesting blog posts (“Working with JSON data in PostgreSQL” and “JSON as a table in PostgreSQL 17”).

Today, I want to show the usage of JSON in EntityFramework with PostgreSQL as the database. We have an event sourcing application similar to the one in my colleagues first blog post written in C#/AspNetCore using EntityFramework Core (EF Core). Fortunately, EF Core and the PostgreSQL database driver have relatively easy to use JSON support.

You have essentially three options when working with JSON data and EF Core:

  1. Simple string
  2. EF owned entities
  3. System.Text.Json DOM types

Our event sourcing use case requires query support on the JSON data and the data has no stable and fixed schema, so the first two options are not really appealing. For more information on them, see the npgsql documentation.

Let us have a deeper look at the third option which suits our event sourcing use-case best.

Setup

The setup is ultra-simple. Just declare the relevant properties in your entities as JsonDocument and make them disposable:

public class Event : IDisposable
{
    public long Id { get; set; }

    public DateTime Date { get; set; }
    public string Type { get; set; }
    public JsonDocument Data { get; set; }
    public string Username { get; set; }
    public void Dispose() => Json?.Dispose();
}

Using dotnet ef migrate EventJsonSupport should generate changes for the database migrations and the database context. Now we are good to start querying and deserializing our JSON data.

Saving our events to the database does not require additional changes!

Writing queries using JSON properties

With this setup we can use JSON properties in our LINQ database queries like this:

var eventsForId = db.Events.Where(ev =>
  ev.Data.RootElement.GetProperty("payload").GetProperty("id").GetInt64() == id
)
.ToList();

Deserializing the JSON data

Now, that our entities contain JsonDocument (or JsonElement) properties, we can of course use the System.Text.Json API to create our own domain objects from the JSON data as we need it:

var eventData = event.Data.RootElement.GetProperty("payload");
return new HistoryEntry
{
    Timestamp = eventData.Date,
    Action = new Action
    {
        Id = eventData.GetProperty("id").GetInt64(),
        Action = eventData.GetProperty("action").GetString(),
    },
    Username = eventData.Username,
};

We could for example deserialize different domain object depending on the event type or deal with evolution of our JSON data over time to accomodate new features or refactorings on the data side.

Conclusion

Working with JSON data inside a classical application using an ORM and a relational database has become suprisingly easy and efficient. The times of fragile full-text queries using LIKE or similar stuff to find your data are over!

Every Unit Test Is a Stage Play – Part V

In this series about describing unit tests with the metaphor of a stage play that tells short stories about your system, we already published four parts:

Today, we look at the critics.

An integral part of the theater experience is the appraisal of the critics. A good review of a stage play can multiply the viewer count manyfold, while a bad review can make avid visitors hesitate or even omit the visit.

In our world of source code and unit tests, we can define the whole team of developers as critics. If they aren’t fond of the tests, they will neglect or even abandon them. Tests need to prove their worth in order to survive.

Let us think a little deeper about this aspect: Test code is evaluated more critically than any other source code! Normal production code can always claim to be wanted by the customer. No matter how bad the production code may look and feel like, it cannot just be deleted. Somebody would notice and complain.

Test code is not wanted by the customer. You can delete a test and it would not be noticed until a regression bug raises the question why the failing functionality wasn’t secured by a test. So in order to survive, test code needs a stakeholder inside the development team. Nobody outside the team cares about the test.

There is another difference between production code and test code: Production code is inherently silent during development. In contrast to this, test code is programmed to drive the developer’s attention to it in case of a crisis. It is code that tries to steal your focus and cries wolf. It is the messenger that delivers the bad news.

Test code is the code you’ll likely read in a state of irritation or annoyance.

Think about a theater critic that visits and rates a stage play in a state of irritation and annoyance. That wanted to do something else instead and probably has a deadline to meet for that other thing. His opinion is probably biased towards a scathing critique.

We talked about several things that test code can do to be inviting, concise, comprehendible and plausible. What it can’t do is to be entertaining. Test code is inherently boring. Every test is a short story that seems trivial when seen in isolation. We can probably anticipate the critique about such a play: “it meant well, but was ultimately forgettable”.

What can we do to make test code more meaningful? To convey its impact and significance to the critics?

In the world of theater (and even more so: movies), one strategy is to add “big names” to the production: “From the director of Known Masterpiece” or “Part III of the Successful Series”.

Another strategy is to embellish oneself with other critiques (hopefully good ones): “Nominated for X awards” or “Praised by Grumpy Critic”.

Let’s translate these two strategies into the world of unit tests:

Strategy 1: Borrow a stakeholder by linking to the requirement

I stated above that test code has no direct stakeholder. That’s correct for the code itself, but not for its motivation to exist. We don’t write unit tests just to have them. We write them because we want to assert that some functionality is present or some bug is absent. In both cases, we probably have a ticket that describes the required change in depth. We can add the “big name” of the ticket to the test by adding its number or a full url as a comment to the test:

/**
 * #Requirement http://issuetracker/TICKET-3096
 */
@Test
public void understands_iso8601_timestamp() {
    final LocalDateTime actual = SomeController.dateTimeFrom(
        "2023-05-24T17:30:20"
    );
    assertThat(
        actual
    ).isEqualTo(
        "2023-05-24T17:30:20"
    );
}

The detail of interest is the comment above the test method. It explains the motivation behind authoring the test. The first word (#Requirement) indicates that this is a new feature that got commissioned by the customer. If it was a bugfix test instead, the first word would be #Bugfix. In both cases, we tell future developers that this test has a meaning in the context of the linked ticket. It isn’t some random test that annoys them, it is the guard for a specific use case of the system.

Strategy 2: Gather visible awards for previous achievements

Once you get used to the accompanying comment to a test method, you can see it as some kind of billboard that displays the merit of the test. Why not display the heroic deeds of the test, too? I’ve blogged about the idea a decade ago, so this is just a quick recap:

/**
 * #Requirement http://issuetracker/TICKET-3096
 * @lifesaver by dsl
 * @regression by xyz
 */
@Test
public void understands_iso8601_timestamp() {
    /* omitted test code */
}

Every time a test does something positive for you, give it a medal! You can add it right below the ticket link and document for everybody to see that this test has earned its place in the code base. Of course, you can also document your frustrating encounters with a specific test in the same way. Over time, the bad tests will exhibit several negative awards, while your best tests will have several lifesaver medals (the highest distinction a test can achieve).

So, to wrap up this part of the metaphor: Pacify the inevitable critics of your test code by not only giving them pleasant code to look at but also context information about why this code exists and why they should listen to it if it happens to have grabbed their attention, even with bad news.

Epilogue

This is the fifth part of a series. All parts are linked below:

Why Java’s built-in hash functions are unsuitable for password hashing

Passwords are one of the most sensitive pieces of information handled by applications. Hashing them before storage ensures they remain protected, even if the database is compromised. However, not all hashing algorithms are designed for password security. Java’s built-in hashing mechanisms used e.g. by HashMap, are optimized for performance—not security.

In this post, we will explore the differences between general-purpose and cryptographic hash functions and explain why the latter should always be used for passwords.

Java’s built-in hashing algorithms

Java provides a hashCode() method for most objects, including strings, which is commonly used in data structures like HashMap and HashSet. For instance, the hashCode() implementation for String uses a simple algorithm:

public int hashCode() {
    int h = 0;
    for (int i = 0; i < value.length; i++) {
        h = 31 * h + value[i];
    }
    return h;
}

This method calculates a 32-bit integer hash by combining each character in the string with the multiplier 31. The goal is to produce hash values for efficient lookups.

This simplicity makes hashCode() extremely efficient for its primary use case—managing hash-based collections. Its deterministic nature ensures that identical inputs always produce the same hash, which is essential for consistent object comparisons. Additionally, it provides decent distribution across hash table buckets, minimizing performance bottlenecks caused by collisions.

However, the same features that make the functions ideal for collections are also its greatest weaknesses when applied to password security. Because it’s fast, an attacker could quickly compute the hash for any potential password and compare it to a leaked hash. Furthermore, it’s 32-bit output space is too small for secure applications and lead to frequent collisions. For example:

System.out.println("Aa".hashCode()); // 2112
System.out.println("BB".hashCode()); // 2112

The lack of randomness (such as salting) and security-focused features make hashCode() entirely unsuitable for protecting passwords. You can manually add a random value before passing the string into the hash algorithm, but the small output space and high speed still make it possible to generate a lookup table quickly. It was never designed to handle adversarial scenarios like brute-force attacks, where attackers attempt billions of guesses per second.

Cryptographic hash algorithms

Cryptographic hash functions serve a completely different purpose. They are designed to provide security in the face of adversarial attacks, ensuring that data integrity and confidentiality are maintained. Examples include general-purpose cryptographic hashes like SHA-256 and password-specific algorithms like bcrypt, PBKDF2, and Argon2.

They produce fixed-length outputs (e.g., 256 bits for SHA-256) and are engineered to be computationally infeasible to reverse. This makes them ideal for securing passwords and other sensitive data. In addition, some cryptographic password-hashing libraries, such as bcrypt, incorporate salting automatically—a technique where a random value is added to the password before hashing. This ensures that even identical passwords produce different hash values, thwarting attacks that rely on precomputed hashes (rainbow tables).

Another critical feature is key stretching, where the hashing process is deliberately slowed down by performing many iterations. For example, bcrypt and PBKDF2 allow developers to configure the number of iterations, making brute-force attacks significantly more expensive in terms of time and computational resources.

Conclusion

Java’s built-in hash functions, such as hashCode(), are designed for speed, efficiency, and consistent behavior in hash-based collections. They are fast, deterministic, and effective at spreading values evenly across buckets.

On the other hand, cryptographic hash algorithms are purpose-built for security. They prioritize irreversibility, randomness, and computational cost, all of which are essential for protecting passwords against modern attack vectors.

Java’s hashCode() is an excellent tool for managing hash-based collections, but it was never intended for the high-stakes realm of password security.

A few more heuristics for rejecting Merges

Since a few weeks ago, I am trying to find a few easy things to look for when facing a Merge Request (also called “Pull Request”) that is too large to be quickly accepted.

When facing a larger Merge Request, how can one rather quickly decide whether it is worth going through all changes in one session, or to decide that this is too dangerous and reject.

These thoughts apply for a medium-sized repository – I am of the opinion that if you happen to work in a large project, or contribute to a public open-source repository, one should never even aim for larger merge requests, i.e. they should be rejected if there is more than one reason any code changed in that MR.

Being too strict just for the sake of it, in my eyes, can be a costly mistake – You waste your time in unnecessary structure and, in earlier / more experimental development stages, you might not want to take the drive out of a project. Nevertheless, maintainers need to know what’s going on.

Last time, I kept two main thoughts open, and I want to discuss these here, especially since they now had time to flourish a while in the tasty marinade that is my brain.

Can you describe the changes in one sentence?

I want my code to change for a multitude of reasons, but I want to know which kind of “glasses” I read these changes with. For me, it is a lesser problem to go through many changes if I can assign them to the same “why”. I.e. introducting i18n might change many lines of codes, but as these happen for the same reason, they can be understood easily.

But if, for some reason, people decide to change the formatting (replace tabs with spaces or such shenanigans), you better make sure that this the only reason any line changes. If there is any other thing someone did “as it just appeared easy” -reject the whole MR. That is no place for the “boy scout rule”, it is just too dangerous.

For me, it is too little to always apply the same type of glasses to any Merge Request there is. One could say “I only look for technical correctness”. But usually I can very well allow myself some flexibility there. I need to know, however, that all changes happened for only a few given reasons, because only then I can be sure that the developer did not actually loose track of his goal somewhere on the way.

Does this Merge increase the trust in a collaboration?

From a bird-eye point of view, people working together should always pay attention whether a given trajectory goes in the direction of increasing trust. Of course, if you fix a broken menu button in a user interface of a large project, the MR should just do that – but if you are in a smaller project with the intention of staying there, I suggest that every MR expresses exactly that: “I understand what is important at the current stage of this collaboration and do exactly that”.

Especially when working together for a longer time, it can be easy to let the branching discipline slip a little – things might have gone well for a longer time. But this is a fragile state, because if you then care too little about the boundaries of a specific MR, this can damage the trust all too easily.

In a customer project, this trust goes out to the customer. This might be the difference between “if something breaks they’ll write you a mail, you fix it, they are happy” and “they insist on a certain test coverage”.

Conclusion

So basically, reviewing the code of others boils down to writing own code, or improving User Experience, or managing anything – think not in a list of small-scale checklists, think in terms of “Cognitive Load”. A good programmer should have a large set of possible glasses (mindsets) through which they see code, especially foreign code. One should always be honest whether a given change is compatible with only a very small number of reasons. If there is a Merge Request that allows itself to do too much, this is not the Boy Scout Rule – it is a recipe of undermining mutual trust. Do not overestimate your own brain capacity. Reject the thing, and reserve that capacity for something useful.