Breaking WebGL Performance

A couple of weeks ago, I ported my game You Are Circle to the browser using Emscripten. Using the conan EMSDK toolchain, this was surprisingly easy to do. The largest engineering effort went into turning the “procedurally generate the level” background thread into a coroutine that I could execute in the main thread while showing the loading screen, since threads are not super well supported (and require enabling a beta feature on itch.io). I already had a renderer abstraction targeting OpenGL 4.1, which is roughly on feature parity with OpenGL ES 3.0, which is what you see WebGL2 as from Emscripten. And that just worked out of the box and things were fine for a while. Until they weren’t.

60FPS to 2FPS

Last week I finally released a new feature: breakable rocks. These are attached to the level walls and can be destroyed for power-ups. I tested this on my machine and everything seemed to be working fine. But some people soon started complaining about unplayable performance in the web build, in the range of 1-2 FPS, coming from a smooth 60 FPS on the same machine. So I took out my laptop and tested it there, and lo and behold, it was very slow indeed. On chrome, even the background music was stuttering. I did some other ‘optimization’ work in that patch, but after ruling that out as the culprit via bisection, I quickly narrowed it down to the rendering of the new breakable rocks.

The rocks are circled in red in this screenshot:

As you can see, everything is very low-poly. The rocks are rendered in two parts, a black background hiding the normal level background and the white outline. If I removed the background rendering, everything was fine (except for the ‘transparent’ rocks).

Now it’s important to know that the rendering is all but optimized at this point. I often use the most basic thing given my infrastructure that I can get away with until I see a problem. In this case, I have some thingies internally that let me draw in a pretty immediate mode way: Just upload the geometry to the GPU and render it. Every frame. At the moment, I do this with almost all the geometry, visible or not, every frame. That was fast enough, and makes it trivial to change what’s being rendered when the rock is broken. The white outline is actually more geometry generated by my line mesher than the rock-background. But that was not causing any catastrophic slow-downs, while the background was. So what was the difference? The line geometry was batched on the CPU, while I was issuing a separate draw-call for each of those rocks. To give some numbers: there were about 100 of those rocks, with each of with up to 11 triangles.

Suspecting the draw call overhead, I tried batching, e.g. merging, all the rock geometry into a single mesh and rendering it with a single draw call. That seemed to work well enough. And that is the version currently released.

Deep Dive

But the problem kept nagging at me after I released the fix. Yes, draw calls can have a lot of overhead, especially in Emscripten. But going from 60FPS to 2FPS still seemed pretty steep, and I did not fully understand why it was so extremely bad. After trying Firefox’s Gecko Profiler, which was recommended in the Emscripten docs, I finally got an idea what was causing the problem. The graphics thread was indeed very busy, and showing a lot of time in MaxForRange<>. That profiler is actually pretty cool, and you can jump directly into the Firefox source code from there to get an idea what’s going on.

Geometry is often specified via one level of indirection: The actual ‘per Point’- a.k.a. Vertex-Array and a list of indices pointing into that, with each triplet defining a triangle. This is a form of compression, and can also help the CPU avoid duplicate work by caching. But it also means that the indices can be invalid, e.g. there can be out-of-bounds indices. And browsers cannot allow that fore safety reasons, so they check the validity before actually issuing a rendering command on the GPU. MaxForRange<> is part of the machinery to do just that via its caller GetIndexedFetchMaxVert. It determines the max index of a section of an index buffer. When issuing a draw call, that max-index is checked against the size of the per-point-data to avoid out-of-range accesses.

This employs a caching scheme: For a given range in the indices, the result is cached, so it doesn’t have to be evaluated again for repeated calls on the same buffer. Also, I suspect to make this cache ‘hit’ more often, the max-index is first retrieved for the whole of the current index buffer, and only if that cannot guarantee valid access, is the subrange even checked. See the calls to GetIndexedFetchMaxVert in WebGLContext::DrawElementsInstanced. When something in the index list is changed from the user side, this cache is completely evicted.

The way that I stream my geometry data in my renderer is by using “big” (=4mb) per-frame buffers for vertex and index data that I gradually fill in some kind of emulated “immediate mode”. In the specific instance for the rocks, this looks like this:

for (auto& each : rock)
{
auto vertex_source = per_frame_buffer.transfer_vertex_data(each.vertex_data);
auto index_source = per_frame_buffer.transfer_index_data(each.index_data);
device.draw(vertex_source, index_source, ...);
}

The combination of all that turned out to be deadly for performance, and again shows why caching is one of the two hard things in IT. The code essentially becomes:

for (auto& each : rock)
{
auto vertex_source = per_frame_buffer.transfer_vertex_data(each.vertex_data);
invalidate_per_frame_index_buffer_cache();
auto index_source = per_frame_buffer.transfer_index_data(each.index_data);
fill_cache_again_for_the_whole_big_index_buffer();
device.draw(vertex_source, index_source, ...);
}

So for my 100 or so small rocks, the whole loop went through about 400mb of extra data per frame, or ~24gb per second. That’s quite something.

That also explains why merging the geometry helped, as it drastically reduced the amount of cache invalidations/refills. But now that the problem was understood, another option became apparent. Reorder the streamed buffer updates and draw calls, so that all the updates happen before all the draw calls.

Open questions

I am still not sure what the optimal way to stream geometry in WebGL is, but I suspect reordering the updates/draws and keeping the index buffer as small as possible might prove useful. So if you have any proven idea, I’d love to hear it.

I am also not entirely sure why I did not notice this catastrophic slow-down on my developer machine. I suspect it’s just because my CPU has big L2 and L3 caches that made the extra index scans very fast. I suspect I will see the performance problem in the profiler.

Digitalization is hard (especially in Germany)

Digitalization in this context means transforming traditional paper based processes to computer-based digital ones. For existing organisations and administrations – both private and public – such a transformation requires a lot of thought, work and time.

There are mostly functioning albeit sometimes inefficient processes in place providing services that do not allow interruptions or unavailabilities for extensive periods of time. That means the transition has to be as smooth as possible often requiring running multiple solutions in parallel or providing several ingestion methods and output formats.

Process evolution in general

Nevertheless I see a general pattern when business processes are transformed from traditional to fully digital:

I have observed and performed such transformations both privately as a client or customer and professionally implementing or supporting them.

Status quo

The current state in many organisations in Germany is “Digital Documents” and that is where it often stops. The processes themselves remain largely unchanged and opportunities and improvements remain lost.

Unfortunately this is the step where a lot of potential could be uncovered: Just by using proper collaboration tools one could assign assign tasks to specific people in a process associated to digital documents, track the progress and inform watchers. In many cases this results in much tighter processes, shorter resolution times and hugely improved documentation and traceability.

Going even further

The next step is where service providers like us are often brought to the table to extend, improve or replace the existing solution with custom- and purpose-build software to maximise efficiency, usability and power of the digital world.

Using general tools for certain processes and a certain time often shows the shortcomings and lets you destill a clearer picture of what is actually needed. Using that knowledge helps building better solutions.

Requirements for success

For this whole transformation to be successful one has to be very careful with the transition. It is seldom as easy as shutting down the old way ™ and firing up the new stuff.

Often we need to keep several ingestion points open – imaging snail mail, e-mail, texting, voice mail, web interface, app etc. as possible input media. At different points in the process several people may want to use their own way of interating with the process/documents/associated people. In the end the output may still be a paper document or a digital document as the end artifact. But maybe in addition other output like digital certificates, codes or tokens may benefit the whole experience and process.

So imho the key besides digitalisation and a good process analysis is keeping the process flexible and approachable using different means.

Some examples we all know:

  • Paying at a store often offers cash, bank card, credit card and sometimes even instant payment systems like Paypal or Wero
  • Document management with tools like Paperless-ngx office allows ingestion by scan, e-mail, direct upload etc. in different formats like PDF, JPG, PNG and hybrid storage digitally and optionally in a filing cabinet using file folders.
  • Sick notices may be sent in using phone, e-mail, web forms, in-app and be delivered by the means the recipient likes most.

The possibilities are endless and the potential improvement of efficiency, speed and comfort is huge. Just look around you and you will begin to see a lot of processes that could easily be improve and cause many win-win situations for both, service providers and their clients.

“Keep in mind” code

Reading source code, especially those from other people (and that includes your past self of some months, too), is hard and needs practice. Source code is one of the rare forms of literature that is easier to write than to read. Yet it seems that code is only written once, but read multiple times during its lifetime. Every time we need to make a change to it, we need to read the whole block of code thoroughly and sift through the rest in order to find the relevant block.

This means that source code should be written with readability and understandability in mind. And in fact, I’ve never met a programmer that set out to write obscure code. We all want our code to be easy to read. And while I don’t have a silver bullet answer how this feature can be achieved, I’ve seen some patterns that are detrimental to the goal. I call them “keep in mind” code lines, because that is what you need to do:

You need to make a mental or physical note of some additional requirement that the source code imposes onto the reader, often without a discernable reason.

Let me make an obvious example:

while (true) {
    // some more code
}

This simple line of code requires the reader to make a note: There needs to be some construct that exits the forever loop in the “some more code” section or else the program wouldn’t work right anyway. We have two possibilities: We can interrupt our flow of reading and understanding, scan ahead looking for the exit structure and discard the mental note once we’ve found it. Or we can persist the note, continue with our reading and cross the note off once we read past the exit structure. Both reactions require additional effort from the reader.

If we choose the first possibility, we need to pause our mental model of the code that we’ve read and understood so far. This is equivalent to peeking some pages ahead in a riveting book, just to make sure the character doesn’t die in the current situation. It’s good for peak suspense, but we really don’t want that in our source code. Source code should be a rather boring type of literature. The stories we tell should fascinate on a higher level than “will this thread survive the method call”?

Recalling a paused mental model is always accompanied by some loss. We don’t notice it right away, but some aspect that we already knew goes missing and needs to be learnt again if relevant. In my opinion, that is a bad trade: My high-level model gets compromised because I need to follow a low-level distraction from one line of code.

If we choose the second possibility and make a written note on a piece of paper (or something equivalent), we might hold the mental model in place during the short interruption of writing the note. But we need to implement a recurring note checking mechanism into our reading process, because we shouldn’t forget about the notes.

There is a third possibility: Ignoring the danger. That would mean reading the code like letting the TV run in the background. You don’t really pay attention and the story just flows by. I don’t think that’s a worthwile way to engage with source code.

Let me try to define what a line of “keep in mind” code is: It is source code that cannot be understood without a forward reference further “down” the lines, but raises concerns or questions. It represents an open item on my “sorrow list”.

Another, less obvious example would be:

private final InputStream input;

Because InputStream is a resource (a Closeable) in java, it needs to be used in accordance to its lifecycle. Storing it into a member variable means that the enclosing object “inherits” the lifecycle management. If the enclosing object exposes the resource to the outside, it gets messy. All these unfortunate scenarios appear on my checklist as soon as I read the line above.

What can we do to avoid “keep in mind” lines? We can try to structure our source code not for writing, but for reading. The dangling mental reference of “I need to exit that while-true loop” is present even as the code is written. Once we notice that we keep a mental short-term list of open code structure tasks while programming, we can optimize it. Every code structure that doesn’t lead to some mandatory complementary work further down is one less thing to keep in mind while reading.

How would the example above produce less mental load while reading? Two options come to mind:

do {
    // some more code
} while (true);

This is essentially the same code, but the reader has seen all lines that exit the loop before being made aware that otherwise, it loops forever. The solution to the problem is already present when the problem presents itself.

Another option makes the exit structure explicit from the start:

boolean exitWhileLoop = false;
while (!exitWhileLoop) {
    // some more code
}

The exit structures in the “some more code” section should now use the flag “exitWhileLoop” if possible instead of breaking out directly. If necessary, a hearty “continue” statement at the right place omits the rest of the loop code. This option will lead to more code that is more verbose about the control flow. For the reader, that’s a good thing because the intent isn’t hidden between the lines anymore. If you as the code author think that your code gets clunky because of it, contemplate if the control flow structure is a good fit for the story you want to tell. Maybe you can simplify it, or you need to employ an even more complex structure because your story requires it.

In any case, try to avoid “keep in mind” lines. They burden your readers and make working with the code less pleasant. I have several more examples of such lines or structures, but wanted to keep this blog post short. Are you interested in more specific examples? Can you provide some example from your experience? Write a comment!

P.S.: I love the gibberish on the AI-generated checklist in the blog entry picture and wanted you to savor it, too.

Splitting a repository while preserving history

Monorepos or “collection repositories” tend to grow over time. At some point, a part of them deserves its own life: independent deployments, a dedicated team, or separate release cycles.

The tricky part is obvious: How do you split out a subproject without losing its Git history?

The answer is a powerful tool called git-filter-repo.

Step 1: Clone the Repository into a New Directory

Do not work in your existing checkout.
Instead, clone the repository into a fresh directory by running the following commands in Git Bash:

git clone ssh://git@github.com/project/Collection.git
cd Collection

We avoid working directly on origin and create a temporary branch:

git checkout -b split

This provides a safety net while rewriting history.

Step 2: Filter the Repository

Now comes the crucial step. Using git-filter-repo, we keep only the desired path and move it to the repository root.

python -m git_filter_repo \
  --path path/to/my/subproject/ \
  --path-rename path/to/my/subproject/: \
  --force

  • --path defines what should remain
  • --path-rename moves the directory to the repository root
  • --force is required because history is rewritten

After this step, the repository contains only the former subdirectory — with its full Git history intact.

Step 3: Push to new repository

Now point the repository to its new remote location:

git remote add origin ssh://git@github.com/project/NewProject.git

If an origin already exists, remove it first:

git remote remove origin

Rename the working branch to main:

git branch -m split main

Finally, push the rewritten history to the new repository:

git push -u origin main

That’s it — the new repository is ready, complete with a clean and meaningful history.

Conclusion

git-filter-repo makes it possible to split repositories precisely. Instead of copying files and losing context, you preserve history — which is invaluable for git blame, audits, and understanding how the code evolved.

When refactoring at repository level, history is not baggage. It’s documentation.

Happy splitting!

Nested queries like N+1 in practice: a 840-fold performance gain

Once every while, I remember a public software engineering talk, held by Daniel and visited by me (not only), at a time before I accidentally started working for the Softwareschneiderei for several years, which turned out splendid – but I digress – and while the topic of that talk was (clearly) a culinary excursion about… how onions are superior to spaghetti… one side discussion spawned that quote (originally attributed to Michael Jackson e.g. here)

You know the First Rule of Optimization?

… don’t.

You know the Second Rule?

… don’t do it YET.

I like that a lot, because while we’ve all been at places where we knew a certain algorithm to be theoretically sub-optimal, measures of optimal is also “will it make my customer happy in a given time” and “will our programming resources be used at the most effective questions at hand”.

And that is especially tricky in software in which a customer starts with a wish like “I want a certain view on my data that I never had before” (thinking of spreadsheets, anyone?) and because they do not know what they will learn from that, no specification can even be thought-out. Call it “research” or “just horsing around”, but they will be thankful if you have a robust software that does the job, and customers can be very forgiving about speed issues, especially when they accumulate slowly over time.

Now we all know what N+1 query problems are (see e.g. Frederik’s post from some weeks ago), and even without databases, we are generally wary about any nested loops. But given the circumstances, one might write specific queries where you do allow yourself some.

There it not only “I have no time”. Sometimes you can produce much more readable code by doing nested queries. It can make sense. I mean, LINQ and the likes have a telling appearance, e.g. one can read

var plansForOffer = db.PlansFor(week).Where(p => p.Offers.Contains(offer.Id));
if (plansForOffer.Any()) { 
  lastDay = plansForOffer.Max(p => p.ProposedDay);
}

surely quite well; but here alone do the .Any() and the .Max() loop over similar data needlesly, and probably the helper PlansFor(…) does something like it, and then run that in a loop over many “offers” and quickly, there goes your performance down the drain only because your customers have now 40 instead of 20 entities.

To cut a long story short – with given .NET and Entity Framework in place, and in the situation of now-the-customer-finds-it-odd-that-some-queries-take-seven-minutes-to-completion, I did some profiling. In software where there are few users and ideally one instance of the software on one dedicated machine, memory is not the issue. So I contrasted the readable, “agile” version with some queries at startup and converged at the following pattern

var allOffers = await db.Offers
    .Where(o => ... whatever interests you ... )
    .Include(o => ... whatever EF relations you need ...)
    .ToListAsync();
var allOfferIds = allOffsets
    .Select(o => o.Id)
    .ToArray();
var allPlans = await db.PlanningUnits
    .Where(p => p.Contains(offer.Id))
    .AsNoTracking()
    .ToListAsync();
var allPlanIds = allPlans
    .Select(p => p.Id)
    .ToArray();
var allAppointments = await db.Appointments
    .Where(a => allPlanIds.Contains(a.PlanId))
    .AsNoTracking()
    .ToListAsync();
var requiredSupervisors = allAppointments
    .Select(a => a.Supervisor)
    .Distinct()
    .ToArray();
var requiredRooms = await db.Rooms
    .Where(r => requiredSupervisors.Contains(r.SupervisorId)
    .AsNoTracking()
    .ToDictionaryAsync(r => r.Id);

// ...

return (
    allOffers
    .Select(o => CollectEverythingFromMemory(o, week, allOffers, allPlans, allAppointments, ...))
    .ToDictionary(o => o.Id);
);
    

So the patterns are

  • Collect every data pool that you will possibly query as completely as you need
  • Load it into memory as .ToList() and where you can – e.g. in a server application, use await … .ToListAsync() in order to ease the the request thread pool
  • With ID lists and subsequent .Contains(), the .ToArray() call is even enough as arrays come with less overhead – although this is less important here.
  • Use .ToDictionaryAsync() for constant-time lookups (although more modern .NET/EF versions might have advanced functions there, this is the more basic fallback that worked for me)
  • Also, note the .AsNoTracking() where you are querying the database only (if not mutating any of this data), so EF can save all the memory overhead.

With that pattern, all inner queries transform to quite efficient in-memory-filtering of appearance like

var lastDay = allPlans
    .Where(p => p.Week == week
        && allPlanIds.Contains(p.Id)
        && planOfferIds.Contains(offer.Id)
    )
    .Max(p => p.ProposedDay);

and while this greatly blew up function signatures, and also required some duplication because these inner functions are not as modular, as stand-alone anymore, our customers now enjoyed a reduction in query size of really

  • Before: ~ 7 Minutes
  • After: ~ half a second

i.e. a 840-fold increase in performance.

Conclusion: not regretting much

It is somewhat of a dilemma. The point of “optimization…? don’t!” in its formulation as First and Second Rule holds true. I would not have written the first version of the algorithm in that consolidated .ToList(), .ToArray()., ToDictionary() shape when still in a phase where the customer gets new ideas every few days. You will need code that is easier to change, and easier to reckon about.

By the way – cf. the source again – ,the Third Rule is “Profile Before Optimizing”. When dealing with performance, it’s always crucial to find the largest culprit first.

And then, know that there are general structures which make such queries efficient. I can apply these thoughts to other projects and probably do not have to finely dig into the exact details of other algorithms, I just need to make that trade-off above.

Creating functors with lambda factories in C++

Back before C++11, the recommended way to customize the behavior of an algorithm was to write a functor-struct, e.g. a small struct overloading operator(). E.g. for sorting by a specific key:

struct key_less {
  bool operator()(T const& lhs, T const& rhs) const { return lhs.key < rhs.key; }
};

// ... later
std::sort(c.begin(), c.end(), key_less{});

Of course, the same logic could be implemented with a free function, but that was often advised against because it was harder for the compiler to inline. Either way, if you had to supply some context variables, you were stuck with the verbose functor struct anyways. And it made it even more verbose. Something like this was common:

struct indirect_less {
  indirect_less(std::vector<int> const& indirection)
  : indirection_(indirection) {}

  bool operator()(T const& lhs, T const& rhs) const
  { return indirection_[lhs.key] < indirection_[rhs.key]; }

private:
  std::vector<int> const& indirection_;
};

That all changed, of course, when lambdas were added to the language, and you could write the same thing as:

auto indirect_less = [&](T const& lhs, T const& rhs)
  { return indirection[lhs.key] < indirection[rhs.key]; };

At least as long as the indirection std::vector<> is in the local scope. But what if you wanted to reuse some more complicated functors in different contexts? In that case, I often found myself reverting back to the old struct pattern. Until recently I discovered there’s a lot nicer way, and I ask myself how I missed that for so long: functor factories. E.g.

auto make_indirect_less(std::vector const& indirection) {
  return [&](T const& lhs, T const& rhs) { /* ... */ };
}

Much better than the struct! This has been possible since C++14’s return type deduction, so a pretty long time. Still, I do not think I have come across this pattern before. What about you?

Single-Use Webapps

One of our customers has the requirement to enter data into a large database while being out in the field, potentially without any internet connection. This is a diminishing problem with the availability of satellite-based internet access, but it can be solved in different ways, not just the obvious “make internet happen” way.

One way to solve the problem is to analyze the customer’s requirements and his degrees of freedom – the things he has some leeway over. The crucial functionality is the safe and correct digital entry of the data. It would suffice to use a pen and a paper or an excel sheet if the mere typing of data was the main point. But the data needs to be linked to existing business entities and has some business rules that need to be obeyed. Neither paper nor excel would warn the user if a business rule is violated by the new data. The warning or error would be delayed until the data needs to be copied over into the real system and then it would be too late to correct it. Any correction attempt needs to happen on site, on time.

One leeway part is the delay between the data recording and the transfer into the real system. Copying the data over might happen several days later, but because the data is exclusive to the geographical spot, there are no edit collisions to be feared. So it’s not a race for the central database, it’s more of an “eventual consistency” situation.

If you take those two dimensions into account, you might invent “single-use webapps”. These are self-contained HTML files that present a data entry page that is dynamic enough to provide interconnected selection lists and real-time data checks. It feels like they gathered their lists and checks from the real system, and that is exactly what they did. They just did it when the HTML file was generated and not when it is used locally in the browser. The entry page is prepared with current data from the central database, written to the file and then forgotten by the system. It has no live connection and no ability to update its lists. It only exists for one specific data recording at one specific geographical place. It even has a “best before” date baked into the code so that it gives a warning if the preparation date and the usage date are too distant.

Like any good data entry form, the single-use webapp presents a “save data” button to the user. In a live situation, this button would transfer the data to the central database system, checking its integrity and correctness on the way. In our case, the checks on the data are done (using the information level at page creation time) and then, a transfer file is written to the local disk. The transfer file is essentially just the payload of the request that would happen in the live situation. It gets stored to be transferred later, when the connection to the central system is available again.

And what happens to the generated HTML files? The user just deletes them after usage. They only serve one purpose: To create one transfer file for one specific data entry task, giving the user the comfort and safety of the real system while entering the data.

What would your solution of the problem look like?

Disclaimer: While the idea was demonstrated as a proof of concept, it was not put into practice by the customer yet. The appeal of “internet access anywhere on the planet” is undeniably bigger and has won the competition of solutions for now. We would have chosen alike. The single-use webapp provides comfort and ease-of-use, but ubiquitous connectivity to the central system tops all other solutions and doesn’t need an extra manual or unusual handling.

Calculation with infinite decimal expansion in Java

When dividing decimal numbers in Java, some values—like 1 divided by 3—result in an infinite decimal expansion. In this blog post, I’ll show how such a calculation behaves using BigDecimal and BigFraction.

BigDecimal

Since this cannot be represented exactly in memory, performing such a division with BigDecimal without specifying a rounding mode leads to an “java.lang.ArithmeticException: Non-terminating decimal expansion; no exact representable decimal result”. Even when using MathContext.UNLIMITED or an effectively unlimited scale, the same exception is thrown, because Java still cannot produce a finite result.

BigDecimal a = new BigDecimal("1");
BigDecimal b = new BigDecimal("3");
BigDecimal c = a.divide(b);

By providing a scale – not MathContext.UNLIMITED – and a rounding mode, Java can approximate the result instead of failing. However, this also means the value is no longer mathematically exact. As shown in the second example, multiplying the rounded result back can introduce small inaccuracies due to the approximation.

BigDecimal a = new BigDecimal("1");
BigDecimal b = new BigDecimal("3");
BigDecimal c = a.divide(b, 100, RoundingMode.HALF_UP); // 0.3333333...
BigDecimal a2 = c.multiply(b);  // 0.9999999...

When working with BigDecimal, it’s important to think carefully about the scale you actually need. Every additional decimal place increases both computation time and memory usage, because BigDecimal stores each digit and carries out arithmetic with arbitrary precision.

To illustrate this, here’s a small timing test for calculating 1/3 with different scales:

As you can see, increasing the scale significantly impacts performance. Choosing an unnecessarily high scale can slow down calculations and consume more memory without providing meaningful benefits. Always select a scale that balances precision requirements with efficiency.

However, as we’ve seen, decimal types like BigDecimal can only approximate many numbers when their fractional part is infinite or very long. Even with rounding modes, repeated calculations can introduce small inaccuracies.

But how can you perform calculations exactly if decimal representations can’t be stored with infinite precision?

BigFraction

To achieve truly exact calculations without losing precision, you can use fractional representations instead of decimal numbers. The BigFraction class from Apache Commons Numbers stores values as a numerator and denominator, allowing it to represent numbers like 1/3 precisely, without rounding.

import org.apache.commons.numbers.fraction.BigFraction;

BigFraction a = BigFraction.ONE;
BigFraction b = BigFraction.of(3);
BigFraction c = a.divide(b);    // 1 / 3
BigFraction a2 = c.multiply(b); // 1

In this example, dividing 1 by 3 produces the exact fraction 1/3, and multiplying it by 3 returns exactly 1. Since no decimal expansion is involved, all operations remain mathematically accurate, making BigFraction a suitable choice when exact arithmetic is required.

BigFraction and Decimals

But what happens if you want to create a BigFraction from an existing decimal number?

BigFraction fromDecimal = BigFraction.from(2172.455928961748633879781420765027);
fromLongDecimal.bigDecimalValue(); // 2172.45592896174866837100125849246978759765625 

At first glance, everything looks fine: you pass in a precise decimal value, BigFraction accepts it, and you get a fraction back. So far, so good. But if you look closely at the result, something unexpected happens—the number you get out is not the same as the one you put in. The difference is subtle, hiding far to the right of the decimal point—but it’s there.
And there’s a simple reason for it: the constructor takes a double.

A double cannot represent most decimal numbers exactly. The moment your decimal value is passed into BigFraction.from(double), it is already approximated by the binary floating-point format of double. BigFraction then captures that approximation perfectly, but the damage has already been done.

Even worse: BigFraction offers no alternative constructor that accepts a BigDecimal directly. So whenever you start from a decimal number instead of integer-based fractions, you inevitably lose precision before BigFraction even gets involved. What makes this especially frustrating is that BigFraction exists precisely to allow exact arithmetic.

Creating a BigFraction from a BigDecimal correctly

To preserve exactness when converting a BigDecimal to a BigFraction, you cannot rely on BigFraction.from(double). Instead, you can use the unscaled value and scale of the BigDecimal directly:

BigDecimal longNumber = new BigDecimal("2172.455928961748633879781420765027");
BigFraction fromLongNumber = BigFraction.of(
   longNumber.unscaledValue(),
   BigInteger.TEN.pow(longNumber.scale())
); // 2172455928961748633879781420765027 / 1000000000000000000000000000000

fromLongNumber.bigDecimalValue() // 2172.455928961748633879781420765027

This approach ensures the fraction exactly represents the BigDecimal, without any rounding or loss of precision.

BigDecimal longNumber = new BigDecimal("2196.329071038251366120218579234972");
BigFraction fromLongNumber = BigFraction.of(
   longNumber.unscaledValue(),
   BigInteger.TEN.pow(longNumber.scale())
); // 549082267759562841530054644808743 / 250000000000000000000000000000

fromLongNumber.bigDecimalValue() // 2196.329071038251366120218579234972

In this case, BigFraction automatically reduces the fraction to its simplest form, storing it as short as possible. Even though the original numerator and denominator may be huge, BigFraction divides out common factors to minimize their size while preserving exactness.

BigFraction and Performance

Performing fractional or rational calculations in this exact manner can quickly consume enormous amounts of time and memory, especially when many operations generate very large numerators and denominators. Exact arithmetic should only be used when truly necessary, and computations should be minimized to avoid performance issues. For a deeper discussion, see The Great Rational Explosion.

Conclusion

When working with numbers in Java, both BigDecimal and BigFraction have their strengths and limitations. BigDecimal allows precise decimal arithmetic up to a chosen scale, but it cannot represent numbers with infinite decimal expansions exactly, and high scales increase memory and computation time. BigFraction, on the other hand, can represent rational numbers exactly as fractions, preserving mathematical precision—but only if constructed carefully, for example from integer numerators and denominators or from a BigDecimal using its unscaled value and scale.

In all cases, it is crucial to be aware of these limitations and potential pitfalls. Understanding how each type stores and calculates numbers helps you make informed decisions and avoid subtle errors in your calculations.

Use duplication to make your single source of truth

Having a single source of truth is one of the big tenets of programming. It is easy to see why. If you want to figure out something about your program, or change something, you just go to the corresponding source.

One of the consequences of this is usually code duplication, but things can get a lot more complicated very fast, when you think of knowledge duplication or fragmentation, instead of just code. Quite unintuitively, duplication can actually help in this case.

Consider the case where you serialize an enum value, e.g. to a database or a file. Suddenly, you have two conceptual points that ‘know’ about the translation of your enum literals to a numeric or string value: The mapping in your code and the mapping implicitly stored in the serialization. None of these two points can be changed independently. Changing the serialized content means changing the source code and vice-versa.

You could still consider your initial enum to value mapping the single source of truth, but the problem is that you can easily miss disruptive changes. E.g. if you used the numeric value, just reordering the enumerated will break the serialization. If you used the text name of the enum, even a simple rename refactoring will break it.

So to deal with this, I often build my own single source of truth: a unit test that keeps track of such implicit value couplings. That way, the test can tell you when you are accidentally breaking things. Effectively, this means duplicating the knowledge of the mapping to a ‘safe’ space: One that must be deliberately changed, and resists accidentally being broken. And then that becomes my new single source of truth for that mapping.

Common SQL Performance Gotchas in Application Development

When building apps that use a SQL database, it’s easy to run into performance problems without noticing. Many of these issues come from the way queries are written and used in the code. Below are seven common SQL mistakes developers make, why they happen, and how you can avoid them.

Not Using Prepared Statements

One of the most common mistakes is building SQL queries by concatenating strings. This approach not only introduces the risk of SQL injection but also prevents the database from reusing execution plans. Prepared statements or parameterized queries let the database understand the structure of the query ahead of time, which improves performance and security. They also help avoid subtle bugs caused by incorrect string formatting or escaping.

// Vulnerable and inefficient
String userId = "42";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users WHERE id = " + userId);
// Safe and performant
String sql = "SELECT * FROM users WHERE id = ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 42);
ResultSet rs = ps.executeQuery();

The N+1 Query Problem

The N+1 problem happens when an application fetches a list of items and then runs a separate query for each item to retrieve related data. For example, fetching a list of users and then querying each user’s posts in a loop. This results in one query to fetch the users and N additional queries for their posts. The fix is to restructure the query using joins or batch-fetching strategies, so all the data can be retrieved in fewer queries.

We have written about it on our blog before: Understanding, identifying and fixing the N+1 query problem

Missing Indexes

When queries filter or join on columns that do not have indexes, the database may need to scan entire tables to find matching rows. This can be very slow, especially as data grows. Adding the right indexes can drastically improve performance. It’s important to monitor slow queries and check whether indexes exist on the columns used in WHERE clauses, JOINs, and ORDER BY clauses.

Here’s how to create an index on an “orders” table for its “customer_id” column:

CREATE INDEX idx_orders_customer_id ON orders(customer_id);

Once the index is added, the query can efficiently find matching rows without scanning the full table.

Retrieving Too Much Data

Using SELECT * to fetch all columns from a table is a common habit, but it often retrieves more data than the application needs. This can increase network load and memory usage. Similarly, not using pagination when retrieving large result sets can lead to long query times and a poor user experience. Always select only the necessary columns and use LIMIT or OFFSET clauses to manage result size.

For example:

String sql = "SELECT id, name, price FROM products LIMIT ? OFFSET ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 50);
ps.setInt(2, 0);
ResultSet rs = ps.executeQuery();

Chatty Database Interactions

Some applications make many small queries in a single request cycle, creating high overhead from repeated database access. Each round-trip to the database introduces latency. Here’s an inefficient example:

for (int id : productIds) {
    PreparedStatement ps = connection.prepareStatement(
        "UPDATE products SET price = price * 1.1 WHERE id = ?"
    );
    ps.setInt(1, id);
    ps.executeUpdate();
}

Instead of issuing separate queries, it’s often better to combine them or use batch operations where possible. This reduces the number of database interactions and improves overall throughput:

PreparedStatement ps = connection.prepareStatement(
    "UPDATE products SET price = price * 1.1 WHERE id = ?"
);

for (int id : productIds) {
    ps.setInt(1, id);
    ps.addBatch();
}
ps.executeBatch();

Improper Connection Pooling

Establishing a new connection to the database for every query or request is slow and resource-intensive. Connection pooling allows applications to reuse database connections, avoiding the cost of repeatedly opening and closing them. Applications that do not use pooling efficiently may suffer from connection exhaustion or high latency under load. To avoid this use a connection pooler and configure it with appropriate limits for the workload.

Unbounded Wildcard Searches

Using wildcard searches with patterns like '%term%' in a WHERE clause causes the database to scan the entire table, because indexes cannot be used effectively. These searches are expensive and scale poorly. To handle partial matches more efficiently, consider using full-text search features provided by the database, which are designed for fast text searching. Here’s an example in PosgreSQL:

SELECT * FROM articles
WHERE to_tsvector('english', title) @@ to_tsquery('database');

One of our previous blog posts dives deeper into this topic: Full-text Search with PostgreSQL.

By being mindful of these common pitfalls, you can write SQL that scales well and performs reliably under load. Good database performance isn’t just about writing correct queries – it’s about writing efficient ones.

Have you faced any of these problems before? Every project is different, and we all learn a lot from the challenges we run into. Feel free to share your experiences or tips in the comments. Your story could help someone else improve their app’s performance too.