Back before C++11, the recommended way to customize the behavior of an algorithm was to write a functor-struct, e.g. a small struct overloading operator(). E.g. for sorting by a specific key:
Of course, the same logic could be implemented with a free function, but that was often advised against because it was harder for the compiler to inline. Either way, if you had to supply some context variables, you were stuck with the verbose functor struct anyways. And it made it even more verbose. Something like this was common:
That all changed, of course, when lambdas were added to the language, and you could write the same thing as:
auto indirect_less = [&](T const& lhs, T const& rhs)
{ return indirection[lhs.key] < indirection[rhs.key]; };
At least as long as the indirection std::vector<> is in the local scope. But what if you wanted to reuse some more complicated functors in different contexts? In that case, I often found myself reverting back to the old struct pattern. Until recently I discovered there’s a lot nicer way, and I ask myself how I missed that for so long: functor factories. E.g.
auto make_indirect_less(std::vector const& indirection) {
return [&](T const& lhs, T const& rhs) { /* ... */ };
}
Much better than the struct! This has been possible since C++14’s return type deduction, so a pretty long time. Still, I do not think I have come across this pattern before. What about you?
One of our customers has the requirement to enter data into a large database while being out in the field, potentially without any internet connection. This is a diminishing problem with the availability of satellite-based internet access, but it can be solved in different ways, not just the obvious “make internet happen” way.
One way to solve the problem is to analyze the customer’s requirements and his degrees of freedom – the things he has some leeway over. The crucial functionality is the safe and correct digital entry of the data. It would suffice to use a pen and a paper or an excel sheet if the mere typing of data was the main point. But the data needs to be linked to existing business entities and has some business rules that need to be obeyed. Neither paper nor excel would warn the user if a business rule is violated by the new data. The warning or error would be delayed until the data needs to be copied over into the real system and then it would be too late to correct it. Any correction attempt needs to happen on site, on time.
One leeway part is the delay between the data recording and the transfer into the real system. Copying the data over might happen several days later, but because the data is exclusive to the geographical spot, there are no edit collisions to be feared. So it’s not a race for the central database, it’s more of an “eventual consistency” situation.
If you take those two dimensions into account, you might invent “single-use webapps”. These are self-contained HTML files that present a data entry page that is dynamic enough to provide interconnected selection lists and real-time data checks. It feels like they gathered their lists and checks from the real system, and that is exactly what they did. They just did it when the HTML file was generated and not when it is used locally in the browser. The entry page is prepared with current data from the central database, written to the file and then forgotten by the system. It has no live connection and no ability to update its lists. It only exists for one specific data recording at one specific geographical place. It even has a “best before” date baked into the code so that it gives a warning if the preparation date and the usage date are too distant.
Like any good data entry form, the single-use webapp presents a “save data” button to the user. In a live situation, this button would transfer the data to the central database system, checking its integrity and correctness on the way. In our case, the checks on the data are done (using the information level at page creation time) and then, a transfer file is written to the local disk. The transfer file is essentially just the payload of the request that would happen in the live situation. It gets stored to be transferred later, when the connection to the central system is available again.
And what happens to the generated HTML files? The user just deletes them after usage. They only serve one purpose: To create one transfer file for one specific data entry task, giving the user the comfort and safety of the real system while entering the data.
What would your solution of the problem look like?
Disclaimer: While the idea was demonstrated as a proof of concept, it was not put into practice by the customer yet. The appeal of “internet access anywhere on the planet” is undeniably bigger and has won the competition of solutions for now. We would have chosen alike. The single-use webapp provides comfort and ease-of-use, but ubiquitous connectivity to the central system tops all other solutions and doesn’t need an extra manual or unusual handling.
When dividing decimal numbers in Java, some values—like 1 divided by 3—result in an infinite decimal expansion. In this blog post, I’ll show how such a calculation behaves using BigDecimal and BigFraction.
BigDecimal
Since this cannot be represented exactly in memory, performing such a division with BigDecimal without specifying a rounding mode leads to an “java.lang.ArithmeticException: Non-terminating decimal expansion; no exact representable decimal result”. Even when using MathContext.UNLIMITED or an effectively unlimited scale, the same exception is thrown, because Java still cannot produce a finite result.
BigDecimal a = new BigDecimal("1");
BigDecimal b = new BigDecimal("3");
BigDecimal c = a.divide(b);
By providing a scale – not MathContext.UNLIMITED – and a rounding mode, Java can approximate the result instead of failing. However, this also means the value is no longer mathematically exact. As shown in the second example, multiplying the rounded result back can introduce small inaccuracies due to the approximation.
BigDecimal a = new BigDecimal("1");
BigDecimal b = new BigDecimal("3");
BigDecimal c = a.divide(b, 100, RoundingMode.HALF_UP); // 0.3333333...
BigDecimal a2 = c.multiply(b); // 0.9999999...
When working with BigDecimal, it’s important to think carefully about the scale you actually need. Every additional decimal place increases both computation time and memory usage, because BigDecimal stores each digit and carries out arithmetic with arbitrary precision.
To illustrate this, here’s a small timing test for calculating 1/3 with different scales:
As you can see, increasing the scale significantly impacts performance. Choosing an unnecessarily high scale can slow down calculations and consume more memory without providing meaningful benefits. Always select a scale that balances precision requirements with efficiency.
However, as we’ve seen, decimal types like BigDecimal can only approximate many numbers when their fractional part is infinite or very long. Even with rounding modes, repeated calculations can introduce small inaccuracies.
But how can you perform calculations exactly if decimal representations can’t be stored with infinite precision?
BigFraction
To achieve truly exact calculations without losing precision, you can use fractional representations instead of decimal numbers. The BigFraction class from Apache Commons Numbers stores values as a numerator and denominator, allowing it to represent numbers like 1/3 precisely, without rounding.
import org.apache.commons.numbers.fraction.BigFraction;
BigFraction a = BigFraction.ONE;
BigFraction b = BigFraction.of(3);
BigFraction c = a.divide(b); // 1 / 3
BigFraction a2 = c.multiply(b); // 1
In this example, dividing 1 by 3 produces the exact fraction 1/3, and multiplying it by 3 returns exactly 1. Since no decimal expansion is involved, all operations remain mathematically accurate, making BigFraction a suitable choice when exact arithmetic is required.
BigFraction and Decimals
But what happens if you want to create a BigFraction from an existing decimal number?
At first glance, everything looks fine: you pass in a precise decimal value, BigFraction accepts it, and you get a fraction back. So far, so good. But if you look closely at the result, something unexpected happens—the number you get out is not the same as the one you put in. The difference is subtle, hiding far to the right of the decimal point—but it’s there. And there’s a simple reason for it: the constructor takes a double.
A double cannot represent most decimal numbers exactly. The moment your decimal value is passed into BigFraction.from(double), it is already approximated by the binary floating-point format of double. BigFraction then captures that approximation perfectly, but the damage has already been done.
Even worse: BigFraction offers no alternative constructor that accepts a BigDecimal directly. So whenever you start from a decimal number instead of integer-based fractions, you inevitably lose precision before BigFraction even gets involved. What makes this especially frustrating is that BigFraction exists precisely to allow exact arithmetic.
Creating a BigFraction from a BigDecimal correctly
To preserve exactness when converting a BigDecimal to a BigFraction, you cannot rely on BigFraction.from(double). Instead, you can use the unscaled value and scale of the BigDecimal directly:
In this case, BigFraction automatically reduces the fraction to its simplest form, storing it as short as possible. Even though the original numerator and denominator may be huge, BigFraction divides out common factors to minimize their size while preserving exactness.
BigFraction and Performance
Performing fractional or rational calculations in this exact manner can quickly consume enormous amounts of time and memory, especially when many operations generate very large numerators and denominators. Exact arithmetic should only be used when truly necessary, and computations should be minimized to avoid performance issues. For a deeper discussion, see The Great Rational Explosion.
Conclusion
When working with numbers in Java, both BigDecimal and BigFraction have their strengths and limitations. BigDecimal allows precise decimal arithmetic up to a chosen scale, but it cannot represent numbers with infinite decimal expansions exactly, and high scales increase memory and computation time. BigFraction, on the other hand, can represent rational numbers exactly as fractions, preserving mathematical precision—but only if constructed carefully, for example from integer numerators and denominators or from a BigDecimal using its unscaled value and scale.
In all cases, it is crucial to be aware of these limitations and potential pitfalls. Understanding how each type stores and calculates numbers helps you make informed decisions and avoid subtle errors in your calculations.
Having a single source of truth is one of the big tenets of programming. It is easy to see why. If you want to figure out something about your program, or change something, you just go to the corresponding source.
One of the consequences of this is usually code duplication, but things can get a lot more complicated very fast, when you think of knowledge duplication or fragmentation, instead of just code. Quite unintuitively, duplication can actually help in this case.
Consider the case where you serialize an enum value, e.g. to a database or a file. Suddenly, you have two conceptual points that ‘know’ about the translation of your enum literals to a numeric or string value: The mapping in your code and the mapping implicitly stored in the serialization. None of these two points can be changed independently. Changing the serialized content means changing the source code and vice-versa.
You could still consider your initial enum to value mapping the single source of truth, but the problem is that you can easily miss disruptive changes. E.g. if you used the numeric value, just reordering the enumerated will break the serialization. If you used the text name of the enum, even a simple rename refactoring will break it.
So to deal with this, I often build my own single source of truth: a unit test that keeps track of such implicit value couplings. That way, the test can tell you when you are accidentally breaking things. Effectively, this means duplicating the knowledge of the mapping to a ‘safe’ space: One that must be deliberately changed, and resists accidentally being broken. And then that becomes my new single source of truth for that mapping.
When building apps that use a SQL database, it’s easy to run into performance problems without noticing. Many of these issues come from the way queries are written and used in the code. Below are seven common SQL mistakes developers make, why they happen, and how you can avoid them.
Not Using Prepared Statements
One of the most common mistakes is building SQL queries by concatenating strings. This approach not only introduces the risk of SQL injection but also prevents the database from reusing execution plans. Prepared statements or parameterized queries let the database understand the structure of the query ahead of time, which improves performance and security. They also help avoid subtle bugs caused by incorrect string formatting or escaping.
// Vulnerable and inefficient
String userId = "42";
Statement stmt = connection.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users WHERE id = " + userId);
// Safe and performant
String sql = "SELECT * FROM users WHERE id = ?";
PreparedStatement ps = connection.prepareStatement(sql);
ps.setInt(1, 42);
ResultSet rs = ps.executeQuery();
The N+1 Query Problem
The N+1 problem happens when an application fetches a list of items and then runs a separate query for each item to retrieve related data. For example, fetching a list of users and then querying each user’s posts in a loop. This results in one query to fetch the users and N additional queries for their posts. The fix is to restructure the query using joins or batch-fetching strategies, so all the data can be retrieved in fewer queries.
When queries filter or join on columns that do not have indexes, the database may need to scan entire tables to find matching rows. This can be very slow, especially as data grows. Adding the right indexes can drastically improve performance. It’s important to monitor slow queries and check whether indexes exist on the columns used in WHERE clauses, JOINs, and ORDER BY clauses.
Here’s how to create an index on an “orders” table for its “customer_id” column:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Once the index is added, the query can efficiently find matching rows without scanning the full table.
Retrieving Too Much Data
Using SELECT * to fetch all columns from a table is a common habit, but it often retrieves more data than the application needs. This can increase network load and memory usage. Similarly, not using pagination when retrieving large result sets can lead to long query times and a poor user experience. Always select only the necessary columns and use LIMIT or OFFSET clauses to manage result size.
Some applications make many small queries in a single request cycle, creating high overhead from repeated database access. Each round-trip to the database introduces latency. Here’s an inefficient example:
for (int id : productIds) {
PreparedStatement ps = connection.prepareStatement(
"UPDATE products SET price = price * 1.1 WHERE id = ?"
);
ps.setInt(1, id);
ps.executeUpdate();
}
Instead of issuing separate queries, it’s often better to combine them or use batch operations where possible. This reduces the number of database interactions and improves overall throughput:
PreparedStatement ps = connection.prepareStatement(
"UPDATE products SET price = price * 1.1 WHERE id = ?"
);
for (int id : productIds) {
ps.setInt(1, id);
ps.addBatch();
}
ps.executeBatch();
Improper Connection Pooling
Establishing a new connection to the database for every query or request is slow and resource-intensive. Connection pooling allows applications to reuse database connections, avoiding the cost of repeatedly opening and closing them. Applications that do not use pooling efficiently may suffer from connection exhaustion or high latency under load. To avoid this use a connection pooler and configure it with appropriate limits for the workload.
Unbounded Wildcard Searches
Using wildcard searches with patterns like '%term%' in a WHERE clause causes the database to scan the entire table, because indexes cannot be used effectively. These searches are expensive and scale poorly. To handle partial matches more efficiently, consider using full-text search features provided by the database, which are designed for fast text searching. Here’s an example in PosgreSQL:
SELECT * FROM articles WHERE to_tsvector('english', title) @@ to_tsquery('database');
By being mindful of these common pitfalls, you can write SQL that scales well and performs reliably under load. Good database performance isn’t just about writing correct queries – it’s about writing efficient ones.
Have you faced any of these problems before? Every project is different, and we all learn a lot from the challenges we run into. Feel free to share your experiences or tips in the comments. Your story could help someone else improve their app’s performance too.
A few years ago, domain specific languages (DSLs) were a really hot topic and “the future”. Now, after the dust has settled there are a bunch of successful and useful examples like LINQ and Gradle but no holy grail buzz anymore.
In some of our projects we did implement some small embedded DSLs (eDSL) if we expected great benefit. Most of the time they are just working silently in the background of our solutions. But sometimes they really shine:
A recent story involving a DSL
A few days ago, I got a request from one of our customers regarding a complex “traffic light”-like system for scientific proposals in a proposal submission system. Each proposal goes through a bunch of phases where different parties have participate. While that sounds relatively simple our system has different proposal types, different roles, different states over time and so on.
All of the above leads to “Starbuck’s menu”-style complexity. And now, after several years in production the customer comes up with some special cases that are counterintuitive.
Expecting the worst I dived into the source code and was delighted to see a quite expressive DSL. Our project and thus also the DSL for the “status overview” feature leveraged the flexible syntax of Groovy (which is also used by Gradle). After a few minutes I could not only understand all the rules for the different status lights but also pin-point the cases in question.
I then moved forward and used the rule definition in our DSL to talk to our customer directly! And after a few more minutes he – as the domain expert – also understood our rule definition and the cause for the irritating display of a certain combination of proposal type, role and phase.
The fix was more or less straightforward and I simply handed that one source file to our customer for future reference.
Here is an excerpt of our status phase rules written in our own DSL:
PhaseRules {
submission {
green {
proposal.isValidated()
}
red {
!proposal.isValidated()
}
white {
false
}
user {
yellow {
proposal.isCallPhaseIndependent())
}
grey {
false
}
}
scientist {
yellow {
false
}
grey {
proposalNeedsValidationByProposer(proposal)
}
}
userOffice {
yellow {
false
}
grey {
proposalNeedsValidationByProposer(proposal)
}
}
}
check {
green {
proposal.isValidated() && proposal.hasFeasibilityComment()
}
red {
}
}
// more phases, roles, states and rules omitted...
}
In that moment I was very proud of my project team, their decision to implement a DSL for this feature and an appropriate implementation.
It saved me a lot of headaches and quite some time. Furthermore it reinforced our customers trust in the application and improved the transparency and traceability of our solution.
This high level abstraction let both me and the customer ignore all the implementation details and focus on the actual functionality and behaviour correcting/improving the status display for our users.
Java Streams are like clean, connected pipes: data flows from one end to the other, getting filtered and transformed along the way. Everything works beautifully — as long as the pipe stays intact.
But what happens if you cut the pipe? Or if you throw rocks into it?
Both stop the flow, though in different ways. Let’s look at what that means for Java Streams.
Exceptions — Cutting the Pipe in Half
A stream is designed for pure functions. The same input gives the same output without side effects. Each element passes through a sequence of operations like map, filter, sorted. But when one of these operations throws an exception, that flow is destroyed. Exceptions are side effects.
Throwing an exception in a stream is like cutting the pipe right in the middle: some water (data) might have already passed through, but nothing else reaches the end. The pipeline is broken.
Example:
var result = items.stream()
.map(i -> {
if(i==0) {
throw new InvalidParameterException();
}
return 10 / i;
})
.toList();
If you throws the exception, the entire stream stops. The remaining elements never get processed.
Uncertain Operations — Throwing Rocks into the Pipe
Now imagine you don’t cut the pipe — you just throw rocks into it.
Some rocks are small enough to pass. Some are too big and block the flow. Some hit the walls and break the pipe completely.
That’s what happens when you perform uncertain operations inside a stream that might fail in expected ways — for example, file reads, JSON parsing, or database lookups.
Most of the time it works, but when one file can’t be read, you suddenly have a broken flow. Your clean pipeline turns into a source of unpredictable errors.
The compiler does not allow checked exceptions like IOException in streams. Unchecked exceptions, such as RuntimeException, are not detected by the compiler. That’s why this example shows a common “solution” of catching the checked exception and converting it into an unchecked exception. However, this approach doesn’t actually solve the underlying problem; it just makes the compiler blind to it.
Uncertain operations are like rocks in the pipe — they don’t belong inside. You never know whether they’ll pass, get stuck, or destroy the stream.
How to Keep the Stream Flowing
There are some strategies to keep your stream unbroken and predictable.
Prevent problems before they happen
If the failure is functional or domain-specific, handle it before the risky operation enters the stream.
Example: division by zero — a purely data-related, predictable issue.
var result = items.stream()
.filter(i -> i != 0)
.map(i -> 10 / i)
.toList();
Keep the flow pure by preparing valid data up front.
Represent expected failures as data
This also applies to functional or domain-specific failures. If a result should be provided for each element even when the operation cannot proceed, use Optional instead of throwing exceptions.
var result = items.stream()
.collect(Collectors.toMap(
i -> i,
i -> {
if(i == 0) {
return Optional.empty();
}
return Optional.of(10 / i);
}
));
Now failures are part of the data. The stream continues.
Keep Uncertain Operations Outside the Stream
This solution is for technical failures that cannot be prevent — perform it before starting the stream.
Fetch or prepare data in a separate step that can handle retries or logging. Once you have stable data, feed it into a clean, functional pipeline.
var responses = fetchAllSafely(ids); // handle exceptions here
responses.stream()
.map(this::transform)
.toList();
That way, your stream remains pure and deterministic — the way it was intended.
Conclusion
A busted pipe smells awful in the basement, and exceptions in Java Streams smell just as bad. So keep your pipes clean and your streams pure.
that the published “Single File” contains of quite a handful of files. So do I, having studied physics and all that, have an outdated knowledge of the concept of “single”? Or is there a very specific reason for any of the files that appear that and it just has to be? I really needed to figure that out for a small web server application (so, ASP.NET), and as we promised our customer a small, simple application; all the extra non-actually-single files were distracting spam at least, and at worst it would suggest a level of complexity to them that wasn’t helpful, or suggest a level of we-don’t-actually-care when someone opens that directory.
So what I found to do a lot was to extend my .csproj file with the following entries, which I want to explain shortly:
The first block gets rid of any *.pdb file, which stands for “Program Debug Database” – these contain debugging symbols, line numbers, etc. that are helpful for diagnostics when the program crashes. Unless you have a very tech-savvy customer that wants to work hands-on when a rare crash occurs, end users do not need them.
The Second block gets rid of the files aspnetcorev2_inprocess.dll and web.config. These are the Windows “Internet Information Services” module and configuration XML that can be useful when an application needs tight integration with the environmental OS features (authentication and the likes), but not for a standalone web application that just does a simple thing.
And then the last two blocks can be understood nearly as simple as they are written – while a customer might find an appsettings.json useful, any appsettings.Development.json is not relevant in their production release,
and for a ASP.NET application, any web UI content conventionally lives in a wwwroot/ folder. While the app needs them, they might not need to be visible to the customer – so the combination of <Content Remove…/> and <EmbeddedResource Include…/> does exactly that.
Maybe this can help you, too, in publishing cleaner packages to your customers, especially when “I just want a simple tool” should mean exactly that.
A couple of weeks ago, I asked my brother to test out my new game You Are Circle (please wishlist it and check out the demo, if that’s up your alley!) and among lots of other valuable feedback, he mentioned that the explosion sound effects had a weird click sound at the end that he could only hear with his headphones on. For those of you not familiar with audio signal processing, those click or pop sounds usually appear when the ‘curvy’ audio signal is abruptly cut off1. I did not notice it on my setup, but he has a lot of experience with audio mixing, so I trusted his hearing. Immediately, I looked at the source files in audacity:
They looked fine, really. The sound slowly fades out, which is the exact thing you need to do to prevent clicks & pops. Suspecting the problem might be on the playback side of his particular setup, I asked him to record the sound on his computer the next time he tested and then kind of forgot about it for a bit.
Fast-forward a couple of days. Neither of us had followed up on the little clicky noise thing. While doing some video captures with OBS, I noticed that the sound was kind of terrible in some places, the explosions in particular. Maybe that was related?
While building a new version of my game, Compiling resources... showed up in my console and it suddenly dawned on me: What if my home-brew resource compiler somehow broke the audio files? I use it to encode all the .wav originals into Ogg Vorbis for deployment. Maybe a badly configured encoding setup caused the weird audio in OBS and for my brother? So I looked at the corresponding .ogg files, and to my surprise, it indeed had a small abrupt cut-off at the end. How could that happen? Only when I put both the original and the processed file next to each other, did I see what was actually going on:
It’s only half the file! How did that happen? And what made this specific file so special for it to happen? This is one of many files that I also convert from stereo to mono in preprocessing. So I hypothesized that might be the problem. No way I missed all of those files being cut in half though, or did I? So I checked the other files that were converted from stereo to mono. Apparently, I did miss it. They were all cut in half. So I took a look at the code. It looked something like this:
while (keep_encoding)
{
auto samples_in_block = std::min(BLOCK_SIZE, input.sample_count() - sample_offset);
if (samples_in_block != 0)
{
auto samples_per_channel = samples_in_block / channel_count;
auto channel_buffer = vorbis_analysis_buffer(&dsp_state, BLOCK_SIZE);
auto input_samples = input.samples() + sample_offset;
if (convert_to_mono)
{
for (int sample = 0; sample < samples_in_block; sample += 2)
{
int sample_in_channel = sample / channel_count;
channel_buffer[0][sample_in_channel] = (input_samples[sample] + input_samples[sample + 1]) / (2.f * 32768.f);
}
}
else
{
for (int sample = 0; sample < samples_in_block; ++sample)
{
int channel = sample % channel_count;
int sample_in_channel = sample / channel_count;
channel_buffer[channel][sample_in_channel] = input_samples[sample] / 32768.f;
}
}
vorbis_analysis_wrote(&dsp_state, samples_per_channel);
sample_offset += samples_in_block;
}
else
{
vorbis_analysis_wrote(&dsp_state, 0);
}
/* more stuff to encode the block using the ogg/vorbis API... */
}
Not my best work, as far as clarity and deep nesting goes. After staring at it for a while, I couldn’t really figure out what was wrong with it. So I built a small test program to debug into, and only then did I see what was wrong.
It was terminating the loop after half the file, which now seems pretty obvious given the outcome. But why? Turns out it wasn’t the convert_to_mono at all, but the whole loop. What’s really the problem here is mismatched and imprecise terminology.
What is a sample? The audio signal is usually sampled several thousand times (44.1kHz, 48kHz or 96kHz are common) per second to record the audio waves. One data point is called a sample. But that is only enough of a definition if the sound has a single channel. But all those with convert_to_mono==true were stereo, and that’s exactly were the confusion is in this code. One part of the code thinks in single-channel samples, i.e. a single sampling time-point has two samples in a stereo file, while the other part things in multi-channel samples, i.e. a single sampling time-point has only one stereo sample, that consists of multiple numbers. Specifically this line:
auto samples_in_block = std::min(BLOCK_SIZE, input.sample_count() - sample_offset);
samples_in_block and sample_offset use the former definition, while input.sample_count() uses the latter. The fix was simple: replace input.sample_count() with input.sample_count() * channel_count.
But that meant all my stereo sounds, even the longer music files, were missing the latter half. And this was not a new bug. The code was in there since the very beginning of the git history. I just didn’t hear its effects. For the sound files, many of them have a pretty long fade out in the second half, so I can kind of get why it was not obvious. But the music was pretty surprising. My game music loops, and apparently, it also loops if you cut it in half. I did not notice.
So what did I learn from this? Many of my assumptions while hunting down this bug were wrong:
My brother’s setup did not have anything to do with it.
Just because the original source file looked fine, I thought the file I was playing back was good as well.
The bad audio in OBS did not have anything to do with this, it was just recorded too loud.
The ogg/vorbis encoding was not badly configured.
The convert_to_mono switch or the special averaging code did not cause the problem.
I thought I would have noticed that almost all my sounds were broken for almost two years. But I did not.
What really cause the problem was an old programming nemesis, famously one of the two hard things in computer science: Naming things. There you have it. Domain language is hard.
I think this is because this sudden signal drop equates to a ‘burst’ in the frequency domain, but that is just an educated guess. If you know, please do tell. ↩︎
There is a long-standing tradition to fill unknown text fields with placeholder data. In graphic design, these texts are called “dummy text”. In the german language, the word is “Blindtext”, which translates directly as “blind text”. The word means that while some text is there, the meaning of it can’t be seen.
A popular dummy text is the latin sounding “Lorem ipsum dolor sit amet”, which isn’t actually valid latin. It has no meaning other than being text and taking up space.
While developing software user interfaces, we often deal with smaller input areas like textfields (instead of text areas that could hold a sizeable portion of “lorem ipsum”) or combo boxes. If we don’t know the actual content yet, we tend to fill it with placeholder data that tries to reflect the software’s domain. And by doing that, we can make many mistakes that seem small because they can easily be fixed – just change the text – but might have negative effects that can just as easily be avoided. But you need to be aware of the subtle messages your placeholders send to the reader.
In this series, we will look at a specific domain example: digital invoices. The mistakes and solutions aren’t bound to any domain, though. And we will look at user interfaces and the corresponding source code, because you can fool yourself or your fellow developers with placeholder data just as easily as your customer.
We start with a relatively simple mistake: Letting your placeholder data appear to be real.
The digital (or electronic) invoice is a long-running effort to reduce actual paper document usage in the economy. With the introduction of the european norm EN 16931, there is a chance of a unified digital format used in one major economic region. Several national interpretations of the norm exist, but the essential parts are identical. You can view invoices following the format with a specialized viewer application like the Quba viewer. One section of the data is the information about the invoice originator, or the “seller” in domain terms:
You can see the defined fields of the norm (I omitted a few for simplicity – a mistake we will discuss later in detail) and a seemingly correct set of values. It appears to be the address of my company, the Softwareschneiderei GmbH.
If you take a quick look at the imprint of our home page, you can already spot some differences. The street is obviously wrong and the postal code is a near miss. But other data is seemingly correct: The company name is real, the country code is valid and my name has no spelling error.
And then, there are those placeholder texts that appear to be correct, but really aren’t. I don’t encourage you to dial the phone number, because it is a real number. But it won’t connect to a phone, because it is routed to our fax machine (we don’t actually have a “machine” for that, it’s a piece of software that will act like a fax). Even more tricky is the e-mail address. It could very well be routed, but actually isn’t.
Both placeholder texts serve the purpose of “showing it like it might be”, but appear to be so real and finalized that they lose the “placeholder” characteristics. If you show the seller data to me, I will immediately spot the wrong street and probably the wrong postal code, but accept the phone number as “real”. But is isn’t real, it is just very similar to the real one.
How can you avoid placeholders that look too real?
One possibility is to fake the data completely until given the real values:
These texts have the same “look and feel” and the same lengths as the near-miss entries, but are readily recognizable as made-up values.
There is only one problem: If you mix real and made-up values, you present your readers a guessing game for each entry: real or placeholder? If it is no big deal to change the placeholders later on, resist the urge to be “as real as possible”. You can change things like the company name from “Softwareschneiderei GmbH” to “Your Company Name Here Inc.” or something similar and it won’t befuddle anybody because the other texts are placeholders, too. You convey the information that this section is still “under construction”. There is no “80% done” for these things. The section is fully real or not. Introducing situations like “the company name and the place are already real, but the street, postal code and anything else isn’t” doesn’t clear anything and only makes things more complicated.
But I want to give you another possibility to make the placeholders look less real:
Add a prefix or suffix that communicates that the entry is in a state of flux:
That way, you can communicate that you know, guess or propose a value for the field, but it still needs approval from the customer. Another benefit is that you can search for “TODO” and list all the decisions that are pending.
If, for some reason, it is not possible to include the prefix or suffix with a separator, try to include it as visible (and searchable) as possible:
This are the two ways I make my placeholder text convey the information that they are, indeed, just placeholders and not the real thing yet.
Maybe there are other possibilities that you know of? Describe them in a comment below!
In the first part of this series, we looked at two mistakes:
Your placeholders look too real
You mix real data with placeholders
And we discussed three solutions:
Make your placeholders unmistakably fake
Give your placeholders a “TODO” prefix or suffix
Demote your real data to placeholders as long as there is still an open question
In the next part of this series, we will look at the code side of the problem and discover that we can make our lives easier there as well.