Calculation with infinite decimal expansion in Java

When dividing decimal numbers in Java, some values—like 1 divided by 3—result in an infinite decimal expansion. In this blog post, I’ll show how such a calculation behaves using BigDecimal and BigFraction.

BigDecimal

Since this cannot be represented exactly in memory, performing such a division with BigDecimal without specifying a rounding mode leads to an “java.lang.ArithmeticException: Non-terminating decimal expansion; no exact representable decimal result”. Even when using MathContext.UNLIMITED or an effectively unlimited scale, the same exception is thrown, because Java still cannot produce a finite result.

BigDecimal a = new BigDecimal("1");
BigDecimal b = new BigDecimal("3");
BigDecimal c = a.divide(b);

By providing a scale – not MathContext.UNLIMITED – and a rounding mode, Java can approximate the result instead of failing. However, this also means the value is no longer mathematically exact. As shown in the second example, multiplying the rounded result back can introduce small inaccuracies due to the approximation.

BigDecimal a = new BigDecimal("1");
BigDecimal b = new BigDecimal("3");
BigDecimal c = a.divide(b, 100, RoundingMode.HALF_UP); // 0.3333333...
BigDecimal a2 = c.multiply(b);  // 0.9999999...

When working with BigDecimal, it’s important to think carefully about the scale you actually need. Every additional decimal place increases both computation time and memory usage, because BigDecimal stores each digit and carries out arithmetic with arbitrary precision.

To illustrate this, here’s a small timing test for calculating 1/3 with different scales:

As you can see, increasing the scale significantly impacts performance. Choosing an unnecessarily high scale can slow down calculations and consume more memory without providing meaningful benefits. Always select a scale that balances precision requirements with efficiency.

However, as we’ve seen, decimal types like BigDecimal can only approximate many numbers when their fractional part is infinite or very long. Even with rounding modes, repeated calculations can introduce small inaccuracies.

But how can you perform calculations exactly if decimal representations can’t be stored with infinite precision?

BigFraction

To achieve truly exact calculations without losing precision, you can use fractional representations instead of decimal numbers. The BigFraction class from Apache Commons Numbers stores values as a numerator and denominator, allowing it to represent numbers like 1/3 precisely, without rounding.

import org.apache.commons.numbers.fraction.BigFraction;

BigFraction a = BigFraction.ONE;
BigFraction b = BigFraction.of(3);
BigFraction c = a.divide(b);    // 1 / 3
BigFraction a2 = c.multiply(b); // 1

In this example, dividing 1 by 3 produces the exact fraction 1/3, and multiplying it by 3 returns exactly 1. Since no decimal expansion is involved, all operations remain mathematically accurate, making BigFraction a suitable choice when exact arithmetic is required.

BigFraction and Decimals

But what happens if you want to create a BigFraction from an existing decimal number?

BigFraction fromDecimal = BigFraction.from(2172.455928961748633879781420765027);
fromLongDecimal.bigDecimalValue(); // 2172.45592896174866837100125849246978759765625 

At first glance, everything looks fine: you pass in a precise decimal value, BigFraction accepts it, and you get a fraction back. So far, so good. But if you look closely at the result, something unexpected happens—the number you get out is not the same as the one you put in. The difference is subtle, hiding far to the right of the decimal point—but it’s there.
And there’s a simple reason for it: the constructor takes a double.

A double cannot represent most decimal numbers exactly. The moment your decimal value is passed into BigFraction.from(double), it is already approximated by the binary floating-point format of double. BigFraction then captures that approximation perfectly, but the damage has already been done.

Even worse: BigFraction offers no alternative constructor that accepts a BigDecimal directly. So whenever you start from a decimal number instead of integer-based fractions, you inevitably lose precision before BigFraction even gets involved. What makes this especially frustrating is that BigFraction exists precisely to allow exact arithmetic.

Creating a BigFraction from a BigDecimal correctly

To preserve exactness when converting a BigDecimal to a BigFraction, you cannot rely on BigFraction.from(double). Instead, you can use the unscaled value and scale of the BigDecimal directly:

BigDecimal longNumber = new BigDecimal("2172.455928961748633879781420765027");
BigFraction fromLongNumber = BigFraction.of(
   longNumber.unscaledValue(),
   BigInteger.TEN.pow(longNumber.scale())
); // 2172455928961748633879781420765027 / 1000000000000000000000000000000

fromLongNumber.bigDecimalValue() // 2172.455928961748633879781420765027

This approach ensures the fraction exactly represents the BigDecimal, without any rounding or loss of precision.

BigDecimal longNumber = new BigDecimal("2196.329071038251366120218579234972");
BigFraction fromLongNumber = BigFraction.of(
   longNumber.unscaledValue(),
   BigInteger.TEN.pow(longNumber.scale())
); // 549082267759562841530054644808743 / 250000000000000000000000000000

fromLongNumber.bigDecimalValue() // 2196.329071038251366120218579234972

In this case, BigFraction automatically reduces the fraction to its simplest form, storing it as short as possible. Even though the original numerator and denominator may be huge, BigFraction divides out common factors to minimize their size while preserving exactness.

BigFraction and Performance

Performing fractional or rational calculations in this exact manner can quickly consume enormous amounts of time and memory, especially when many operations generate very large numerators and denominators. Exact arithmetic should only be used when truly necessary, and computations should be minimized to avoid performance issues. For a deeper discussion, see The Great Rational Explosion.

Conclusion

When working with numbers in Java, both BigDecimal and BigFraction have their strengths and limitations. BigDecimal allows precise decimal arithmetic up to a chosen scale, but it cannot represent numbers with infinite decimal expansions exactly, and high scales increase memory and computation time. BigFraction, on the other hand, can represent rational numbers exactly as fractions, preserving mathematical precision—but only if constructed carefully, for example from integer numerators and denominators or from a BigDecimal using its unscaled value and scale.

In all cases, it is crucial to be aware of these limitations and potential pitfalls. Understanding how each type stores and calculates numbers helps you make informed decisions and avoid subtle errors in your calculations.

Don’t go bursting the pipe

Java Streams are like clean, connected pipes: data flows from one end to the other, getting filtered and transformed along the way. Everything works beautifully — as long as the pipe stays intact.

But what happens if you cut the pipe? Or if you throw rocks into it?

Both stop the flow, though in different ways. Let’s look at what that means for Java Streams.

Exceptions — Cutting the Pipe in Half

A stream is designed for pure functions. The same input gives the same output without side effects. Each element passes through a sequence of operations like map, filter, sorted. But when one of these operations throws an exception, that flow is destroyed. Exceptions are side effects.

Throwing an exception in a stream is like cutting the pipe right in the middle:
some water (data) might have already passed through, but nothing else reaches the end. The pipeline is broken.

Example:

var result = items.stream()
    .map(i -> {
        if(i==0) {
            throw new InvalidParameterException();
        }
        return 10 / i;
    })
    .toList();

If you throws the exception, the entire stream stops. The remaining elements never get processed.

Uncertain Operations — Throwing Rocks into the Pipe

Now imagine you don’t cut the pipe — you just throw rocks into it.

Some rocks are small enough to pass.
Some are too big and block the flow.
Some hit the walls and break the pipe completely.

That’s what happens when you perform uncertain operations inside a stream that might fail in expected ways — for example, file reads, JSON parsing, or database lookups.

Most of the time it works, but when one file can’t be read, you suddenly have a broken flow. Your clean pipeline turns into a source of unpredictable errors.

var lines = files.stream()
   .map(i -> {
        try {
            return readFirstLine(i); // throws IOException
        }
        catch (IOException e) {
            throw new RuntimeException(e);
        }
    })
    .toList();

The compiler does not allow checked exceptions like IOException in streams. Unchecked exceptions, such as RuntimeException, are not detected by the compiler. That’s why this example shows a common “solution” of catching the checked exception and converting it into an unchecked exception. However, this approach doesn’t actually solve the underlying problem; it just makes the compiler blind to it.

Uncertain operations are like rocks in the pipe — they don’t belong inside.
You never know whether they’ll pass, get stuck, or destroy the stream.

How to Keep the Stream Flowing

There are some strategies to keep your stream unbroken and predictable.

Prevent problems before they happen

If the failure is functional or domain-specific, handle it before the risky operation enters the stream.

Example: division by zero — a purely data-related, predictable issue.

var result = items.stream()
    .filter(i -> i != 0)
    .map(i -> 10 / i) 
    .toList();

Keep the flow pure by preparing valid data up front.

Represent expected failures as data

This also applies to functional or domain-specific failures. If a result should be provided for each element even when the operation cannot proceed, use Optional instead of throwing exceptions.

var result = items.stream()
    .collect(Collectors.toMap(
        i -> i,
        i -> {
            if(i == 0) {
                return Optional.empty();
            }
            return Optional.of(10 / i);
        }
    ));

Now failures are part of the data. The stream continues.

Keep Uncertain Operations Outside the Stream

This solution is for technical failures that cannot be prevent — perform it before starting the stream.

Fetch or prepare data in a separate step that can handle retries or logging.
Once you have stable data, feed it into a clean, functional pipeline.

var responses = fetchAllSafely(ids); // handle exceptions here

responses.stream()
    .map(this::transform)
    .toList();

That way, your stream remains pure and deterministic — the way it was intended.

Conclusion

A busted pipe smells awful in the basement, and exceptions in Java Streams smell just as bad. So keep your pipes clean and your streams pure.

Oracle and the materialized view update

Materialized views are powerful. They give us precomputed, queryable snapshots of expensive joins and aggregations. But the moment you start layering other views on top of them, you enter tricky territory.

The Scenario

You define a materialized view to speed up a reporting query. Soon after, others discover it and start building new views on top of it. The structure spreads.

Now imagine: you need to extend the base materialized view. Maybe add a column, or adjust its definition. That’s when the trouble starts.

The Problem

Unlike regular views, materialized views don’t offer a convenient CREATE OR REPLACE. You can’t just adjust the definition in place. Oracle also doesn’t allow a simple ALTER to add a column or tweak the structure—recreating the materialized views is often the only option.

Things get even more complicated when other views depend on your materialized view. In that case, Oracle won’t even let you drop it. Instead, you’re greeted with an error about dependent objects, leaving you stuck in a dependency lock-in.

The more dependencies there are, the more brittle the setup becomes. What started as a performance optimization can lock you into a rigid structure that resists change.

As a short example, let’s look at how other databases handle this scenario. In Postgres, you can drop a materialized view even if other views depend on it. The dependent views temporarily lose their base and will fail if queried, but you won’t get an error on the drop. Once you recreate the materialized view with the same name and structure, the dependent views automatically start working again.

What to Do?

That is the hard question. Sometimes you can try to hide materialized views behind stable views. Or you take the SQL of all dependent views, drop them, change the materialized view, and then recreate all dependent views— a process that can be a huge pain.

How do you manage changes to materialized views that already have dependent views stacked on top? Do you design around it, fight with rebuild scripts every time, or have another solution?

The Dimensions of Navigation in Eclipse

Following up on “The Dimensions of Navigation in Object-Oriented Code” this post explores how Eclipse, one of the most mature IDEs for Java development, supports navigating across different dimensions of code: hierarchy, behavior, validation and utilities.

Let’s walk through these dimensions and see how Eclipse helps us travel through code with precision.

1. Hierarchy Navigation

Hierarchy navigation reveals the structure of code through inheritance, interfaces and abstract classes.

  • Open Type Hierarchy (F4):
    Select a class or interface, then press F4. This opens a dedicated view that shows both the supertype and subtype hierarchies.
  • Quick Type Hierarchy (Ctrl + T):
    When your cursor is on a type (like a class, interface name), this shortcut brings up a popover showing where it fits in the hierarchy—without disrupting your current layout.
  • Open Implementation (Ctrl + T on method):
    Especially useful when dealing with interfaces or abstract methods, this shortcut lists all concrete implementations of the selected method.

2. Behavioral Navigation

Behavioral navigation tells you what methods call what, and how data flows through the application.

  • Open Declaration (F3 or Ctrl + Click):
    When your cursor is on a method call, pressing F3 or pressing Ctrl and click on the method jumps directly to its definition.
  • Call Hierarchy (Ctrl + Alt + H):
    This is a powerful tool that opens a tree view showing all callers and callees of a given method. You can expand both directions to get a full picture of where your method fits in the system’s behavior.
  • Search Usages in Project (Ctrl + Shift + G):
    Find where a method, field, or class is used across your entire project. This complements call hierarchy by offering a flat list of usages.

3. Validation Navigation

Validation navigation is the movement between your business logic and its corresponding tests. Eclipse doesn’t support this navigation out of the box. However, the MoreUnit plugin adds clickable icons next to classes and tests, allowing you to switch between them easily.

4. Utility Navigation

This is a collection of additional navigation features and productivity shortcuts.

  • Quick Outline (Ctrl + O):
    Pops up a quick structure view of the current class. Start typing a method name to jump straight to it.
  • Search in All Files (Ctrl + H):
    The search dialog allows you to search across projects, file types, or working sets.
  • Content Assist (Ctrl + Space):
    This is Eclipse’s autocomplete—offering method suggestions, parameter hints, and even auto-imports.
  • Generate Code (Alt + Shift + S):
    Use this to bring up the “Source” menu, which allows you to generate constructors, getters/setters, toString(), or even delegate methods.
  • Format Code (Ctrl + Shift + F):
    Helps you clean up messy files or align unfamiliar code to your formatting preferences.
  • Organize Imports (Ctrl + Shift + O):
    Automatically removes unused imports and adds any missing ones based on what’s used in the file.
  • Markers View (Window Show View Markers):
    Shows compiler warnings, TODOs, and FIXME comments—helps prioritize navigation through unfinished or problematic code.

Eclipse Navigation Cheat Sheet

ActionShortcut / Location
Open Type HierarchyF4
Quick Type HierarchyCtrl + T
Open ImplementationCtrl + T (on method)
Open DeclarationF3 or Ctrl + Click
Call HierarchyCtrl + Alt + H
Search UsagesCtrl + Shift + G
MoreUnit SwitchMoreUnit Plugin
Quick OutlineCtrl + O
Search in All FilesCtrl + H
Content AssistCtrl + Space
Generate CodeAlt + Shift + S
Format CodeCtrl + Shift + F
Organize ImportsCtrl + Shift + O
Markers ViewWindow → Show View → Markers

The Dimensions of Navigation in Object-Oriented Code

One powerful aspects of modern software development is how we move through our code. In object-oriented programming (OOP), understanding relationships between classes, interfaces, methods, and tests is important. But it is not just about reading code; it is about navigating it effectively.

This article explores the key movement dimensions that help developers work efficiently within OOP codebases. These dimensions are not specific to any tool but reflect the conceptual paths developers regularly take to understand and evolve code.

1. Hierarchy Navigation: From Parent to Subtype and Back

In object-oriented systems, inheritance and interfaces create hierarchies. One essential navigation dimension allows us to move upward to a superclass or interface, and downward to a subclass or implementing class.

This dimension is valuable because:

  • Moving up let us understand general contracts or abstract logic that governs behavior across many classes.
  • Moving down help us see specific implementations and how abstract behavior is concretely realized.

This help us maintain a clear overview of where we are within the hierarchy.

2. Behavioral Navigation: From Calls to Definitions and Back

Another important movement is between where methods are defined and where they are used. This is less about structure and more about behavior—how the system flows during execution.

Understanding this movement helps developers:

  • Trace logic through the system from the point of use to its implementation.
  • Identify which parts of the system rely on a particular method or class.
  • Assess how a change to a method might ripple through the codebase.

This navigation is useful when debugging, refactoring, or working in unfamiliar code.

3. Validation Navigation: Between Code and its Tests

Writing automated tests is a fundamental part of software development. Tests are more than just safety nets—they also serve as valuable guides for understanding and verifying how code is intended to behave. Navigating between a class and its corresponding test forms another important dimension.

This movement enables developers to:

  • Quickly validate behavior after making changes.
  • Understand how a class is intended to be used by seeing how it is tested.
  • Improve or add new tests based on recent changes.

Tight integration between code and test supports confident and iterative development, especially in test-driven workflows.

4. Utility Navigation: Supporting Movements that Boost Productivity

Beyond the main three dimensions, there are several supporting movements that contribute to developer efficiency:

  • Searching across the codebase to find any occurrence of a class, method, or term.
  • Generating boilerplate code, like constructors or property accessors, to reduce repetitive work.
  • Code formatting and cleanup, which helps maintain consistency and readability.
  • Autocompletion, which reduces cognitive load and accelerates writing.

These actions do not directly reflect code relationships but enhance how smoothly we can move within and around the code, keeping us focused on solving problems rather than managing structure.

Conclusion: Movement is Understanding

In object-oriented systems, navigating through your codebase along different dimensions provides essential insight for understanding, debugging, and improving your software.

Mastering these dimensions transforms your workflow from reactive to intuitive, allowing you to see code not just as static text, but as a living system you can navigate, shape, and grow.

In an upcoming post, I will take the movement dimensions discussed here and show how they are practically supported in IDEs like Eclipse and IntelliJ IDEA.

Tell different stories within the same universe

You might know this from fantasy book series: the author creates a unique world, a whole universe of their own and sets a story or series of books within it. Then, a few years later, a new series is released. It is set in the same universe, but at a different time, with different characters, and tells a completely new story. Still, it builds on the foundation of that original world. The author does not reinvent everything from scratch. They use the same map, the same creatures, the same customs and rules established in the earlier books.

Examples of this are the Harry Potter series and Fantastic Beasts, or The Lord of the Rings and The Hobbit.

But what does this have to do with software development?
In one of my projects, I faced a very similar use case. I had to implement several services, each covering a different use case, but all sharing the same set of peripherals, adapters, and domain types.

So I needed an architecture that did not just allow for interchangeable periphery, as is usually the focus, but also supported interchangeable use cases. In other words, I needed a setup that allowed for multiple “books” to be written within the same “universe.”

Architecture

Let’s start with a simple example: user management.
I originally implemented it following Clean Architecture principles, where the structure resembles an onion, dependencies flow inward, from the outer layers to the core domain logic. This makes the outer layers (the “peel”) easily replaceable or extendable.

Our initial use case is a service that creates a user. The use case defines an interface that the user controller implements, meaning the dependency flows from the outer layer (the controller) toward the core. So far, so good.

However, I wanted to evolve the architecture to support multiple use cases. For that, the direct dependency from the UserController to the CreateUser use case had to be removed.

My solution was to introduce a new domain module, a shared foundation that contains all interfaces, data types, and common logic used by both use cases and adapters. I called this module the UseCaseService.

The result is a new architecture diagram:

There is no longer a direct connection between a specific use case and an adapter. Instead, both depend on the shared UseCaseService module. With this setup, I can easily create new use cases that reuse the existing ecosystem without duplicating code or logic.

For example, I could implement another service that retrieves all users whose birthday is today and sends them birthday greetings. (Whether this is GDPR-compliant is another discussion!) But thanks to this architecture, I now have the freedom to implement that use case cleanly and efficiently.

Conclusion

Architecture is a highly individual matter. There is no one-size-fits-all solution that solves every problem or suits every project. Models like Clean Architecture can be helpful guides, but ultimately, you need to define your own architectural requirements and find a solution that meets them. This was a short story of how one such solution came to life based on my own needs.

It is also a small reminder to keep the freedom to think outside the box. Do not be afraid to design an architecture that truly fits you and your project, even if it deviates from the standard models.

Nginx upload limit

Today, I encountered a surprising issue with my Docker-based web application. The application has an upload limit set, but before reaching it, an unexpected error appeared:

413 Request Entity Too Large

Despite the application’s upload limit being correctly configured, the error occurred much earlier—when the file was barely over 1MB. Where does this limitation come from, and how can it be changed?


Troubleshooting

The issue occurred before the request even reached the application layer, during a critical step in request processing. The root cause was Nginx, the web server and reverse proxy used in the Docker stack.

Nginx, commonly used in modern application stacks for load balancing, caching, and HTTPS handling, acts as the gateway to the application, managing all incoming requests. However, Nginx was rejecting uploads larger than 1MB. This was due to the client_max_body_size directive, which—when unset—defaults to a relatively low limit in some configurations. As a result, Nginx blocked larger file uploads before they could reach the application.

Solution

To resolve this issue, the client_max_body_size directive in the Nginx configuration needed to be updated to allow larger file uploads.

Modify the nginx.conf file or the relevant server block configuration:

server {
    listen 80;
    server_name example.com;
    client_max_body_size 100M;  # Allow uploads up to 100MB
}

After making this change, restart Nginx to apply the new configuration:

nginx -s reload

If Nginx is running in a Docker container, you can restart the container instead:

docker restart <container_name>

With this update, the upload limit increased to 100MB, allowing the application to handle larger files without premature rejection. Once the configuration was applied, the error disappeared, and file uploads worked as expected, provided they remained within the newly defined limits.

Integrating API Key Authorization in Micronaut’s OpenAPI Documentation

In a Java Micronaut application, endpoints are often secured using @Secured(SecurityRule.IS_AUTHENTICATED), along with an authentication provider. In this case, authentication takes place using API keys, and the authentication provider validates them. If you also provide Swagger documentation for users to test API functionalities quickly, you need a way for users to specify an API key in Swagger that is automatically included in the request headers.

For a general guide on setting up a Micronaut application with OpenAPI Swagger and Swagger UI, refer to this article.

The following article focuses on how to integrate API key authentication into Swagger so that users can authenticate and test secured endpoints directly within the Swagger UI.

Accessing Swagger Without Authentication

To ensure that Swagger is always accessible without authentication, update the application.yml file with the following settings:

micronaut:  
  security:
    intercept-url-map:
      - pattern: /swagger/**
        access:
          - isAnonymous()
      - pattern: /swagger-ui/**
        access:
          - isAnonymous()
    enabled: true

These settings ensure that Swagger remains accessible without requiring authentication while keeping API security enabled.

Defining the Security Schema

Micronaut supports various Swagger annotations to configure OpenAPI security. To enable API key authentication, use the @SecurityScheme annotation:

import io.swagger.v3.oas.annotations.security.SecurityScheme;
import io.swagger.v3.oas.annotations.enums.SecuritySchemeIn;
import io.swagger.v3.oas.annotations.enums.SecuritySchemeType;

@SecurityScheme(
    name = "MyApiKey",
    type = SecuritySchemeType.APIKEY,
    in = SecuritySchemeIn.HEADER,
    paramName = "Authorization",
    description = "API Key authentication"
)

This defines an API key security scheme with the following properties:

  • Name: MyApiKey
  • Type: APIKEY
  • Location: Header (Authorization field)
  • Description: Explains how the API key authentication works

Applying the Security Scheme to OpenAPI

Next, we configure Swagger to use this authentication scheme by adding it to @OpenAPIDefinition:

import io.swagger.v3.oas.annotations.info.*;
import io.swagger.v3.oas.annotations.security.SecurityRequirement;

@OpenAPIDefinition(
    info = @Info(
        title = "API",
        version = "1.0.0",
        description = "This is a well-documented API"
    ),
    security = @SecurityRequirement(name = "MyApiKey")
)

This ensures that the Swagger UI recognizes and applies the defined authentication method.

Conclusion

With these settings, your Swagger UI will display an Authorization field in the top-left corner.

Users can enter an API key, which will be automatically included in all API requests as a header.

This is just one way to implement authentication. The @SecurityScheme annotation also supports more advanced authentication flows like OAuth2, allowing seamless token-based authentication through a token provider.

By setting up API key authentication correctly, you enhance both the security and usability of your API documentation.

String Representation and Comparisons

Strings are a fundamental data type in programming, and their internal representation has a significant impact on performance, memory usage, and the behavior of comparisons. This article delves into the representation of strings in different programming languages and explains the mechanics of string comparison.

String Representation

In programming languages, such as Java and Python, strings are immutable. To optimize performance in string handling, techniques like string pools are used. Let’s explore this concept further.

String Pool

A string pool is a memory management technique that reduces redundancy and saves memory by reusing immutable string instances. Java is a well-known language that employs a string pool for string literals.

In Java, string literals are automatically “interned” and stored in a string pool managed by the JVM. When a string literal is created, the JVM checks the pool for an existing equivalent string:

  • If found, the existing reference is reused.
  • If not, a new string is added to the pool.

This ensures that identical string literals share the same memory location, reducing memory usage and enhancing performance.

Python also supports the concept of string interning, but unlike Java, it does not intern every string literal. Python supports string interning for certain strings, such as identifiers, small immutable strings, or strings composed of ASCII letters and numbers.

String Comparisons

Let’s take a closer look at how string comparisons work in Java and other languages.

Comparisons in Java

In this example, we compare three strings with the content “hello”. While the first comparison return true, the second does not. What’s happening here?

String s1 = "hello";
String s2 = "hello";
String s3 = new String("hello");

System.out.println(s1 == s2); // true
System.out.println(s1 == s3); // false

In Java, the == operator compares references, not content.

First Comparison (s1 == s2): Both s1 and s2 reference the same object in the string pool, so the comparison returns true.

Second Comparison (s1 == s3): s3 is created using new String(), which allocates a new object in heap memory. By default, this object is not added to the string pool, so the object reference is unequal and the comparison returns false.

You can explicitly add a string to the pool using the intern() method:

String s1 = "hello";
String s2 = new String("hello").intern();

System.out.println(s1 == s2); // true

To compare the content of strings in Java, use the equals() method:

String s1 = "hello";
String s2 = "hello";
String s3 = new String("hello");

System.out.println(s1.equals(s2)); // true
System.out.println(s1.equals(s3)); // true
Comparisons in Other Languages

Some languages, such as Python and JavaScript, use == to compare content, but this behavior may differ in other languages. Developers should always verify how string comparison operates in their specific programming language.

s1 = "hello"
s2 = "hello"
s3 = "".join(["h", "e", "l", "l", "o"])

print(s1 == s2)  # True
print(s1 == s3)  # True

print(s1 is s2)  # True
print(s1 is s3)  # False

In Python, the is operator is used to compare object references. In the example, s1 is s3 returns False because the join() method creates a new string object.

Conclusion

Different approaches to string representation reflect trade-offs between simplicity, performance, and memory efficiency. Each programming language implements string comparison differently, requiring developers to understand the specific behavior before relying on it. For example, some languages differentiate between reference and content comparison, while others abstract these details for simplicity. Languages like Rust, which lack a default string pool, emphasize explicit memory management through ownership and borrowing mechanisms. Languages with string pools (e.g., Java) prioritize runtime optimizations. Being aware of these nuances is essential for writing efficient, bug-free code and making informed design choices.

Why Java’s built-in hash functions are unsuitable for password hashing

Passwords are one of the most sensitive pieces of information handled by applications. Hashing them before storage ensures they remain protected, even if the database is compromised. However, not all hashing algorithms are designed for password security. Java’s built-in hashing mechanisms used e.g. by HashMap, are optimized for performance—not security.

In this post, we will explore the differences between general-purpose and cryptographic hash functions and explain why the latter should always be used for passwords.

Java’s built-in hashing algorithms

Java provides a hashCode() method for most objects, including strings, which is commonly used in data structures like HashMap and HashSet. For instance, the hashCode() implementation for String uses a simple algorithm:

public int hashCode() {
    int h = 0;
    for (int i = 0; i < value.length; i++) {
        h = 31 * h + value[i];
    }
    return h;
}

This method calculates a 32-bit integer hash by combining each character in the string with the multiplier 31. The goal is to produce hash values for efficient lookups.

This simplicity makes hashCode() extremely efficient for its primary use case—managing hash-based collections. Its deterministic nature ensures that identical inputs always produce the same hash, which is essential for consistent object comparisons. Additionally, it provides decent distribution across hash table buckets, minimizing performance bottlenecks caused by collisions.

However, the same features that make the functions ideal for collections are also its greatest weaknesses when applied to password security. Because it’s fast, an attacker could quickly compute the hash for any potential password and compare it to a leaked hash. Furthermore, it’s 32-bit output space is too small for secure applications and lead to frequent collisions. For example:

System.out.println("Aa".hashCode()); // 2112
System.out.println("BB".hashCode()); // 2112

The lack of randomness (such as salting) and security-focused features make hashCode() entirely unsuitable for protecting passwords. You can manually add a random value before passing the string into the hash algorithm, but the small output space and high speed still make it possible to generate a lookup table quickly. It was never designed to handle adversarial scenarios like brute-force attacks, where attackers attempt billions of guesses per second.

Cryptographic hash algorithms

Cryptographic hash functions serve a completely different purpose. They are designed to provide security in the face of adversarial attacks, ensuring that data integrity and confidentiality are maintained. Examples include general-purpose cryptographic hashes like SHA-256 and password-specific algorithms like bcrypt, PBKDF2, and Argon2.

They produce fixed-length outputs (e.g., 256 bits for SHA-256) and are engineered to be computationally infeasible to reverse. This makes them ideal for securing passwords and other sensitive data. In addition, some cryptographic password-hashing libraries, such as bcrypt, incorporate salting automatically—a technique where a random value is added to the password before hashing. This ensures that even identical passwords produce different hash values, thwarting attacks that rely on precomputed hashes (rainbow tables).

Another critical feature is key stretching, where the hashing process is deliberately slowed down by performing many iterations. For example, bcrypt and PBKDF2 allow developers to configure the number of iterations, making brute-force attacks significantly more expensive in terms of time and computational resources.

Conclusion

Java’s built-in hash functions, such as hashCode(), are designed for speed, efficiency, and consistent behavior in hash-based collections. They are fast, deterministic, and effective at spreading values evenly across buckets.

On the other hand, cryptographic hash algorithms are purpose-built for security. They prioritize irreversibility, randomness, and computational cost, all of which are essential for protecting passwords against modern attack vectors.

Java’s hashCode() is an excellent tool for managing hash-based collections, but it was never intended for the high-stakes realm of password security.