The Dimensions of Navigation in Eclipse

Following up on “The Dimensions of Navigation in Object-Oriented Code” this post explores how Eclipse, one of the most mature IDEs for Java development, supports navigating across different dimensions of code: hierarchy, behavior, validation and utilities.

Let’s walk through these dimensions and see how Eclipse helps us travel through code with precision.

1. Hierarchy Navigation

Hierarchy navigation reveals the structure of code through inheritance, interfaces and abstract classes.

  • Open Type Hierarchy (F4):
    Select a class or interface, then press F4. This opens a dedicated view that shows both the supertype and subtype hierarchies.
  • Quick Type Hierarchy (Ctrl + T):
    When your cursor is on a type (like a class, interface name), this shortcut brings up a popover showing where it fits in the hierarchy—without disrupting your current layout.
  • Open Implementation (Ctrl + T on method):
    Especially useful when dealing with interfaces or abstract methods, this shortcut lists all concrete implementations of the selected method.

2. Behavioral Navigation

Behavioral navigation tells you what methods call what, and how data flows through the application.

  • Open Declaration (F3 or Ctrl + Click):
    When your cursor is on a method call, pressing F3 or pressing Ctrl and click on the method jumps directly to its definition.
  • Call Hierarchy (Ctrl + Alt + H):
    This is a powerful tool that opens a tree view showing all callers and callees of a given method. You can expand both directions to get a full picture of where your method fits in the system’s behavior.
  • Search Usages in Project (Ctrl + Shift + G):
    Find where a method, field, or class is used across your entire project. This complements call hierarchy by offering a flat list of usages.

3. Validation Navigation

Validation navigation is the movement between your business logic and its corresponding tests. Eclipse doesn’t support this navigation out of the box. However, the MoreUnit plugin adds clickable icons next to classes and tests, allowing you to switch between them easily.

4. Utility Navigation

This is a collection of additional navigation features and productivity shortcuts.

  • Quick Outline (Ctrl + O):
    Pops up a quick structure view of the current class. Start typing a method name to jump straight to it.
  • Search in All Files (Ctrl + H):
    The search dialog allows you to search across projects, file types, or working sets.
  • Content Assist (Ctrl + Space):
    This is Eclipse’s autocomplete—offering method suggestions, parameter hints, and even auto-imports.
  • Generate Code (Alt + Shift + S):
    Use this to bring up the “Source” menu, which allows you to generate constructors, getters/setters, toString(), or even delegate methods.
  • Format Code (Ctrl + Shift + F):
    Helps you clean up messy files or align unfamiliar code to your formatting preferences.
  • Organize Imports (Ctrl + Shift + O):
    Automatically removes unused imports and adds any missing ones based on what’s used in the file.
  • Markers View (Window Show View Markers):
    Shows compiler warnings, TODOs, and FIXME comments—helps prioritize navigation through unfinished or problematic code.

Eclipse Navigation Cheat Sheet

ActionShortcut / Location
Open Type HierarchyF4
Quick Type HierarchyCtrl + T
Open ImplementationCtrl + T (on method)
Open DeclarationF3 or Ctrl + Click
Call HierarchyCtrl + Alt + H
Search UsagesCtrl + Shift + G
MoreUnit SwitchMoreUnit Plugin
Quick OutlineCtrl + O
Search in All FilesCtrl + H
Content AssistCtrl + Space
Generate CodeAlt + Shift + S
Format CodeCtrl + Shift + F
Organize ImportsCtrl + Shift + O
Markers ViewWindow → Show View → Markers

Save yourself from releasing garbage with Git Hooks

Times do happen, where one would write code that should please, please not land in the production release, like for example:

  • Overwriting a certain URL with a local one
  • Having a dialog always-open in order to efficiently style it
  • or generally, mocking code of something that is not-important-right-now™

And then we all already know that such shortcuts tend to stay in the code longer than intended. One might tell oneself:

Oh, I will mark this as // TODO: Remove ASAP

This is the feature branch, so either me or the Code Review will catch it when merging on main.

And this is better than nothing – some Git clients might disallow you from even committing code with any //TODO, but I feel that this is direct violation of the idea of “Commit Early, Commit Often”. As in, now you’re bound to finish your feature before you commit. This is the opposite of helping. By now you figure:

Thanks for nothing, I will just rename this as // TO_REMOVE

Rest of above logic applies untampered with.

These keyword-in-comments are sometimes called Code Tags (e.g. here), and here we just worked around the point that //TODO is more commonly used than other tags, but of course, these still are comments of no specific meaning.

You might now be able to push again, but still – say, the Code Reviewer does the heinous mistake of trusting you too much – this might lead to code in the official release that behaves so silly that it just makes every customer question your mental sanity, or worse.

Git Hooks can help you with appearing more sane than you are.

These are bash scripts in your specific repository instance (i.e. your local clone, or the server copy, individually) to run at specific points in the Git workflow.

For our particular use case, two hooks are of interest:

  • A pre-receive hook run on the server-side repository instance that can prevent the main branch from receiving your dumb development code. This sounds more rigid, but you need access to the server hosting the (bare) repository.
  • A pre-push hook run on your local repository instance that can prevent you from pushing your toxic waste to the branch in question. Keep in mind that if you do your merging unto main via GitLab Merge Requests etc. that this hook will not run then – but implementing it is easier because you already have all the access.

The local pre-push hook is as simple as adding a file named pre-push in the .git/hooks subfolder. It needs to be executable – (which, under Windows, you can use e.g. Git Bash for.)

cd $repositoryPath/.git/hooks
touch pre-push
chmod +x pre-push

And it contains:

#!/bin/sh
 
while read localRef localHash remoteRef remoteHash; do
    if [[ "$remoteRef" == "refs/heads/main" ]]; then
        for commit in $(git rev-list $remoteHash..$localHash); do
            if git grep -n "// REMOVE_ME" $commit; then
                echo "REJECTED: Commit contains REMOVE_ME tag!"
                exit 1
            fi
        done
    fi
done

This already will then lead git push to fail with output like

2f5da72ae9fd85bb5d64c03171c9a8f248b4865f:src/DevelopmentStuff.js:65:        // REMOVE_ME: temporary override for database URL
REJECTED: Commit contains REMOVE_ME tag!
failed to push some refs to '...'

and if that REMOVE_ME is removed, the push goes through.

Some comments:

  • You can easily extend this to multiple branches with the regex condition:
    if [[ "$remoteRef" =~ /(main|release|whatever)$ ]];
  • The git grep -n flag is there for printing out the offending line number.
  • You can make this more convenient with git grep -En "//\s*REMOVE_ME", i.e. allowing an arbitrary number of whitespace between the // and the tag.
  • The surrounding loop structure:
    while read localRef localHash remoteRef remoteHash;
    is exactly matching the way git is processing this pre-push hook. Each hook has a while read <arg list> structure, but the specific arguments depend on the actual hook type.

Hope this can help you taking some extra care!

However, remember that for these local hooks, every developer has to setup them for themselves; they are not pushed to the server instance – but implementing a pre-receive hook there is the topic for a future blog post.

The Estranged Child

One code construct I encounter every now and then is what my colleague appropriately dubbed “the extranged child”, after I said that I do not have a proper name for it. It happens in OOP when a parent object creates child objects, and then later needs to interact with that child:

class Child { /* ... */ }

class Parent
{
  Child Create() { /* ... */ }
  void InteractWith(Child c) { /* ... */ }
}

This is all good, but as soon as inheritance enters the picture, it becomes more complicated:

abstract class BaseChild { /* ... */ }
abstract class BaseParent
{
  public abstract BaseChild Create();
  public abstract void InteractWith(BaseChild child);
}

class RealChild : BaseChild { }
class RealParent : BaseParent
{
  public override BaseChild Create()
  {
    return new RealChild( /* ... */ );
  }

  public override void InteractWith(BaseChild child)
  {
    // We really want RealChild here...
    var realChild = child as RealChild;
  }
}

The interaction often needs details that only the child type associated with that specific parent type has, so that involves a smelly downcast at that point.

Just looking at the type system, this now violates the Liskov-Substitution-Principle also known as the L from SOLID.

One possible solution is adding a precondition for the InteractWith function. Something along the lines of “May only be called with own children”. That works, but cannot be checked by a compiler.

Another solution is to move the InteractWith function into the child, because at the point when it is created, it can know its real parent. That may not be the natural place for the function to go. Also, it requires the child to keep a reference to its parent, effectively making it a compound. But this approach can usually be done, as long as the relation of ‘valid’ child/parent types is one to one.

If you have a parent object that can create different kinds of children that it later needs to interact with, that approach is usually doomed as well. E.g. let the parent be a graphics API like OpenGL or DirectX, and the children be the resources created, like textures or buffers. For drawing, both are required later. At that point, really only the precondition approach works.

On the implementation side, the “ugly” casts remain. Stand-ins for the children can be used and associated with the data via dictionaries, hash-tables or any other lookup. This approach is often coupled with using (possibly strongly typed) IDs as the stand-ins. However, that really only replaces the downcast with a lookup, and it will also fail if the precondition is not satisfied.

Have you encountered this pattern before? Do you have a different name for it? Any thoughts on designing clean APIs that have such a parent-child relationship in a hierarchy? Let me know!

PostgreSQL’s Foreign Data Wrappers

When people think of SQL and PostgreSQL, they usually picture databases full of rows and columns – clean, organized, and stored on a persistent volume like a disk. But what if your data isn’t in a database at all? What if it lives on the web, behind a REST API, in JSON format?

Usually, you’d write a script. You’d use Python or JavaScript, send a request to the API, parse the JSON, and maybe insert the results into your database for analysis. But PostgreSQL has a shortcut that many people don’t know about: Foreign Data Wrappers.

Just like FUSE (Filesystem in Userspace) lets you mount cloud drives or remote folders as if they were local, Foreign Data Wrappers lets PostgreSQL mount external resources like other database management systems, files, or web APIs and treat them like SQL tables.

Example

In this article we’ll use http_fdw as an example. With it, you can run a SQL query against a URL and read the result – no extra code, no data pipeline, just SQL. Here’s how you could set that up.

Let’s say you want to explore data from JSONPlaceholder, a free fake API for testing. It has a /posts endpoint that returns a list of blog posts in JSON format.

First, make sure the extension is installed (this may require building from source, depending on your system):

CREATE EXTENSION http_fdw;

Now create a foreign server that points to the API:

CREATE SERVER json_api_server
  FOREIGN DATA WRAPPER http_fdw
  OPTIONS (uri 'https://jsonplaceholder.typicode.com/posts', format 'json');

Then define a foreign table that maps to the shape of the JSON response:

CREATE FOREIGN TABLE api_posts (
  userId integer,
  id     integer,
  title  text,
  body   text
)
SERVER json_api_server
OPTIONS (rowpath '');

Since the API returns an array of posts, and each one is an object with userId, id, title, and body, this table matches that structure.

Now you can query it:

SELECT title FROM api_posts WHERE userId = 1;

Behind the scenes, PostgreSQL sends a GET request to the API, parses the JSON, and returns the result like any other table.

Things to Keep in Mind

Not everything is perfect. http_fdw is read-only, so you can’t use it to send data back to the API. It also relies on the API being available and responsive, and it doesn’t support authentication out of the box – you’ll need to handle that with custom headers if the API requires it. Complex or deeply nested JSON might also require some extra configuration.

But for many use cases, it’s an interesting option to work with external data. You don’t have to leave SQL. You don’t have to wire up a data pipeline. You just run a query.

Dockerized toolchain in CLion with Conan

In the olden times it was relatively hard to develop C++ projects cross-platform. You had to deal with cross-compiling, different compilers and versions of them, implementation-defined and unspecified behaviour, build system issues and lacking dependency management.

Recent compilers mitigated many problems and tools like CMAKE and Conan really helped with build issues and dependencies. Nevertheless, C++ – its compilers and their output – is platform-dependent and thus platform differences still exist and shine through. But with the advent of containerization and many development tools supporting “dev containers” or “dockerized toolchains” this also became much easier and feasible.

CLion’s dockerized toolchain

CLions dockerized toolchain is really easy to setup. After that you can build and run your application on the platform of your choice, e.g. a Debian container running your IDE on a Windows machine. This is all fine and easy for a simple CMake-based hello-world project.

In real projects there are some pitfalls and additional steps to do to make it work seamlessly with Conan as your dependency manager.

Expanding the simple case to dependencies with Conan

First of all your container needs to be able to run Conan. Here is a simple Dockerfile that helps getting started:

FROM debian:bookworm AS toolchain

RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y dist-upgrade
RUN DEBIAN_FRONTEND=noninteractive apt-get update && apt-get -y install \
  cmake \
  debhelper \
  ninja-build \
  pipx \
  git \
  lsb-release \
  python3 \
  python3-dev \
  pkg-config \
  gdb

RUN PIPX_BIN_DIR=/usr/local/bin pipx install conan

# This allows us to automatically use the conan context from our dockerized toolchain that we populated on the development
# machine
ENV CONAN_HOME=/tmp/my-project/.conan2

The important thing here is that you set CONAN_HOME to the directory CLion uses to mount your project. By default CLion will mount your project root directory to /tmp/<project_name> into the dev container.

If you have some dependencies you need to build/export yourself, you have to run the conan commands using your container image with a bind mount so that the conan artifacts reside on your host machine because CLion tends to use short-lived containers for the toolchain. So we create a build_dependencies.sh script

#!/usr/bin/env bash

# Clone and export conan-omniorb
rm -rf conan-omniorb
git clone -b conan2/4.2.3 https://github.com/softwareschneiderei/conan-omniorb.git
conan export ./conan-omniorb

# Clone and export conan-cpptango
rm -rf conan-cpptango
git clone -b conan2/9.3.6 https://github.com/softwareschneiderei/conan-cpptango.git
conan export ./conan-cpptango

and put it in the docker command:

CMD chmod +x build_environment.sh && ./build_environment.sh

Now we can run the container to setup our Conan context once using a bind-mount like -v C:\repositories\my-project\.conan2:/tmp/my-project/.conan2.

If you have done everything correctly, especially the correct locations for the Conan artifacts you can use your dockerized toolchain and develop transparently regardless of host and target platforms.

I hope this helps someone fighting multiplatform development using C++ and Conan with CLion.

Revisiting the bus factor concept

The concept of a “bus factor” is both grim and very useful to manage project risks. It originates from the area of project management and is sometimes called a “truck number” or (to give it a more positive spin) the “lottery factor”.

It tries to pinpoint the number of people in a project that can drop out suddenly and unplanned without the project success being jeopardized. The “bus” or “truck” is conceptually used as the tool to enforce the drop out. The big lottery win might produce the same outcome, but with less implacability.

The sole number of a bus factor is often helpful to make lurking project risks visible. Especially a bus factor of 1, the most nerve-wrecking number, should be avoided. It means that the project success is directly coupled to the health (or gambling luck) of one specific person.

But even a higher bus factor, lets say 3, is no complete relief. What if those three people hop into the same car to meet the customer in a project meeting and have an accident? The only way to mitigate those “cluster risks” is to plan separate routes and means of travel. Most people would regard those measures as “overly paranoid” and it robs the three people from communicating directly before and after the meeting.

You can explore the individual project risk with more sophisticated tools than just a number. Setting up and filling out a RACI matrix (or one of its many variants) is a good way to make things visible.

But in this blog post, I want to highlight another detail of the bus factor that I learned the hard way: The “bus factor risk” of different people can vary a lot. The “bus factor risk” is the individual probability that the bus factor occurs.

Let’s have an example with the lottery: Your project has two key players that keep the project afloat. One of them never fills out a lottery ticket, the other plays regularly. Their “lottery factor risk percentage” is not equal. Given the low probability to win the lottery, the percentage doesn’t change much, but it changes.

Now imagine one person that often pursues high risk spare time activities. I don’t want to single out one specific activity, but think about free-climbing maybe. The other person stays mostly at home and cooks delicious meals without using sharp knives or hot water. Ok, this comparison sounds a bit contrived, but you get the message:

Two projects with a bus factor of 2 each can vary a lot in the actual risk percentage, because all 4 people have their individual drop out percentage.

It doesn’t have to be spare time activities, by the way. Every person has an individual health risk that can only be improved to a certain degree. Every person simply has “luck” or “misfortune” and can’t do anything about it.

My message is simply that the bus factor number 2 might not be “half the risk” than 1. Or even that two bus factor numbers with the same value denote equal risk.

I don’t think that it is useful to try to quantify the individual “bus factor risk”of a person. Way too many factors come into play and most of them should not be the employer’s concern (like a medical history or spare time activities). What might be useful is to be aware that equal numbers don’t equate equal actual risk.

The Dimensions of Navigation in Object-Oriented Code

One powerful aspects of modern software development is how we move through our code. In object-oriented programming (OOP), understanding relationships between classes, interfaces, methods, and tests is important. But it is not just about reading code; it is about navigating it effectively.

This article explores the key movement dimensions that help developers work efficiently within OOP codebases. These dimensions are not specific to any tool but reflect the conceptual paths developers regularly take to understand and evolve code.

1. Hierarchy Navigation: From Parent to Subtype and Back

In object-oriented systems, inheritance and interfaces create hierarchies. One essential navigation dimension allows us to move upward to a superclass or interface, and downward to a subclass or implementing class.

This dimension is valuable because:

  • Moving up let us understand general contracts or abstract logic that governs behavior across many classes.
  • Moving down help us see specific implementations and how abstract behavior is concretely realized.

This help us maintain a clear overview of where we are within the hierarchy.

2. Behavioral Navigation: From Calls to Definitions and Back

Another important movement is between where methods are defined and where they are used. This is less about structure and more about behavior—how the system flows during execution.

Understanding this movement helps developers:

  • Trace logic through the system from the point of use to its implementation.
  • Identify which parts of the system rely on a particular method or class.
  • Assess how a change to a method might ripple through the codebase.

This navigation is useful when debugging, refactoring, or working in unfamiliar code.

3. Validation Navigation: Between Code and its Tests

Writing automated tests is a fundamental part of software development. Tests are more than just safety nets—they also serve as valuable guides for understanding and verifying how code is intended to behave. Navigating between a class and its corresponding test forms another important dimension.

This movement enables developers to:

  • Quickly validate behavior after making changes.
  • Understand how a class is intended to be used by seeing how it is tested.
  • Improve or add new tests based on recent changes.

Tight integration between code and test supports confident and iterative development, especially in test-driven workflows.

4. Utility Navigation: Supporting Movements that Boost Productivity

Beyond the main three dimensions, there are several supporting movements that contribute to developer efficiency:

  • Searching across the codebase to find any occurrence of a class, method, or term.
  • Generating boilerplate code, like constructors or property accessors, to reduce repetitive work.
  • Code formatting and cleanup, which helps maintain consistency and readability.
  • Autocompletion, which reduces cognitive load and accelerates writing.

These actions do not directly reflect code relationships but enhance how smoothly we can move within and around the code, keeping us focused on solving problems rather than managing structure.

Conclusion: Movement is Understanding

In object-oriented systems, navigating through your codebase along different dimensions provides essential insight for understanding, debugging, and improving your software.

Mastering these dimensions transforms your workflow from reactive to intuitive, allowing you to see code not just as static text, but as a living system you can navigate, shape, and grow.

In an upcoming post, I will take the movement dimensions discussed here and show how they are practically supported in IDEs like Eclipse and IntelliJ IDEA.

Customizing Vite plugins is as quick as Vite itself

Once in a blue moon or so, it might be worthwhile to look at the less frequently used features of the Vite bundler, just to know how your life could be made easier when writing web applications.

And there are real use cases to think about custom vite plugins, e.g.

  1. wanting to treat SVGs as <svg> code or React Component, not just as a <img> resource – so maye your customer can easily swap them out as separate files. That is a solved problem with existing plugins like vite-plugin-svgr or vite-svg-loader, but… once in a ultra-violet moon or so… even the existing plugins might not suffice.
  2. For teaching a few lectures about WebGL / GLSL shader programming, I wanted to readily show the results of changes in a fragment shader on the fly. That is the case in point:

I figured I could just use Vite’s Hot Module Replacement to reload the UI after changing the shader. There could have been alternatives like

  • Copying pieces of GLSL into my JS code as strings
    – which is cumbersome,
  • Using the async import() syntax of JS
    – which is then async, obviously,
  • Employing a working editor component on the UI like Ace / React-Ace
    – which is nice when it works, but is so far off my actual quest, and I guess commonplace IDEs are still more convenient for editing GLSL

I wanted the changes to be quick (like, pair-programming-quick), and Vite’s name exactly means that, and their HMR aptly so. It also gives you the option of raw string assets (import Stuff from "./stuff.txt?raw";) which is ok, but I wanted a bit of prettification to be done automatically. I found vite-plugin-glsl, but I needed it customized because I wanted to always combine multiple blank lines to a single one and this is how easy it was:

  • ./plugin/glslImport.js
    Note: this is executed by the vite dev server, not our JS app itself.
import glsl from "vite-plugin-glsl";

const originalPlugin = glsl({compress: false});

const glslImport = () => ({
    ...originalPlugin,
    enforce: "pre",
    name: "vite-plugin-glsl-custom",
    transform: async (src, id) => {
        const original = await originalPlugin.transform(src, id);
        if (!original) { // not a shader source
            return;
        }
        // custom transformation as described above:
        const code = original.code
            .replace(/\\r\\n/g, "\\n")
            .replace(/\\n(\\n|\\r|\s)*?(?=\s*\w)/g, "\\n$1");
        return {...original, code};
    },
});

export default glslImport;
  • and then the vite.config.js is simply
import { defineConfig } from 'vite';
import glslImport from './plugin/glslImport.js';

export default defineConfig({
    plugins: [
        glslImport()
    ],
    ...
})

I liked that. It kept me from transforming either the raw import or that of the original plugin in each of the 20 different files, and I could easily fiddle around in my filesystem while my students only saw the somewhat-cleaned-up shader code.

So if you would ever have some non-typical-JS files that need some transformation but are e.g. too many or too volatile to be cultivated in their respective source format, that is a nice tool to know. That is as easily-pluggable-into-other-projects as a plugin should be.

Calculating the Number of Segments for Accurate Circle Rendering

A common way to draw circles with any kind of vector graphics API is by approximating it with a regular polygon, e.g. as a regular polygon with 32 sides. The problem with this approach is that it might look good in one resolution, but crude in another, as the approximation becomes more visible. So how do you pick the right number of sides N for the job? For that, let’s look at the error that this approximation has.

A whole bunch of math

I define the ‘error’ of the approximation as the maximum difference between the ideal circle shape and the approximation. In other words, it’s the difference of the inner radius and the outer radius of the regular polygon. Conveniently, with a step angle \alpha=\frac{2\pi}{N} the inner radius is just the outer radius r multiplied by the cosine of half of that: r\times\cos\frac{\alpha}{2}. So the error is r-r\times\cos\frac{\pi}{N}. I find it convenient to use relative error \epsilon for the following, and set r=1:

\epsilon=1-cos\frac{\pi}{N}

The following plot shows that value for N going from 4 to 256:

Plot showing the relative error numbers of subdivision

As you can see, this looks hyperbolic and the error falls off rather fast with an increasing number of subdivisions. This function lets use figure out the error for a given number of subdivisions, but what we really want is he inverse of that: Which number of subdivisions do we need for the error to be less than a given value. For example, assuming a 1080p screen, and a half-pixel error on a full-size (r=540) circle, that means we should aim for a relative error of 0.1\%. So we can solve the error equation above for N. Since the number of subdivisions should be an integer, we round it up:

N=\lceil\frac{\pi}{\arccos{(1-\epsilon)}}\rceil

So for 0.1\% we need only 71 divisions. The following plot shows the number of subdivisions for error values from 0.01\% to 1\%:

Here are some specific values:

\epsilonN
0.01%223
0.1%71
0.2%50
0.4%36
0.6%29
0.8%25
1.0%23

Assuming a fixed half-pixel error, we can plug in \epsilon=\frac{0.5}{radius} to get:

N=\lceil\frac{\pi}{\arccos{(1-\frac{0.5}{radius})}}\rceil

The following graph shows that function for radii up to full-size QHD circles:

Give me code

Here’s the corresponding code in C++, if you just want to figure out the number of segments for a given radius:

std::size_t segments_for(float radius, float pixel_error = 0.5f)
{
  auto d = std::acos(1.f - pixel_error / radius);
  return static_cast<std::size_t>(std::ceil(std::numbers::pi / d));
}

What are GIN Indexes in PostgreSQL?

If you’ve worked with PostgreSQL and dealt with things like full-text search, arrays, or JSON data, you might have heard about GIN indexes. But what exactly are they, and why are they useful?

GIN stands for Generalized Inverted Index. Most indexes (like the default B-tree index) work best when there’s one clear value per row – like a number or a name. But sometimes, a single column can hold many values. Think of a column that stores a list of tags, words in a document, or key-value data in JSON. That’s where GIN comes in.

Let’s walk through a few examples to see how GIN indexes work and why they’re helpful.

Full-text search example

Suppose you have a table of articles:

CREATE TABLE articles (
  id    serial PRIMARY KEY,
  title text,
  body  text
);

You want to let users search the content of these articles. PostgreSQL has built-in support for full-text search, which works with a special data type called tsvector. To get started, you’d add a column to store this processed version of your article text:

ALTER TABLE articles ADD COLUMN tsv tsvector;
UPDATE articles SET tsv = to_tsvector('english', body);

Now, to speed up searches, you create a GIN index:

CREATE INDEX idx_articles_tsv ON articles USING GIN(tsv);

With that in place, you can search for articles quickly:

SELECT * FROM articles WHERE tsv @@ to_tsquery('tonic & water');

This finds all articles that contain both “tonic” and “water”, and thanks to the GIN index, it’s fast – even if you have thousands of articles.

Array example

GIN is also great for columns that store arrays. Let’s say you have a table of photos, and each photo can have several tags:

CREATE TABLE photos (
  id   serial PRIMARY KEY,
  tags text[]
);

You want to find all photos tagged with “capybara”. You can create a GIN index on the tags column:

CREATE INDEX idx_photos_tags ON photos USING GIN(tags);
SELECT * FROM photos WHERE tags @> ARRAY['capybara'];

(The @> operator means “contains” or “is a superset of”.)

The index lets PostgreSQL find matching rows quickly, without scanning the entire table.

JSONB example

PostgreSQL’s jsonb type lets you store flexible key-value data. Imagine a table of users with extra info stored in a jsonb column:

CREATE TABLE users (
  id   serial PRIMARY KEY,
  data jsonb
);

One row might store {"age": 42, "city": "Karlsruhe"}. To find all users from New York, you can use:

SELECT * FROM users WHERE data @> '{"city": "Karlsruhe"}';

And again, with a GIN index on the data column, this query becomes much faster:

CREATE INDEX idx_users_data ON users USING GIN(data);

Things to keep in mind

GIN indexes are very powerful, but they come with some tradeoffs. They’re slower to build and can make insert or update operations a bit heavier. So they’re best when you read (search) data often, but don’t write to the table constantly.

In short, GIN indexes are your friend when you’re dealing with columns that contain multiple values – like arrays, full-text data, or JSON. They let PostgreSQL break apart those values and build a fast lookup system. If your queries feel slow and you’re working with these kinds of columns, adding a GIN index might be exactly what you need.