Avoiding Code Style Discussions

Every developer has personal formatting preferences.
Brace placement, line wrapping, imports, tabs vs. spaces — everybody has an opinion, and most of them are reasonable.

The problem starts when all these styles meet in one repository.

The cost of “personal style”

A codebase written by ten developers can easily look like ten different applications stitched together. Suddenly, pull requests are full of formatting changes. Git diffs become noisy. Merge conflicts appear because one developer reformatted a file differently than another. Code reviews drift into discussions about whitespaces instead of actual functionality.

Even worse: inconsistent code slows down reading.

Humans recognize patterns quickly. When code follows the same visual structure everywhere, the brain spends less effort parsing syntax and more effort understanding intent.

Consistent formatting reduces cognitive load.

A shared style is less about aesthetics and more about reducing friction. But how to solve this problem?

Shared Project Style

In IDEs like IntelliJ, you can define a code style and automatically reformat code according to those rules. This helps you keep your own code consistent. However, if every developer uses a different style, it does not help the project as a whole.

You can configure the style under:

Settings -> Editor -> Code Style

and save it as a project-level configuration. IntelliJ will then create a codeStyles folder with XML files inside the .idea directory.

The solution for sharing one configuration across the whole project is to commit these files to Git. This way, every developer working on the project uses the same code style configuration.

The IDE can then help enforce the agreed style by reformatting code before commit or even automatically on save.


Consistency beats preference

The important thing is not finding the perfect style. The important thing is agreeing on one.

A consistent codebase is easier to read, easier to review, and easier to maintain. Pull requests become smaller and cleaner because they contain actual changes instead of formatting noise.

Good formatting should be boring and automatic. That leaves more time for discussions that actually matter.

Splitting a repository while preserving history

Monorepos or “collection repositories” tend to grow over time. At some point, a part of them deserves its own life: independent deployments, a dedicated team, or separate release cycles.

The tricky part is obvious: How do you split out a subproject without losing its Git history?

The answer is a powerful tool called git-filter-repo.

Step 1: Clone the Repository into a New Directory

Do not work in your existing checkout.
Instead, clone the repository into a fresh directory by running the following commands in Git Bash:

git clone ssh://git@github.com/project/Collection.git
cd Collection

We avoid working directly on origin and create a temporary branch:

git checkout -b split

This provides a safety net while rewriting history.

Step 2: Filter the Repository

Now comes the crucial step. Using git-filter-repo, we keep only the desired path and move it to the repository root.

python -m git_filter_repo \
  --path path/to/my/subproject/ \
  --path-rename path/to/my/subproject/: \
  --force

  • --path defines what should remain
  • --path-rename moves the directory to the repository root
  • --force is required because history is rewritten

After this step, the repository contains only the former subdirectory — with its full Git history intact.

Step 3: Push to new repository

Now point the repository to its new remote location:

git remote add origin ssh://git@github.com/project/NewProject.git

If an origin already exists, remove it first:

git remote remove origin

Rename the working branch to main:

git branch -m split main

Finally, push the rewritten history to the new repository:

git push -u origin main

That’s it — the new repository is ready, complete with a clean and meaningful history.

Conclusion

git-filter-repo makes it possible to split repositories precisely. Instead of copying files and losing context, you preserve history — which is invaluable for git blame, audits, and understanding how the code evolved.

When refactoring at repository level, history is not baggage. It’s documentation.

Happy splitting!

Save yourself from releasing garbage with Git Hooks

Times do happen, where one would write code that should please, please not land in the production release, like for example:

  • Overwriting a certain URL with a local one
  • Having a dialog always-open in order to efficiently style it
  • or generally, mocking code of something that is not-important-right-now™

And then we all already know that such shortcuts tend to stay in the code longer than intended. One might tell oneself:

Oh, I will mark this as // TODO: Remove ASAP

This is the feature branch, so either me or the Code Review will catch it when merging on main.

And this is better than nothing – some Git clients might disallow you from even committing code with any //TODO, but I feel that this is direct violation of the idea of “Commit Early, Commit Often”. As in, now you’re bound to finish your feature before you commit. This is the opposite of helping. By now you figure:

Thanks for nothing, I will just rename this as // TO_REMOVE

Rest of above logic applies untampered with.

These keyword-in-comments are sometimes called Code Tags (e.g. here), and here we just worked around the point that //TODO is more commonly used than other tags, but of course, these still are comments of no specific meaning.

You might now be able to push again, but still – say, the Code Reviewer does the heinous mistake of trusting you too much – this might lead to code in the official release that behaves so silly that it just makes every customer question your mental sanity, or worse.

Git Hooks can help you with appearing more sane than you are.

These are bash scripts in your specific repository instance (i.e. your local clone, or the server copy, individually) to run at specific points in the Git workflow.

For our particular use case, two hooks are of interest:

  • A pre-receive hook run on the server-side repository instance that can prevent the main branch from receiving your dumb development code. This sounds more rigid, but you need access to the server hosting the (bare) repository.
  • A pre-push hook run on your local repository instance that can prevent you from pushing your toxic waste to the branch in question. Keep in mind that if you do your merging unto main via GitLab Merge Requests etc. that this hook will not run then – but implementing it is easier because you already have all the access.

The local pre-push hook is as simple as adding a file named pre-push in the .git/hooks subfolder. It needs to be executable – (which, under Windows, you can use e.g. Git Bash for.)

cd $repositoryPath/.git/hooks
touch pre-push
chmod +x pre-push

And it contains:

#!/bin/sh
 
while read localRef localHash remoteRef remoteHash; do
    if [[ "$remoteRef" == "refs/heads/main" ]]; then
        for commit in $(git rev-list $remoteHash..$localHash); do
            if git grep -n "// REMOVE_ME" $commit; then
                echo "REJECTED: Commit contains REMOVE_ME tag!"
                exit 1
            fi
        done
    fi
done

This already will then lead git push to fail with output like

2f5da72ae9fd85bb5d64c03171c9a8f248b4865f:src/DevelopmentStuff.js:65:        // REMOVE_ME: temporary override for database URL
REJECTED: Commit contains REMOVE_ME tag!
failed to push some refs to '...'

and if that REMOVE_ME is removed, the push goes through.

Some comments:

  • You can easily extend this to multiple branches with the regex condition:
    if [[ "$remoteRef" =~ /(main|release|whatever)$ ]];
  • The git grep -n flag is there for printing out the offending line number.
  • You can make this more convenient with git grep -En "//\s*REMOVE_ME", i.e. allowing an arbitrary number of whitespace between the // and the tag.
  • The surrounding loop structure:
    while read localRef localHash remoteRef remoteHash;
    is exactly matching the way git is processing this pre-push hook. Each hook has a while read <arg list> structure, but the specific arguments depend on the actual hook type.

Hope this can help you taking some extra care!

However, remember that for these local hooks, every developer has to setup them for themselves; they are not pushed to the server instance – but implementing a pre-receive hook there is the topic for a future blog post.

Forking an Open Source Repository in Good Faith

One might love Open Source for different reasons: Maybe as a philosophical concept of transcendental sharing and human progress, maybe for reasons of transparency and security, maybe for the sole reason of getting stuff for free…

But, as a developer, Open Source is additionally appealing for the sake of actively participating, learning and sharing on a directly personal level.

Now I would guess that most repository forks are probably done for rather practical reasons (“I wanna have that!”), the forks get some minor patches one happens to need right now – or for some super-specific use case – and then hang around for some time until that use case vanishes or the changes are so vast that there will never be a merge (a situation commonly known as “der Zug ist abgefahren”), one might sometimes try to supply one’s work for the good of more than oneself. That is what I hereby declare a “Fork in Good Faith.”

A fork can happen in good faith if some conditions are true, like:

  • I am sure that someone else can benefit from my work
  • My technical skills match the technical level of the repository in question
  • Said upstream repository is even open for contributions (i.e. not understaffed)
  • My broader vision does not diverge from the original maintainers’ vision

Maybe there are more of these, but the most essential point is a mindset:

  • I declare to myself that I want to stay compatible with the upstream as long as is possible from both sides.

To fork with Good Faith is, then, a great idea because it helps to advance much more causes at once than just the stuff for free, i.e. on a developmental level:

  1. You learn from the existing code, i.e. the language, coding style, design patterns, specific solutions, algorithms, hidden gems, …
  2. You learn from the existing repository, i.e. how commits and branches are organized, how commit messages are used productively, how to manage patches or changes in general, …
  3. In reverse, the original maintainers might learn from you, or at least future contributors might
  4. You might get more people to actually see / try / use your awesome feature, thus getting more feedback or bug reports than brewing your own soup
  5. You might consider it as a workout of professional confidence, to advocate your use cases or implementation decisions against other developers, training to focus on rational principles and unlearning the reflexes of your ego.
  6. This can also serve as a workout in mental fluidity, by changing between different coding styles or conventions – if you are e.g. used of your super-perfect-one-and-only way of doing things, it might just positively blow your mind to see that other conventions can work too, if done properly.
  7. Having someone to actually review your changes in a public pull request (merge request) gives you feedback also on an organisational level, as in “was all of this part actually important for your feature?”, “can you put that into a future pull request?” or “why did you rewrite all comments for some Paleo-Siberian language??”

Not to forget, you might grow your personal or professional network to some degree, or at least get the occasional thank you from anyone (well…).

But the basic point of this post is this:

Maintaining a Fork in Good Faith is active, continuous work.

And there is no shame in abandoning that claim, but if you do once, there might be no easy return.

Just think about the pure sadness of features that are sometimes replicated over-and-over again, or get lost over the time;

And just think about how confusing or annoying that already could have been for yourself, e.g. with some multiply-forked npm package or maybe full-fledged end-user projects (… how many forks of e.g. WLED do even exist?).

This is just some reflection of how careful such a decision should be done. Of course, I am writing this because I recently became aware of that point of bifurcation, i.e. not the point where a repository is forked, but the one where all of the advantages mentioned above are weighed against real downsides.

And these might be legitimate, and numerous, too. Just to name a few,

  1. Maybe the existing conventions are just not “done properly”, and following them for the sake of uniformity makes you unproductive over time?
  2. Maybe the original maintainers are just understaffed, non-responsive or do not adhere to a style of communication that works with you?
  3. Maybe most discussions are really just debates of varying opinion (publicly, over the internet – that usually works!) and not vehicles of transcending the personal boundaries of human knowledge after all?
  4. Maybe you are stuck with sub-par legacy code, unable to boy-scout away some technical debt because “that is not the point right now”, or maybe every other day some upstream commit flushes in more freshly baked legacy code?
  5. Maybe no one understands your use case and contrary to the idea mentioned above – in order to get appropriate feedback about your features, and to prove its worth, you need to distribute this independently?
  6. Maybe at one point the maintainers of an upstream repository change, and from now on you have to name your variables in some Paleo-Siberian language?

I guess you get the point by now. There is much energy to be saved by never considering upstream compatibility in the first place, but there is also much potential to be wasted. I have no clear answer – yet – how to draw the line, but maybe you have some insight on that topic, too.

Are there any examples of forks that live on their own, still with the occasional cherry-pick, rebase, merge? Not one comes to my mind.

Beware of using Git LFS on Github

In my private game programming projects, I am often using data files alongside my code for all kinds of game assets like images and sounds. So I thought it might be a good idea to use the Git Large File Storage (=LFS) extension for that.

What is Git LFS?

Essentially, if you’re not using it, the file will be in your local .git folder if it was part of your repository at any time in your history. E.g. if you accidentally added&committed a 800mb video files and then deleted it again, they will still be in your local .git folder. This problem multiplies when using a CI with many branches: each branch will typically have a copy of all files ever used in your repository. This is not a problem with source code files, because they are not that big and they can be compressed really well with different versions of themselves, which is what git typically does.

With Git LFS, the big files are only stored as references in the .git folder. This means that you might need an additional request to your remote when checking them out again, but it will save you lots space and traffic when cloning repositories.

In my previous projects on github, I just did not enable LFS for my assets. And that worked fine, as my assets are usually pretty small and I don’t change them often. But this time I wanted to try it.

Sorry, Github, what?

Imagine my suprise when I got an e-mail from github last month warning me that my LFS traffic quota is almost reached and I have to pay to extend it. What? I never had and traffic quota problems without LFS. Github doesn’t even seem to have one, if I just keep my big files in ‘pure’ git. So that’s what I get for trying to safe Github traffic.

Now the LFS quota is a meager 1 gb per month with Github Pro. That’s nothing. Luckily, my current project is not asset heavy: the full repo is very small at ~60mb. But still the quota was reached with me as a single developer. How did that happen? I just enabled CI for my project on my home server and I was creating lots of branches my CI wanted to build. That’s only 12 branches cloned for the 80% warning to be reached.

Workarounds

Jenkins, which I’m using as a CI tool, has the ability to use a ‘reference repository’ when cloning. This can be used to get the bulk of the data from a local remote, while getting the rest from Github. This is what I’m now using to avoid excess LFS traffic. It is a bit of a pain to set up: you have to manually maintain this reference repository, Jenkins will not do it for you, and you have to do that on each agent. I only have one at this point, so that’s an okay trade-off. But next time, Isure won’t use Git LFS on Github, if I can avoid it.

Git revert may not be what you want!

Git is a powerful version control system (VCS) for tracking changes in a source code base as most developers will know by now… It is also known for its quirky command line interface and previously less known concepts like staging area, cherry-picking and rebasing.

This can make Git quite intimidating and there are many memes about how confusing and hard it is to work with.

Especially for people coming from Subversion (SVN) there are a few changes and pitfalls because some commands have the same name but either do different things in the context or altogether, e.g. commit, checkout and revert.

Commit in the SVN world perform synchronization with the remote repository whereas a commit in git is local and has to synchronized with the remote repository using push. Checkout is similar and maybe even worse: In SVN it fetches the remote repository state and puts it in your working copy. In Git is used to replace files in your working copy with contents from the local repository or formerly to switch and/or create local branches…

But now on to our main topic:

What does git revert actually do?

Revert is maybe the most misleading command existing in both SVN and Git.

In SVN it is quite simple: Undo all local edits (see svnbook). It throws away you uncommited changes and causes potential data loss.

Git’s revert works in a completely different way! It creates “inverse” commit(s) in your (local) repository. Your working copy has to be clean without changes to be able to revert and you will undo the specified commit and record this change in the repository and its history.

So both revert commands are fundamentally different. This can lead to unexpected behaviour, especially when reverting a merge commit:

We had the case where our developers had two feature branches and one dev decided to temporarily merge the other feature branch to test some things out. Then he reverted the merge using git revert and opened a merge request. This leads to the situation, that the other feature branch becomes unmergeable because git records no changes. This is rightfully so, because this was already merged as part of the first one, but without the changes (because they were reverted!). So all the changes of the other feature branch were lost and not easily mergeable:

To fix a situation like this you can cherry-pick the reverted commit mentioned in the commit message.

How to revert local changes in git

So revert is maybe not what we wanted to do in the first place to undo our temporary merge of the other feature branch. Instead we should use git reset. It has some variants, but the equivalent of SVN’s revert would be the lossy

git reset --hard HEAD

to clean our working copy. If we wanted to “revert” the merge commit in our example we would do

git reset --hard HEAD^

to undo the last commit.

I hope this clear up some confusion regarding the function of some Git commands vs. their SVN counterparts or “false friends”.

Lineendings in repository

Git normally leaves files and their lineendings untouched. However, it is often desired to have uniform line endings in a project. Git provides support for this.

Config Variable

What some may already know is the configuration variable core.autocrlf. With this, a developer can locally specify that his newly created files are checked in to Git with LF. By setting the variable to “true” the files will be converted to CRLF locally by Git on Windows and converted back when saved to the repository. If the variable is set to “input” the files are used locally with the same lineending as in Git without conversion.
The problem is, this normalization only affects new files and each developer must set it locally. If you set core.autocrlf to false, files can still be checked in with not normalized line endings.

Gitattributes File

Another possibility is the .gitattributes file. The big advantage is that the file is checked in similarly to the .gitignore file and the settings therefore apply to all developers. To do this, the .gitattributes file is created in the repository and a path pattern and the text attribute are defined in it. The setting affects how the files are stored locally for the git switch, git checkout and git merge commands and how the files are stored in the repository for git add and git commit.

*.jpg          -text

The text attribute can be unset, then neither check-in nor check-out will do any conversions

*              text=auto

The attribute can also be set to auto. In this case, the line endings will be converted to LF at check-in if Git recognizes the file contents as text. However, if the file is already CRLF in the repository, no conversion takes place and the files remain CRLF. In the example above, the settings are set for all file types.

*.txt         text
*.vcproj      text eol=crlf
*.sh          text eol=lf

If the attribute is set, the lineending are stored in the default repository with LF. But eol can also be used to force fixed line endings for specific file types.

*.ps1	      text working-tree-encoding=UTF-16

Furthermore, settings such as the encoding can be set via the gitattributes file by using working-tree-encoding attribute. Everything else can be read in the documentation of the gitattributes file.

We use this possibility more and more often in our projects. Partly only to set single file types like .sh files to LF or to normalize the whole project.

Reading a conanfile.txt from a conanfile.py

I am currently working on a project that embeds another library into its own source tree via git submodules. This is currently convenient because the library’s development is very much tied to the host project and having them both in the same CMake project cuts down dramatically on iteration times. Yet, that library already has its own conan dependencies in a conanfile.txt. Because I did not want to duplicate the dependency information from the library, I decided to pull those into my host projects requirements programmatically using a conanfile.py.

Luckily, you can use conan’s own tools for that:

from conans.client.loader import ConanFileTextLoader

def load_library_conan(recipe_folder):
    text = Path(os.path.join(recipe_folder, "libary_folder", "conanfile.txt")).read_text()
    return ConanFileTextLoader(text)

You can then use that in your stage methods, e.g.:

    def config_options(self):
        for line in load_library_conan(self.recipe_folder).options.splitlines():
            (key, value) = line.split("=", 2)
            (library, option) = key.split(":", 2)
            setattr(self.options[library], option, value)

    def requirements(self):
        for x in load_library_conan(self.recipe_folder).requirements:
            self.requires(x)

I realize this is a niche application, but it helped me very much. It would be cool if conan could delegate into subfolders natively, but I did not find a better way to do this.

git-submodules in Jenkins pipeline scripts

Nowadays, the source control git is a widespread tool and work nicely hand in hand with many IDEs and continuous integration (CI) solutions.

We use Jenkins as our CI server and migrated mostly to the so-called pipeline scripts for job configuration. This has the benefit of storing your job configuration as code in your code repository and not in the CI servers configuration. Thus it is easier to migrate the project to other Jenkins CI instances, and you get versioning of your config for free.

Configuration of a pipeline job

Such a pipeline job is easily configured in Jenkins merely providing the repository and the location of the pipeline script which is usually called Jenkinsfile. A simple Jenkinsfile may look like:

node ('build&&linux') {
    try {
        env.JAVA_HOME="${tool 'Managed Java 11'}"
        stage ('Prepare Workspace') {
            sh label: 'Clean build directory', script: 'rm -rf my_project/build'
            checkout scm // This fetches the code from our repository
        }
        stage ('Build project') {
            withGradle {
                sh 'cd my_project && ./gradlew --continue war check'
            }
            junit testResults: 'my_project/build/test-results/test/TEST-*.xml'
        }
        stage ('Collect artifacts') {
            archiveArtifacts(
                artifacts: 'my_project/build/libs/*.war'
            )
        }
    } catch (Exception e) {
        if (e in org.jenkinsci.plugins.workflow.steps.FlowInterruptedException) {
            currentBuild.result = 'ABORTED'
        } else {
            echo "Exception: ${e.class}, message: ${e.message}"
            currentBuild.result = 'FAILURE'
        }
    }
}

If you are running GitLab you get some nice features in combination with the Jenkins Gitlab plugin like automatic creation of builds for all your branches and merge requests if you configure the job as a multibranch pipeline.

Everything works quite well if your project resides in a single Git repository.

How to use it with git submodules

If your project uses git-submodules to connect other git repositories that are not directly part of your project the responsible line checkout scm in the Jenkinsfile does not clone or update the submodules. Unfortunately, the fix for this issue leads to a somewhat bloated checkout command as you have to copy and mention the settings which are injected by default into the parameter object of the GitSCM class and its extensions…

The simple one-liner from above becomes something like this:

checkout scm: [
    $class: 'GitSCM',
    branches: scm.branches,
    extensions: [
        [$class: 'SubmoduleOption',
        disableSubmodules: false,
        parentCredentials: false,
        recursiveSubmodules: true,
        reference: 'https://github.com/softwareschneiderei/ADS.git',
        shallow: true,
        trackingSubmodules: false]
    ],
    submoduleCfg: [],
    userRemoteConfigs: scm.userRemoteConfigs
]

After these changes projects with submodules work as expected, too.

Simple build triggers with secured Jenkins CI

The jenkins continuous integration (CI) server provides several ways to trigger builds remotely, for example from a git hook. Things are easy on an open jenkins instance without security enabled. It gets a little more complicated if you like to protect your jenkins build environment.

Git plugin notify commit url

For git there is the “notifyCommitUrl” you can use in combination with the Poll SCM settings:

$JENKINS_URL/git/notifyCommit?url=http://$REPO/project/myproject.git

Note two things regarding this approach:

  1. The url of the source code repository given as a parameter must match the repository url of the jenkins job.
  2. You have to check the Poll SCM setting, but you do not need to provide a schedule

Another drawback is its restriction to git-hosted jobs.

Jenkins remote access api

Then there is the more general and more modern jenkins remote access api, where you may trigger builds regardless of the source code management system you use.
curl -X POST $JENKINS_URL/job/$JOB_NAME/build?token=$TOKEN

It allows even triggering parameterized builds with HTTP POST requests like:

curl -X POST $JENKINS_URL/job/$JOB_NAME/build \
--user USER:TOKEN \
--data-urlencode json='{"parameter": [{"name":"id", "value":"123"}, {"name":"verbosity", "value":"high"}]}'

Both approaches work great as long as your jenkins instance is not secured and everyone can do everything. Such a setting may be fine in your companies intranet but becomes a no-go in more heterogenious environments or with a public jenkins server.

So the way to go is securing jenkins with user accounts and restricted access. If you do not want to supply username/password as part of the url for doing HTTP BASIC auth and create users just for your repository triggers there is another easy option:

Using the Build Authorization Token Root Plugin!

Build authorization token root plugin

The plugin introduces a configuration setting in the Build triggers section to define an authentication token:

It also exposes a url you can access without being logged in to trigger builds just providing the token specified in the job:

$JENKINS_URL/buildByToken/build?job=$JOB_NAME&token=$TOKEN

Or for parameterized builds something like:

$JENKINS_URL/buildByToken/buildWithParameters?job=$JOB_NAME&token=$TOKEN&Type=Release

Conclusion

The token root plugin does not need HTTP POST requests but also works fine using HTTP GET. It does neither requires a user account nor the awkward Poll SCM setting. In my opinion it is the most simple and pragmatic choice for build triggering on a secured jenkins instance.