Forking an Open Source Repository in Good Faith

One might love Open Source for different reasons: Maybe as a philosophical concept of transcendental sharing and human progress, maybe for reasons of transparency and security, maybe for the sole reason of getting stuff for free…

But, as a developer, Open Source is additionally appealing for the sake of actively participating, learning and sharing on a directly personal level.

Now I would guess that most repository forks are probably done for rather practical reasons (“I wanna have that!”), the forks get some minor patches one happens to need right now – or for some super-specific use case – and then hang around for some time until that use case vanishes or the changes are so vast that there will never be a merge (a situation commonly known as “der Zug ist abgefahren”), one might sometimes try to supply one’s work for the good of more than oneself. That is what I hereby declare a “Fork in Good Faith.”

A fork can happen in good faith if some conditions are true, like:

  • I am sure that someone else can benefit from my work
  • My technical skills match the technical level of the repository in question
  • Said upstream repository is even open for contributions (i.e. not understaffed)
  • My broader vision does not diverge from the original maintainers’ vision

Maybe there are more of these, but the most essential point is a mindset:

  • I declare to myself that I want to stay compatible with the upstream as long as is possible from both sides.

To fork with Good Faith is, then, a great idea because it helps to advance much more causes at once than just the stuff for free, i.e. on a developmental level:

  1. You learn from the existing code, i.e. the language, coding style, design patterns, specific solutions, algorithms, hidden gems, …
  2. You learn from the existing repository, i.e. how commits and branches are organized, how commit messages are used productively, how to manage patches or changes in general, …
  3. In reverse, the original maintainers might learn from you, or at least future contributors might
  4. You might get more people to actually see / try / use your awesome feature, thus getting more feedback or bug reports than brewing your own soup
  5. You might consider it as a workout of professional confidence, to advocate your use cases or implementation decisions against other developers, training to focus on rational principles and unlearning the reflexes of your ego.
  6. This can also serve as a workout in mental fluidity, by changing between different coding styles or conventions – if you are e.g. used of your super-perfect-one-and-only way of doing things, it might just positively blow your mind to see that other conventions can work too, if done properly.
  7. Having someone to actually review your changes in a public pull request (merge request) gives you feedback also on an organisational level, as in “was all of this part actually important for your feature?”, “can you put that into a future pull request?” or “why did you rewrite all comments for some Paleo-Siberian language??”

Not to forget, you might grow your personal or professional network to some degree, or at least get the occasional thank you from anyone (well…).

But the basic point of this post is this:

Maintaining a Fork in Good Faith is active, continuous work.

And there is no shame in abandoning that claim, but if you do once, there might be no easy return.

Just think about the pure sadness of features that are sometimes replicated over-and-over again, or get lost over the time;

And just think about how confusing or annoying that already could have been for yourself, e.g. with some multiply-forked npm package or maybe full-fledged end-user projects (… how many forks of e.g. WLED do even exist?).

This is just some reflection of how careful such a decision should be done. Of course, I am writing this because I recently became aware of that point of bifurcation, i.e. not the point where a repository is forked, but the one where all of the advantages mentioned above are weighed against real downsides.

And these might be legitimate, and numerous, too. Just to name a few,

  1. Maybe the existing conventions are just not “done properly”, and following them for the sake of uniformity makes you unproductive over time?
  2. Maybe the original maintainers are just understaffed, non-responsive or do not adhere to a style of communication that works with you?
  3. Maybe most discussions are really just debates of varying opinion (publicly, over the internet – that usually works!) and not vehicles of transcending the personal boundaries of human knowledge after all?
  4. Maybe you are stuck with sub-par legacy code, unable to boy-scout away some technical debt because “that is not the point right now”, or maybe every other day some upstream commit flushes in more freshly baked legacy code?
  5. Maybe no one understands your use case and contrary to the idea mentioned above – in order to get appropriate feedback about your features, and to prove its worth, you need to distribute this independently?
  6. Maybe at one point the maintainers of an upstream repository change, and from now on you have to name your variables in some Paleo-Siberian language?

I guess you get the point by now. There is much energy to be saved by never considering upstream compatibility in the first place, but there is also much potential to be wasted. I have no clear answer – yet – how to draw the line, but maybe you have some insight on that topic, too.

Are there any examples of forks that live on their own, still with the occasional cherry-pick, rebase, merge? Not one comes to my mind.

The whole company under version control

One of our secrets is that we’ve put the whole company under version control. You can see every change to our business data and undo every mistake.

by Sashkin / fotolia

A minor fact about the Softwareschneiderei that always evokes surprised reactions is that everything we do is under version control. This should be no surprise for our software development work, as version control is a best practice for about twenty years now. If you aren’t a software developer or unfamiliar with the concept of version control for whatever reasons, here’s a short explanation of its main features:

Summary of version control

Version control systems are used to track the change history of a file or a bunch of files in a way that makes it possible to restore previous versions if needed. Each noteworthy change of a file (or a bunch of files) is stored as a commit, a new savepoint that can be restored. Each commit can be provided with a change note, a short comment that describes the changes made. This results in a timeline of noteworthy changes for each file. All the committed changes are immutable, so you get revision safety of your data for almost no cost.

Usual work style for developers

In software development, each source code file has to be “in a repository”, the repository being the central database for the version control system. The repository is accessible over the network and holds the commits for the project. One of the first lessons a developer has to learn is that source code that isn’t committed to a version control system just doesn’t exist. You have to commit early and you have to commit often. In modern development, commit cycles of a few minutes are usual and necessary. Each development step results in a commit.

What we’ve done is to adopt this work style for our whole company. Every document that we process is stored under version control. If we write you a quote or an invoice, it is stored in our company data repository. If we send you a letter, it was first committed to the repository. Every business analysis spreadsheet, all lists and inventories, everything is stored in a repository.

Examples of usage scenarios

Let me show you two examples:

We have a digital list of all the invoices we sent. It’s nothing but a spreadsheet with the most important data for each invoice. Every time we write an invoice, it is another digital document with all the necessary text and an additional line in the list of invoices. Both changes, the new invoice document and the extended list are included in one commit with a comment that hints to the invoice number and the project number. These changes are now included in the ever-growing timeline of our company data.

We also have a liquidity analysis spreadsheet that needs to be updated often. Every time somebody makes a change to the spreadsheet, it’s a new commit with a comment what was updated. If the update was wrong for whatever reason, we can always backtrack to the spreadsheet content right before that faulty commit and try again. We don’t only have the spreadsheet, but the whole history of how it was filled out, by whom and when.

Advantages of version controlled files

Before we switched to a version controlled work style, we had network shares as the place to store all company data. This is probably the de-facto standard of how important files are handled in many organizations. Adding version control has some advantages:

  • While working with network shares, everybody works on the same file. Most programs show a warning that another user has write access on a file and only opens in read-only mode. But not every program does that and that’s where edit collisions occur without anybody noticing. With version control, you work on a local copy of the file. You can always change the file, but you will get a “merge conflict” when another user has altered the file in the repository after your last synchronization. These merge conflicts are usually minor inconveniences with source code, but a major pain with binary file formats like spreadsheets. So you’ll know about edit collisions and you’ll try to avoid them. How do you avoid them? By planning and communicating your work better. Version control emphasizes the collaborative work setting we all live in.
  • Version controlled data is always traceable. You can pinpoint exactly who did what at which time and why (as stated in the commit comment). There is no doubt about any number in a spreadsheet or any file in your repository. This might sound like a surveillance nightmare, but it’s more of a protection against mishaps and honest errors.
  • Version control lets you review your edits. Every time you commit your work, you’ll see a list of files that you’ve changed. If there is a file that you didn’t know you’ve changed, the version control just saved your ass. You can undo the erroneous change with a simple click. If you’d worked with network shares, this change would have gone unnoticed. With version control, you have to double-check your work.
  • There are no accidental deletions with version control. Because you have every file stored in the repository, you can always undo every delete operation. With network shares, every file lives in the constant fear of the delete key. With version control, you catch your mishap in the commit step and just restore the file.

Summary of the adoption

When we switched to version control for every company data, we just committed our network shares in the repository and started. The work style is a bit inconvenient at first, because it is additional work and needs frequent breaks for the commits, but everybody got used to it very fast. Soon, the advantages began to outweight the inconvenience and now working with our company data is free of fear because we have the safety net of version control.

You want to know more about version control? Feel free to ask!