Leibniz would have known how to override equals

Equality is a subtle and thorny business, in programming as well as in pure mathematics, physics and philosphy. Probably every software developer got annoyed somtime by unexpected behaviour of some ‘equals’ method or corresponding operators and assertions. There are lot’s of questions that depend on context and answering them for some particular context might cost some pain and time – here is a list of examples:

  • What about objects that come with database-ids? Should they be equal for the objects to be equal?
  • Are dates with time zones equal if they represent the same instant but have a different time zone?
  • What about numbers represented by functions that compute digits up to a given precision?

Leibniz’ Law

This post is about applying an idea of Leibniz I like, to the problem of finding good answers to the questions above. It is called “Leibniz’ law” and can be phrased as a definition or characterization of equality:

Two objects are equal, if and only if, they agree in all properties.

If you are not familiar with the phrase “if and only if”, that’s from mathematics and it is a shorthand for saying, that two things are true:

  • If two objects are equal, then they agree in all properties.
  • If two objects agree in all properties, then they are equal.

Lebniz’ law is sometimes stated using mathematical symbols, like “\forall“, but this would be besides the point of this post – what those properties are will not be defined in a formal mathematical way. If I am in doubt about equality while programming, I am concerned about properties relevant to the problem I want to solve. For example, in almost all circumstances I can imagine, for a list, a relevant property would be its length, but not the place in the computers memory where it is stored.

But what are relevant properties in general? For me, such a property is the result of running some piece of meaningful code. And what meaningful code is, depends on your judgement how the object in question should be used. So in total, this boils down to the following:

Two instances of a type are equal, if and only if, they yield the same results in any meaningful piece of code.

Has this gotten us anywhere? My answer is yes, since the question about equality was reduced to a question about use cases of a type, which might be a starting point of defining a new type anyway.

Turtles all the way down

Please take a moment to note what a sneaky beast equality can be: Above I explained equality by using equality – right where I said “same results”. It is really hard to make statements about anything at all without using some notion of equality in some way. Even in programming, where you can freely define when two objects are equal, you can very well forget that you are using a system, namely your programming language, which usually already comes with an intricate notion of equality defined on the syntax you are using to define your notion of equality…

On a more practical note, that means that messed up notions of equality usually propagate if you define new kinds of objects from known ones.

Relation to Liskov’s Principle

With our above definition, we are very close to an informal interpretation of Liskov’s Substitution Principle, which we can rephrase as:

In all meaningful code for a type, an instance of a subtype has to behave the same way.

For comparison, the message of this post stated in the same tongue:

In all meaningful code for a type, two equal instance of the type should behave the same way.

Mastering programming like a martial artist

Gichin Funakoshi, who is sometimes called the “father of modern karate”, issued a list of twenty guiding principles for his students, called the “Shōtōkan nijū kun”. While these principles as a whole are directed at karate practitioners, many of them are very useful for other disciplines as well.
In my understanding, lifelong learning is a fundamental pillar of both karate and programming, and many of those principles focus on “learning” as a more fundamental action. I’d like to focus on a particular one now:

Formal stances are for beginners; later, one stands naturally.

While, at first, this seems to focus on stances, the more important concept is progression, and how it relates to formalities.

Shuhari

It is a variation of the concept of Shuhari, the three stages of mastery in martial arts. I think they map rather beautifully to mastery in programming too.

The first stage, shu, is about learning traditions and movements, and how to apply them strictly and faithfully. For programming, this is learning to write your first programs with strict rules, like coding conventions, programming patterns and all the processes needed to release your programs to the world. There is no room for innovation, this stage is about absorbing what knowledge and practices already exist. While learning these rules may seem like a burden, the restrictions are also a gift. Because it is always clear what is right and what is wrong, and decisions are easy.

The second stage, ha, is about breaking away from these rules and traditions. Coding conventions, programming-patterns etc. are still followed. But more and more, exceptions are allowed, and encouraged, when they serve a greater purpose. A hack might no longer seem so bad, when you consider how much time it saved. Technical dept is no longer just avoided, but instead managed. Decisions are a little harder here, but there’s always the established conventions to fall back to.

In the final stage, ri, is about transcendence. Rules lose their inherent meaning to purpose. Coding conventions, best-practices, and patterns can still be observed, but they are seen for what they are: merely tools to achieve a goal. As thus, all conventions are subject to scrutiny here. They can be ignored, changed or even abandoned completely if necessary. This is the stages for true innovation, but also for mastery. To make decisions on this level, a lot of practice and knowledge and a bit of wisdom are certainly required.

How to use this for teaching

When I am teaching programming, I try to find out what stage my student is in, and adapt my style appropriately (although I am not always successful in this).

Are they beginners? Then it is better to teach rigid concepts. Do not leave room for options, do not try to explore alternatives or trade-offs. Instead, take away some of the complexity and propose concrete solutions. Setup rigid guidelines, how to code, how to use the IDEs, how to use tools, how to communicate. Explain exactly how they are to fulfill all their tasks. Taking decisions away will make things a lot easier for them.

Students in the second, or even in the final stage, are much more receptive to these freedoms. While students on the second stage will still need guidance in the form of rules and conventions, those in the final stage will naturally adapt or reject them. It is much more useful to talk about goals and purpose with advanced students.

Building the right software

When we talk about software development a lot of the discussion revolves around programming languages, frameworks and the latest in technology.

While all the above and also the knowledge and skill of the developers certainly matter a great deal regarding the success of a software project the interaction between the involved individual is highly undervalued in my opion. Some weeks ago I watched a great talk connecting air plane crashes and interaction in a software development team. The golden quote for me was certainly this one:

Building software takes technical skill, but building the right software take human interaction and lots of it”

Nickolas Means (“How to crash an airplane”, The Lead Developer UK 2016)

I could not word it better and it matches my personal experience. Many, if not most of the problems in software projects are about human communication, values, feelings and opinions and not technical.

In his talk Nickolas Means focuses on internal team communication and I completely agree with him. My focus as a team lead shifted a lot from technical to fostering diversity, opinions and communication within the team. I am less strict in enforcing certain rules and styles in a project. I think this leads to more freedom and better opportunities for experimentation and exploration of ways to approach a problem.

Extend it to your customer

As we work on projects in different domains with a variety of customers we are really working hard to understand our customers. Building up open, trustworthy and stable communications is key in forming a fruitful and productive collaborative partnership in a (software) project. It will help you to produce a great software that does meet the customers needs instead of just a great software. It may also help you in situation where you mess up or technical problems plague the project.

The aspect of human interaction in software projects has its place rightfully in the agile software development manifesto:

Through this work we have come to value:

Individuals and interactions over processes and tools

The Authors of the Agile Manifesto

Almost 20 years later this is still undervalued and many software developers are still way too much on the technical side. We are striving to steadily improve our skills on the human interaction side and think it proves fruitful everytime we succeed.

I hope that more and more software developers will grasp the value of this shifted view point and that it will increase quality and value of the software solutions provided to all users.

Maybe it will make working in this field friendlier for not so tech-savvy people and allow for more of much needed diversity in tech.

State Management for emotionally overwhelmed React rookies

State Management for overwhelmed React rookies

The topic of React state management is nowhere near new. Just to be safe what we‘re talking about here: Consider a multitude of components, which, in nice React-fashion, are finely interleaved into each other, each one with a Single Responsibility, each one only holding as much information as it needs. Depending on the complexity of your application, there can now be a lot of complex dependencies, where one small component somewhere might cause another small component totally-elsewhere to update (re-render), without these having to really know much about each other, because we strive for Low Coupling. In front-end development, this is not only done in terms of „cleaner code“, but also in the performance problem of having to re-render stuff that is not actually changing.

So, just a few months back, a new competitor appeared in the question of React state management, which was open-sourced by Facebook and is called Recoil. The older top dog in this field is the widely-used Redux, with smaller interventions of libraries like MobX, that also aimed to offer an alternative of managing state in smaller applications, and when React in version 16.3 opened up a new standard of Context API, it already officially advanced quite a step into the direction of an official React solution to these questions.

There‘s probably not a single web developer on earth who wouldn‘t agree that in our field, one of the most fun…fundamental challenges is the effort of staying afloat on top of the turbulent JavaScript-osphere. If you are the type of person who doesn‘t want to jump on every bandwagon, but still don‘t want to miss out on all the amazing opportunities that all this progress could give you, you better start a bunch of side projects (call them „recreational“ if you like) and give yourself a chance to dive into particular technologies with confined scope, for some research time.

This is what I‘ve done now and I try to focus completely on the issues that an ambitious developer can experience when having all these choices. This is what I want to outline for you now, because as usual – if you have lots of time studying a single technology, you can succeed in spite of many limitations, and you also get used to certain kinds of things you might have to do that you originally didn‘t want to do, and so on and so on.

So with Redux, nobody really appeared to talk a lot of bad things about it and there even are some Mariuses who seem to be absolutely in love with the official Redux documentation, that are actually more of a guide to time-tested Best Practices, giving you the opportunity to do things right and have a scaleable state container which supports you even if your application grows to large dimensions. Then there‘s stuff like a time-travelling state debugger and the flexible middleware integration which I didn‘t even touch yet. When your project has a number of unrelated data structures, there‘s the Ducks pattern that advises you to organize your required Reducers, Actions and Action Creators in a coherently arranged files. Which, however, turned complicated in my one project in which the types of data objects aren‘t as unrelated, and I had to remove all the combineReducer() logic and ended up with one large global state object; I now have one source file that just consists of everything-related-with-redux and for my purpose, this seems fine, but I still have to write rather cumbersome connect(mapStateToProps, mapDispatchToProps) structure in every component in which I want to access the state. I would prefer to have smaller state containers, but maybe it‘s due to the structure of my project that makes these complicated.

It really is that way: Due to the everchanging recommendations that come with the evolution of React, the question of how to do things best (read: best for your specific purpose), always stays fresh. Since React 16.8 and the arrival of Hooks there is a procession towards less boilerplate code, favoring functional components with a leaner appearance. In this spirit, I strived for something less Redux-y. E.g. if I want some text in my state to be set; I would have to do something like

// ./ducks/TextDucks.js
// avoid having to rely on a magical string, therefore save the string to a constant
const SET_TEXT = 'SET_TEXT'; 

// action creator
export const setTextCreator = (text) ==> ({type: CLEAR_TEXT, payload: {text}});

const Reducer = (state = initialState, {type, payload}) => {
  //... other state stuff
  if (type === SET_TEXT) {
    return {...state, text: payload.text};
  }
}

================
// Component file
import {setTextCreator} from './ducks/TextDucks.js';
const mapDispatchToProps = (dispatch) => ({
  setText: text => dispatch(setTextCreator(text));
});
const Component = ({setText, ...}) => {
  // here can I actually use setText()
};
export default connect(..., mapDispatchToProps)(Component);

Which is more organized than passing along some setText(‘‘) function through my whole component tree, but still feels a bit overhead.

Then there was MobX. Which seemed to be quite lightweight and clearly laid out a coherent use of the Observable pattern, implemented with its own JavaScript decorators that follow a simple structure. Still, however, the lookandfeel of this code would appear to differ quite a lot from my usual coding style, which kept me from actually using it. This decision was then advanced by certain statements online, who, some years ago, actually predicted that the advancement of React’s own Context API would make any third-party library redundant. Now to be fair, React’s current Context API, with its useReducer() and useContext() also makes it possible to imiate a Redux-like structure already, but consider it as ways of thinking: If you write your code in the same style as you would with Redux’ recommendations, why not use it directly? Clearly, the point of avoiding Redux should go towards the direction of thinking differently.

The Context API actually supplies the underlying structure on which Redux’ own <Provider> builds. Insofar, it is not a big astonishment that you can accomplish similar things with it. Using the Context API, you wrap your whole Application like

// myContext.js
import React from "react";
const TextContext = React.createContext();
export default TextContext;

// App.js
import TextContext from './myContext';
const App = () => <TextContext.Provider value={"initial text"}>{/* actual app components here */}</TextContext.Provider>;

// some subComponent.js
import React from 'react';
import TextContext fom './myContext';
const SubComponent = (props) => {
  const [text, setText] = React.useContext(TextContext);
  // now use setText() as you would with a local React useState dispatch function..
}

Personally, I found that arrangement a bit clearer than the Redux structure, if you ‘re not used to Redux’ way of thinking anyway. However, if your state grows and is more than just a text, you would either keep state information in one large object again, or wrap your <App/> in a ton of different Contexts which I personally disdained already when I just had three different global state variables to manage.

Having all these possbilities at hand, one might wonder why Facebook felt the need to implement a new way of state management with Recoil. Recoil is still in its experimental phase, however, it didn’t take long to find one aspect very promising: The coding style of managing state in Recoil feels a lot like writing React code itself, which itself makes it very smooth to operate, as you don’t have to treat global state much different than local state. Our example would look like this

// textState.js
import * as Recoil from 'recoil';
export const text = Recoil.atom({key: 'someUniqueKey', default: 'inital text'});

// App.js
import {RecoilRoot} from 'recoil';
const App = () => <RecoilRoot>{/* here the actual app components */}</RecoilRoot>

// some Component.js
import * as TextState from './textState';
const [text, setText] = Recoil.useRecoilState(TextState.text);
// from then on, you can use setText() like you would as a React useState dispatch function

Even more simple, with Recoil you directly have access to the single useRecoilValue() function to just read the text value, and the single useSetRecoilState() function to just dispatch a new text. This avoids the complication of having to re-think your treating of whatever-in-your-state-is-global differently from what is local. Your <App/> component doesn’t grow to ugly levels of intendation, and you can neatly organize everything state-related in a separate file.

As someone who considers himself quite eager to learn new technologies, but also wants to quickly see some results without having to learn a lot of fresh basic understanding first, I had the most fun trying out Recoil in my projects, I have to admit. However, I totally believe that the demise of Redux is not closing in that soon at all, due to its focus on sustainability. For the future, I would aim to see my one Recoil project grow, and I keep you updated on how well this grows…

Be precise, round twice

Recently after implementing a new feature in a software that outputs lots of floating point numbers, I realized that the last digits were off by one for about one in a hundred numbers. As you might suspect at this point, the culprit was floating point arithmetic. This post is about a solution, that turned out to surprisingly easy.

The code I was working on loads a couple of thousands numbers from a database, stores all the numbers as doubles, does some calculations with them and outputs some results rounded half-up to two decimal places. The new feature I had to implement involved adding constants to those numbers. For one value, 0.315, the constant in one of my test cases was 0.80. The original output was “0.32” and I expected to see “1.12” as the new rounded result, but what I saw instead was “1.11”.

What happened?

After the fact, nothing too surprising – I just hit decimals which do not have a finite representation as a binary floating point number. Let me explain, if you are not familiar with this phenomenon: 1/3 happens to be a fraction which does not have a finte representation as a decimal:

1/3=0.333333333333…

If a fraction has a finite representation or not, depends not only on the fraction, but also on the base of your numbersystem. And so it happens, that some innocent looking decimal like 0.8=4/5 has the following representation with base 2:

4/5=0.1100110011001100… (base 2)

So if you represent 4/5 as a double, it will turn out to be slightly less. In my example, both numbers, 0.315 and 0.8 do not have a finite binary representation and with those errors, their sum turns out to be slightly less than 1.115 which yields “1.11” after rounding. On a very rough count, in my case, this problem appeared for about one in a hundred numbers in the output.

What now?

The customer decided that the problem should be fixed, if it appears too often and it does not take to much time to fix it. When I started to think about some automated way to count the mistakes, I began to realize, that I actually have all the information I need to compute the correct output – I just had to round twice. Once say, at the fourth decimal place and a second time to the required second decimal place:

(new BigDecimal(0.8d+0.315d))
    .setScale(4, RoundingMode.HALF_UP)
    .setScale(2, RoundingMode.HALF_UP)

Which produces the desired result “1.12”.

If doubles are used, the errors explained above can only make a difference of about 10^{-15}, so as long as we just add a double to a number with a short decimal representation while staying in the same order of magnitude, we can reproduce the precise numbers from doubles by setting the scale (which amounts to rounding) of our double as a BigDecimal.

But of course, this can go wrong, if we use numbers, that do not have a short neat decimal representation like 0.315. In my case, I was lucky. First, I knew that all the input numbers have a precision of three decimal places. There are some calculations to be done with those numbers. But: All numbers are roughly in the same order of magnitude and there is only comparing, sorting, filtering and the only honest calculation is taking arithmetic means. And the latter only means I had to increase the scale from 4 to 8 to never see any error again.

So, this solution might look a bit sketchy, but in the end it solves the problem with the limited time budget, since the only change happens in the output function. And it can also be a valid first step of a migration to numbers with managed precision.

PSA: logrotate does not support time-travel

When diagnostics explode

A great many things can break in a software system. However, diagnostics breaking the rest of the software is especially ironic. These tools are supposed to help you find bugs and other problems after the fact, not become one.
The system in question was a small data-recorder running on a BeagleBone Black (BBB), continiously recording measurements from specialized hardware.
These measurements are stored in an SQLite database and can be retrieved (and purged) via a very simple http interface.
For context: the BeagleBone Black is a small GNU/Linux ARM device, not unlike a Raspberry Pi.

During development, we noticed that logfiles would quickly grow to hundreds of megabyte, which could potentially be a problem if the data in the SQLite database is not retrieved, and subsequently purged, for a while. So as a precaution, we set the file-size limit to 5mb in /etc/logrotate.conf. We figured that should solve it, and during testing the logs never got very big again.

Fun in production

Imagine my surprise when I saw a 1.4gb /var/log folder that prevented any successful writes and subsequently corrupted the SQLite db. SQLite does not deal well with full disks, so this was a huge problem.

Two files especially, daemon.log and syslog, were huge with ~950mb and ~450mb respectively. They were clearly bigger than 5mb. logrotate was configured to run daily and weekly respectively. We were kind of spamming the log files, and estimated at max 50mb growth in either file per day, which should limit the files to 50mb and 350mb. But obviously, it didn’t.

Time travel

The production environment has several special properties:

  1. The BBB is not connected to the internet.
  2. There are semi-frequent power losses.
  3. The BBB does not have a battery, so power-cycling it means its internal date is reset.

What all this amounts to is: The system doesn’t know the current time and can’t get it via ntp. And whenever the system starts again, it resumes from a fixed date the disk was flashed with.

logrotate on the other hand doesn’t like that one bit. It’ll get confused by the files written in the future and even worse, it remembers when it last ran. And it doesn’t run if that’s in the future. So if the BBB runs nicely from January, 1st to July, 1st and then power-cycles, you’ll have to wait half a year for your daily logrotate run. And whenever it successfully runs, the problem will get worse.

So, in general, it’s not a good idea to run a full GNU Linux without a working clock!

Docker runtime breaking your container

Docker (or container technology in general) is a great tool to clearly separate the concerns of developers and operations. We use it to simplify various tasks like building projects, packaging them for different platforms and deployment of our software onto the target machines like staging and production servers. All the specifics of the projects are contained and version controlled using the Dockerfiles and compose files.

Our operations only needs to provide some infrastructure able to build container images and run them. This works great most of the time and removes a lot of the friction between developers and operation where in the past snowflaky-servers needed to be setup and maintained. Developers often had to ask for specific setups and environments because each project had their own needs. That is all gone with this great container technology. Brave new world. Except when it suddenly does not work anymore.

Help, my deployment container stopped working!

As mentioned above we use docker to deploy our software to the target machines. These machines are often part of a corporate network protected by firewalls and only accessible using VPN. I already talked about how to use openvpn in a docker container for deployment. So the other day I was making a release of one of my long-running projects and pressing the deploy button for that project on our jenkins continuous integration server.

But instead of just leaning back, relaxing and watching the magic work the deployment failed and the red light lit up! A look into the job output showed that the connection to the target machine was refused. A quick check from the developer machine showed no problem on the receiving side. VPN, target machine and everything was up and running as usual.

After a quick manual deployment performed with care and administrator hat I went on an investigation journey…

What was going on?

The deployment job did not change for several months, the container image did not change and the rest of the infrastructure was working as expected. After more digging, debugging narrowing down the problem I found out, that openvpn did not work in the container anymore because of some strange permission denied error:

Tue May 19 15:24:14 2020 /sbin/ip addr add dev tap0 1xx.xxx.xxx.xxx/22 broadcast 1xx.xxx.xxx.xxx
Tue May 19 15:24:14 2020 /sbin/ip -6 addr add 2axx:1xxx:4:5xxx:9xx:5xxx:5xxx:4xxx/64 dev tap0
RTNETLINK answers: Permission denied
Tue May 19 15:24:14 2020 Linux ip -6 addr add failed: external program exited with error status: 2
Tue May 19 15:24:14 2020 Exiting due to fatal error

This hot trace made it easy to google for and revealed following issue on github: https://github.com/dperson/openvpn-client/issues/75. The cause of all the trouble was changed behaviour of the docker runtime. Our automatic updates had run over the weekend and actually installed a new package version of the docker runtime (see exerpt from apt history log):

containerd.io:amd64 (1.2.13-1, 1.2.13-2)

This subtle change broke my container! After some sacrifices to the whale gods I went on to implement the fix. Fortunately there is an easy way to get it working like before. You just have to pass following command line switch to docker run and everything works as expected:

--sysctl net.ipv6.conf.all.disable_ipv6=0

As nice as containers are for abstracting away hardware, operating systems and other environment details sometimes the container runtime shines through. It is just a shame that such things happen on minor releases or package release upgrades…

The ALARA principle in software engineering

The ALARA principle originates in radiation protection and means “As Low As Reasonably Achievable”. It means that you have to weigh the purpose of an action dealing with radiation against its disadvantages, like radiation damage or long-term risks. The word “reasonably” means that while some disadvantages are not avoidable, a practicable amount of protection should be in place to lower them. The principle calls for a balancing act: Not without safety measures, but don’t overextend your means by trying to achieve a safety level that isn’t helpful anymore.
To put the ALARA principle in practice with an example: You shouldn’t need a X-ray every time you go to the dentist, but given enough time since the last one and reasonable doubt about a tooth, the X-ray examination will benefit your dental (and overall) health more than if you deny it. It isn’t healthy by itself, but the information gained by it will be used to improve your health.

I learnt about the ALARA principle when my father (a nuclear physicist by heart) explained it to me in context of the current corona pandemic: Use protection like face masks and distance, but don’t stress yourself too much over that one time when you grabbed a pen in the postal office. While preparation and watchfulness is helpful, fear is detrimental to your mental health. And even the most resilient mind has bounded resources that can better be spent on constructive things instead of fear.

A fun fact from radiation protection is that at least three of the four main rules of protection can be applied to corona, too:

  • Distance yourself from the radiation source
  • Use appropriate protection gear
  • Avoid incorporation (keep the thing outside your body)
  • Limit your exposure time (this doesn’t fit as nicely, because the virus is probably not cumulative)

But how can we apply the ALARA principle to software engineering? I was instantly reminded about the “Thorough” rule of unit testing. In the book “Pragmatic Unit Testing” by Andy Hunt and Dave Thomas, the two original Pragmatic Programmers, good unit tests have to follow the ATRIP-rules. The T stands for “Thorough” which is often misinterpreted as “test everything at least twice”. In reality, the rule states that:

  • all mission critical functionality needs to be tested
  • for every occuring bug, there needs to be an additional test that ensures that the bug cannot happen again

The first thing that meets the eye is that the rule doesn’t define a bug as a failure of your testing effort. It takes a bug that probably happened in production and caused some damage as a motivation to strengthen your test coverage in that particular area. The second part of the rule calls for directed, well-aimed testing effort. It is easy to follow because it has a clear trigger: A bug happened, now you have to write a test.

The first part of the rule is more complicated: What is mission critical functionality? And what means “is tested thoroughly” in this context? And here, the ALARA principle can help us. The bug rate in the important parts of your code should be as low as reasonably achievable. “Reasonably achievable” is defined by the resources at your disposal (like time to market), your expertise in testing and the potential damage that could happen if something in your code goes wrong.

If the potential damage is high or even life-threatening, your reasonable effort should be much higher than if the most critical thing that happens is a 15 minute downtime while you restart the server. There are use cases where even 15 minutes mean subsequent damage, but most software is written for a more relaxed context.

I’ve always found the “Thorough” rule of good unit tests pleasant and comforting: If you made reasonable effort to test your most important code and write a test for every bug you or your users encounter, you can say that your bug rate is ALARA – “As Low As Reasonably Achievable”. And that is good enough for most cases.

What was your first thought when you heard about the ALARA principle? Tell us in the comment section!

Math development practices

As a mathematician that recently switched to almost full-time software developing, I often compare the two fields. During the last years of my mathematics career I was in the rather unique position of doing both at once – developing software of some sort and research in pure mathematics. This is due to a quite new mathematical discipline called Homotopy Type Theory, which uses a different foundation as the mathematics you might have learned at a university. While it has been a possibility for quite some time to check formalized mathematics using computers, the usual way to do this entails a crazy amount of work if you want to use it for recent mathematics. By some lucky coincidences this was different for my area of work and I was able to write down my math research notes in the functional language Agda and have them checked for correctness.

As a disclaimer, I should mention that what I mean with “math” in this post, is very far from applied mathematics and very little of the kind of math I talk about is implement in computer algebra systems. So this post is about looking at pure, abstract math, as if it were a software project. Of course, this comparison is a bit off from the start, since there is no compiler for the math written in articles, but it is a common believe, that it should always be possible to translate correct math to a common foundation like the Zermelo Fraenkel Set Theory and that’s at least something we can type check with software (e.g. isabelle).

Refactoring is not well supported in math

In mathematics, you want to refactor what you write from time to time pretty much the same way and for the same reasons as you would while developing software. The problem is, you do not have tools which tell you immediately if your change introduces bugs, like automated tests and compilers checking your types.

Most of the time, this does not cause problems, judging from my experiences with refactoring software, most of the time a refactoring breaks something detected by a test or the compiler, it is just about adjusting some details. And, in fact, I would conjecture that almost all math articles have exactly those kind of errors – which is no problem at all, since the mathematicians reading those articles can fix them or won’t even notice.

As with refactoring in software development, what does matter are the rare cases where it is crucial that some easy to overlook details need to match exactly. And this is a real issue in math – sometimes a statements gets reformulated while proving it and the changes are so subtle that you do not even realize you have to check if what you prove still matches your original problem. The lack of tools that help you to catch those bugs is something that could really help math – but it has to be formalized to have tools like that and that’s not feasible so far for most math.

Retrospectively, being able to refactor my math research was the biggest advantage of having fully formal research notes in Agda. There is no powerful IDE like they are used in mainstream software development, just a good emacs-mode. But being able to make a change and check afterwards if things still compile, was already enough to enable me to do things I would not have done in pen and paper math.

Not being able to refactor might also be the root cause of other problems in math. One wich would be really horrible for a software project, is that sometimes important articles do not contain working versions of the theorems used in some field of study and you essentially need to find some expert in the field to tell you things like that. So in software project, that would mean, you have to find someone who allegedly made the code base run some time in the past by applying lots of patches which are not in the repository and wich he hopefully is still able to find.

The point I wanted to make so far is: In some respects, this comparison looks pretty bad for math and it becomes surprising that it works in spite of these deficiencies. So the remainder of this post is about the things on the upside, that make math check out almost all the time.

Math spent person-centuries on designing its datatypes

This might be exaggerated, but it is probably not that far off. When I started studying math, one of my lecturers said “inventing good definitions is not less important as proving new results”. Today, I could not agree more, immense work went into the definitions in pure math and they allowed me to solve problems I would be too dumb to even think about otherwise. One analogue in programming is finding the ‘correct’ datatypes, which, if achieved can make your algorithms a lot easier. Another analogue is using good libraries.

Math certainly reaps a great benefit from its well-thought-through definitions, but I must also admit, that the comparison is pretty unfair, since pure mathematicians usually take the freedom to chose nice things to reason about. But this is a point to consider when analysing why math still works, even if some of its practices should doom a software project.

I chose to speak about ‘datatypes’ instead of, say ‘interfaces’, since I think that mathematics does not make that much use of polymorhpism like I learned it in school around 2000. Instead, I think, in this respect mathematical practice is more in line with a data-oriented approach (as we saw last week here on this blog), in math, if you want your X to behave like a Y, you usually give a map, that turns your X into a Y, and then you use Y.

All code is reviewed

Obviously like everywhere in science, there is a peer-review processs if you want to publish an article. But there are actually more instances of things that can be called a review of your math research. Possibly surprising to outsiders, mathematicians talk a lot about their ideas to each other and these kind of talks can be even closer to code reviews than the actual peer reviews. This might also be comparable to pair programming. Also, these review processes are used to determine success in math. Or, more to the point, your math only counts if you managed to communicate your ideas successfully and convince your audience that they work.

So having the same processes in software development would mean that you have to explain your code to your customer, which would be a software developer as yourself, and he would pay you for every convincing implementaion idea. While there is a lot of nonsense in that thought, please note that in a world like that, you cannot get payed for a working 300-line block code function that nobody understands. On the other hand, you could get payed for understanding the problem your software is supposed to solve even if your code fails to compile. And in total, the interesting things here for me is, that this shift in incentives and emphasis on practices that force you to understand your code by communicating it to others can save a very large project with some quite bad circumstances.

Data-Oriented Design: Using data as interfaces

A Code Centric World

In main-stream OOP, polymorphism is achieved by virtual functions. To reuse some code, you simply need one implementation of a specific “virtual” interface. Bigger programs are composed by some functions calling other functions calling yet other functions. Virtual functions introduce a flexibility here to that allow parts of the call tree to be replaced, allowing calling functions to be reused by running on different, but homogenuous, callees. This is a very “code centric” view of a program. The data is merely used as context for functions calling each other.

Duality

Let us, for the moment, assume that all the functions and objects that such a program runs on, are pure. They never have any side effects, and communicate solely via parameters to and return values from the function. Now that’s not traditional OOP, and a more functional-programming way of doing things, but it is surely possible to structure (at least large parts of) traditional OOP programs that way. This premise helps understanding how data oriented design is in fact dual to the traditional “code centric” view of a program: Instead of looking at the functions calling each other, we can also look at how the data is being transformed by each step in the program because that is exactly what goes into, and comes out of each function. IS-A becomes “produces/consumes compatible data”.

Cooking without functions

I am using C# in the example, because LINQ, or any nice map/reduce implementation, makes this really staight-forward. But the principle applies to many languages. I have been using the technique in C++, C#, Java and even dBase.
Let’s say we have a recipe of sorts that has a few ingredients encoded in a simple class:

class Ingredient
{
  public string Name { get; set; }
  public decimal Amount { get; set; }
}

We store them in a simple List and have a nice function that can compute the percentage of each ingredient:

public static IReadOnlyList<(string, decimal)> 
    Percentages(IEnumerable<Ingredient> incredients)
{
  var sum = incredients.Sum(x => x.Amount);
    return incredients
      .Select(x => (x.Name, x.Amount / sum))
      .ToList();
}

Now things change, and just to make it difficult, we need a new ingredient type that is just a little more complicated:

class IngredientInfo
{
  public string Name { get; set; }
  /* other useful stuff */
}

class ComplicatedIngredient
{
  public IngredientInfo Info { get; set; }
  public decimal Amount { get; set; }
}

And we definitely want to use the old, simple one, as well. But we need our percentage function to work for recipes that have both Ingredients and also ComplicatedIngredients. Now the go-to OOP approach would be to introduce a common interface that is implemented by both classes, like this:

interface IIngredient
{
  string GetName();
  string GetAmount();
}

That is trivial to implement for both classes, but adds quite a bunch of boilerplate, just about doubling the size of our program. Then we just replace IReadOnlyList<Ingredient> by IReadOnlyList<IIngredient> in the Percentage function. That last bit is just so violating the Open/Closed principle, but just because we did not use the interface right away (Who thought YAGNI was a good idea?). Also, the new interface is quite the opposite of the Tell, don’t ask principle, but there’s no easy way around that because the “Percentage” function only has meaning on a List<> of them.

Cooking with data

But what if we just use data as the interface? In this case, it so happens that we can easiely turn a ComplicatedIngredient into an Ingredient for our purposes. In C#’s LINQ, a simple Select() will do nicely:

var simplified = complicated
  .Select(x => new Ingredient
   { 
     Name = x.Info.Name,
     Amount = x.Amount
   });

Now that can easiely be passed into the Percentages function, without even touching it. Great!

In this case, one object could neatly be converted into the other, which is often not the case in practice. However, there’s often a “common denominator class” that can be found pretty much the same way as extracting a common interface would. Just look at the info you can retrieve from that imaginary interface. In this case, that was the same as the original Ingredients class.

Further thoughts

To apply this, you sometimes have to restructure your programs a little bit, which often means going wide instead of deep. For example, you might have to convert your data to a homogenuous form in a preprocessing step instead of accessing different objects homogenuously directly in your algorithms, or use postprocessing afterwards.
In languages like C++, this can even net you a huge performance win, which is often cited as the greatest thing about data-oriented design. But, first and foremost, I find that this leads to programs that are easier to understand for both machine and people. I have found myself using this data-centric form of code reuse a lot more lately.

Are you using something like this as well or are you still firmly on the override train, and why? Tell me in the comments!