Authors – Page 21 – Schneide Blog

What dependent types can do for you

In a way, this post is also about Test Driven Developement and *Type* Driven Developement. While the two share the same acronym, I always thought of them as different concepts. However, as I recently experienced, when the two concepts are used in a dependently typed language, there is something like a fluid transition between them.

While I will talk about programming in the dependently typed language Agda, not much is needed to follow what is going on – I will just walk through an exercise and explain everything along the way.

The exercise I want to use, is here. It talks about a submarine, its position and certain commands, that change the position. Examples for commands are forward 1, down 2 and up 3. These ‘values’ can be used just like that with the following definition of the type of commands:

data Command : Set where
forward : Nat -> Command
up : Nat -> Command
down : Nat -> Command

Agda can be used in a very mathy way – this should really be read as saying, that the type of commands is a Set and there are three constructors (highlighted green) which take a natural number as argument and produce a command. So, using that application is just juxtaposition, we can make the following definitions now:

justSomeCommand = forward 5
anotherOne = up 1

Now the exercise text explains, how these commands can be applied to the position of the submarine. Working as a software developer, I built the habit of turning specifications like that into tests. Since I don’t know any better, I just wrote ‘tests’ in Agda using equations to translate the exercise text – I’ll explain the syntax below:

apply (forward 5) (pos 0 0) ≡ pos 5 0
apply (down 5) (pos 5 0) ≡ pos 5 5
apply (forward 8) (pos 5 5) ≡ pos 13 5

Note that the triple equal sign is different from what we used above. Roughly, this is because it is the proposition, that some tings are equal, while the normal equal sign above, was used to make definitions. The code doesn’t type check as is. We haven’t defined ‘apply’ and it is not valid Agda to just write down equations like that. Let’s fix the latter problem first, by turning it into declarations and definitions. This will actually define elements of the datatypes of equality proofs – but I’m pretty sure you can accept these changes just as boilerplate we have to add to our equations:

example1 : apply (forward 5) (pos 0 0) ≡ pos 5 0
example1 = refl

example2 : apply (down 5) (pos 5 0) ≡ pos 5 5
example2 = refl

example3 : apply (forward 8) (pos 5 5) ≡ pos 13 5
example3 = refl

Now, to make the examples type-check, we have to define ‘pos’ and ‘apply’. Positions can be done analogous to commands:

data Position : Set where
pos : Nat -> Nat -> Position

(Here, the type of ‘pos’ just tells us, that it is a function taking two natural numbers as arguments.) Now we are ready to start with ‘apply’:

apply : Command → Position → Position
apply = ?

So apply is a function, that takes a ‘Command’ and a ‘Position’ and returns another ‘Position’. For the definition of ‘apply’ I just entered a questionmark ‘?’. It is one of my favorite features of Agda, that terms can be left out like this before type checking. Agda still checks everything we have given so far and will give us a lot of information about what ‘?’ could be. This is called ‘interacting with a hole’. Because, well, it is a hole in your code and the type checker is there to tell you, which things might fit into this hole. After type checking, the hole and what Agda tells us about it, will look like this:

apply : Command → Position → Position
apply = ?

Goal: Command -> Position -> Position

This was type-checked with a couple of imports – see my final version of the code if you want to reproduce. The first thing Agda tells us, is the type of the goal and then there is some mumbling about constraints with some fragments, that look like they have something to do with the examples from above – the latter is actually not information about the hole, but general information about the type checking. So lets look at them to see, if the type checker has to say anything:

The refl terms in the definitions of the examples, are highlighted yellow

Something is yellow! This is Agda’s way to tell us, that it does not have enough information to decide, if everything is okay. Which makes a lot of sense, since we haven’t given a definition of ‘apply’ and these equations are about values computed with ‘apply’. So let us just continue to define ‘apply’ and see if the yellow vanishes. This is analogous to the stage in TDD were your tests don’t pass because your code does not yet compile.

We will use pattern matching on the given ‘Command’ and ‘Position’ to define ‘apply’ – the cases below were generated by Agda (I only changed variable names), and we now have a hole for each case:

apply : Command → Position → Position
apply (forward x) (pos h d) = {!!}
apply (up x) (pos h d) = {!!}
apply (down x) (pos h d) = {!!}

There are various ways in which Agda can use the information given by types to help us with filling these holes. First of all, we can just ask Agda to make the hole ‘smaller’ if there is a unique canonical way to do so. This will work here, since ‘Position’ has only one constructor. So we get new holes for arguments of the constructor ‘pos’ and can try to fill those.

Let us focus on the first case and see what happens if we enter something not in line with our tests:

If we ask Agda, if ‘h+d’ fits into the ‘hole’, it will say no and tell us what the problem is in the following way:

While this is essentially the same kind of feedback you would get from a unit test, there are at least two important advantages to note:

This is feedback from the type checker and it is combined with other things the type checker can tell you. It means you get a lot of feedback at once, when you ask Agda, if something you wrote fits into a hole.
‘refl’ is only a simple case of the proves you can write in Agda. More complicated ones need some training, but you can go way beyond unit tests and ‘check’ infinitely many cases or even better: all cases.

If you want more, just try Agda yourself! One easy way to do that, is to use Ingo Blechschmidt’s Agdapad, which let’s you try Agda in your browser.

X Forwarding from Linux to Windows

This is a very short statement of joy in that I found something I thought of being very complicated – actually turned out to be done quite easy.

We have a multitude of clients with a multitude of infrastructures. Then there is home office and still a Corona pandemic (according to individual voices..?), so all in all, one does not always have a Linux system at hand when working on a Linux project.

SSH access is usually easy, but if you need graphical UIs, that would be a problem, because the X Window System (X11) that is commonplace for the graphical display of Linux and the standard ssh client allows X11 forwarding via the command line option ssh -X out of the box.

Now Windows is a different story, but it turns out that the right tools… just exist. This is the short version:

There is the Xming Public Domain version (last release in 2016) which you can get from e.g. here and is straightforward to install. This plays the role of a X Server, e.g. a software-side display that can receive data via ssh.
After the straighinstallation, call XLaunch to setup
The “Multiple Windows” option is fine, as is using “Display number 0”. I then opted to “start no client” and ignored the other options.
I already use PuTTY for everything else (including SSH tunnels to various remote networks), and while this has a somewhat objectionable user interface, one can manage. If you have an existing session, make sure to select that first, then click Load, then adjust the settings as follows.
go to Connection > SSH > X11
Enable X11 Forwarding
in “X display location”, enter “localhost:0” if you chose “Display number 0” in step 3. Or choose accordingly.
You might now save (or don’t) and open the connection.

I was more than surprised just to be able to start any gui application on our client’s remote machine and seeing the result.

Sure you might get a few seconds delay, but compared to the hassle I expected – this was a walk in the park.

A big shoutout to the creators of Xming and PuTTY, well deserved.

Automated instance construction in C++

I’m currently mostly switching back and forth between C# and C++ projects. One of the things that I’m missing most when switching to C++ is a nice dependency-injection (DI) library. After checking out what was already available, I finally decided I wanted to try to build my own slim type-indexed variant. I quickly started by registering factories and instances in a map on std::type_index, making it possible to both have the DI retain ownership (with std::unique_ptr) or just make a type available via a bare pointer. So I was able to do things like:

// Register an instance
di.insert_unique(std::make_unique<foo_service>());
// Register a factory
di.insert_unique([] {return std::make_unique<bar_service>());
// Register an existing bare pointer
di.insert_bare(my_bare_thingy);

// ... and retrieve them
auto& foo = di.get<foo_service>();

One of the most powerful aspects of a DI library is the ability to transitively setup dependencies. I like constructor injection the most, so I implemented a very naive way like this:

di.insert_unique([](auto& p) { return std::make_unique<complex_service>(
  p.get<base_service1>(), p.get<base_service2>(), p.get<base_service3>());
});

This is pretty verbose and we basically have to repeat all the constructor parameter types. But it’s easy to implement. We can do a little bit better by using a templated type-conversion operator and using it to call the get:

class service_provider
{
  struct inferred_locator
  {
    service_provider const* provider;
    template <class T> operator T&() const
    {
      return provider->get<std::remove_const_t<T>>();
    }
  };
  
  inferred_locator get() const
  {
    return { .provider = this };
  }
  
  /** typed get implementations here... */
};

Now we can reduce the previous registration to:

di.insert_unique([](auto& p) { 
  return std::make_unique<complex_service>(p.get(), p.get(), p.get());
});

That is basically only the number of constructor parameters in a verbose way. We could write a small template that takes the number, creates an std::index_sequence from it and then unpacks each index into an invokation of service_provider::get. But then we would still have to update registrations when adding (or removing) a new dependency to a services’s constructor. With a litte more work, we can actually get this instead:

di.insert_unique<complex_service>();

This partly inspired by Antony Polukhin’s C++ reflection talks, and combines std::index_sequence based unpacking, SFINEA and the templated type-conversion operator:

template <class T, std::size_t Head, std::size_t... Rest>
constexpr auto make_unique_impl(provider_wrapper const& p,
    std::index_sequence<Head, Rest...>,
    decltype(T{ mimic{ Head }, mimic{ Rest }... }) * = nullptr) -> std::unique_ptr<T>
{
    // This next requirement is so we do not accidentally recurse into the copy/move-ctors
    static_assert(sizeof...(Rest) + 1 > 1, "Can only deduce constructors with two or more parameters.");
    return std::make_unique<T>(p(Head), p(Rest)...);
}

template <class T, std::size_t... Rest>
constexpr auto make_unique_impl(provider_wrapper const& p, std::index_sequence<Rest...>) -> std::unique_ptr<T>
{
    // This next requirement is so we do not accidentally recurse into the copy/move-ctors
    static_assert(sizeof...(Rest) > 1, "Can only deduce constructors with two or more parameters.");
    return make_unique_impl<T>(p, std::make_index_sequence<sizeof...(Rest) - 1>{});
}

template <class T, std::size_t Max = 8> auto make_unique(service_provider const& p)
{
    return make_unique_impl<T>(provider_wrapper{ &p }, std::make_index_sequence<Max>{});
}

This uses two new types: mimic, which is only used for SFINEA, takes std::size_t on construction (for the unpacking from the std::index_sequence) and converts to anything (templated type conversion again) and the provider_wrapper, which is a simple adaptor around service_provider that takes an unused std::size_t argument (again, for unpacking). The first overload of make_unique_impl is slightly more specialized (because it has Head and Rest), so the compiler tries it first. If it works, it returns a new instance of the service we want. Otherwise, it will fail without an error due to SFINEA in the unused and defaulted third parameter. The compiler will then try the second overload, which will recurse to a variant with fewer parameters. The outermost make_unique starts this recursion with 8 parameters, because that should be enough for any sane service. I stop this recursion at one constructor parameter, even though that is a useful configuration. This is because I have not yet found a way to avoid calling the copy or move constructors accidentally. If anyone knows how to do that, I’d be very happy to hear how. My workaround right now is to explicitly register a factory in that case.

Range Types in PostgreSQL

How do you store ranges in an SQL database? By ranges I mean things like price ranges, temperature ranges, date ranges for scheduling, etc. You’d probably represent them with two columns in a table, like min_price and max_price, min_temperature and max_temperature, start_date and end_date. If you want to represent an unbounded range, you’d probably make one or both columns nullable and then take NULL as +/- infinity.

If you want to test if a value is in a range you can use the BETWEEN operator:

SELECT * FROM products WHERE
  target_price BETWEEN min_price AND max_price;

This doesn’t work as nicely anymore if you work with unbounded ranges as described above. You’d have to add additional checks for NULL. What if you want to test if one of the ranges in the table overlaps with a given range?

SELECT * FROM products WHERE
  max_given >= min_price AND
  min_given <= max_price;

Did I make a mistake here? I’m not sure. What if they should overlap but not cover each other? And again, this becomes even more complicated with unbounded ranges.

Enter range types

PostgreSQL has a better solution for these problems — range types. It comes with these additional built-in data types:

int4range: Range of integer
int8range: Range of bigint
numrange: Range of numeric
tsrange: Range of timestamp without time zone
tstzrange: Range of timestamp with time zone
daterange: Range of date

You can use them as a column type in a table:

CREATE TABLE products (…, price_range numrange);

Construction

You can construct range values for these types like this:

'[20,35]'::int4range
'(5,12]'::int4range
'(6.2,12.5)'::numrange
'[2022-05-01, 2022-05-31]'::daterange
'[9:30, 12:00)'::timerange

As you can see, they use mathematical interval notation. A square bracket means inclusive bound, and a round parenthesis means exclusive bound. They can also be unbounded (infinite) or empty:

'[5,)'::int4range
'(,20]'::int4range
'empty'::int4range

You can get the bounds of a range individually with the lower() and upper() functions:

SELECT * FROM products ORDER BY lower(price_range);

Operators

The range types become really powerful through the range operators. There are a lot, so I will only show some basic examples:

The && operators tests if two ranges overlap: range_a && range_b
The @> and <@ operators test if the first range contains the second or vice versa: range_a <@ range_b. If used with an element on one side they test if the element is in a range: element <@ range or range @> element.
The -|- operator tests if two ranges are adjacent: range_a -|- range_b

Additionally to these boolean tests you can also calculate new ranges based on existing ranges:

The + operator computes the union of two overlapping or adjacent ranges: range_a + range_b. The * computes the intersection of ranges, and the - operator the difference.

Multiranges

There is one more thing I want to mention: For each one of the range types there is also a multirange type: int4multirange, int8multirange, nummultirange, tsmultirange, tstzmultirange, datemultirange. As their names suggest, they store multiple ranges in one value:

'{}'::int4multirange
'{[2,9)}'::int4multirange
'{[2,9), [12,20)}'::int4multirange

The mentioned range operators work with them as well.

Packaging Java-Project as DEB-Packages

Providing native installation mechanisms and media of your software to your customers may be a large benefit for them. One way to do so is packaging for the target linux distributions your customers are running.

Packaging for Debian/Ubuntu is relatively hard, because there are many ways and rules how to do it. Some part of our software is written in Java and needs to be packaged as .deb-packages for Ubuntu.

The official way

There is an official guide on how to package java probjects for debian. While this may be suitable for libraries and programs that you want to publish to official repositories it is not a perfect fit for your custom project that you provide spefically to your customers because it is a lot of work, does not integrate well with your delivery pipeline and requires to provide packages for all of your dependencies as well.

The convenient way

Fortunately, there is a great plugin for ant and maven called jdeb. Essentially you include and configure the plugin in your pom.xml as with all the other build related stuff and execute the jdeb goal in your build pipeline and your are done. This results in a nice .deb-package that you can push to your customers’ repositories for their convenience.

A working configuration for Maven may look like this:

<build>
    <plugins>
        <plugin>
            <artifactId>jdeb</artifactId>
            <groupId>org.vafer</groupId>
            <version>1.8</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>jdeb</goal>
                    </goals>
                    <configuration>
                        <dataSet>
                            <data>
                                <src>${project.build.directory}/${project.build.finalName}-jar-with-dependencies.jar</src>
                                <type>file</type>
                                <mapper>
                                    <type>perm</type>
                                    <prefix>/usr/share/java</prefix>
                                </mapper>
                            </data>
                            <data>
                                <type>link</type>
                                <linkName>/usr/share/java/MyProjectExecutable</linkName>
                                <linkTarget>/usr/share/java/${project.build.finalName}-jar-with-dependencies.jar</linkTarget>
                                <symlink>true</symlink>
                            </data>
                            <data>
                                <src>${project.basedir}/src/deb/MyProjectStartScript</src>
                                <type>file</type>
                                <mapper>
                                    <type>perm</type>
                                    <prefix>/usr/bin</prefix>
                                    <filemode>755</filemode>
                                </mapper>
                            </data>
                        </dataSet>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

If you are using gradle as your build tool, the ospackage-plugin may be worth a look. I have not tried it personally, but it looks promising.

Wrapping it up

Packaging your software for your customers drastically improves the user experience for users and administrators. Doing it the official debian-way is not always the best or most efficient option. There are many plugins or extensions for common build systems to conveniently build native packages that may easier for many use-cases.

The charged charging switch

In this blog post, I’ll describe my experiences with a certain product (a computer monitor) and its manual. It might serve as an example of how ridiculous a poorly designed customer experience is perceived on the receiving end. Hopefully, it inspires some readers to think about sensible defaults and how to communicate them.

Let’s start with the context. In a previous blog post, I described my journey from one small monitor to four monitors in total (three big ones, one small additional one). Well, it is not just my journey – all of my co-workers have now four computer monitors for their office workplace.

This meant that we bought a lot of smaller monitors in the last months. We decided to go the monoculture route and bought one piece of our favorite model.

It arrived faulty. The only thing that this device did was to indicate “battery full” when the battery status button was pressed (yes, this particular monitor has its own battery for mobile usage). Everything else didn’t work, especially not the power button. The device was a dead fish. I returned it to the supplier.

The replacement unit was also dead on arrival. This puzzled me, because the odds of having two duds in a row seem very small. So I investigated and found an interesting fact: The unpacking and assembly instruction sheet is incomplete. Well, even more than that. It’s plain misleading.

It starts with a big lettered alert that reads “Please follow the illustration and text description strictly when opening the package and installing the display.” It then shows three illustrations of a totally different monitor and ends the instructions at the step when the styrofoam is removed (and no cables attached). At the bottom of the sheet, there’s an explanation: “The machine picture and styrofoam shown are for illustration purpose only and may differ from the actual product”. You can’t make this up.

The manual urges me to follow it “strictly” and then vaguely tells me how to unwrap the monitor from the styrofoam and nothing more. Even better, in the illustrations, there are different options given like “For binding-less, please ignore the untying action” (actual quote!). You can’t follow strictly if given multiple options and hand-wavey instructions. “Unpack the monitor correctly” is more actionable than this manual.

But that was just the beginning. The user manual actually references the correct monitor and gives usage instructions for common use cases, but it lacks a troubleshooting section. The user manual starts with a working device – and my device(s) don’t work. They don’t turn on if the power button is pressed – and it has to be pressed for 3 seconds to turn on the monitor! Yes, the manual is clear on this one: To turn the monitor on by using its power button, you have to press for three, long, “twenty-two”, tedious, “twenty-three”, seconds. That’s like having a light switch, but if you press it in the dark, it requires you to keep pressing because it could be a mistake – do you really want to have the lights on?

The device is still dead, the manual is no help for my situation, so I inspect the material a little bit more thorough. There is a sticker at the bottom of the monitor (at the opposite side from the power plug and the power button) that catches my eye. I have photographed it, because nobody would believe me otherwise. Here it is:

The first sentence is a no-brainer. But the second one is a head-scratcher: “Please turn on the charging switch for the first time”.

There is no mention of a “charging switch” in the manual. There is no switch labeled “charging” on the device. All the buttons/switches and ports that are present are described in the manual and can’t be interpreted as a “charging switch”.

But if you look at the sticker more closely, you’ll see the illustration at the right side. In reality, it is 3 mm wide and 18 mm in height. It is very small. Even smaller are the depicted things – they resemble the input ports on the right side! From the bottom up, there is a USB-C port, a micro-HDMI port and something that is encircled in the illustration. The circle is probably our hint that this is indeed the “charging switch” mentioned on the sticker.

I searched for the switch and only found a notch in the plastic, about 3 mm wide. Only by using a magnifying glass did I find a small black plastic knob at the bottom of the notch (2 mm deep). The knob is probably one square-millimeter tiny. It was situated more to the top of the notch.

I have built electronics since the early nineties. I know how to solder and recognize all kinds of electronic parts. This thing was a DIP-switch, but one of the smallest ones I’ve ever seen. And it wasn’t labeled at all. The only hint we get to search for it is the illustration on the sticker.

So – is it in the “on” position? I decided to find out by moving it down. A paper clip wire was too big to fit, so I used the smallest screwdriver my micro-mechanic screwdriver set would offer. Just a bit smaller and I would have resorted to an actual hair. The DIP-switch moved half a millimeter down and got stuck more to the bottom of the notch.

The monitor suddenly worked – after the three second pressing. The unlabeled “on” position of the unlabeled “charging switch” that you have to manipulate by using the smallest metal rod that you can find in an electronics lab is at the bottom. Good to know.

I won’t reiterate the madness that we just experienced. It gets even worse, so buckle up.

Right now, I have a working monitor that is actually pleasing to use. I buy it again – the same routine. I wonder if I should report the trick to the supplier.

We have more than two workplaces, so I buy the monitor – the same product for the same price – again, but five times now.

I get five packages with identical content. Well, nearly identical. The stickers are different!

Three monitors have the same sticker as seen above. One of them needs to be switched to turn on, the other two were already in the “on” position.

But the other two monitors have a different sticker:

Both monitors were already in the “on” position, so nothing needed to be done. But this sticker tells you to leave the charging switch alone – A switch that is never mentioned in the manual, that is so small that you probably miss it even if you search for it and that needs special equipment to be changed. That’s as if my refrigerator came with a warning sticker not to disable a particular fuse when this fuse is safely hidden away in the internals of the refrigerators electronics and never mentioned in the manual. Why point it out if my only job is to ignore it?

Remember the first manual that “strictly” tells a vague story? This is the same logic. And it gets even better with the second sentence, the one with an exclamation mark! “Let it keep the factory state!” means that it is turned off when coming from the factory? Or does it mean to keep it in the state that is delivered, regardless of the monitor being functional or disabled by it?

I still don’t know what the “on” position of this switch really is and now I’m even more confused than before.

My mind invented this elaborate fantasy story about a factory that produces monitors. One engineer is tasked with designing the charging functionality and adds the “charging switch” to enable or disable the whole feature. But she/he forgets to remove it before the blueprint is committed into production and now the switch is part of the consumer product. The DIP switch is on the “off” position by default from its producer. This renders the first batches of monitors useless because the documentation doesn’t mention the magic switch that needs to be flipped once to have the monitors turn on. The return rates are horrendous and management gets involved. They decide to get rid of the problem by applying a quick fix – the first sticker. This sentences their customers to perform a scavenger hunt of subtle hints to have the monitors work. They also install a new production line station – the switch flipper. This person needs training and is only available for the day shift – Half of the monitors leave the factory with the switch in the “on” position, the other half is in the “off” position. The first sticker remains, it is still a mystery, but the return rates are cut in half nearly overnight.

In my story, the original engineer recognizes her/his error and tries to correct it – by reversing the switch positions. The default position (“off”) now enables the feature, while the “on” position disables it. Just by turning the (still unlabeled) positions around, the factory produces ready-to-use monitors without requiring intervention from the customer.

The problem? A lot of customers have now learned the switch-flip trick and deactivate their product. And the switch flipper still deactivates half of the production without noticing. They need to inform their customers! They apply the second sticker, hoping to clear this matter once and for all.

And here I am, having bought 7 monitors so far and received nearly every possible combination of sticker and initial switch position. I am more confused and wary as if they had stuck to their original approach and just updated their manual.

But there is one indicator that might be helpful: The serial number of the monitors start with some letters and then two digits:

79: You get sticker 1 and need to flip the switch
99: You get sticker 2 and need not flip the switch
69: You get sticker 1, but the switch is already flipped

At least that was my observation with the samples at hand.

What can we, as software developers, learn from this disaster?

First, keep an eye on your feature switches! One non-sensible default and you chase that error forever.

Second, don’t compensate the first error by making the complemental error, too. Sometimes, the cure is worse than the disease.

Third, don’t ever not avoid negative logic! Boolean logic is hard enough itself, if you further complicate it, people like me will just resort to guessing and trial-and-error.

Fourth, and that is the most important one for me: Don’t explain things that need no attention from the user. I’m definitely guilty of that one. Often, I want my documentation to be “complete” and to “show all opportunities” when all I do is confuse my users with sentences like “Do not turn on the charging switch. Let it keep the factory state!” and then never mention the “charging switch” anywhere again.

5 Not-so-Beginner’s React Pitfalls

React, in my opinion, has become quite a useful tool over the years. I admin I haven’t given the other major frameworks a try, but from the look of the resulting code, I only would give Svelte a real chance in the nearer future (in fact, you’d really have to pay me real big money to convince me about Angular).

Now with many of the more useful JS libraries, React is in a state where not only has it survived quite a time (reaching v18 only a few weeks ago), but also breeding a community that harbors a lot of valuable knowledge, enabling one to efecavoid the most common pitfalls at the beginning of your journey. There are lots of resources you can easily find online, from few-hour-courses to several posts in other blogs about the most common traps.

However, in our daily life it appears that there still are some very good points to make about how not to go about React’s unopinionatedness. So these are some of our own findings that I’ve not yet seen overly emphasized, and maybe they are here for your advantage.

1. HAVE YOUR STATES ATOMIC

It might happen that one migrates an older React component where functional programming wasn’t the norm yet, or out of whatever habit, that you declares something like a greedy React state as

const [state, setState] = useState({this: ..., that: ... , ..., ...});

Now your state profits much from immutability (think of this as “your machine then knows that it’s content is clear and unique, given any time”) and therefore you do not need to care about the same-or-not-sameness of state.that when evaluating state.this. Therefore, it is usually advised to split that up into several independent states as

const [this, setThis] = useState(...);
const [that, setThat] = useState(...);
...

That is more readable and everything. However, the most useful rule to build your states is not even to split everything up as small-as-possible, but rather, to have your states atomic. By that, we mean, “not needlessly large, but containing all what might change at the same time”.

One common example is basic data fetching. If you don’t choose to grab for react-query, which I personally like. But if you do e.g. a simple GET request, you usually do not only have “data” (some response), but also at least a “pending” (has the request finished yet?) and an “error” (is this response even usable?) field. These all change at the same time. Thus, they belong to the same entity. That state, designed atomically

const [query, setQuery] = useState({
    pending: false,
    data: null,
    error: null,
});

side note: you might choose not to use the null object as an initial value here because of the known problem of ambivalence with this object. For this illustration, it will suffice.

So, this query state now is atomic. Not to split further without serious consequences, as you will. If you had another, unrelated query, you would not just put it right into the same state entity; but if you had another property of that query (like e.g. a separate field for the status code, …), it would belong.

This helps in having more predictable useEffect, useMemo etc. dependency arrays. You can have an Effect depending on [query] as a whole and this makes complete semantic sense. It would be very hard to predict it’s behaviour, if you mashed multiple queries or whatever-state-you-can-think-of in there.

2.HAVE YOUR EFFECTS ATOMIC & TEAR THEM DOWN

Similarly, it is not super obvious (to the newcomer’s eye at least), that you can have multiple useEffects(). You can adhere to the Single Responsibility principle right there — the only good Effects are the ones that you can grasp in a twinkling of an eye. Use one each for every single thing you want to achieve, don’t lump multiple different things together in a somewhat-“constructor”-type of thinking. This keeps the dependency arrays small and controllable, and there are fewer cases of peculiar “But this CANNOT EVEN happen!!”.

Moreover, Effects have a function designed to clean them up, or the teardown function. If your Effect starts any larger operation and then for some reason your component get’s re-rendered before your operation is finished, you are likely to get hit by that effect in a state where you forgot about it already. You can follow this example

// example: listening to the scroll event
useEffect(() => {
    const handler = (event) => { /* ... */ };
    document.addEventListener('scroll', handler);
    return () => document.removeEventListener('scroll', handler);
}, []);

// or you might do something later in life
useEffect(() => {
    const timeout = setTimeout(() => { /* ... */ }, 5000);
    return () => clearTimeout(timeout);
}, []);

Some asynchronous operations might not have a simple teardown operation, but you can at least tell your Promises to disregard the effect. This is at least responsible for the very ugly

Warning: Can’t perform a React state update on an unmounted component. This is a no-op, but it indicates a memory leak in your application.

If you are responsible, you clean your Browser Console of all of these warnings. It appears if you call a setState-or-similar function at a point where the teardown actually should have happened. This pattern solves that case:

// this example uses a fetch Promise,
// but it also works for stale setTimeout handlers etc.

useEffect(() => {
    let mounted = true;
    fetch('/whatever').then(() => {
        if (mounted) {
            setState(true);
        }
    };
    return () => { mounted = false };
}, []);

// if you do not check for the value of mounted,
// the "memory leak" error can appear, if the
// fetch returns when the component updated meanwhile.

Side note: I also can not recall a single case in which the common React linter rule “exhaustive–deps” was worth ignoring. I had several occasions in which I believed to outsmart the stupid machine, only to end up in much larger problems down the road. Sure, things like Redux’ dispatch() might be cumbersome to include always, but I found that if I just make sure that exhaustive-deps never fires, I am more happy in the long run.

3.USEEFFECT() in too DEEP Functions

Especially in the context of data fetching, it might appear luring to put your useEffect() calls as deep (in the direction of the smallest components) as you can. Even more so, if you don’t have a rigid way of state management.

Now, I feel the point that this appears as “more modular” and flexible, but for me, has happend to situations where way too many requests were sent to our backends. You trade the modularity for the unpredictability of some Effects, so the best way I came to think of it was: Treat useEffect() like a bug.

I’m not saying that using it is wrong. But if you are wary of it’s appearance, this can help. Sometimes, it is just possible to do everything an Effect does – just completely outside React. Maybe, the Effect code can instead live in your index.js (as vanilla JS or otherwise) and just injected into your Root component, e.g. as props or via other libraries. E.g. with a Redux middleware, some effects can run with a higher degree of control about your state.

Remember: Modularity is not bad per se. It’s good. Don’t elevate the most particular effects to the top level of your application, but figure out where they can live well enough so you exactly know when they need to fire.

So far, there hasn’t been a case where I wished that I stuffed my useEffects further down to the virtual DOM leaves, but several, in which elevating them helped me a lot.

4. USE CUSTOM HOOKS with minimal interface

I consider it helpful, even for React beginners, to always be on the lookout of what could be its own React hook. A React Hook is any function that has a name beginning with “use” and for the most time, these consist of some combination of internal useState, useEffect, useContext and useRef definitions.

But their merit is in that they allow for much cleaner, dumber looking Components themselves – consider: dumb components are the best!

If they are only needed once, you can have them co-located next to where they are needed, but even just the act of giving them an own name makes for much more understandable code.

I use custom hooks for a lot of things, e.g.

having a State that is persisted in the localStorage / sessionStorage
having a State that updates in a debounced / throttled / delayed manner
standardizing very basic data fetching
accessing the window width at any time (nice for Responsive layout)
creating a React ref for an element with an “clicked outside” handler
standardized response of messages from connected websockets

I will now spare you the code, but if you have questions about any of these, just drop a comment.

One important point, though: Always have your interface minimal. E.g. if your custom hook has an internal setState(), think hard about whether you pass that function to the outside via the hook return value. Even if you are the only developer on a project, treat yourself as two different instances, one “framework designer” and one “framework consumer”, and as the designer, think hard about what havoc the consumer could do if you allow him too much.

5. Do not duplicate STATE informAtion (especially with react-router)

This applies to every state information, but it’s important to recognize that your URL route is just that: a kind of global state. One that your user can edit directly at any time, leaving the synchronization up to you.

So do not go about it by reading the URL parameters into some state that has it’s own setState! If you define a certain role of a state parameter in your URL, then it is your obligation to have a uni-directional data flow:

From the route, that value flows into your application in a clearly-defined manner,
where you act upon it as you wish, until you need to change it
Then you change the route. Then go back to 1

Of course, one might imagine that in some cases you can not guarantee that. Then maybe do your own synchronization logic, but I would highly advise you to stash that away into e.g. a custom hook, or middleware if you use Redux, so that you can test it thoroughly and it won’t break too soon.

Further note: There are situations where it is quite sensible to have two very similar states, if they have a different responsibility. These are not a bug.

E.g. if you GET a value from a server, then edit it in a controlled <input/> field, and PUT it to the server again, you do not wish to do so on every key press. Then these are not meant to be the same:

the value as you currently know it from the server
the value as it exists inside the <input/>

These are semantically different. They can and should be a different state entity. But if you have something that is utterly dependant on one other state, then chances are you do not really need another entity.

All in all,

that turned out longer than I envisioned it to be become. But I hope it is of any help to any React coders who managed the absolute basics and now are prone to the next-level pitfalls.

The good news is that after a certain bunch of hardships, there is rarely the case of even more surprises. So, manage your state and effects responsibly, especially the asynchronous ones, and the rest are practices that apply for any software development.

Or am I misled?

Reading a conanfile.txt from a conanfile.py

I am currently working on a project that embeds another library into its own source tree via git submodules. This is currently convenient because the library’s development is very much tied to the host project and having them both in the same CMake project cuts down dramatically on iteration times. Yet, that library already has its own conan dependencies in a conanfile.txt. Because I did not want to duplicate the dependency information from the library, I decided to pull those into my host projects requirements programmatically using a conanfile.py.

Luckily, you can use conan’s own tools for that:

from conans.client.loader import ConanFileTextLoader

def load_library_conan(recipe_folder):
    text = Path(os.path.join(recipe_folder, "libary_folder", "conanfile.txt")).read_text()
    return ConanFileTextLoader(text)

You can then use that in your stage methods, e.g.:

    def config_options(self):
        for line in load_library_conan(self.recipe_folder).options.splitlines():
            (key, value) = line.split("=", 2)
            (library, option) = key.split(":", 2)
            setattr(self.options[library], option, value)

    def requirements(self):
        for x in load_library_conan(self.recipe_folder).requirements:
            self.requires(x)

I realize this is a niche application, but it helped me very much. It would be cool if conan could delegate into subfolders natively, but I did not find a better way to do this.

Full-text Search with PostgreSQL

If you want to add simple text search functionality to an application backed by an SQL database one of the first things that may come to your mind is the SQL LIKE operator. The LIKE operator and its case-insensitive sibling ILIKE find substrings in text data via wildcards such as %, which matches any sequence of zero or more characters:

SELECT * FROM book WHERE title ILIKE '%dog%'.

However, this approach satisfies only very basic requirements for text search, because it only matches exact substrings. That’s why application developers often use an external search engine like Elasticsearch based on the Apache Lucene library.

With a PostgreSQL database there is another option: it comes with a built-in full-text search. A full-text search analyzes text according to the language of the text, parses it into tokens and converts them into so-called lexemes. These are strings, just like tokens, but they have been normalized so that different forms of the same word, for example “pony” and “ponies”, are made alike. Additionally, stop words are eliminated, which are words that are so common that they are useless for searching, like “a” or “the”. For this purpose the search engine uses a dictionary of the target language.

In PostgreSQL, there are two main functions to perform full-text search: they are to_tsvector and to_tsquery. The ts part in the function names stands for “text search”. The to_tsvector function breaks up the input string and creates a vector of lexemes out of it, which are then used to perform full-text search using the to_tsquery function. The two functions can be combined with the @@ (match) operator, which applies a search query to a search vector:

SELECT title
  FROM book
  WHERE to_tsvector(title) @@ to_tsquery('(cat | dog) & pony')

The query syntax of ts_query supports boolean operators like | (or), & (and), ! (not) and grouping using parentheses, but also other operators like and <-> (“followed by”) and * (prefix matching).

You can specify the target language as a parameter of to_tsvector:

# SELECT to_tsvector('english', 'Thousands of ponies were grazing on the prairie.');

'graze':5 'poni':3 'prairi':8 'thousand':1

Here’s another example in German:

# SELECT to_tsvector('german', 'Wer einen Fehler begeht, und ihn nicht korrigiert, begeht einen zweiten (Konfuzius)');

'begeht':4,9 'fehl':3 'konfuzius':12 'korrigiert':8 'wer':1 'zweit':11

PostgreSQL supports dictionaries for about 80+ languages out-of-the-box.

The examples in this article are just a small glimpse of what is possible with regards to full-text search in PostgreSQL. If you want to learn more you should consult the documentation. The key takeaway is that there is another option between simple LIKE clauses and an external search engine.

Effective computer names with DNS aliases

If you have a computer in a network, it has a lot of different names and addresses. Most of them are chosen by the manufacturer, like the MAC address of the network device. Some are chosen by you, like the IP address in the local network. And some need to be chosen by you, like the computer’s name in your local DNS (domain name service).

A typical indicator for an under-managed network is the lack of sufficiently obvious computer names in it. You want to connect to the printer? 192.168.0.77 it is. You need to access the network drive? It is reachable under nas-producer-123.local. You can be sure that either of these names change as soon as anything gets modified in the network.

Not every computer in a network needs a never-changing, obvious name. If you connect a notebook for some hours, it can be addressable only by 192.168.0.151 and nobody cares. But there will be computers and similar network devices like printers that stay longer and provide services to others. These are the machines that require a proper name, and probably not only one.

Our approach is a layered one, with four layers:

MAC-address, chosen by the manufacturer
IP address, chosen by our DHCP
Device name, chosen by our DNS
Device aliases, chosen by our DNS

Of course, our DHCP and our DNS is told by our administrator what addresses and names to give out. Our IP addresses are partitioned into sections, but that is not relevant to the users.

The device name is a mapping of a name on an IP address. It is chosen by the administrator in case of a server/service machine. It will tell you about the primary service, like “printer0”, “printer1” or “nas0”. It is not a creative name and should not be remembered or used directly. If the machine has a direct user, like a workstation or a notebook, the user gets to choose the name. The only guideline is to keep it short, the rest is personal preference. This name should only be remembered by the user.

On top of the device name, each machine gets one or several additional DNS names, in the form of DNS aliases (CNAME records). These are the names we work with directly and should be remembered. Let’s see some examples:

I want to print on the laser printer: “laserprinter.local” is the correct address. It is an alias to printer0.local which is a mapping to 192.168.0.77 which resolves to a specific MAC address. If the laser printer gets replaced, every entry in this chain will probably change, except for one: the alias will point to the new printer and I don’t have to care much about it (maybe I need to update my driver).

I want to access the network drive: “nas.local” is one possibility. “networkdrive.local” is another one. Both point to “nas0” today and maybe “nas1” tomorrow. I don’t need to care which computer provides the service, because the service alias always points to the correct machine.

I want to connect to my colleague’s workstation: Because we have different naming preferences, I cannot remember that computer’s name. But I also don’t have to, because the computer has an alias: If my colleague’s name is “Joe”, the computer’s alias is “joe.local”, which resolves to his “totallywhackname.local”, which points to the IP address, etc. There is probably no more obvious DNS name than “joe.local”.

Another thing that we do is give a service its purpose as a name. This blog is run by wordpress, so we would have “wordpress.local”, but also “blog.local” which is the correct address to use if you want to access the blog. Should we eventually migrate our blog to another service, the “blog.local” address would point to it, while the “wordpress.local” address would still point to the old blog. The purpose doesn’t change, while the product that provides it might some day.

Of course, maintaining such a rich ecosystem of names and aliases is a lot of work. We don’t type our zone files directly, we use generators that supply us with the required level of comfort and clarity. This is done by one of our internal tools (if you remember the Sunzu blog post, you now know 2 out of our 53 tools). In short, we maintain a table in our wiki, listing all IP addresses and their DNS aliases and linking to the computer’s detail wiki page. From there, the tool scrapes the computer’s name and MAC address and generates configuration files for both the DHCP and DNS services. We can define our whole network in the wiki and have the tool generate the actual settings for us.

That way, the extra effort for the DNS aliases is negligible, while the positive effects are noticeable. Most network modifications can be done without much reconfiguration of dependent services or machines. And it all starts with alias names for your computers.

	Anonymous on Avoiding Code Style Discu…
	Anonymous on What Happens When We Don’t Lis…
	Writing Integration… on Every Unit Test Is a Stage Pla…
	mariuselvert on C# is very strict about modify…
	Anonymous on C# is very strict about modify…