Experimenting with CMake’s unity builds

CMake has an option, CMAKE_UNITY_BUILD, to automatically turn your builds into unity-builds, which is essentially combining multiple source files into one. This is supposed to make your builds more efficient. You can just enable enable it while executing the configuration step of your CMake builds, so it is really easy to test. It might just work without any problems. Here are some examples with actual numbers of what that does with build times.

Project A

Let us first start with a relatively small project. It is a real project we have been developing, that reads sensor data, transports it over the network and displays it using SDL and Dear ImGui. I’m compiling it with Visual Studio (v17.13.6) in CMake folder mode, using build insights to track the actual time used. For each configuration, I’m doing a clean rebuild 3 times. The steps are the number of build statements that ninja runs.

Unity Build#StepsTime 1Time 2Time 3
OFF4013.3s13.4s13.6s
ON2810.9s10.7s9.7s

That’s a nice, but not massive, speedup of 124,3% for the median times.

Project A*

Project A has a relatively high number of non-compile steps: 1 step is code generation, 6 steps are static library linking, and 7 steps are executable linking. That’s a total of 14 non-compile steps, which are not directly affected by switching to unity builds. 5 of the executables in Project A are non-essential, basically little test programs. So in an effort to decrease the relative number of non-compile steps, I disabled those for the next test. Each of those also came with an additional source file, so the total number of steps decreased by 10. This really only decreased the relative amount of non-compile steps from 35% to 30%, but the numbers changes quite a bit:

Unity Build#StepsTime 1Time 2Time 3
OFF309.9s10.0s9.7s
ON189.0s8.8s9.1s

Now the speedup for the median times was only 110%.

Project B

Project B is another real project, but much bigger than Project A, and much slower to compile. It’s a hardware orchestration system with a web interface. As the project size increases, the chance for something breaking when enabling unity builds also increases. In no particular order:

  • Include guards really have to be there, even if that particular header was not previously included multiple times
  • Object files will get a lot bigger, requiring /bigobj to be enabled
  • Globally scoped symbols will name-clash across files. This is especially true for static globals or things in unnamed namespaces, which basically don’t do their job anymore. More subtly, things moved into the global namespace will also clash, such as the classes with the same name moved into the global namespace via using namespace.

In general, that last point will require the most work to resolve. If all fails, you can disable unity build on a target via set_target_properties(the_target PROPERTIES UNITY_BUILD OFF) or even just skip specific files for unity build inclusion via SKIP_UNITY_BUILD_INCLUSION. In Project B, I only had to do this for files generated by CMakeRC. Here are the results:

Unity Build#StepsTime 1Time 2Time 3
OFF416279.4s279.3s284,0s
ON11873.2s76.6s74.5s

That’s a massive speedup of 375%, just for enabling a build-time switch.

When to use this

Once your project has a certain size, I’d say definitely use this on your CI pipeline, especially if you’re not doing incremental builds. It’s not just time, but also energy saved. And faster feedback cycles are always great. Enabling it on developer machines is another matter: it can be quite confusing when the files you’re editing do not correspond to what the build system is building. Also, developers usually do more incremental builds where the advantages are not as high. I’ve also used hybrid approaches where I enable unity builds only for code that doesn’t change that often, and I’m quite satisfied with that. Definitely add an option to turn that off for debugging though. Have you had similar experiences with unity builds? Do tell!

Deferred Constraints in Oracle DB

Foreign key constraints are like rules in your Oracle database that make sure data is linked properly between tables. For example, you can’t add an order for a customer who doesn’t exist – that’s the kind of thing a foreign key will stop. They help enforce data integrity by ensuring that relationships between tables remain consistent. But hidden in the toolbox of Oracle Database is a lesser-known trick: deferred foreign key constraints.

What Are Deferred Constraints?

By default, when you insert or update data that violates a foreign key constraint, Oracle will throw an error immediately. That’s immediate constraint checking.

But with deferred constraints, Oracle lets you temporarily violate a constraint during a transaction – as long as the constraint is satisfied by the time the transaction is committed.

Here’s how you make a foreign key deferrable:

ALTER TABLE orders
  ADD CONSTRAINT fk_orders_customer
  FOREIGN KEY (customer_id)
  REFERENCES customers(customer_id)
  DEFERRABLE INITIALLY DEFERRED;

That last part – DEFERRABLE INITIALLY DEFERRED – is the secret sauce. Now, the constraint check for fk_orders_customer is deferred until the COMMIT.

Use Cases

Let’s look at a few situations where this is really helpful.

One use case are circular references between tables. Say you have two tables: one for employees, one for departments. Each employee belongs to a department. But each department also has a manager – who is an employee. You end up in a “chicken and egg” situation. Which do you insert first? With deferred constraints, it doesn’t matter – you can insert them in any order, and Oracle will only check everything after you’re done.

Another use case is the bulk import of data. If you’re importing a bunch of data (like copying from another system), it can be really hard to insert things in the perfect order to keep all the foreign key rules happy. Deferred constraints let you just insert everything, then validate it all at the end with one COMMIT.

Deferred constraints also help when dealing with temporary incomplete data: Let’s say your application creates a draft invoice before all the customer info is ready. Normally, this would break a foreign key rule. But if the constraint is deferred, Oracle gives you time to finish adding all the pieces before checking.

Caution

Using deferred constraints recklessly can lead to runtime surprises. Imagine writing a huge batch job that appears to work fine… until it crashes at COMMIT with a constraint violation error – rolling back the entire transaction. So only defer constraints when you really need to.

One last tip

If you want to check if a constraint is deferrable in your database you can use the following SQL query:

SELECT constraint_name, deferrable, deferred
  FROM user_constraints
 WHERE table_name='ORDERS';

AI is super-good at guessing, but also totally clueless

AI is still somewhere in its hype phase, maybe towards the end of it. Most of us have used generative AI or use it more or less regularily.

I am experimenting with it every now and then and sometimes even use the output professionally. On the other hand I am not hyped at all. I have mostly mixed feelings (and the bad feelings are not because I fear losing my job…). Let me share my thoughts:

The situation a few years/months ago

Generative AI (or more specifically) chatgpt impressed many people but failed at really simple tasks like

  • Simple trick questions like “Tom’s father has three children. The first one is called Mark and the second Andrea. What is the name of the third child?”
  • Counting certain letters in a word

Another problem were massive hallucinations like shown in Kevlin Henney Kotlin Conf talk and completely made up summaries of a childrens book (german) I got myself.

Our current state of generative AI

Granted, nowadays many of these problems are mitigated and the AI became more useful. On the other hand new problems are found quite frequently and then worked around by the engineers. Here are some examples:

  • I asked chatgpt to list all german cities above 200k citizens. It put out nice tables scraped from wikipedia and other sources, but with a catch: they were all incomplete and the count was clearly wrong. Even after multiple iterations I did not get a correct result. A quick look at the wikipedia page chatgpt used as a source showed a complete picture.
  • If you ask about socially explosive topics like islamic terrorism, crime and controversial people like Elon Musk and Donald Trump you may get varying, questionable responses.

More often then not I feel disappointed when using generative AI. I have zero-trust in the results and end up checking and judging the output myself. In questions about code and APIs I usually have enough knowledge to judge the output or at least take it as a base for further development.

In the really good “IntelliJ Wizardry with AI Assistant Live” by Heinz Kabutz online course we also explored the possibilities, limits and integration of Jetbrains AI assistant. While it may be useful in some situations and has a great integration into the IDE we were not impressed by its power.

Generating test cases or refactoring existing code varies from good to harmful. Sometimes it can find bugs for you, sometimes it cannot…

Another personal critical view on AI

After all the ups and downs and the development and progress with AI and many thoughts and reflections about it I have discovered something that really bothers me about AI:

AI takes away most of the transparency, determinism and control that we as software developers are used to and sometimes worked hard for.

As developers we strive for understanding what is really happening. Non-determinism is one of our enemies. Obfuscated code, unclear definitions, undefined behaviour – these and many other things make me/us feel uncomfortable.

And somehow, for me AI feels the same:

I change a prompt slightly, and sometimes the result does not change at all while at other times it changes almost completely. Sometimes the results are very helpful, on another occasion they are total crap. In the past maybe they were useless, now the same prompts put out useful information or code.

This is where my bad feelings about AI come from. Someone, at some company trains the AI, engineers rules, defines the training data etc. Everything has an enourmous impact on the behaviour of the AI and the results. Everything stays outside of your influence and control. One day it works as intended, on another not anymore.

Conclusion

I do not know enough about generative AI and the mathematics, science and engineering behind it to accurately judge or predict the possibilities and boundaries for the year to come.

Maybe we will find ways to regain the transparency, to “debug” our models and prompts, to be able to reason about the output and to make generative AI reliable and predictable.

Maybe generative AI will collapse under the piles of crap it uses for training because we do not have powerful enough means of telling it how to separate trustful/truthful information from the rest.

Maybe we will use it as assistants in many areas like coding or evaluating X-ray images to sort out the unobtrusive ones.

What I really doubt at this point is that AI will replace professionals, regardless of the field. It may make them more productive or enable us to build bigger/better/smarter systems.

Right now, generative AI sometimes proves useful but often is absolutely clueless.

Java enum inheritance preferences are weird

Java enums were weird from their introduction in Java 5 in the year 2004. They are implemented by forcing the compiler to generate several methods based on the declaration of fields/constants in the enum class. For example, the static Enum::valueOf(String) method is only present after compilation.

But with the introduction of default methods in Java 8 (published 2014), things got a little bit weirder if you combine interfaces, default methods and enums.

Let’s look at an example:

public interface Person {

  String name();
}

Nothing exciting to see here, just a Person type that can be asked about its name. Let’s add a default implementation that makes clearly no sense at all:

public interface Person {

  default String name() {
    return UUID.randomUUID().toString();
  }
}

If you implement this interface in a class and don’t overwrite the name() method, you are the weird one:

public class ExternalEmployee implements Person {

  public ExternalEmployee() {
    super();
  }
}

We can make your weirdness visible by creating an ExternalEmployee and calling its name() method:

public class Main {

  public static void main(String[] args) {
    ExternalEmployee external = new ExternalEmployee();
    System.out.println(external.name());
  }
}

This main method prints the “name” of your external employee on the console:

1460edf7-04c7-4f59-84dc-7f9b29371419

Are you sure that you hired a human and not some robot?

But what if we are a small startup company with just a few regular employees that can be expressed by a java enum?

public enum Staff implements Person {

  michael,
  bob,
  chris,
  ;
}

You can probably predict what this little main method prints on the console:

public class Main {

  public static void main(String[] args) {
    System.out.println(
      Staff.michael.name()
    );		
  }
}

But, to our surprise, the name() method got overwritten, without us doing or declaring to do so:

michael

We ended up with the “default” generated name() method from the Java enum type. In this case, the code generated by the compiler takes precedence over the default implementation in the interface, which isn’t what we would expect at first glance.

To our grief, we can’t change this behaviour back to a state that we want by overwriting the name() method once more in our Staff class (maybe we want our employees to be named by long numbers!), because the generated name() method is declared final. From the source code of the enum class:

/**
 * @return the name of this enum constant
 */
public final String name() {
  return name;
}

The only way out of this situation is to avoid the names of methods that are generated in an enum type. For the more obscure ordinal(), this might be feasible, but name() is prone for name conflicts (heh!).

While I can change my example to getName() or something, other situations are more delicate, like this Kotlin issue documents: https://youtrack.jetbrains.com/issue/KT-14115/Enum-cant-implement-an-interface-with-method-name

And I’m really a fan of Java’s enum functionality, it has the power to be really useful in a lot of circumstances. But with great weirdness comes great confusion sometimes.

Nginx upload limit

Today, I encountered a surprising issue with my Docker-based web application. The application has an upload limit set, but before reaching it, an unexpected error appeared:

413 Request Entity Too Large

Despite the application’s upload limit being correctly configured, the error occurred much earlier—when the file was barely over 1MB. Where does this limitation come from, and how can it be changed?


Troubleshooting

The issue occurred before the request even reached the application layer, during a critical step in request processing. The root cause was Nginx, the web server and reverse proxy used in the Docker stack.

Nginx, commonly used in modern application stacks for load balancing, caching, and HTTPS handling, acts as the gateway to the application, managing all incoming requests. However, Nginx was rejecting uploads larger than 1MB. This was due to the client_max_body_size directive, which—when unset—defaults to a relatively low limit in some configurations. As a result, Nginx blocked larger file uploads before they could reach the application.

Solution

To resolve this issue, the client_max_body_size directive in the Nginx configuration needed to be updated to allow larger file uploads.

Modify the nginx.conf file or the relevant server block configuration:

server {
    listen 80;
    server_name example.com;
    client_max_body_size 100M;  # Allow uploads up to 100MB
}

After making this change, restart Nginx to apply the new configuration:

nginx -s reload

If Nginx is running in a Docker container, you can restart the container instead:

docker restart <container_name>

With this update, the upload limit increased to 100MB, allowing the application to handle larger files without premature rejection. Once the configuration was applied, the error disappeared, and file uploads worked as expected, provided they remained within the newly defined limits.

Useful browser features for the common Web Dev

Once every while, I stumble upon some minor thing ingrained in modern browsers that make some specific problem so much easier. And while the usual Developer Tools offer tons of capabilities, these are so wildly spread and grouped that you easily get used to just ignoring most of them.

So this bunch of things come to my mind that seem not to be super-common knowledge. Maybe you benefit from some of it.

Disclaimer: The browsers I know of have their Dev Tools available via pressing F12 and a somewhat similar layout, even though particular words will differ. My naming here relates to the current Chrome, so your experience might differ.

Disabling the Browser cache

Your Browser caches many resources because the standard user usually prefers speed, is used that the browser endlessly hogs memory anyway, and most resources usually do not change that often anyway.

During development, this might lead to confusion because of course, you do change your sources often (in fact, that is your job), and the browser needs to know that fact.

For that, let it be known:

The "Network Tab" has a Checkbox "Disable Cache".
And the Dev Tools have to be open for it to work.

This is usually so much the default setting on my working machines that I need to remember myself on it when I troubleshoot something on someone else’s machine. Spread the word.

Also, browsers have a Hard-Reload-Feature, like under Chrome, to cleaning the Cache before the reload without having to do so for the whole browser.

Hard-Reload: Ctrl + Shift + R

I’ve read that this is also Chrome’s behaviour when pressing F5 while the Dev Tools are open, but anyway. Take extra care that your customer does not of that feature, because sometimes they might be frustrated about a failed update of your app that is actually just a cached version.

Inspect Network Requests

Depending on the complexity of your web app, or what external dependencies you are importing, it might be that the particular order and content of network requests is not trivial anymore.

Now the “Network” tab (you know it already) is interesting on its own, but remember that it only displays the requests since opening the Dev Tools, i.e. some Page Reload (hard or soft) might be required.

And – this appears somewhat changing between browser updates, don’t ask me why that is necessary – this is good to know:

  • The first column of that list shows only the last part of each request URL, but if you click on it, a very helpful window appears with details of the Request Headers, Payload and Response
  • make sure that in the filters above, the “Fetch/XHR” is active
  • And then some day I found, that
Right-Clicking a request gives you options for Copy, e.g. Copy as fetch, to so you can repeat the exact request from javascript-code for debugging etc.

Inspecting volatile elements

The “Elements” tab (called “Inspector” in other browsers) is quite straightforward to find mistakes in styling or the rendered HTML itself – you see the whole current HTML structure, click your element in there and to the right, from bottom to top you see the CSS rules applied.

But sometimes, it can be hard to inspect elements that change on mouse interaction, and it is good to know that (again, this holds for Chrome), first,

There is a shortcut like Ctrl + Shift + C to get into the Inspector without extra mouse movements

Now think of a popup that vanishes when hovering away. You might do it by using Ctrl + Shift + C to open the Inspector, then using the Keyboard to navigate within (especially with the Tab and Cursor keys), but here’s a small trick I thought of that helped me a lot:

Add (temporarily) an on-click-handler to your popup that calls window.alert(...);

With that, you can open your popup, press Ctrl + Shift + C, then click the popup, and the alert() will now block any events and you can use your mouse to inspect all you want.

In easer cases you could just disable the code that makes the popup go away, but in that case I had, this wasn’t an option either.

Now that I think of it, I could also have used debugger; instead of the alert(), but the point is that you have to block javascript execution only after interacting with the popup element.

The performance API

I have no idea why I discovered that only recently, but if precision in timing is important – e.g. measuring the impact of a possible change on performance, one does not have to resort to Date.now() with its millisecond resolution, but

there is performance.now() to give you microsecond precision for time measurements.

It can afford that by not having the epoch “zero” of Jan 1th 1970 as reference, and instead, the moment of starting your page.

The Performance API has a lot more stuff – which I didn’t need yet.

A FPS Monitor, and a whole world of hidden features

If you do graphics programming or live data visualization of some measurement, it might be of interest to see that. There’s a whole hidden menu at least in Chrome, and you can access it by focussing the Dev Tools, then

  • press Ctrl + Shift + P
  • Enter “FPS” and select
  • Now you have a nice overlay over your page.

Even if you are not in the target group of that specific feature, it might be interesting to know that there is a whole menu of many particular (sometimes experimental) features built into all these browsers. They differ from browser to browser, and version to version, and of course, plugins can do a lot more,

but it might just be worth the idea to think that maybe your work flow can benefit from any of that stuff.

fmt::format vs. std::format

The excellent {fmt} largely served as the blueprint for the C++20 standard formatting library. That alone speaks for its quality. But I was curious: should you now just use std::format for everything, or is fmt::format still a good option? In this particular instance, I wanted to know which one is faster, so I wrote a small benchmark. Of course, the outcome very much depends on the standard library you are using. In my case, I’m using Visual Studio 17.13.0 and its standard library, and {fmt} version 11.1.3.

I started with a benchmark helper function:


template <std::invocable<> F> steady_clock::duration benchmark(std::string_view label, F f)
{
  auto start = steady_clock::now();
  f();
  auto end = steady_clock::now();
  auto time = end - start;
  auto us = duration_cast<nanoseconds>(time).count() / 1000.0;
  std::cout << std::format("{0} took {1:.3f}us", label, us) << std::endl;
  return time;
}

Then I called it with a lambda like this, with NUMBER_OF_ITERATIONS set to 500000:

int integer = 567800;
float real = 1234.0089f;
for (std::size_t i = 0; i < NUMBER_OF_ITERATIONS; ++i)
  auto _ = fmt::format("an int: {}, and a float: {}", integer, real);

… and the same thing with std::format.

Interestingly, fmt::format only needed about 75%-80% of time of std::format in a release build, while the situation reversed for a debug build to about 106%-108%.

It seems hard to construct a benchmark with low overhead of other things, while still avoiding that the compiler can optimize everything away. My code assumes the compiler keeps the formatting even after throwing it away. So take all my results with a grain of salt!

Exploring Dynamic SQL in Oracle: Counting Specific Values Across Multiple Tables

Imagine you have a large database where multiple tables contain a column named BOOK_ID. Perhaps you’re tasked with finding how many times a particular book (say, with an ID of 12345) appears across all these tables. How would you approach this?

In this post, I’ll explain a small piece of Oracle PL/SQL code that uses dynamic SQL to search for a specific value in any table that has a column with a specific name.

Finding the relevant tables

First, we need to determine which tables in the schema have a column with the desired name. To do this, we must look at a metadata table or view that contains information about all the tables in the schema and their columns. Most database systems offer such metadata tables, although their names and their structures vary greatly between different systems. In Oracle, the metadata table relevant to our task is called DBA_TAB_COLUMNS. Therefore, to find the names of all tables that contain a column BOOK_ID, you can use the following query:

SELECT table_name 
  FROM dba_tab_columns 
  WHERE column_name = 'BOOK_ID';

The output might be:

TABLE_NAME
-----------------
BOOKS
LIBRARY_BOOKS
ARCHIVED_BOOKS
TEMP_BOOKS
OLD_BOOKS

Looping through the tables

Now we want to loop through these tables in order to execute an SQL query for each of them. In Oracle we use a PL/SQL FOR loop to do this:

BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    -- do something for each record
  END LOOP;
END;
/

Dynamic SQL Construction

We can use the loop variable rec to dynamically create an SQL statement as a string by using the string concatenation operator || and assign it to a variable, in this case v_sql:

v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

The :val part is a placeholder that will become relevant later.

Of course, the variable needs to be declared for the PL/SQL code block first, so we add a DECLARE section:

DECLARE
  v_sql VARCHAR2(4000);
BEGIN
  -- ...
END;

How do we execute the SQL statement that we stored in the variable? By using the EXECUTE IMMEDIATE statement and two other variables; let’s call them v_result and v_value:

EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;

This will execute the SQL in v_sql, replace the :val placeholder by the value in v_value, and store the result in v_result. The latter will capture the result of our dynamic query, which is the count of occurrences.

Of course, we have to declare these two variables as well. We’ll set v_value to the book ID we are looking for. The whole code so far is:

DECLARE
  v_sql VARCHAR2(4000);
  v_value NUMBER := 12345;  -- Value to search for
  v_result VARCHAR2(4000);
BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

    EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
  END LOOP;
END;
/

Printing the results

If we execute the code above, we might be a little disappointed because it is accepted and executed without any errors, but nothing is printed out. How can we see the results? For that, we need to include DBMS_OUTPUT.PUT_LINE calls:

DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);

But how do we handle the cases if no record was found or if there was an error in the SQL query? We’ll wrap it in an EXCEPTION handling block:

BEGIN
  EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
  DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);
EXCEPTION
  WHEN NO_DATA_FOUND THEN
    NULL;  -- No matching rows found in this table
  WHEN OTHERS THEN
    DBMS_OUTPUT.PUT_LINE('Error in ' || rec.table_name || ': ' || SQLERRM);
END;

There’s still one thing to do. We first have to enable server output to see any of the printed lines:

SET SERVEROUTPUT ON;

This command ensures that any output generated by DBMS_OUTPUT.PUT_LINE will be displayed in your SQL*Plus or SQL Developer session.

Let’s put it all together. Here’s the full code:

SET SERVEROUTPUT ON;

DECLARE
  v_sql VARCHAR2(4000);
  v_value NUMBER := 12345;  -- Value to search for
  v_result VARCHAR2(4000);
BEGIN
  FOR rec IN (SELECT table_name 
              FROM dba_tab_columns 
              WHERE column_name = 'BOOK_ID')
  LOOP
    v_sql := 'SELECT COUNT(BOOK_ID) FROM ' || rec.table_name || ' WHERE BOOK_ID = :val';

    BEGIN
      EXECUTE IMMEDIATE v_sql INTO v_result USING v_value;
      DBMS_OUTPUT.PUT_LINE('Found in ' || rec.table_name || ': ' || v_result);
    EXCEPTION
      WHEN NO_DATA_FOUND THEN
        NULL; -- No rows found in this table
      WHEN OTHERS THEN
        DBMS_OUTPUT.PUT_LINE('Error in ' || rec.table_name || ': ' || SQLERRM);
    END;
  END LOOP;
END;
/

Here’s what the output might look like when the block is executed:

Found in BOOKS: 5
Found in LIBRARY_BOOKS: 2
Found in ARCHIVED_BOOKS: 0

Alternatively, if one of the tables throws an error (for instance, due to a permissions issue or if the table doesn’t exist in the current schema), you might see an output like this:

Found in BOOKS: 5
Error in TEMP_BOOKS: ORA-00942: table or view does not exist
Found in LIBRARY_BOOKS: 2

Conclusion

Dynamic SQL is particularly useful when the structure of your query is not known until runtime. In this case, since the table names come from a data dictionary view (dba_tab_columns), the query must be constructed dynamically.

Instead of writing a separate query for each table, the above code automatically finds and processes every table with a BOOK_ID column. It works on any table with the right column, making it useful for large databases.

Building and running SQL statements on the fly allows you to handle tasks that are not possible with static SQL alone.

Local Javascript module development

https://www.viget.com/articles/how-to-use-local-unpublished-node-packages-as-project-dependencies/

yalc: no version upgrade, no publish etc.

Building an application using libraries – called packages or modules in Javascript – is a common practice since decades. We often use third-party libraries in our projects to not have to implement everything ourselves.

In this post I want to describe the less common situation where we are using a library we developed on our own and/or are actively maintaining. While working on the consuming application we need to change the library sometimes, too. This can lead to a cumbersome process:

  1. Need to implement a feature or fix in the application leads to changes in our library package.
  2. Make a release of the library and publish it.
  3. Make our application reference the new version of our library.
  4. Test everything and find out, that more changes are needed.
  5. Goto 1.

This roundtrip-cycle takes time, creates probably useless releases of our library and makes our development iterations visible to the public.

A faster and lighter alternative

Many may point to npm link or yarn link but there are numerous problems associated with these solutions, so I tried the tool yalc.

After installing the tool (globally) you can make changes to the library and publish them locally using yalc publish.

In the dependent project you add the local dependency using yalc add <dependency_name>. Now we can quickly iterate without creating public releases of our library and test everything locally until we are truly ready.

This approach worked nicely for me. yalc has a lot more features and there are nice guides and of course its documentation.

Conclusion

Developing several javascript modules locally in parallel is relatively easy provided the right tooling.

Do you have similar experiences? Do you use other tools you would recommend?

You are mislead about the Big-O notation

One statement I have people say and people repeat a lot, especially in the data-oriented design bubble, is that Big-O notation cannot accurately real-life performance of contemporary computer programs, especially in the presence of multi-tier memory hierarchies like L1/L2/L3-caches for RAM. This is, at best, misleading and gives this fantastic tool a bad reputation.

At it’s core, Big-O is just a way to categorize functions in how they scale. There’s nothing in the formal definition about performance at all. Of course, it is often used to categorize performance of algorithms and implementations of them. But to use it for that, you need two other things: A machine model and a metric for it.

Traditionally, when performance categorization using Big-O is taught, the machine model is either the Turing-machine or the slightly closer-to-reality RAM-machine. The metric is a number of operations. The operation that is counted has a huge impact. For example, insertion sort can easily be implemented in O(n*log(n)) when counting the number of comparisons (by using binary search to find the insertion point), but is in O(n²) when counting the number of memory moves/swaps.

Neither the model nor the metric is intrinsic to Big-O. To use in in the context of memory hierarchies, you just need to start counting what matters to you, e.g. memory accesses, cache misses or branch mispredictions. This is not new either, I learned about cache-aware and cache-oblivious machine models for this in university over 15 years ago.

TL;DR: Big-O is not obsolete, you just have to use it to count the appropriate performance-critical element in your algorithm.