Modern developer Issue 4: My SQL toolbox

Thinking in SQL

SQL is such a basic and useful language but the underlying thinking is non-intuitive when you come from imperative languages like Java, Ruby and similar.
SQL is centered around sets and operations on them. The straight forward solution might not be the best one.

Limit

Let’s say we need the maximum value in a certain set. Easy:

select max(value) from table

But what if we need the row with the maximum value? Just adding the other columns won’t work since aggregations only work with other aggregations and group bys. Joining with the same table may be straight forward but better is to not do any joins:

select * from (select * from table order by value desc) where rownum<=1

Group by and having

Even duplicate values can be found without joining:

select value from table group by value having count(*) > 1

Grouping is a powerful operation in SQL land:

select max(value), TO_CHAR(time, 'YYYY-MM') from table group by TO_CHAR(time, 'YYYY-MM')

Finding us the maximum value in each month.

Mapping with outer joins

SQL is also good for calculations. Say we have one table with values and one with a mapping like a precalculated log table. Joining both gets the log of each of your values:

select t.value, log.y from table t left outer join log_table log on t.value=log.x

Simple calculations

We can even use a linear interpolation between two values. Say we have only the function values stored for integers but we values between them and these values between them can be interpolated linearly.

select t.value, (t.value-floor(t.value))*f.y + (ceil(t.value)-t.value)*g.y from table t left outer join function_table f on floor(t.value)=f.x left outer join function_table g on ceil(t.value)=g.x

When you need to calculate for large sets of values and insert them into another table it might be better to calculate in SQL and insert in one step without all the conversion and wrapping stuff present in programming languages.

Conditions

Another often overlooked feature is to use a condition:

select case when MOD(t.value, 2) = 0 then 'divisible by 2' else 'not divisible by 2' end from table t

These handful operations are my basic toolbox when working with SQL, almost all queries I need can be formulated with them.

Dates and timestamps

One last reminder: when you work with time always specify the wanted time zone in your query.

It’s only Cores and Caches but I like it

The programming game changed dramatically in the past ten years. We are playing CPU cores and caches now, but without proper visibility.

Most of our software development economy is based on a simple promise: The computing power (or “performance”) of a common computer will double every two years. This promise accompanied us for 40 years now, a time during which our computers got monitors, acquired harddisks and provided RAM beyond the 640 kB that was enough for nobody. In the more recent years, we don’t operate systems with one CPU, but four, eight or even twelve of them. So it came as a great irritation when ten years ago, Herb Sutter predicted that “The free lunch is over” and even Gordon Moore, the originator of Moore’s Law that forms the basis of our simple promise said that it will only hold true for ten to fiveteen more years. Or, in other words, until today.

Irritation

That’s a bit unsettling, to say the least, and should be motivation enough to have a good look at everything we are doing. Intel, the biggest manufacturer of CPUs for computers, has indicated earlier this year that Moore’s Law cannot be fulfilled any longer. So, the free lunch is really over. And it turns out to have some hidden costs. One cost is a certain complacency, the conviction that things will continue to be as they were and that coding styles chiseled over years and decades hold an inherent value of experience.

Complacency

Don’t get me wrong – there is great value in experience, but not all knowledge of the past is helpful for the future. Sometimes, fundamental things change. Just as the tables will eventually turn for every optimization trick, we need to reevaluate some axioms of our stance towards performance. Let me reiterate some common knowledge:

There are two types of performance inherently baked into your source code: Theoretical and practical performance.

Performance

The first type is theoretical performance, measured in O(n), O(n²) or even O(n!) and mostly influenced by the complexity class of the algorithm you are using. It will translate into runtime behaviour (like in the case of O(n!) your software is already dead, you just don’t know yet), but isn’t concerned with the details of your implementation. Not using an unnecessary high complexity class for a given problem will continue to be a valueable skill that every developer should master.

On the other hand, practical performance is measured in milliseconds (or nanoseconds if you are into micro-benchmarks and can pull off to measure them correctly) and can heavily depend on just a few lines in your source code. Practical performance is the observable runtime behaviour of your software on a given hardware. There are two subtypes of practical performance:

Throughput (How many operations are computed by the system in a given unit of time?)
Latency (How long does it take one operation to be computed by the system?)

If you run a service, throughput is your main metric for performance. If you use a service, latency is your main concern. Let me explain this by the metapher of a breakfast egg. If you want to eat your breakfast at a hotel buffet and the eggs are empty, your main concern is how fast you will get your freshly boiled egg (latency). But if you run the hotel kitchen, you probably want to cook a lot of eggs at once (throughput), even if that means that one particular egg might boil slightly longer as if you’d boiled each of them individually.

Latency

Those two subtypes are not entirely independent from each other. But the main concern for most performance based work done by developers is latency. It is relatively easy to measure and to reason about. If you work with latency-based performance issues, you should know about the latency numbers every programmer should know, either in visual form or translated to a more human time scale. Lets iterate some of the numbers and their scaled counterpart here:

1 CPU cycle (0.3 ns): 1 second
Level 1 cache access (0.9 ns): 3 seconds
Branch mispredict (2.5 ns): 8 seconds
Level 2 cache access (2.8 ns): 9 seconds
Level 3 cache access (12.9 ns): 43 seconds
Main memory access (120 ns): 6 minutes
Solid-state disk I/O (50-150 μs): 2-6 days
Rotational disk I/O (1-10 ms): 1-12 months

We can discuss any number in detail, but the overall message stands out nonetheless: CPUs are lightning fast and caches are the only system components that can somewhat keep up. As soon as your program hits the RAM, your peak performance is lost. This brings us to the main concept of latency optimization:

Your program’s latency is ultimately decided by your ability to decrease cache misses.

You can save CPU cycles by performing clever hacks, but if you are able to always read your data (and code) from the cache, you’ll be 360 times faster than if your program constantly has to read from RAM. Your source code doesn’t have to change at all for this to happen. A good compiler and/or optimizing runtime can work wonders if you adhere to your programming language’s memory model. In reality, you probably have to rearrange your instructions and align your data structures. That’s the performance optimization of today, not the old cycle stinting. The big challenge is that none of these aspects are visible on the source code level of your program. We have to develop our programs kind of blindfolded currently.

Concurrency

One way how we’ve held up Moore’s Law in the last ten years was the introduction of multiprocessor computing into normal computers. If you cram two CPUs onto the die, the number of transistors on it has doubled. A single-threaded program doesn’t run any faster, though. So we need to look at concurrent programming to unlock the full power of our systems. Basically, there are two types of concurrent programming, deliberate and mechanical.

Deliberate concurrent programming means that you as the developer actively introduce threads, fibers or similar concepts into your source code to control parallel computation.
Mechanical concurrent programming means that your source code can be parallelized by the compiler and/or runtime and/or the hardware (e.g. hyper-threading) without changing the correctness of your program.

In both types of concurrent programming, you need to be aware about the constraints and limitations of correct concurrency. It doesn’t matter if your program is blazingly fast and utilizes all cores if the result is wrong or only occasionally correct. Once again, the memory model of your programming language is a useful set of rules and abstractions to guide you. Most higher-level concurrency models like actors narrow your possibilities even further, with functional programming being one of the strictest (and most powerful ones).

In the field of software development, we are theoretically well-prepared to take on the task of pervasive concurrent programming. But we need to forget about the good old times of single-core confirmability and embrace the chaotic world of raw computing power, the world of cores and caches.

Cores ‘n’ Caches

This is our live now: We rely on Cores ‘n’ Caches to feed us performance, but the lunch isn’t free anymore. We have to adapt and adjust, to rethink our core axioms and let go of those parts of our experience that are now hindrances. Just like Rock ‘n’ Roll changed the rules in the music business, our new motto changes ours.

Let’s rock.

Why I’m not using C++ unnamed namespaces anymore

Well okay, actually I’m still using them, but I thought the absolute would make for a better headline. But I do not use them nearly as much as I used to. Almost exactly a year ago, I even described them as an integral part of my unit design. Nowadays, most units I write do not have an unnamed namespace at all.

What’s so great about unnamed namespaces?

Back when I still used them, my code would usually evolve gradually through a few different “stages of visibility”. The first of these stages was the unnamed-namespace. Later stages would either be a free-function or a private/public member-function.

Lets say I identify a bit of code that I could reuse. I refactor it into a separate function. Since that bit of code is only used in that compile unit, it makes sense to put this function into an unnamed namespace that is only visible in the implementation of that unit.

Okay great, now we have reusability within this one compile unit, and we didn’t even have to recompile any of the units clients. Also, we can just “Hack away” on this code. It’s very local and exists solely to provide for our implementation needs. We can cobble it together without worrying that anyone else might ever have to use it.

This all feels pretty great at first. You are writing smaller functions and classes after all.

Whole class hierarchies are defined this way. Invisible to all but yourself. Protected and sheltered from the ugly world of external clients.

What’s so bad about unnamed namespaces?

However, there are two sides to this coin. Over time, one of two things usually happens:

1. The code is never needed again outside of the unit. Forgotten by all but the compiler, it exists happily in its seclusion.
2. The code is needed elsewhere.

Guess which one happens more often. The code is needed elsewhere. After all, that is usually the reason we refactored it into a function in the first place. Its reusability. When this is the case, one of these scenarios usually happes:

1. People forgot about it, and solve the problem again.
2. People never learned about it, and solve the problem again.
3. People know about it, and copy-and-paste the code to solve their problem.
4. People know about it and make the function more widely available to call it directly.

Except for the last, that’s a pretty grim outlook. The first two cases are usually the result of the bad discoverability. If you haven’t worked with that code extensively, it is pretty certain that you do not even know that is exists.

The third is often a consequence of the fact that this function was not initially written for reuse. This can mean that it cannot be called from the outside because it cannot be accessed. But often, there’s some small dependency to the exact place where it’s defined. People came to this function because they want to solve another problem, not to figure out how to make this function visible to them. Call it lazyness or pragmatism, but they now have a case for just copying it. It happens and shouldn’t be incentivised.

A Bug? In my code?

Now imagine you don’t care much about such noble long term code quality concerns as code duplication. After all, deduplication just increases coupling, right?

But you do care about satisfied customers, possibly because your job depends on it. One of your customers provides you with a crash dump and the stacktrace clearly points to your hidden and protected function. Since you’re a good developer, you decide to reproduce the crash in a unit test.

Only that does not work. The function is not accessible to your test. You first need to refactor the code to actually make it testable. That’s a terrible situation to be in.

What to do instead.

There’s really only two choices. Either make it a public function of your unit immediatly, or move it to another unit.

For functional units, its usually not a problem to just make them public. At least as long as the function does not access any global data.

For class units, there is a decision to make, but it is simple. Will using preserve all class invariants? If so, you can move it or make it a public function. But if not, you absolutely should move it to another unit. Often, this actually helps with deciding for what to create a new class!

Note that private and protected functions suffer many of the same drawbacks as functions in unnamed-namespaces. Sometimes, either of these options is a valid shortcut. But if you can, please, avoid them.

Arbitrary 2D curves with Highcharts

Highcharts is a versatile JavaScript charting library for the web. The library supports all kinds of charts: scatter plots, line charts, area chart, bar charts, pie charts and more.

A data series of type line or spline connects the points in the order of the x-axis when rendered. It is possible to invert the axes by setting the property inverted: true in the chart options.

var chart = $('#chart').highcharts({
  chart: {
    inverted: true
  },
  series: [{
    name: 'Example',
    type: 'line',
    data: [
      {x: 10, y: 50},
      {x: 20, y: 56.5},
      {x: 30, y: 46.5},
      // ...
    ]
  }]
});

Connecting points in an arbitrary order

Connecting the points in an arbitrary order, however, is not supported by default. I couldn’t find a Highcharts plugin which supports this either, so I implemented a solution by modifying the getGraphPath function of the series object:

var series = $('#chart').highcharts().series[0];
var oldGetGraphPath = series.getGraphPath;
Object.getPrototypeOf(series).getGraphPath = function(points, nullsAsZeroes, connectCliffs) {
  points = points || this.points;
  points.sort(function(a, b) {
    return a.sortIndex - b.sortIndex;
  });
  return oldGetGraphPath.call(this, points, nullsAsZeroes, connectCliffs);
};

The new function sorts the points by a new property called sortIndex before the line path of the chart gets rendered. This new property must be assigned to each point object of the series data:

series.setData([
  {x: 10, y: 50, sortIndex: 1},
  {x: 20, y: 56.5, sortIndex: 2},
  {x: 30, y: 46.5, sortIndex: 3},
  // ...
], true);

Now we can render charts with points connected in an arbitrary order like this:

A line chart with points connected in arbitrary order

Modern developer #3: Framework independent JavaScript architecture

Usually small JavaScript projects start with simple wiring of callbacks onto DOM elements. This works fine when it the project is in its initial state. But in a short time it gets out of hand. Now we have spaghetti wiring and callback hell. Often at this point we try to get help by looking at adopting a framework, hoping to that its coded best practices draw us out of the mud. But now our project is tied to the new framework.
In search of another, framework independent way I stumbled upon scalable architecture by Nicholas Zakas.
It starts by defining modules as independent units. This means:

separate JavaScript and DOM elements from the rest of the application
Modules must not reference other modules
Modules may not register callbacks or even reference DOM elements outside their DOM tree
To communicate with the outside world, modules can only call the sandbox

The sandbox is a central hub. We use a pub/sub system:

sandbox.publish({type: 'event_type', data: {}});

sandbox.subscribe('event_type', this.callback.bind(this));

Besides being an event bus, the sandbox is responsible for access control and provides the modules with a consistent interface.
Modules are started and stopped (in case of misbehaving) in the application core. You could also use the core as an anti corruption layer for third party libraries.
This architecture gives a frame for implementation. But implementing it raises other questions:

how do the modules update their state?
where do we call the backend?

Handling state

A global model would reside or be referenced by the application core. In addition every module has its own model. Updates are always done in application event handlers, not directly in the DOM event handlers.
Let me illustrate. Say we have a module with keeps track of selected entries:

function Module(sandbox) {
  this.sandbox = sandbox;
  this.selectedEntries = [];
}

Traditionally our DOM event handler would update our model:

button.on('click', function(e) {
  this.selectedEntries.push(entry);
});

A better way would be to publish an application event, subscribe the module to this event and handle it in the application event handler:

this.sandbox.subscribe('entry_selected', this.entrySelected.bind(this));

Module.prototype.entrySelected = function(event) {
  this.selectedEntries.push(event.entry);
};

button.on('click', function(e) {
  this.sandbox.publish({type: 'entry_selected', entry: entry});
});

Other modules can now listen on selecting entries. The module itself does not need to know who selected the entry. All the internal communication of selection is visible. This makes it possible to use event sourcing.

Calling the backend

No module should directly call the backend. For this a special module called extension is used. Extensions encapsulate cross cutting concerns and shield communication with other systems.

Summary

This architecture keeps UI parts together with their corresponding code, flattens callbacks and centralizes the communication with the help of application events and encapsulates outside communication. On top of that it is simple and small.

Remote development with PyCharm

PyCharm is a fantastic tool for python development. One cool feature that I quite like is its support for remote development. We have quite a few projects that need to interact with special hardware, and that hardware is often not attached to the computer we’re developing on.
In order to test your programs, you still need to run it on that computer though, and doing this without tool support can be especially painful. You need to use a tool like scp or rsync to transmit your code to the target machine and then execute it using ssh. This all results in painfully long and error prone iterations.
Fortunately, PyCharm has tool support in its professional edition. After some setup, it allows you do develop just as you would on a local machine. Here’s a small guide on how to set it up with an ubuntu vagrant virtual machine, connecting over ssh. It work just as nicely on remote computers.

1. Create a new deployment configuration

In the Tools->Deployment->Configurations click the small + in the top left corner. Pick a name and choose the SFTP type.

In the “Connection” Tab of the newly created configuration, make sure to uncheck “Visible only for this project”. Then, setup your host and login information. The root path is usually a central location you have access to, like your home folder. You can use the “Autodetect” button to set this up.

For my VM, the settings look like this.

On the “Mappings” Tab, set the deployment path for your project. This would be the specific folder of your project within the root you set on the previous page. Clicking the associated “…” button here helps, and even lets you create the target folder on the remote machine if it does not exist yet.

2. Activate the upload

Now check “Tools->Deployment->Automatic Upload”. This will do an upload when you change a file, so you still need to do the initial upload manually via “Tools->Deployment->Upload to “.

3. Create a project interpreter

Now the files are synced up, but the runtime environment is not on the remote machine. Go to the “Project Interpreter” page in File->Settings and click the little gear in the top-right corner. Select “Add Remote”.

It should have the Deployment configuration you just created already selected. Once you click ok, you’re good to go! You can run and debug your code just like on a local machine.

Have fun developing python applications remotely.

The migration path for Java applets

Java applets have been a deprecated technology for a while. It has become increasingly difficult to run Java applets in modern browsers. Browser vendors are phasing out support for browser plugins based on the Netscape Plugin API (NPAPI) such as the Java plugin, which is required to run applets in the browser. Microsoft Edge and Google Chrome already have stopped supporting NPAPI plugins, Mozilla Firefox will stop supporting it at the end of 2016. This effectively means that Java applets will be dead by the end of the year.

However, it does not necessarily mean that Java applet based projects and code bases, which can’t be easily rewritten to use other technologies such as HTML5, have to become useless. Oracle recommends the migration of Java applets to Java Web Start applications. This is a viable option if the application is not required to be run embedded within a web page.

To convert an applet to a standalone Web Start application you put the applet’s content into a frame and call the code of the former applet’s init() and start() from the main() method of the main class.

Java Web Start applications are launched via the Java Network Launching Protocol (JNLP). This involves the deployment of an XML file on the web server with a “.jnlp” file extension and content type “application/x-java-jnlp-file”. This file can be linked on a web page and looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<jnlp spec="1.0+" codebase="http://example.org/demo/" href="demoproject.jnlp">
 <information>
  <title>Webstart Demo Project</title>
  <vendor>Softwareschneiderei GmbH</vendor>
  <homepage>http://www.softwareschneiderei.de</homepage>
  <offline-allowed/>
 </information>
 <resources>
  <j2se version="1.7+" href="http://java.sun.com/products/autodl/j2se"/>
  <jar href="demo-project.jar" main="true"/>
  <jar href="additional.jar"/>
 </resources>
 <application-desc name="My Project" main-class="com.schneide.demo.WebStartMain">
 </application-desc>
 <security>
  <all-permissions/>
 </security>
 <update check="always"/>
</jnlp>

The JNLP file describes among other things the required dependencies such as the JAR files in the resources block, the main class and the security permissions.

The JAR files must be placed relative to the URL of the “codebase” attribute. If the application requires all permissions like in the example above, all JAR files listed in the resources block must be signed with a certificate.

deployJava.js

The JNLP file can be linked directly in a web page and the Web Start application will be launched if Java is installed on the client operating system. Additionally, Oracle provides a JavaScript API called deployJava.js, which can be used to detect whether the required version of Java is installed on the client system and redirect the user to Oracle’s Java download page if not. It can even create an all-in-one launch button with a custom icon:

<script src="https://www.java.com/js/deployJava.js"></script>
<script>
  deployJava.launchButtonPNG = 'http://example.org/demo/launch.png';
  deployJava.createWebStartLaunchButton('http://example.org/demo/demoproject.jnlp', '1.7.0');
</script>

Conclusion

The Java applet technology has now reached its “end of life”, but valuable applet-based projects can be saved with relatively little effort by converting them to Web Start applications.

C++ Coroutines on Windows with the Fiber API

Last week, I had the chance to try out coroutines as a way to cooperatively interleave long tasks with event-processing. Unlike threads, where you can have interaction between between threads at any time, coroutines need to yield control explicitly, whic arguably makes synchronisation a little simpler. Especially in (legacy) systems that are not designed for concurrency. Of course, since coroutines do not run at the same time, you do not get the perks from concurrency either.

If you don’t know coroutines, think of them as functions that can be paused and resumed.

Unlike many other languages, C++ does not have built-in support for coroutines just yet. There are, however, several alternatives. On Windows, you can use the Fiber API to implement coroutines easily.

Here’s some example code of how that works:

auto coroutine=make_shared<FiberCoroutine>();
coroutine->setup([](Coroutine::Yield yield)
{
  for (int i=0; i<3; ++i)
  {
    cout << "Coroutine " 
         << i << std::endl;
    yield();
  }
});

int stepCount = 0;
while (coroutine->step())
{
  cout << "Main " 
       << stepCount++ << std::endl;
}

Somewhat surprisingly, at least if you have never seen coroutines, this will output the two outputs alternatingly:

Coroutine 0
Main 0
Coroutine 1
Main 1
Coroutine 2
Main 2

Interface

Since fibers are not the only way to implement coroutines and since we want to keep our client code nicely insulated from the windows API, there’s a pure-virtual base class as an interface:

class Coroutine
{
public:
  using Yield = std::function<void()>;
  using Run = std::function<void(Yield)>;

  virtual ~Coroutine() = default;
  virtual void setup(Run f) = 0;
  virtual bool step() = 0;
};

Typically, creation of a Coroutine type allocates all the resources it needs, while setup “primes” it with an inner “Run” function that can use an implementation-specific “Yield” function to pass control back to the caller, which is whoever calls step.

Implementation

The implementation using fibers is fairly straight-forward:

class FiberCoroutine
  : public Coroutine
{
public:
  FiberCoroutine()
  : mCurrent(nullptr), mRunning(false)
  {
  }

  ~FiberCoroutine()
  {
    if (mCurrent)
      DeleteFiber(mCurrent);
  }

  void setup(Run f) override
  {
    if (!mMain)
    {
      mMain = ConvertThreadToFiber(NULL);
    }
    mRunning = true;
    mFunction = std::move(f);

    if (!mCurrent)
    {
      mCurrent = CreateFiber(0,
        &FiberCoroutine::proc, this);
    }
  }

  bool step() override
  {
    SwitchToFiber(mCurrent);
    return mRunning;
  }

  void yield()
  {
    SwitchToFiber(mMain);
  }

private:
  void run()
  {
    while (true)
    {
      mFunction([this]
                { yield(); });
      mRunning = false;
      yield();
    }
  }

  static VOID WINAPI proc(LPVOID data)
  {
    reinterpret_cast<FiberCoroutine*>(data)->run();
  }

  static LPVOID mMain;
  LPVOID mCurrent;
  bool mRunning;
  Run mFunction;
};

LPVOID FiberCoroutine::mMain = nullptr;

The idea here is that the caller and the callee are both fibers: lightweight threads without concurrency. Running the coroutine switches to the callee’s, the run function’s, fiber. Yielding switches back to the caller. Note that it is currently assumed that all callers are from the same thread, since each thread that participates in the switching needs to be converted to a fiber initially, even the caller. The current version only keeps the fiber for the initial thread in a single static variable. However, it should be possible to support this by replacing the single static fiber pointer with a map that maps each thread to its associated fiber.

Note that you cannot return from the fiberproc – that will just terminate the whole thread! Instead, just yield back to the caller and either re-use or destroy the fiber.

Assessment

Fiber-based coroutines are a nice and efficient way to model non-linear control-flow explicitly, but they do not come without downsides. For example, while this example worked flawlessly when compiled with visual studio, Cygwin just terminates without even an error. If you’re used to working with the visual studio debugger, it may surprise you that the caller gets hidden completely while you’re in the run function. The run functions stack completely replaces the callers stack until you call yield(). This means that you cannot find out who called step(). On the other hand, if you’re actually doing a lot of processing in the run function, this is quite nice for profiling, as the “processing” call tree seemingly has its own root in the call-tree.

I just wish the visual studio debugger had a way to view the states of the different fibers like it has for threads.

Alternatives

On Linux, you can use the ucontext.

Visual Studio 2015 also has another, newer, implementation.

Coroutines can be implemented using threads and condition-variables.

There’s also Boost.Coroutine, if you need an independent implementation of the concept. From what I gather, they only use Fibers optionally, and otherwise do the required “trickery” themselves. Maybe this even keeps the caller-stack visible – it is certainly worth exploring.

Using passwords with Jenkins CI server

For many of our projects the Jenkins continuous integration (CI) server is one important cornerstone. The well known “works on my machine” means nothing in our company. Only code in repositories and built, tested and packaged by our CI servers counts. In addition to building, testing, analyzing and packaging our projects we use CI jobs for deployment and supervision, too. In such jobs you often need some sort of credentials like username/password or public/private keys.

If you are using username/password they do not only appear in the job configuration but also in the console build logs. In most cases this is undesirable but luckily there is an easy way around it: using the Environment Injector Plugin.

In the plugin you can “inject passwords to the build as environment variables” for use in your commands and scripts.

The nice thing about this is that the passwords are not only masked in the job configuration (like above) but also in the console logs of the builds!

Another alternative doing mostly the same is the Credentials Binding Plugin.

There is a lot more to explore when it comes to authentication and credential management in Jenkins as you can define credentials at the global level, use public/private key pairs and ssh agents, connect to a LDAP database and much more. Just do not sit back and provide security related stuff plaintext in job configurations or your deployments scripts!

Generating a spherified cube in C++

In my last post, I showed how to generate an icosphere, a subdivided icosahedron, without any fancy data-structures like the half-edge data-structure. Someone in the reddit discussion on my post mentioned that a spherified cube is also nice, especially since it naturally lends itself to a relatively nice UV-map.

The old algorithm

The exact same algorithm from my last post can easily be adapted to generate a spherified cube, just by starting on different data.

After 3 steps of subdivision with the old algorithm, that cube will be transformed into this:

Slightly adapted

If you look closely, you will see that the triangles in this mesh are a bit uneven. The vertical lines in the yellow-side seem to curve around a bit. This is because unlike in the icosahedron, the triangles in the initial box mesh are far from equilateral. The four-way split does not work very well with this.

One way to improve the situation is to use an adaptive two-way split instead:

Instead of splitting all three edges, we’ll only split one. The adaptive part here is that the edge we’ll split is always the longest that appears in the triangle, therefore avoiding very long edges.

Here’s the code for that. The only tricky part is the modulo-counting to get the indices right. The vertex_for_edge function does the same thing as last time: providing a vertex for subdivision while keeping the mesh connected in its index structure.

TriangleList
subdivide_2(ColorVertexList& vertices,
  TriangleList triangles)
{
  Lookup lookup;
  TriangleList result;

  for (auto&& each:triangles)
  {
    auto edge=longest_edge(vertices, each);
    Index mid=vertex_for_edge(lookup, vertices,
      each.vertex[edge], each.vertex[(edge+1)%3]);

    result.push_back({each.vertex[edge],
      mid, each.vertex[(edge+2)%3]});

    result.push_back({each.vertex[(edge+2)%3],
      mid, each.vertex[(edge+1)%3]});
  }

  return result;
}

Now the result looks a lot more even:

Note that this algorithm only doubles the triangle count per iteration, so you might want to execute it twice as often as the four-way split.

Alternatives

Instead of using this generic of triangle-based subdivision, it is also possible to generate the six sides as subdivided patches, as suggested in this article. This approach works naturally if you want to have seams between your six sides. However, that approach is more specialized towards this special geometry and will require extra “stitching” if you don’t want seams.

Code

The code for both the icosphere and the spherified cube is now on github: github.com/softwareschneiderei/meshing-samples.

	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…