frz – Page 7 – Schneide Blog

Java’s OptionalInt et al. versus Optional<T>

In Java 8 the Optional type was introduced to avoid the (ab)use of nullable types and null to indicate the absence of a value. It allows the programmer to clearly indicate whether the potential absence of a value is intentional or accidental.

Such option types, sometimes also called Maybe types, have been established in other programming languages, mostly in statically typed functional programming languages like ML and derivatives, but are also emerging in more mainstream languages like Swift.

Java’s Optional type is, to put it mildly, not the most sophisticated implementation of this concept, mostly due to limitations of Java’s existing type system. The Optional type is nullable itself, it’s not a sum type, so it has to rely on runtime exceptions to signal invalid access of a non-existent value, but it’s still useful. Static analysers, usually built into IDEs, can do what the compiler doesn’t and warn if the value is accessed without checking for its presence first.

The Optional type suffers from another limitation of Java’s type system: the fact that primitive types like int, long, double etc. and reference types, derived from Object, aren’t unified in a single type hierarchy. Related to that, primitive types can’t be used as generic type parameters in Java. The language works around this with additional boxed types like Integer, Long and Double for each primitive type.

When the stream API and the Optional type were introduced in Java 8, those primitive types were once again treated with special types: there’s not just Stream<T>, but also IntStream, LongStream, DoubleStream, there’s not just Optional<T>, but also OptionalInt, OptionalLong, OptionalDouble, the same for consumers, suppliers, predicates and functions.

This was done to avoid boxing and unboxing, but also makes it unpleasant to use. What’s worse is that the Optional variants for the primitive types don’t offer the same functionality as Optional<T>: they are lacking the filter, map and flatMap methods as well as the ofNullable factory method. All in all they are less useful than the real Optional, and there’s no convenient way to convert back and forth between, for example, an OptionalInt and Optional<Integer>.

The above mentioned annoyances are the reason why we prefer the generic variant over the special ones for the primitive types by default. Hopefully a future Java release will mitigate this dichotomy between those types, at least by adding the missing methods, but we are not aware of any plans for this yet.

Cache configuration with WildFly, Infinispan, CDI and JCache

This post is about a specific problem I encountered using the WildFly application server in combination with the Infinispan cache module, CDI and the JCache API. If you don’t use this combination of technologies this post is probably not relevant or interesting to you, but I hope it will help someone who encounters the same problem.

The problem

After upgrading an application from WildFly 10 to WildFly 13 it became apparent that the settings for the Infinispan caches from the WildFly configuration file are no longer applied to the caches used by the application.

The cache settings in the WildFly configuration specify a cache container, several local caches and the object memory sizes and expiration lifespans of these caches:

<subsystem xmlns="urn:jboss:domain:infinispan:6.0">
  <cache-container name="myapp" default-cache="default" module="org.wildfly.clustering.web.infinispan" statistics-enabled="true">
    <local-cache name="default" statistics-enabled="true">
      <object-memory size="10000"/>
      <expiration lifespan="86400000"/>
    </local-cache>
    <local-cache name="foo" statistics-enabled="true">
      <object-memory size="10000"/>
      <expiration lifespan="600000"/>
    </local-cache>
  </cache-container>
</subsystem>

The cache manager is injected via CDI resource injection in a Config class as the default cache manager:

class Config {
    @Produces
    @Resource(lookup = "java:jboss/infinispan/container/myapp")
    private EmbeddedCacheManager defaultCacheManager;
}

The caches are used via the @CacheResult annotation from the JCache API (JSR-107):

class FooService {
    @CacheResult(cacheName = "foo")
    public List<Foo> getFoo(String query) {
        // ...
    }
}

With this setup the application worked, the service results were cached, but the cache settings from the configuration file were not applied, as could be seen by inspecting the MBeans of the caches via JConsole. Instead the caches used a default configuration with an expiration lifespan of -1 (never), even though they were assigned to the cache container “myapp” as configured.

The solution

One particular answer to a similar problem description on StackOverflow was helpful in finding the solution. Each cache must be injected once via CDI resource lookup as well:

import org.infinispan.Cache;

class Config {
    @Resource(lookup = "java:jboss/infinispan/cache/myapp/foo")
    private Cache<String, Object> fooCache;

    // ...
}

The format of the JNDI path is:

"java:jboss/infinispan/cache/${cacheContainerName}/${cacheName}"

The property itself will be unused, but the @CacheResult annotation will now use the cache with the correct configuration.

Automated vulnerability checking of software dependencies

The OWASP organization is focused on improving the security of software systems and regularly publishes lists with security risks, such as the OWASP Top 10 Most Critical Web Application Security Risks or the Mobile Top 10 Security Risks. Among these are common attack vectors like command injections, buffer overruns, stack buffer overflow attacks and SQL injections.

When developing software you have to be aware of these in order to avoid and prevent them. If your project depends on third-party software components, such as open source libraries, you have to assess those dependencies for security risks as well. It is not enough to do this just once. You have to check them regularly and watch for any known, publicly disclosed, vulnerabilities in these dependencies.

Publicly known information-security vulnerabilities are tracked according to the Common Vulnerabilities and Exposures (CVE) standard. Each vulnerability is assigned an ID, for example CVE-2009-2704, and published in the National Vulnerability Database (NVD) by the U.S. government. Here’s an example for such an entry.

Automated Dependency Checking

There are tools and services to automatically check the dependencies of your project against these publicly known vulnerabilities, for example the OWASP Dependency Check or the Sonatype OSS Index. In order to use them your project has to use a dependency manager, for example Maven in the Java world or NuGet in the .NET ecosystem.

Here’s how to integrate the OWASP Dependency Check into your Maven based project, by adding the following plugin to the pom.xml file:

<plugin>
  <groupId>org.owasp</groupId>
  <artifactId>dependency-check-maven</artifactId>
  <version>5.0.0-M1</version>
  <executions>
    <execution>
      <goals>
        <goal>check</goal>
      </goals>
    </execution>
  </executions>
</plugin>

When you run the Maven goal dependency-check:check you might see an output like this:

One or more dependencies were identified with known vulnerabilities in Project XYZ:

jboss-j2eemgmt-api_1.1_spec-1.0.1.Final.jar (pkg:maven/org.jboss.spec.javax.management.j2ee/jboss-j2eemgmt-api_1.1_spec@1.0.1.Final, cpe:2.3:a:sun:j2ee:1.0.1:*:*:*:*:*:*:*) : CVE-2009-2704, CVE-2009-2705
...

The output tells you which version of a dependency is affected and the CVE ID. Now you can use this ID to look it up in the NVD database and inform yourself about the potential dangers of the vulnerability and take action, like updating the dependency if there is a newer version, which addresses the vulnerability.

Organize complex SQL queries with Common Table Expressions

Complex SQL database queries often contain subqueries.

SELECT * FROM ... 
   WHERE name IN (SELECT name 
         FROM ... 
         WHERE ...)

These can quickly become unreadable, especially if multiple subqueries are involved. A nice way to organise such queries with multiple subqueries is called Common Table Expressions (CTE), or colloquially: “WITH queries”. They are supported by most contemporary SQL databases.

When using Common Table Expressions you list all the subqueries after a WITH keyword in front of the actual query and assign a name to each subquery:

WITH
  <subquery1_name> AS (SELECT ...),
  <subquery2_name> AS (SELECT ...),
  [ ... ]
SELECT ...
FROM ...
[ ... ]

You can now refer to these subqueries in the main query by their names. They are conceptually just like temporary views. You can also refer to a subquery in the main query multiple times, whereas if the subquery was inlined you would have to repeat it. A subquery defined as a Common Table Expression can also refer to the preceding subqueries in the subquery list.

Recursion

A Common Table Expression can even refer to itself, which is a recursive definition. In some database systems you have to add the RECURSIVE keyword after WITH, in others you can leave it out. Here’s a recursive CTE that calculates factorial numbers:

WITH RECURSIVE factorials (n, fact) AS 
  (SELECT 0, 1
   UNION ALL
   SELECT n+1, (n+1)*fact FROM factorials
          WHERE n < 6)
SELECT * FROM factorials;

You could also use such a recursive query to descend into a tree hierarchy.

While the use cases for recursive queries are less frequent, I find the general concept of Common Table Expressions very useful to make complex queries more readable.

Determining the sizes of Oracle database tables and indexes

For one of our projects we store large amounts of timeseries data in an Oracle database. Sometimes we want to get an overview of how big the tables and related indexes are. Some database client tools like Toad for Oracle can show this information directly in their user interface, but if you use other tools like the SQuirreL SQL Client or JetBrains DataGrip you have to gather this information yourself via SQL queries.

DBA_SEGMENTS and DBA_INDEXES

For Oracle databases this meta information is available via the DBA_SEGMENTS and DBA_INDEXES tables. To query the sizes of several tables in MB use the following query:

SELECT segment_name, segment_type, bytes/1024/1024 MB
  FROM dba_segments
  WHERE segment_type = 'TABLE'
    AND segment_name IN ('TABLE_NAME_1', 'TABLE_NAME_2');

This SQL query returns the sizes of TABLE_NAME_1 and TABLE_NAME_2.

If you want to see the sizes of all the indexes that are associated with a table or a set of tables you can use the following query:

SELECT idx.table_name, idx.index_name, SUM(bytes)/1024/1024 MB
  FROM dba_segments seg,
       dba_indexes idx
  WHERE idx.table_owner = 'SCHEMA_NAME'
    AND idx.table_name IN ('TABLE_NAME_1', 'TABLE_NAME_2')
    AND idx.owner       = seg.owner
    AND idx.index_name  = seg.segment_name
  GROUP BY idx.index_name, idx.table_name;

Of course, you have to replace SCHEMA_NAME, and TABLE_NAME_x with the names in your database.

Unfortunately, access to this kind of meta information is different for each database system, and the queries above only work for Oracle databases.

Consistency over magic, please

The Groovy programming language is a JVM based scripting language. It is used by the Grails web framework and the Gradle build automation system.

Groovy has a language feature called Named argument constructors. This means that given a class with properties, for example

class Example {
  String text
}

you can initialize the properties directly when calling the constructor:

def example = new Example(text: ' This is an example. ')
assert example.text == ' This is an example. '

This is basically a shortcut for initializing the properties via explicit assignment:

def example = new Example()
example.text = ' This is an example. '
assert example.text == ' This is an example. '

So far so good.

Enter Grails

We use the aforementioned Grails framework for some of our web application projects. It is advertised on its website as featuring “convention-over-configuration” and “sensible defaults”. Grails uses the Groovy programming language, and a simple domain class looks just like a plain old Groovy class, except that it lives under the grails-app/domain directory (this is one of the convention-over-configuration aspects):

class Example {
  String text
}

As expected, you can initialize the property via regular assignment:

def example = new Example()
example.text = ' This is an example. '
assert example.text == ' This is an example. '

So one might expect that you can initialize it via a named argument constructor call as well:

def example = new Example(text: ' This is an example. ')
assert example.text == ' This is an example. '

And indeed, you can. But what’s this? Our assertion fails:

assert example.text == ' This is an example. '
               |    |
               |    false
               This is an example.

It is not directly obvious from the assertion failure output, but the property value is indeed no longer equal to the expected text: the leading and trailing spaces got trimmed!

I was surprised, but after some research in Grails documentation it turned out that it’s not a bug, but a feature. In the section on Data Binding, you can find the following sentence:

The mass property binding mechanism will by default automatically trim all Strings at binding time. To disable this behavior set the grails.databinding.trimStrings property to false in grails-app/conf/application.groovy.

Groovy’s named argument constructor feature is used as a data binding mechanism by Grails to bind web request parameters to a domain object. For this the default behavior was modified, so that strings are automatically trimmed. I can only guess that this is considered to be an instance of the “sensible defaults” mentioned on the Grails homepage.

To me personally this kind of surprising behavior is not a sensible default, and I think it goes against the Principle of least astonishement. I prefer consistency over “magic”.

Client-side web development: Drink the Kool-Aid or be cautious?

Client side web development is a fast-changing world. JavaScript libraries and frameworks come and go monthly. A couple of years ago jQuery was a huge thing, then AngularJS, and nowadays people use React or Vue.js with a state container like Redux. And so do we for new projects. Unfortunately, these modern client-side frameworks are based on the npm ecosystem, which is notoriously known for its dependency bloat. Even if you only have a couple of direct dependencies the package manager lock file will list hundreds of indirect dependencies. Experience has shown that lots of dependencies will result in a maintenance burden as time passes, especially when you have to do major version updates. Also, as mentioned above, frameworks come and then go out of fashion, and the maintainers of a framework move on to their next big thing pet project, leaving you and your project sitting on a barely or no longer maintained base, and frameworks can’t be easily replaced, because they tend to permeate every aspect of your application.

With this frustrating experience in mind we recently did an experiment for a new medium sized web project. We avoided frameworks and the npm ecosystem and only used JavaScript libraries with no or very few indirect dependencies, which really were necessary. Browsers have become better at being compatible to web standards, at least regarding the basics. Libraries like jQuery and poly-fills that paper over the incompatibilities can mostly be avoided — an interesting resource is the website You Might Not Need jQuery.

We still organised our views as components, and they are communicating via a very simple event dispatcher. Some things had to be done by foot, but not too much. It works, although the result is not as pure as it would have been with declarative views as facilitated by React and a functional state container like Redux. We’re still fans of the React+Redux approach and we’re using it happily (at least for now) for other projects, but we’re also skeptical regarding the long term costs, especially from relying on the npm ecosystem. Which approach will result in less maintenance burden? We don’t know yet. Time will tell.

Selecting all columns of a database table with an SQL GROUP BY expression

Suppose we have an SQL database table named “temperatures” with the following contents:

LOCATION  TIME        CELSIUS
inside    2018-08-01  24
inside    2018-08-02  28
inside    2018-08-03  21
inside    2018-08-04  28
outside   2018-08-01  29
outside   2018-08-02  31
outside   2018-08-03  25
outside   2018-08-04  30

We want to find the highest temperature for each location. We use the MAX aggregate function and a GROUP BY expression:

SELECT location, MAX(celsius) celsius
FROM temperatures
GROUP BY location;

As expected, the result is:

LOCATION  CELSIUS
outside   31
inside    28

Now suppose we also want to know when each of these extreme temperatures occured. Naively, we try the following query:

SELECT location, time, MAX(celsius) celsius
FROM temperatures
GROUP BY location;

The response is an error: “not a GROUP BY expression”. In a GROUP BY expression all selected columns must be either part of the GROUP BY clause or an aggregate.

To achieve what we want we can use a JOIN:

SELECT
  t.location, t.time, t.celsius
FROM
  temperatures t
JOIN (SELECT location, MAX(celsius) celsius
      FROM temperatures
      GROUP BY location) tmax
ON
  t.location=tmax.location AND t.celsius=tmax.celsius;

This query results in multiple rows per location if the maximum temperature was recorded at different times:

LOCATION  TIME        CELSIUS
outside   2018-08-02  31
inside    2018-08-04  28
inside    2018-08-02  28

If we are only interested in the first occurrence of the maximum temperature per location, we can use the following query:

SELECT
  location,
  MIN(time) KEEP (DENSE_RANK LAST ORDER BY celsius) time,
  MAX(celsius) celsius
FROM
  temperatures
GROUP BY
  location;

LOCATION  TIME        CELSIUS
inside    2018-08-02  28
outside   2018-08-02  31

Here we don’t need a JOIN anymore, because select clause for the time column is an aggregate as well.

Entity Framework migrations with multiple database contexts

The .NET Entity Framework provides functionality for automatic database migrations. Every time your application code requires a change of the database schema you should create a migration, so that the existing database schema is updated when a new version of the application is deployed. Examples for such changes are new entity classes or the addition and removal of properties of existing entity classes. The Entity Framework functionality for database migrations is called Code First Migrations.

Code First Migrations are managed via the so-called Package Manager Console. That’s how it’s called in Visual Studio, because its usually used for package management, but it’s basically a general Power Shell command line interface. After you have created the database context class and the entity model classes for your application, you create an initial migration (usually called InitialCreate), which captures the original state of the database schema for your application:

Add-Migration InitialCreate

This will create a new migration class called InitialCreate in the Migrations folder. The filename is prefixed with a timestamp: 201810702207458_InitialCreate.cs. Each migration class has an Up() method, which applies the migration and a Down() method, which rolls the migration back.

Each subsequent migration only describes the difference to its predecessor migration. For example, you add a new string property Email to your User entity class and add a new migration:

Add-Migration AddUserEmail

The tool will scan your entity classes, compare the current state to the state of the previous migration, calculate the difference and create a new migration class, which adds a new column to the database schema.

When the migrations are run on the target system they are tracked in a special database table called __MigrationHistory.

Multiple database contexts

The above usage of Code First Migrations is well documented. Here I want to describe a feature, that is documented in less detail, because it’s less commonly used: migrations with multiple database contexts.

Let’s assume you have two database contexts: CoreDataContext and MeasurementDataContext. In this case you have to create two migration configuration classes, which inherit DbMigrationsConfiguration. You want to create two subdirectories under the Migrations directory, one for each database context:

In each subdirectory you create a migrations Configuration class, one for each database context:

namespace Migrations.CoreData
{
  internal sealed class Configuration : DbMigrationsConfiguration<CoreDataContext>
  {
    public Configuration()
    {
      MigrationsDirectory = "Migrations\CoreData";
      AutomaticMigrationsEnabled = true;
    }
  }
}

and

namespace Migrations.MeasurementData
{
  internal sealed class Configuration : DbMigrationsConfiguration<MeasruementDataContext>
  {
    public Configuration()
    {
      MigrationsDirectory = "Migrations\MeasurementData";
      AutomaticMigrationsEnabled = true;
    }
  }
}

For each configuration you have to set the MigrationsDirectory property accordingly. The AutomaticMigrationsEnabled property is optional. If it’s set the migrations will be applied automatically at the start of the application.

Now, if you run a migration command like Add-Migration, you have to add the -ConfigurationTypeName option, which specifies the Configuration class for desired the database context:

Add-Migration InitialCreate -ConfigurationTypeName Migrations.CoreData.Configuration


Add-Migration InitialCreate -ConfigurationTypeName Migrations.MeasurementData.Configuration
Add-Migration AddUserEmail -ConfigurationTypeName Migrations.CoreData.Configuration

Add-Migration AddMeasurementTimestamp -ConfigurationTypeName Migrations.MeasurementData.Configuration

The migration classes will now be created in the correct subdirectories.

Handling database warnings with JDBC

Database administrators have the possibility to set lifetimes for user passwords. This can be considered a security feature, so that passwords get updated regularly. But if one of your software services logs into the database with such an account, you want to know when the password expires in good time before this happens, so that you can update the password. Otherwise your service will stop working unexpectedly.

Of course, you can mark the date in your calendar in order to be reminded beforehand, and you probably should. But there is an additional measure you can take. The database administrator can not only set the lifetime of a password, but also a “grace period”. For example:

ALTER PROFILE app_user LIMIT PASSWORD_LIFE_TIME 180 PASSWORD_GRACE_TIME 14;

This SQL command sets the password life time to 180 days (roughly six months) and the grace period to 14 days (two weeks). If you log into the database with this user you will see a warning two weeks before the password will expire. For Oracle databases the warning looks like this:

ORA-28002: the password will expire within 14 days

But your service logs in automatically, without any user interaction. Is it possible to programmatically detect a warning like this? Yes, it is. For example, with JDBC the following code detects warnings after a connection was established:

// Error codes for ORA-nnnnn warnings
static final int passwordWillExpireSoon = 28002;
static final int accountWillExpireSoon = 28011;

void handleWarnings(Connection connection) throws SQLException {
    SQLWarning warning = connection.getWarnings();
    while (null != warning) {
        String message = warning.getMessage();
        log.warn(message);

        int code = warning.getErrorCode();
        if (code == passwordWillExpireSoon) {
            System.out.println("ORA-28002 warning detected");
            // handle appropriately
        }
        if (code == accountWillExpireSoon) {
            System.out.println("ORA-28011 warning detected");
            // handle appropriately
        }
        warning = warning.getNextWarning();
    }
}

Instead of just logging the warnings, you can use this code to send an email to your address, so that you will get notified about a soon-to-be-expired password in advance. The error code depends on your database system.

With this in place you should not be unpleasantly surprised by an expired password. Of course, this only works if the administrator sets a grace period, so you should agree on this approach with your administrator.

	Miq on AI Code Won’t Be for Humans Mu…
	AI Code Won’t Be for… on Impressions of Our Current AI…
	Impressions of Our C… on AI Code Won’t Be for Humans Mu…
	Great software engin… on Digitalization is hard (especi…
	Impressions of Our C… on Computing gets fuzzy again (AI…