Miq – Page 13 – Schneide Blog

Lazy Initialization/evaluation can help you performance and memory-consumption-wise

One way to improve performance when working with many objects or large data structures is lazy initialization or evaluation. The concept is to defer expensive operations to the latest moment possible – ideally never. I want to show some examples how to use lazy techniques in java and give you pointers to other languages where it is even easier and part of the core language.

One use case was a large JTable showing hundreds of domain objects consisting of meta-data and measured values. Initially our domain objects held both types of data in memory even though only part of the meta data was displayed in the table. Building the table took several seconds and we were limited to showing a few hundred entries at once. After some analysis we changed our implementation to roughly look like this:

public class DomainObject {
  private final DataParser parser;
  private final Map<String, String> header = new HashMap<>();
  private final List<Data> data = new ArrayList<>();

  public DomainObject(DataParser aParser) {
    parser = aParser;
  }

  public String getHeaderField(String name) {
    // Here we lazily parse and fill the header map
    if (header.isEmpty()) {
      header.addAll(parser.header());
    }
    return header.get(name);
  }
  public Iterable<Data> getMeasurementValues() {
    // again lazy-load and parse the data
    if (data.isEmpty()) {
      data.addAll(parser.measurements());
    }
    return data;
  }
}

This improved both the time to display the entries and the number of entries we could handle significantly. All the data loading and parsing was only done when someone wanted the details of a measurement and double-clicked an entry.

A situation where you get lazy evaluation in Java out-of-the-box is in conditional statements:

// lazy and fast because the expensive operation will only execute when needed
if (aCondition() && expensiveOperation()) { ... }

// slow order (still lazy evaluated!)
if (expensiveOperation() && aCondition()) { ... }

Persistence frameworks like Hibernate often times default to lazy-loading because database access and data transmission is quite costly in general.

Most functional languages are built around lazy evaluation of and their concept of functions as first class citizens and isolated/minimized side-effects support lazyness very well. Scala as a hybrid OO/functional language introduces the lazy keyword to simplify typical java-style lazy initialization code like above to something like this:

public class DomainObject(parser: DataParser) {
  // evaluated on first access
  private lazy val header = { parser.header() }

  def getHeaderField(name : String) : String = {
    header.get(name).getOrElse("")
  }

  // evaluated on first access
  lazy val measurementValues : Iterable[Data] = {
    parser.measurements()
  }
}

Conclusion
Lazy evaluation is nothing new or revolutionary but a very useful tool when dealing with large datasets or slow resources. There may be many situations where you can use it to improve performance or user experience using it.
The downsides are a bit of implementation cost if language support is poor (like in Java) and some cases where the application can feel more responsive with precomputed/prefetched values when the user wants to see the details.

Automated Testing of E-Mail Functionality

Nowadays most (web) applications have to send e-mails for notifications, news, passwords, reminders and so on. You have to ensure that the correct e-mail is really sent to the receiver and it is a good idea to do this using an automated test. Since you usually do not want to depend on the whole network infrastructure and externally installed mail servers which could cause breaking tests without code change once in a while some kind of mock mail server would be nice.

We tried two solutions in our grails web application:

Dumbster

A really simple fake smtp server written in Java. Using it is really simple: You just add the jar to your project and start the smtp server on some port:

SmtpServer mailServer = SmtpServerFactory.startServer(2525);

Your application can now simply send e-mails using the smtp-server running locally on the specified port. In your tests you can check the mail server for the expected e-mails. After each test we stopped dumpster resetting it to a clean state.

mailServer.stop()

The problem we had with this solution was, that stopping the fake smtp was not enough between tests. There were zombie dumpster threads left which caused an excessive load. Our acceptance test run slowed down increasingly with every new test. Eventually, execution time of acceptance tests doubled to more than two hours!

Greenmail (plugin for grails)

Due to the problems with dumbster we tried greenmail with its grails plugin. The greenmail smtp is available as a spring bean in grails and is mostly transparent as a plugin. We only had to change our WithEmailServer rule which provides a clean and enabled state and disables greenmail after the test using the rule.

    void before() {
        SpringUtil.getBean('grailsApplication').config.grails.mail.disabled = false
        SpringUtil.getBean('greenMail').deleteAllMessages()
    }

    void after() {
        SpringUtil.getBean('grailsApplication').config.grails.mail.disabled = true
    }

If your tests are executing in a different JVM than your app is running, you may need to use the Remote Control-plugin to access greenMail or grailsApplication. The grails greenmail plugin provides a controller too, which lets you inspect sent messages over the web interface.

Testing on .NET: Choosing NUnit over MSTest

We sometimes do smaller .NET projects for our clients even though we are mostly a Java/JVM shop. Our key infrastructure stays the same for all projects – regardless of the platform. That means the .NET projects get integrated into our existing continuous integration (CI) infrastructure based on Jenkins. This works suprisingly well even though you need a windows slave and the MSBuild plugin.

One point you should think about is which testing framework to use. MSTest is part of Visual Studio and provides nice integration into the IDE. Using it in conjunction with Jenkins is possible since there is a MSTest plugin for our favorite CI server. One downside is that you need either Visual Studio itself or the Windows SDK (500MB download, 300MB install) installed on the build server in addition to .NET. Another one is that it does not work with the “Express” editions of Visual Studio. Usually that is not a problem for companies but it raises the entry barrier for open source or other non-profit projects by requiring relatively expensive Visual Studio licences.

In our scenarios NUnit proved much lighter and friendlier in installation and usage. You can easily bundle it with your sources to improve self-containment of the project and lessen the burden on the system and tools. If you plug the NUnit tool into the external tools-section of Visual Studio (which also works with Express) the integration is acceptable, too.

Conclusion

If you are not completely on the full Microsoft stack for you project infrastructure using Visual Studio, TeamCity, Sourcesafe et al. it is worth considering choosing NUnit over MSTest because of its leaner size and looser coupling to the Mircosoft stack.

Checking preconditions in advance vs. on demand vs. exceptions

Usually, it is good practice to check certain preconditions before applying operations to input data. This is often referred to as defensive programming. Many people are used to lines like:

public void preformOn(String foo) {
  if (!myMap.containsKey(foo)) {
    // handle it correctly
    return;
  }
  // do something with the entry
  myMap.get(foo).performOperation();
}

While there is nothing wrong with such kind of “in advance checking” it may have performance implications – especially when IO is involved.

We had a problem some time ago when working with some thousand wrappers for File objects. The wrappers checked if the given File object actually is a file using the innocent isFile()-method in the constructor which caused hard disk access each time. So building our collection of wrapped files took quite some time (dozens of seconds) and our client complained (rightfully so!) about the performance. Once the collection was built the operations were fast because no checking was needed anymore.

Our first optimization step was deferring the check to the point where the file was actually used. This sped up the creation of the wrappers so it was barely noticeable but processing a bunch of elements took longer because of additional disk accesses. Even though this approach may work for a plethora of situations for our typical use cases the effect of this optimization was not enough.

So we looked at our problem from another perspective: The vast majority of file handles were actually existing and readable files and directories and foreign/unknown files were the exception. Because of this fact we chose to simply leave out any kind of checks and handle the exceptions! Exception handling is often referred to as slow but if exceptions are rare it can make a difference in some orders of magnitude. Our speed up using this approach was enourmous and the client was happy about sub-second responsiveness for his typical operations. In addition we think that the code now expresses more cleary that irregular files really are the exception and not the rule for this particular code.

Conclusion

There are different approaches to handling of parameters and input data. Depending on the cost of the check and the frequency of special input different strategies may prove beneficial both in expressing your intent and the perceived performance of your application.

Triggering jenkins from git with common post-receive hook

The standard way of triggering Jenkins jobs from a git repository was issuing a get request on the “build now” URL of the job in the post-receive hook, e.g.

curl http://my_ci_server:8080/job/my_job/build?delay=0sec

The biggest problem of this approach is that you have to hardcode the job name into the url. This prevents sharing the hook between repositories and requires you to put an adjusted post-receive hook script into each new repository. Also, additional work has to be done to trigger jobs only for certain branches and the like.
Fortunately, Jenkins offers a new way of triggering jobs from a git repository for quite a while now. Essentially you have to notify jenkins of the commit in your repository and configure the job for polling.

To trigger jobs for the repository git@my_repository_server:my_project.git you can use the following script:

GIT_REPO_URL=git@my_repository_server:`pwd | sed 's:.*\/::'`
curl http://my_ci_server:8080/git/notifyCommit?url=$GIT_REPO_URL

Notice the absence of any repository or job specific stuff in the post-receive hook. Such a hook can be placed in a central location and be shared between repositories using symbolic links.

A tale of anti-virus software killing local connectivity

We are developing and running a distributed system which is deployed on-site at our client. Everything was running smoothly for years only some minor hick-ups related to network infrastructure problems occurred over time. Then one day our client told us the scheduled database backups were not working anymore. We immediately checked the database, all installed firewall programs and the like on that Windows 7 server machine. The Postgresql database was running and our local and remote application components were able to connect. Strangely though, neither pgAdmin nor psql or even telnet were able to make a connection locally to the database!! Adding more oddity we did not change or update any part of the system at the time things stopped working. Remote access to the database was working though leaving us even more confused. To sum up the situation:

Some applications can connect to the database locally, others cannot
Remote access to the database works without problems for all applications, even those that cannot connect locally on the server
We did not change any of these applications, neither client side nor server side
All firewalls were disabled and the problem persisted over reboots

The explanation

So we talked to our client again and depicted our complete analysis pinning the date of the breakage to a moment when we evidently did not change anything. Suddenly it struck him like lightning when he remembered that there was an automatic update of an anti-virus program. He removed the software from the machine and everything worked again as expected. Even reinstalling the anti-virus program did not break the system again. It was only this misbehaving automatic update somewhere in time that killed some part of our system in a most odd way…

Testing C programs using GLib

Writing programs in good old C can be quite refreshing if you use some modern utility library like GLib. It offers a comprehensive set of tools you expect from a modern programming environment like collections, logging, plugin support, thread abstractions, string and date utilities, different parsers, i18n and a lot more. One essential part, especially for agile teams, is onboard too: the unit test framework gtest.

Because of the statically compiled nature of C testing involves a bit more work than in Java or modern scripting environments. Usually you have to perform these steps:

Write a main program for running the tests. Here you initialize the framework, register the test functions and execute the tests. You may want to build different test programs for larger projects.
Add the test executable to your build system, so that you can compile, link and run it automatically.
Execute the gtester test runner to generate the test results and eventually a XML-file to you in your continuous integration (CI) infrastructure. You may need to convert the XML ouput if you are using Jenkins for example.

A basic test looks quite simple, see the code below:

#include <glib.h>
#include "computations.h"

void computationTest(void)
{
    g_assert_cmpint(1234, ==, compute(1, 1));
}

int main(int argc, char** argv)
{
    g_test_init(&argc, &argv, NULL);
    g_test_add_func("/package_name/unit", computationTest);
    return g_test_run();
}

To run the test and produce the xml-output you simply execute the test runner gtester like so:

gtester build_dir/computation_tests --keep-going -o=testresults.xml

GTester unfortunately produces a result file which is incompatible with Jenkins’ test result reporting. Fortunately R. Tyler Croy has put together an XSL script that you can use to convert the results using

xsltproc -o junit-testresults.xml tools/gtester.xsl testresults.xml

That way you get relatively easy to use unit tests working on your code and nice some CI integration for your modern C language projects.

Update:

Recent gtester run the test binary multiple times if there are failing tests. To get a report of all (passing and failing) tests you may want to use my modified gtester.xsl script.

Better diagnostics in TDD

Automated tests gain more and more popularity in our field and our company. Avoiding being slowed down by tests becomes crucial. Steve Freeman has a nice talk on infoq.com with many advices for maintaining the benefits of automated testing without producing too much drag. One seldomly discussed topic is test diagnostics and immediately caught our attention. In short your aim is to produce as meaningful messages as possible for failing tests. This leads to the extended TDD cycle depicted below.

There are several techniques to improve the diagnostics of failing tests. Here is a short list of the most important ones:

Using assertion messages to make clear what exactly failed
Using “named objects” where you essentially just override the toString()-method of some type in your tests to provide meaning for the checked value
```
Date startDate = new Date(1000L) {
    @Override
    public String toString() {
        return "startDate";
    }
};
```

Using “tracer objects” by giving names to mocks/collaborators in the test, e.g. in Mockito-syntax:

EventManager em1 = mock(EventManager.class, "Gavin");
EventManager em2 = mock(EventManager.class, "Frank");
// do something with them

Conclusion

By applying the extended TDD-cycle you can drastically reduce guessing of what went wrong and find regressions much faster without using debug messages or the debugger itself.

Your own CI-based RPM build farm, part 3

In my previous post we learned how to build RPM packages of your software for multiple versions of your target distribution(s). Now I want to present a way of automating the build process and building packages on/for all target platforms. You should have a look at the openSUSE build service to see if it already fits your needs. Then you can stop reading here :-).

We needed better control over the platforms and the process, so we setup a build farm based on the Jenkins continuous integration (CI) server ourselves. The big picture consists of the following components:

build slaves allowing a jenkins user to do unattended builds of the packages
Jenkins continuous integration server using matrix builds with build slaves for each target platform
build script orchestrating the build of all our self-maintained packages
jenkins job to deploy the packages to our RPM repository

Preparing the build slaves

Standard installations of openSUSE need some minor tweaks so they can be used as Jenkins build slaves doing unattended RPM package builds. Here are the changes we needed to make it work properly:

Add a user account for the builds, e.g. useradd -m -d /home/jenkins jenkins and setup a password with passwd jenkins.
Change sshd configuration to allow password authentication and restart sshd.
We will link the SOURCES and SPECS directories of /usr/src/packages to the working copy of our repository, so we need to delete the existing directories: rm -r /usr/src/packages/SPECS /usr/src/packages/SOURCES /usr/src/packages/RPMS /usr/src/packages/SRPMS.
Allow non-priviledged users to work with /usr/src/packages with chmod -R o+rwx /usr/src/packages.
Copy the ssh public key for our git repository to the build account in ~/.ssh/id_rsa
Test ssh access on the slave as our build user with ssh -v git@repository. With this step we confirm the host authenticity one time so that future public key ssh interactions work unattended!
Configure git identity on the slave with git config --global user.name "jenkins@build###-$$"; git config --global user.email "jenkins@buildfarm.myorg.net".
Add privileges for the build user needed for our build process in /etc/sudoers: jenkins ALL = (root) NOPASSWD:/usr/bin/zypper,/bin/rpm

Configuring the build slaves

Linux build slaves over ssh are quite easily configured using Jenkins’ web interface. We add labels denoting the distribution release and architecture to be easily able to setup our matrix builds. Then we setup our matrix build as a new job with the usual parameters for source code management (in our case git) etc.

Our configuration matrix has the two axes Architecture and OpenSuseRelease and uses the labels of the build slaves. Our only build step here is calling the script orchestrating the build of our rpm packages.

Putting together the build script

Our build script essentially sets up a clean environment, builds package after package installing build prerequisites if needed. We use small utility functions (functions.sh) for building a package, installing packages from repository, installing freshly built packages and removing installed RPM. The script contains roughly the following phases:

Figure out some quirks about the environment, e.g. openSUSE release number or architecture to build.
Clean the environment by removing previously installed self-built packages.
Setting up the build environment, e.g. linking folder from /usr/src/packages to our working copy or installing compilers, headers and the like.
Building the packages and installing them locally if they are a dependency of packages yet to be built.

Here is a shortened example of our build script:

#!/bin/bash

RPM_BUILD_ROOT=/usr/src/packages
if [ "i686" = `uname -m` ]
then
  ARCH=i586
else
  ARCH=`uname -m`
fi
SUSE_RELEASE=`cat /etc/SuSE-release | sed '/^[openSUSE|CODENAME]/d' | sed 's/VERSION =//g' | tr -d '[:blank:]' | sed 's/\.//g'`

source functions.sh

# setup build environment
ensureDirectoryLinks
# force a repository refresh without checking the signature
sudo zypper -n --no-gpg-checks refresh -f OUR_REPO
# remove previously built and installed packages
removeRPM libomniORB4.1
removeRPM omniNotify2
# install needed tools
installFromRepo c++-compiler
if [ $SUSE_RELEASE -lt 121 ]
then
  installFromRepo java-1_6_0-sun-devel
else
  installFromRepo jdk
fi
installFromRepo log4j
buildRPM omniORB
installRPM $ARCH/libomniORB4.1
installRPM $ARCH/omniORB-devel
installRPM $ARCH/omniORB-servers
buildAndInstallRPM omniNotify2 $ARCH

Deploying our packages via Jenkins

We setup a second Jenkins job to deploy successfully built RPM packages to our internal repository. We use the Copy Artifacts plugin to fetch the rpms from our build job and put them into a directory like all_rpms. Then we add a build step to execute a script like this:

for i in suse-12.1 suse-11.4 suse-11.3
do
  rm -rf $i
  mkdir -p $i
  versionlabel=`echo $i | sed 's/[-\.]//g'`
  cp -r "all_rpms/Architecture=32bit,OpenSuseRelease=$versionlabel/RPMS" $i
  cp -r "all_rpms/Architecture=64bit,OpenSuseRelease=$versionlabel/RPMS" $i
  cp -r "all_rpms/Architecture=64bit,OpenSuseRelease=$versionlabel/SRPMS" $i
  rsync -e "ssh" -avz $i/* root@rpmrepository.intranet:/srv/www/htdocs/OUR_REPO/$i/
  ssh root@rpmrepository.intranet "createrepo /srv/www/htdocs/OUR_REPO/$i/RPMS"

Summary

With a setup like this we can perform an automatic build of all our RPM packages on several targetplatform everytime we update one of the packages. After a successful build we can deploy our new packages to our RPM repository making them available for our whole organisation. There is an initial amount of work to be done but the rewards are easy, unattended package updates with deployment just one button click away.

Packaging RPMs for a variety of target platforms, part 2

In part 1 of our series covering the RPM package management system we learned the basics and built a template SPEC file for packaging software. Now I want to give you some deeper advice on building packages for different openSUSE releases, architectures and build systems. This includes hints for projects using cmake, qmake, python, automake/autoconf, both platform dependent and independent.

Use existing makros and definitions

RPM provides a rich set of macros for generic access to directory paths and programs providing better portability over different operating system releases. Some popular examples are /usr/lib vs. /usr/lib64 and python2.6 vs. python2.7. Here is an exerpt of macros we use frequently:

%_lib and %_libdir for selection of the right directory for architecture dependent files; usually [/usr/]lib or [/usr/]lib64.
%py_sitedir for the destination of python libraries and %py_requires for build and runtime dependencies of python projects.
%setup, %patch[#], %configure, %{__python} etc. for preparation of the build and execution of helper programs.
%{buildroot} for the destination directory of the build artifacts during the build

Use conditionals to enable building on different distros and releases

Sometimes you have to use %if conditional clauses to change the behaviour depending on

operating system version

%if %suse_version < 1210
  Requires: libmysqlclient16
%else
  Requires: libmysqlclient18
%endif

operating system vendor

%if "%{_vendor}" == "suse"
BuildRequires: klogd rsyslog
%endif

because package names differ or different dependencies are needed.

Try to be as lenient as possible in your requirement specifications enabling the build on more different target platforms, e.g. use BuildRequires: c++_compiler instead of BuildRequires: g++-4.5. Depend on virtual packages if possible and specify the versions with < or > instead of = whenever reasonable.

Always use a version number when specifying a virtual package

RPM does a good job in checking dependencies of both, the requirements you specify and the implicit dependencies your package is linked against. But if you specify a virtual package be sure to also provide a version number if you want version checking for the virtual package. Leaving it out will never let you force a newer version of the virtual package if one of your packages requires it.

Build tool specific advices

qmake: We needed to specify the INSTALL_ROOT issuing make, e.g.:
```
qmake
make INSTALL_ROOT=%{buildroot}/usr
```
autotools: If the project has a sane build system nothing is easier to package with RPM:
```
%build
%configure
make

%install
%makeinstall
```
cmake: You may need to specify some directory paths with -D. Most of the time we used something like:
```
%build
cmake -DCMAKE_INSTALL_PREFIX=%{_prefix} -Dlib_dir=%_lib -G "Unix Makefiles" .
make
```

Working with patches

When packaging projects you do not fully control, it may be neccessary to patch the project source to be able to build the package for your target systems. We always keep the original source archive around and use diff to generate the patches. The typical workflow to generate a patch is the following:

extract source archive to source-x.y.z
copy extracted source archive to a second directory: cp -r source-x.y.z source-x.y.z-patched
make changes in source-x.y.z-patched
generate patch with: cd source-x.y.z; diff -Naur . ../source-x.y.z-patched > ../my_patch.patch

It is often a good idea to keep separate patches for different changes to the project source. We usually generate separate patches if we need to change the build system, some architecture or compiler specific patches to the source, control-scripts and so on.

Applying the patch is specified in the patch metadata fields and the prep-section of the SPEC file:

Patch0: my_patch.patch
Patch1: %{name}-%{version}-build.patch

...

%prep
%setup -q # unpack as usual
%patch0 -p0
%patch1 -p0

Conclusion
RPM packaging provides many useful tools and abstractions to build and package projects for a wide variety of RPM-based operation systems and releases. Knowing the macros and conditional clauses helps in keeping your packages portable.

In the next and last part of this series we will automate building the packages for different target platforms and deploying them to a repository server.

	Anonymous on Cache configuration with WildF…
	Miq on Nested queries like N+1 in pra…
	mariuselvert on Creating functors with lambda…
	Nested queries like… on Common SQL Performance Gotchas…
	Nested queries like… on Make your users happy by not c…