Analyzing Java Heap problems Part 2: Using Eclipse MAT

In part one we saw how to obtain the data to analyze, the heap dumps. Now we are looking into a nice plugin for the Eclipse IDE for analyzing the dumps.  Compared to the basic tools described in the previous article Memory Analyzer Tool (MAT) offers better usability, performance and some high level analysis and report tools.Eclipse MAT Overview After you open a hprof heap dump with MAT it will generate index files for faster access to all the data you are interested in and show an overview with nice charts.  From here you have access to other views and features:

  • The histogram is somewhat similar to what jHat offers.mat-pathtogcroot It allows you to browse, sort and filter the object instances in memory and shows you instance count and the shallow heap (memory used only by this object instance) and retained heap (memory used by this object instance including referenced objects). From the context menu you can choose “Merge Shortest Paths to GC roots” to see the reference chain of an object all the way up to the classloader. Here we can see that the JDateChooser registers itself at the MenuSelectionManager as a listener which can cause serious memory leaks as described in another post about Java memory handling.
  • The dominator tree allows you to quickly identify the biggest objects and what they reference. Again, using the context menu on an item in the list offers many options to dive deeper into the analysis.
  • The object inspector gives you detailed information about the selected objects like shallow and retained size, its fields and the class loader by whom it was loaded.
  • The leak suspects report tries to give you some high level hints about possible causes of memory problems of your application.
  • MAT Component ReportThe component report provides some very interesting statistics about Strings and collection usage which might be worth looking at if you are not hunting down leaks but trying to reduce overall memory usage. You can even get performance hints when many overfull HashMaps are detected or there are many empty collections which could be better lazily created.

I personally am using the histogram and the dominator tree the most because I am a technical guy and like to hunt down the problems in the code. Nevertheless the reports may show use other valuable aspects which you did not think of before. The MAT team are expanding the tools nicely on that side so that the benefit of these reports is ever increasing.

It is very likely that when you analyze large heap dumps you may need to increase the Java heap size for Eclipse by using the -vmargs -Xmx<memory size> parameter. That way you are able to analyze big heaps > 500M relatively fast and comfortable. For some live demo take a look at a webinar by some of SAPs Eclipse MAT committers.

Merry christmas – and check your candles before you light them

xmas-treeThe Softwareschneiderei (Schneide) team wishes you all a merry christmas and a happy new year.

We might pause this blog for a few weeks as everybody is on holiday.

This year, we sent out christmas cards that needed to be assembled. The card with a tea light made up a little latern. To have all batteries included, we bought some tea lights but couldn’t resist to buy some LED tea lights, too. The we put the parts together in giant envelopes and sent them out.

We got really nice feedback, but several reports suggested we might additionally package a warning note next time we send out LED tea lights that look too similar to traditional ones. The LED tea lights refuse to catch fire when burned, that’s for certain now.

So a word of warning in advance: Double-check your christmas tree candles before you kindle them. They might operate with batteries instead of fire.

Batteries not included

When I bought a label printing device lately, it came bundled with a label tape roll. That suggested instant usage – no need to think of additional parts upfront. Only tousb-battery-little find out that, you’ve guessed it already, batteries weren’t included. A bunch of standardized parts missing (Murphy’s law applied) and the whole ready-to-use package was rendered useless. The time and effort it took me to get the batteries was the same as to get the label tape I really wanted to use instead of the bundled one.

This is a common pattern not only with device manufacturers, but with software developers, too.

Instant feature – just add effort

Frequently, a software comes “nearly” ready-to-use. All you have to do to make it run is

  • upgrade to the latest graphics drivers
  • install some database system (we won’t tell you how as it’s not our business)
  • create some file or directory manually
  • login with administrator rights once (or worse: always) to gather write access to the registry or configuration file
  • review and change the complete configuration prior to first usage

The last point is a personal pet peeve of mine.

It all boils down to the question if a software or a feature is really ready-to-use. Most of the work you have to do manually is tedious or highly error-prone. Why not add support for this apparently crucial steps to the software in the first place?

It works instantly – with my setup

A common mistake made by developers is to forget about the history of a feature emerging in the development labs. The history includes all the little requirements (a writeable folder here, an existing database table there) that will naturally be present on the developer’s machine when she finishes work, because fulfilling them was part of the development process.

If the same developer was forced to recreate the feature on a fresh machine, she would notice all these steps with ease and probably automate or support (e.g. documentate) them, least to save herself the work of wading through it a third time.

But given that most developers regard a feature “finished”, “done” or “resolved” when the code was accepted by the repository (and hopefully the continuous integration system), the aching of the users wont reach them.

This is a case of lacking feedback.

Feel the pain – publicly

To close this open feedback loop, we established a habit of “adopting” features and bringing them to the user in person to overcome the problem of “nearly done”. If you can’t make your own feature run on the client’s machine within a few seconds, is it really that usable and “ready”? The unavoidable presence of the whole process – from the first feature request to the installed and proven-to-work software acts as a deterrent to fall for the “works on my machine” style of programming. It creates a strong relationship between the user, a feature and the developer as a side-effect.

We’ve seen quite a few junior developers experiencing a light bulb moment (and heavy sweating) in front of the customer. This is the hot-wired feedback loop working. In most cases, the situation (a feature requiring non-trivial effort to be run) will not repeat ever.

Batteries are part of the product

If your product (e.g. software) isn’t usable because some standard part (e.g. a folder) is missing, make sure you add these parts to the delivery package. It is a very pleasant experience for the user to just unwrap a software and use it right away. It shows that you’ve been cared for.

Observer/Listener structures in C++ with boost’s smart pointers

Whenever you are developing sufficiently large complex programs in languages like C++ or Java you have to deal with memory issues. This holds true especially when your program is supposed to run 24/7 or close to that. Because these kinds of issues can be hard to get right Java has this nice little helper, the garbage collector. But as Java solves all memory problems, or maybe not? points out, you can still easily shoot yourself in foot or even blow your whole leg away.  One of the problems stated there is that memory leaks can easily occur due to incorrect listener relations. Whenever a listener is not removed properly, which is either a large object itself or has references to such objects,  it’s only a matter of time until your program dies with “OutOfMemoryError” as its last words.  One of the proposed solutions is to use Java weak pointers for listener management.  Let’s see how this translates to C++.

Observer/listener management in C++ is often done using pointers to listener objects. Pointers are pretty weak by default. They can be :

  • null
  • pointing to a valid object
  • pointing to an invalid memory address

In listener relationships especially the latter can be a problem. For example, simple listener management could look like this:

   class SimpleListenerManagement
   {
   public:
      void addListener(MyListener* listener);
      void removeListener(MyListener* listener);
      void notifyListeners();
   private:
      std::list<MyListener*> listeners_;
   };

   void SimpleListenerManagement::notifyListeners()
   {
      // call notify on all listeners
      for (std::list<MyListener*>::iterator iter = listeners_.begin();
          iter != listeners_.end();
          ++iter)
      {
         (*iter)->notify(); // may be a bad idea!
      }
   }

In notifyListeners(), the pointer is used trusting that it still points to a valid object. But if it doesn’t, for instance because the object was deleted but the client forgot to removed it from the listener management, well, too bad.

Obviously, the situation would be much better if we didn’t use raw pointers but some kind of wrapper objects instead.  A first improvement would be to use boost::shared_ptr in the listener management:

   typedef boost::shared_ptr<MyListener> MyListenerPtr;

   class SimpleListenerManagement
   {
   public:
      void addListener(MyListenerPtr listener);
      void removeListener(MyListenerPtr listener);
      void notifyListeners();
   private:
      std::list<MyListenerPtr> listeners_;
   };

Provided that the given MyListenerPtr instance was created correctly by the client we can be sure now that all listeners exist when we call notify() on them.  Seems much better now. But wait! Using boost::shared_ptr, we now hold  strong references in our listeners list and are therefore kind of in the same situation as described in the post mentioned above. If the client forgets to remove its MyListenerPtr instance it never gets deleted and may be in a invalid state next time notify() is called.

A solution that works well in most cases is to use boost::weak_ptr to hold the listeners. If you see boost::shared_ptr on a level with normal Java references, boost::weak_ptrs are roughly the same as Java’ s weak references. Our listener management class would then look like this:

   typedef boost::shared_ptr<MyListener> MyListenerPtr;
   typedef boost::weak_ptr<MyListener> MyListenerWeakPtr;

   class SimpleListenerManagement
   {
   public:
      void addListener(MyListenerPtr listener);
      void removeListener(MyListenerPtr listener);
      void notifyListeners();
   private:
      std::list<MyListenerWeakPtr> listeners_; // using weak_ptr
   };

Note that addListener and removeListener still use MyListenerPtr as parameter. This ensures that the client provides valid listener objects.  The interesting stuff happens in notifyListeners():

   void SimpleListenerManagement::notifyListeners()
   {
      std::list<MyListenerWeakPtr>::iterator iter = listeners_.begin();
      while(iter != listeners_.end())
      {
         if ((*iter).expired())
         {
            iter = listeners_.erase(iter);
         }
         else
         {
            MyListenerPtr listener = (*iter).lock(); // create a shared_ptr from the weak_ptr
            listener->notify();
            ++iter;
         }
      }
   }

Each weak_ptr can now be checked if its object still exists before using it. If the weak_ptr is expired, it can simply be removed from the listeners list. With this implementation the removeListener method becomes optional and can as well be omitted. The client only has to make sure that the shared_ptr holding the listener gets deleted somehow.

We’ve won a prize!

When we switched our continuous integration platform from CruiseControl to Hudson, it was still a younhudsonbutler-149_50pxg project with many white areas on the project roadmap. But it already was powerful enough to handle our settings and delivered real value, so we eagerly wanted to contribute back.

The development process with “release early, release often” policy and community focussed drive fit right into our mindset, so we (in fact, mostly me) spent a few nights figuring out what and how to contribute. The effort materialized in a few private tweaks and a new plugin: the Crap4J hudson plugin.

With the knowledge of Hudson’s internals, we were able to help out various customers to set up their own sophisticated installations (which nearly led to the development of a Perforce plugin when Mike Wille finished his one right on time). This led to various bug reports and feature requests that we filed to Hudson’s issue database.

Soon afterwards, Sun Microsystems announced the GlassFish Awards Program (GAP) as part of the Community Innovation Awards Program. Hudson was part of the participating projects, so I gave it a try and submitted some feature requests and the plugin.

lauriersAnd we won a prize! It’s not the big sort of prize (look at position 50 in the list), but a reward for our filed issues and a honorable mention of the plugin (which truly stands no chance compared to the awesome work of Dr. Hafner, who contributed a complete “get-them-all” collection of useful metrics reporting plugins). At least, we are the only winner from Karlsruhe.

Lately, we blogged about awarding your customers. Well, that’s just what Sun did here. Thanks for that!

JTable index madness

A coworker of mine recently stumbled upon a strange looking JTable:
A broken down JTable

This reminded me of an effect I have seen several times. Digging through the source code of the JTable we found an unusual handling of TableEvents:

    public void tableChanged(TableModelEvent e) {
        if (e == null || e.getFirstRow() == TableModelEvent.HEADER_ROW) {
            // The whole thing changed
            clearSelectionAndLeadAnchor();

            rowModel = null;

            if (getAutoCreateColumnsFromModel()) {
		// This will effect invalidation of the JTable and JTableHeader.
                createDefaultColumnsFromModel();
		return;
	    }

	    resizeAndRepaint();
            return;
        }
...

The hidden problem here is that the value of TableModelEvent.HEADER_ROW is -1. So sending a TableEvent to the table with a obviously wrong index causes the table to reset discarding all renderers, column sizes, etc. And this is regardless of the type of the event (INSERT, UPDATE and DELETE). Yes, it is a bug in our implementation of the table model but instead of throwing an exception like IndexOutOfBounds it causes another event which resets the table. Not an easy bug to hunt down…

Spelling the feedback: The LED bar

Our fully automated project ecosystem provides us with feedback of very different type and granularity. We felt it was impossible to render every single notable event into its own extreme feedback device (XFD). Instead, we implemented an universal feedback source: the LED bar.

ledbar-alone

You know the LED bar already from a shop window of your town. It tells you about the latest special bargain, the opening hours of the shop or just something you didn’t want to know. But you’ve read it, because it is flashing and moving. You just can’t pass that shop window without noticing the text on the LED bar.

Our LED bar sells details to us. The most important issues are already handled by the ONOZ Lamp and the Audio feedback, as both are very intrusive. The LED bar is responsible to spell the news, rather than to tell it.

A very comforting news might be “All projects sane”, which happen to be our regular state. You might be told that you rendered “project X BROKEN”, but you already know this, as the ONOZ Lamp lit up and you were the one to check in directly before. It’s better to be informed that “project X sane” was the build’s outcome. After a while, the text returns to the regular state or blanks out.

Setting up the LED bar

We aren’t the only ones out there with a LED bar on the wall. Dirk Ziegelmeier for example installed his at the same time, but blogged much earlier about it. He even gives you detailed information about the communication protocol used by the device and a C# implementation for it. The lack of protocol documentation was a bugger for us, too. We reverse engineered it independently and confirm his information. We wrote a complete Java API for the device (in our case a LSB-100R), which we might open source on request. Just drop us a note if you are interested.

Basically, we wrote an IRC bot that understands commands given to it and transforms it into API calls. The API then deals with the low-level transformation and the device handshake. This way, software modules that want to display text on the LED bar from anywhere on the internal net only need to talk on IRC.

The idea of connecting an IRC channel and the led bar isn’t unique to us, either. The F-Secure Linux Team blogged about their setup, which is disturbingly equal to ours. Kudos to you guys for being cool, too.

Effects of the LED bar

The LED bar is the perfect place to indicate project news. Its non-intrusive if you hold back those “funny” displaying effects but versatile enough to provide more than simple binary (on/off) information. Its the central place to look up to if you want to know what’s the news.

We even found out that our company logo (created by Hannafaktur) is scalable down to 7×7 pixels, which exactly fits the LED bar in height:

logo_on_led

Try this with your company’s logo!


Read more about our Extreme Feedback Devices:

Award your Customer

Recently, we successfully finished a web app project that had many specialties we never had before. Major issues were very tight budget and time constraints (about 3 months) including an absolutely unpostponable deadline. However, the bigger concern for us was the diversity of our customer. Although we had one or two main reference persons, for the project to be successful we depended on the collaboration of a total of 8 departments.

As a first step to meet those challenges we decided on one-week iteration cycles – the shortest ever for us. At the kick-off meeting, where delegates of all departments were assembled, we presented our strategy and tried to make clear that communication and collaboration would be essential for the project to succeed. We also invited everyone to come to iteration meetings even when the agenda is not exactly about their specific requirements. After the meeting we hoped for the best.

With (almost) all departments it went like this: We did one requirements gathering appointment with one or two delegates and they either showed up once or twice on following meetings or they approved our implementation based on emailed screen shots. With most departments, email response time was good, with some, well, let’s just say holiday season didn’t really help. But altogether it was sufficient to keep the project well on track.

But wait! Did I say all departments? Not exactly! One single department actually managed it to sent at least one delegate to every single iteration meeting. And they not only enjoyed coffee and cookies but contributed a great deal every time. This was very helpful for us especially because after every iteration, we were a little bit more confident that we were still on the right track. Towards the end of the project, when success was foreseeable, we had the idea that their outstanding performance had to be rewarded somehow. So at the last iteration meeting, again with people from every department, we presented them with the Continuous Collaboration Award. ccaward They were very delighted and for the others it was a good laugh. And with the help of a little champagne and some snacks it became a very nice last iteration meeting.

As many of you know, good understanding between customer and developer can never be taken for granted. This is why agile methods always put great emphasis on extensive customer communication. A-Story-of-Project-Failure-Mitch-Lacey shows that even agile-by-the-book projects can fail basically due to lack of understanding on customer side. So do it like us and, if they deserve it, show your appreciation to your customer once in a while in a more creative way. And if you use a cup, make sure that there is also champagne around to fill it.

Analyzing Java Heap problems Part 1: Basic actions and tools

You think that your shiny Java app has some memory issues but how do you find out if that is true and what is taking up all that memory? Knowing the potential problems is fine. Nevertheless you still have to find out your actual problems. There are several instruments available to help you analyse your Java application regarding its memory usage. I will tell you about increasing your maximum heap (most of you surely know  about that), looking at the memory of a running app, making heap dumps (on demand or on OutOfMemoryException) and analyzing the dumps.

Increasing maximum heap

The Java VM has a setting that defines the maximum amount of heap memory available to your application. It defaults to 64MB which is enough for many programs. If you have a larger application you should try to start it with that value increased by passing the -Xmx<size>m parameter to the VM at startup. <size> is the value in MBytes so just fiddle around with that. If your app is leaking memory that won’t help you for long so you have to find out *if* it leaks.

Looking at memory usage of a running application

You can use jconsole for a quick look at your applications resource usage. jconsole is part of the Sun JDK since Java 6. You can connect the jconsole to any running java applications on your computer or even reachable over network and offering the Java Management Extensions (JMX) over TCP. Non-leaking programs should have a memory graph like this:

You can see, that the memory fluctuates over time because of the garbage collection cycles. But overall it does not grow. Next we will look at an application that leaks memory:

Above we see that the garbage collector (GC) tries its best but the used memory is growing over time. If we see such behaviour we probably need a heap dump to analyze the issue further.

Making a heapdump

Basically you have two nice ways to get a heap dump of your application which you can look into at a later time:

  1. Use jmap (which is also part of the Sun JDK 6) to dump the heap of a running application to a file using a command line like jmap -dump:format=b,file=myheap.hprof <pid>
  2. Tell the VM to make a heap dump when an OutOfMemoryException occurs by adding -XX:+HeapDumpOnOutOfMemoryError to the VM parameters at startup. With another switch you can specify the path for the dumps: -XX:HeapDumpPath=jmxdata .

After you have obtained a dump of your application you certainly want to have a look at it and find the issues. You can start with Sun’s jhat which is also part of current JDKs. After supplying jhat the hprof-file you can point your browser to the integrated webserver of jhat and browse the heap looking for the objects that take up your memory.

That way you can get an idea of what objects lived in memory when the heap dump was made and how they were referenced.

Conclusion

We have seen many ways to perform memory diagnostics using only free tools which are part of the JDK from Sun. They are all nice but have their limitations. Especially jhat has problems with usability and performance when you examine larger heap dumps with it.

Next time I will show you how to use the Eclipse plugin MAT for analysis of heap dumps obtained in one of the above ways. So stay tuned!

Extreme Feedback Device (XFD): The ONOZ! Lamp

When two good ideas meet, there’s a chance for an even better idea to be born. This happened to us some time ago, when the ONOZ! Lamp came up.

(This is a free translation and revision of an earlier article written in german)

The first good idea

On April 1st 2004, Alberto Savioa published a blog entry about an idea of two lava lamps (green and red) displaying the current build state of a project. I was somewhat distracted that day, marrying my wife, so the idea came to us two years later. Mike Clark wrote his wonderful book “Pragmatic Project Automation” and included not only the idea of the lava lamps, but also detailed construction guidance.

The second good idea

One day, an email contained a little animated gif with two panic guys running around.

Investigation suggests Jonn Wood as the author. We thought the guys act exactly like us after a broken build (one of the worst things that may happen here), so the gif and the word “ONOZ” were integrated into our company culture.

The birth of another idea

After we read about the lava lamps, we wanted to own them, too. But only after inspiration from the animated gif, we were sure about our specific realisation. We merged the two states into one lamp (on/off instead of green/red) and did without the lava. A normal desk light would do the job now.
We even have a good justification for the omission of the green (lava) lamp:

  • it saves energy
  • no timer switch is needed for the nights/weekends
  • our team includes colorblinds

The last reason is a good one when you look at this simulation of colorblindness:
These are pictures of the original green and red lava lamps:

This is how it looks to a colorblind employee. These images were generated by Vischeck, a website trying to inform about colorblindness practically.

Not much of a difference. If you swap them around secretly, ten percent (the percentage of colorblinds in the male population) of your team will panic without reason.

The ONOZ! Lamp

With little investment, we build a system supervising the build state of all our projects. Every build process sends its result to a server that checks for failures. If a build failed, the lamp gets switched on over traditional X10 signals. We can’t overlook the sudden burst of luminance, we panic a bit and try to fix the build. The lamp turns off when all projects are back to normal.

The ONOZ! Lamp is just a lamp standing around, until something ugly happens. Then it turns into a glowing infernal of failure. We nearly failed to give it a correctly spelled name, too. We named it “ONOEZ! Lamp” first, which seems to be the only invalid spelling of this exclamation.

The effects

The ONOZ! Lamp works great. Its mere presence has a comforting effect, as long as it is off. Which is the case most of the time. When it fires, the effect is like an alarm stopping all work. And the operator giving the alarm is always alert and incorruptible: our continuous integration server.


Read more about our Extreme Feedback Devices: