JVM crash because of fragmented codecache

Some time ago we encountered a (Hotspot) JVM crash in our JBOSS 7 application. This happened after being operational for a while. A JVM crash is always serious, but the necessity to fix it when occurring after such a long time was not very high.

JVM crash investigation

Still we did some investigation and it seemed that the space reserved for the Code Cache was too low (out of memory error). The Code Cache is a memory that is used for compilation and storage of native code. It seemed to be a known issue in the 1.6 Hotspot JVM (at least till update 32) and should not occur in the 1.5 or 1.7 Hotspot JVM’s (We didn’t try it though). We found the following bug report that included a work around.

http://bugs.sun.com/view_bug.do?bug_id=7009641

We tweaked the memory of our JBOSS jvm concerning codecache and permgen with the following JVM arguments (the mentioned work around).

-XX:CodeCacheMinimumFreeSpace=… -XX:ReservedCodeCacheSize=…

However, Applying these JVM arguments did not solve the problem. In fact, the JVM crashed again in the first hour it was operational.

After some more investigation into the crash dump and monitoring the memory usage, it was clear that the problem was not really an out of memory error, but more of a memory allocation problem. The codecache was so fragmented that there were only small blocks of memory available. The space that the JVM wanted to allocate was smaller than the biggest block available. In monitoring tools like Visual VM, this is not visible. So it looked like there was enough Code Cache.

Database abstraction layer

Sometimes you have a gut feeling that a certain implementation will give problems in the future. I had noticed even before the memory problem first occurred that our Database service bean was not integrated in the JEE environment very well. We use hibernate and JPA as DB abstraction. In each call to the DB an EntityManagerFactory object, an EntityManager object and a Transaction were created as shown in the following snippet.

EntityManagerFactory emf = 
    Persistence.createEntityManagerFactory("someunit");
EntityManager entityManager = entityManagerFactory.createEntityManager();
entityManager.getTransaction().start();
...
entityManager.getTransactino().commit();
entityManager.close();
emf.close();

The transaction completely bypassed the JEE facilities (i.e. JTA).

In general, JEE likes (assumes) to have control over almost everything. Therefore I had a hunch this might give problems (however not sure what kind of problems to expect).

I refactored the DB service such that JTA managed transactions were used. The refactored code is similar to the following snippet.

 
@Stateless
public class SomeDAO {
    @PersistenceContext(unitName="someunit")
    EntityManager entityManager;

    void store(SomeObject someObject) {
    ...
    }
}

However, I was not allowed to integrate this in the release we were making, because we were close to releasing and it was considered high risk to integrate it at that moment.

Solution

You might think, what does this have to do with the mentioned problem. The Code Cache fragmentation was directly caused by this implementation circumventing the JEE container. When this became clear, my change (called the Mark-solution) was integrated in the release and we didn’t see the problem ever again.
Moral of the story: if some solution feels fishy, it is worthwhile to address it because it could indicate a possible problem.

JVM crash because of fragmented codecache

JVM crash investigation

Database abstraction layer

Solution

admin

Previous PostJava 8 is here!

Next PostAsserting exceptions

First8 | Conclusion en volg ons