Some time ago we encountered a (Hotspot) JVM crash in our JBOSS 7 application. This happened after being operational for a while. A JVM crash is always serious, but the necessity to fix it when occurring after such a long time was not very high.
JVM crash investigation
Still we did some investigation and it seemed that the space reserved for the Code Cache was too low (out of memory error). The Code Cache is a memory that is used for compilation and storage of native code. It seemed to be a known issue in the 1.6 Hotspot JVM (at least till update 32) and should not occur in the 1.5 or 1.7 Hotspot JVM’s (We didn’t try it though). We found the following bug report that included a work around.
http://bugs.sun.com/view_bug.do?bug_id=7009641
We tweaked the memory of our JBOSS jvm concerning codecache and permgen with the following JVM arguments (the mentioned work around).
-XX:CodeCacheMinimumFreeSpace=… -XX:ReservedCodeCacheSize=…
However, Applying these JVM arguments did not solve the problem. In fact, the JVM crashed again in the first hour it was operational.
After some more investigation into the crash dump and monitoring the memory usage, it was clear that the problem was not really an out of memory error, but more of a memory allocation problem. The codecache was so fragmented that there were only small blocks of memory available. The space that the JVM wanted to allocate was smaller than the biggest block available. In monitoring tools like Visual VM, this is not visible. So it looked like there was enough Code Cache.
Database abstraction layer
Sometimes you have a gut feeling that a certain implementation will give problems in the future. I had noticed even before the memory problem first occurred that our Database service bean was not integrated in the JEE environment very well. We use hibernate and JPA as DB abstraction. In each call to the DB an EntityManagerFactory object, an EntityManager object and a Transaction were created as shown in the following snippet.
EntityManagerFactory emf = Persistence.createEntityManagerFactory("someunit"); EntityManager entityManager = entityManagerFactory.createEntityManager(); entityManager.getTransaction().start(); ... entityManager.getTransactino().commit(); entityManager.close(); emf.close();
The transaction completely bypassed the JEE facilities (i.e. JTA).
I refactored the DB service such that JTA managed transactions were used. The refactored code is similar to the following snippet.
@Stateless public class SomeDAO { @PersistenceContext(unitName="someunit") EntityManager entityManager; void store(SomeObject someObject) { ... } }
However, I was not allowed to integrate this in the release we were making, because we were close to releasing and it was considered high risk to integrate it at that moment.
Solution
You might think, what does this have to do with the mentioned problem. The Code Cache fragmentation was directly caused by this implementation circumventing the JEE container. When this became clear, my change (called the Mark-solution) was integrated in the release and we didn’t see the problem ever again.
Moral of the story: if some solution feels fishy, it is worthwhile to address it because it could indicate a possible problem.