/ Java EE Support Patterns

2.09.2012

Java Heap Space: What is it?

This article will provide you with a high level overview of the Java Heap Space and will help improve your knowledge in this area.

Additional complementary articles are provided at the end of this post. 

Background

When learning Java for the first time, a lot of focus is often spent on the Java language itself, Object-oriented programming principles, design patterns, compilation etc. and not so much on the Java VM itself such as the Java Heap memory management, garbage collection, performance tuning which are often considered “advanced” topics.

A beginner Java or Java EE programmer ends up creating his first program or Web application. Java Heap memory problems are then often observed such as OutOfMemoryError which can be quite challenging for Java beginners or even intermediates to troubleshoot.

Sounds familiar?

Java Heap Space – Overview & life cycle

Proper knowledge of the Java VM Heap Space is critical; including for Java beginner so my recommendation to you is to learn these principles at the same time you learn the Java language technicalities.

Your Java VM is basically the foundation of your Java program which provides you with dynamic memory management services, garbage collection, Threads, IO and native operations and more.

The Java Heap Space is the memory “container” of you runtime Java program which provides to your Java program the proper memory spaces it needs (Java Heap, Native Heap) and managed by the JVM itself.

Your Java program life cycle typically looks like this:

-        Java program coding (via Eclipse IDE etc.) e.g. HelloWorld.java
-        Java program compilation (Java compiler or third party build tools such as Apache Ant, Apache Maven..) e.g. HelloWord.class
-        Java program start-up and runtime execution e.g. via your HelloWorld.main() method

The Java Heap space is mainly applicable and important for the third step: runtime execution. For the HotSpot VM, the Java Heap Space is split in 3 silos:

-        Java Heap for short & long lived objects (YoungGen & OldGen spaces)
-        PermGen space
-        Native Heap

Now let’s dissect your HelloWorld.class program so you can better understand.

-        At start-up, your JVM will load and cache some of your static program and JDK libraries to the Native Heap, including native libraries, Mapped Files such as your program Jar file(s), Threads such as the main start-up Thread of your program etc.
-        Your JVM will then store the “static” data of your HelloWorld.class Java program to the PermGen space (Class metadata, descriptors..)
-        Once your program is started, the JVM will then manage and dynamically allocate the memory of your Java program to the Java Heap (YoungGen & OldGen). This is why it is so important that you understand how much memory your Java program needs to you can properly fine-tuned the capacity of your Java Heap controlled via –Xms & -Xmx JVM parameters. Profiling, Heap Dump analysis allow you to determine your Java program memory footprint
-        Finally, the JVM has to also dynamically release the memory from the Java Heap Space that your program no longer need; this is called the garbage collection process. This process can be easily monitored via the JVM verbose GC or a monitoring tool of your choice such as Java VisualVM.

Sounds complex? The good news is that the JVM maturity has improved significantly over the last 10 years and provides you with out-of-the-box tools allowing you to understand your Java program Java Heap allocation monitor it and fine-tuned.

Related posts & case studies

I suggest that you review the articles below for more detail on this topic. You will also find from this Blog several case studies on OutOfMemoryError related problems and resolution strategies.

For any question or additional help please simply post a comment or question below this article. You can also email me directly @[email protected].

2.08.2012

Too many open files – Case Study

This case study describes the complete root cause analysis and resolution of a File Descriptor (Too many open files) related problem that we faced following a migration from Oracle ALSB 2.6 running on Solaris OS to Oracle OSB 11g running on AIX.

This article will also provide you with proper AIX OS commands you can use to troubleshoot and validate the File Descriptor configuration of your Java VM process.

Environment specifications

-        Java EE server: Oracle Service Bus 11g
-        Middleware OS: IBM AIX 6.1
-        Java VM: IBM JRE 1.6.0 SR9 – 64 bit
-        Platform type: Service Bus – Middle Tier

Problem overview

-        Problem type: java.net.SocketException: Too many open files error was observed under heavy load causing our Oracle OSB managed servers to suddenly hang

Such problem was observed only during high load and did require our support team to take corrective action e.g. shutdown and restart the affected Weblogic OSB managed servers

Gathering and validation of facts

As usual, a Java EE problem investigation requires gathering of technical and non technical facts so we can either derived other facts and/or conclude on the root cause. Before applying a corrective measure, the facts below were verified in order to conclude on the root cause:

·        What is the client impact? HIGH; Full JVM hang
·        Recent change of the affected platform? Yes, recent migration from ALSB 2.6 (Solaris OS) to Oracle OSB 11g (AIX OS)
·        Any recent traffic increase to the affected platform? No
·        What is the health of the Weblogic server? Affected managed servers were no longer responsive along with closure of the Weblogic HTTP (Server Socket) port
·        Did a restart of the Weblogic Integration server resolve the problem? Yes but temporarily only

-        Conclusion #1: The problem appears to be load related

Weblogic server log files review

A quick review of the affected managed servers log did reveal the error below:

java.net.SocketException: Too many open files

This error indicates that our Java VM process was running out of File Descriptor. This is a severe condition that will affect the whole Java VM process and cause Weblogic to close its internal Server Socket port (HTTP/HTTPS port) preventing any further inbound & outbound communication to the affected managed server(s).

File Descriptor – Why so important for an Oracle OSB environment?

The File Descriptor capacity is quite important for your Java VM process. The key concept you must understand is that File Descriptors are not only required for pure File Handles but also for inbound and outbound Socket communication. Each new Java Socket created to (inbound) or from (outound) your Java VM by Weblogic kernel Socket Muxer requires a File Descriptor allocation at the OS level.

An Oracle OSB environment can require a significant number of Sockets depending how much inbound load it receives and how much outbound connections (Java Sockets) it has to create in order to send and receive data from external / downstream systems (System End Points).

For that reason, you must ensure that you allocate enough File Descriptors / Sockets to your Java VM process in order to support your daily load; including problematic scenarios such as sudden slowdown of external systems which typically increase the demand on the File Descriptor allocation.

Runtime File Descriptor capacity check for Java VM and AIX OS

Following the discovery of this error, our technical team did perform a quick review of the current observed runtime File Descriptor capacity & utilization of our OSB Java VM processes. This can be done easily via the AIX procfiles <Java PID> | grep rlimit & lsof -p <Java PID> | wc –l commands as per below example:

## Java VM process File Descriptor total capacity

>> procfiles 5425732 | grep rlimit
  Current rlimit: 2000 file descriptors

## Java VM process File Descriptor current utilization

>> lsof -p <Java PID> | wc –l
  1920

As you can see, the current capacity was found at 2000; which is quite low for a medium size Oracle OSB environment. The average utilization under heavy load was also found to be quite close to the upper limit of 2000.

The next step was to verify the default AIX OS File Descriptor limit via the ulimit -S –n command:

>> ulimit -S –n
  2000

-        Conclusion #2: The current File Descriptor limit for both OS and OSB Java VM appears to be quite low and setup at 2000. The File Descriptor utilization was also found to be quite close to the upper limit which explains why so many JVM failures were observed at peak load

Weblogic File Descriptor configuration review

The File Descriptor limit can typically be overwritten when you start your Weblogic Java VM. Such configuration is managed by the WLS core layer and script can be found at the following location:

<WL_HOME>/wlserver_10.3/common/bin/commEnv.sh

..................................................
resetFd() {
  if [ ! -n "`uname -s |grep -i cygwin || uname -s |grep -i windows_nt || \
       uname -s |grep -i HP-UX`" ]
  then
    ofiles=`ulimit -S -n`
    maxfiles=`ulimit -H -n`
    if [ "$?" = "0" -a  `expr ${maxfiles} : '[0-9][0-9]*$'` -eq 0 -a `expr ${ofiles} : '[0-9][0-9]*$'` -eq 0 ]; then
      ulimit -n 4096
    else
      if [ "$?" = "0" -a `uname -s` = "SunOS" -a `expr ${maxfiles} : '[0-9][0-9]*$'` -eq 0 ]; then
        if [ ${ofiles} -lt 65536 ]; then
          ulimit -H -n 65536
        else
          ulimit -H -n 4096
        fi
      fi
    fi
  fi
.................................................

Root cause: File Descriptor override only working for Solaris OS!

As you can see with the script screenshot below, the override of the File Descriptor limit via ulimit is only applicable for Solaris OS (SunOS) which explains why our current OSB Java VM running on AIX OS did end up with the default value of 2000 vs. our older ALSB 2.6 environment running on Solaris OS which had a File Descriptor limit of 65536.


Solution: script tweaking for AIX OS

The resolution of this problem was done by modifying the Weblogic commEnv script as per below. This change did ensure a configuration of 65536 File Descriptor (from 2000); including for the AIX OS:


** Please note that the activation of any change to the Weblogic File Descriptor configuration requires a restart of both the Node Manager (if used) along with the managed servers. **

A runtime validation was also performed following the activation of the new configuration which did confirm the new active File Descriptor limit:

>> procfiles 6416839 | grep rlimit
  Current rlimit: 65536 file descriptors

No failure has been observed since then.

Conclusion and recommendations

-        When upgrading your Weblogic Java EE container to a new version, please ensure that you verify your current File Descriptor limit as per the above case study
-         From a capacity planning perspective, please ensure that you monitor your File Descriptor utilizaiton on a regular basis in order to identify any potential capacity problem, Socket leak etc..

Please don’t hesitate to post any comment or question on this subject if you need any additional help.

2.03.2012

PRSTAT Linux – How to pinpoint high CPU Java VM Threads

This article will provide you with an equivalent approach, for JVM on Linux OS, of the powerful Solaris OS prstat command; allowing you to quickly pinpoint the high CPU Java VM Thread contributors.

One key concept to understand for a Java VM running on the Linux OS is that Java threads are implemented as native Threads, which results in each thread being a separate Linux process.

Ok thanks for the info but why is this related to prstat?

Well this key concept means that you don’t need a prstat command for Linux. Since each Java VM Thread is implemented as a native Thread, each Thread CPU % can simply be extracted out-of-the-box using the top command.

You still need to generate Thread Dump data of your JVM process in order to correlate with the Linux top command output.

Thanks for this explanation. Now please show me how to do this

Please simply follow the instructions below:

1)     Execute the top command (press SHIFT-H to get the Threads toggle view) or use -H option (to show all Threads) and find the PID associated with your affected  / high CPU WLS process(es) (remember, many may show up since each Java Thread is implemented as a separate Linux process)
2)     Immediately after, generate a few Thread Dump snapshots using kill –3 <PID> of the parent WLS process. Thread Dump provides you with the complete list with associated Stack Trace of each Java Thread within your JVM process
3)     Now, convert the PID(s) extracted from the top command output to HEX format
4)     The next step is to search from the Thread Dump data for a match nid=<HEX PID>
5)     The final step is to analyze the affected Thread(s) and analyze the Stack Trace so you can determine where in the code is the problem (application code, middleware itself, JDK etc.)

Example: top command captured of a Weblogic Server Java Thread running at 40% CPU utilization along with Thread Dump data via kill -3 <PID>

## top output sample 
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND
...........
22111 userWLS 9 0 86616 84M 26780 S 0.0 40.1 0:00 java

  • Decimal to HEX conversion of Java Thread (native Thread) 22111 >> 0x565F
  • Now using the HEX value, we can search within the Thread Dump for the following keyword: nid=0x565F

## Thread Dump output sample Thread as per the above search criteria nid=0x565F
"ExecuteThread: '0' for queue: 'default'" daemon prio=1 tid=0x83da550 nid=0x565F waiting on monitor [0x56138000..0x56138870]
  at java.util.zip.ZipFile.getEntry(Native Method)
  at java.util.zip.ZipFile.getEntry(ZipFile.java:172)
  at java.util.jar.JarFile.getEntry(JarFile.java:269)
  at java.util.jar.JarFile.getJarEntry(JarFile.java:252)
  at sun.misc.URLClassPath$JarLoader.getResource(URLClassPath.java:989)
  at sun.misc.URLClassPath$JarLoader.findResource(URLClassPath.java:967)
  at sun.misc.URLClassPath.findResource(URLClassPath.java:262)
  at java.net.URLClassLoader$4.run(URLClassLoader.java:763)
  at java.security.AccessController.doPrivileged(AccessController.java:224)
  at java.net.URLClassLoader.findResource(URLClassLoader.java:760)
  at java.lang.ClassLoader.getResource(ClassLoader.java:444)
  at java.lang.ClassLoader.getResourceAsStream(ClassLoader.java:504)
  ............................................

In the above example, the problem was related to an excessive class loading / IO problem.

As you can see, the approach did allow us to quickly pinpoint high CPU Thread contributor but you will need to spend additional time to analyze the root cause which is now your job.

Need any additional help?

I hope this short tutorial has helped you understand how you can pinpoint high CPU Thread contributors for your JVM running on the Linux OS.

For any question or additional help please simply post a comment or question below this article. You can also email me directly @[email protected].

1.22.2012

java.lang.NullPointerException: How to resolve

Exception in thread "main" java.lang.NullPointerException is one of the common problems and Java error that you will face when developing Java or Java EE applications. This Java Exception has been around since early JDK days e.g. JDK 1.0.

Most of you probably have seen and resolve this problem multiple times so this article is mainly dedicated for individuals new in Java or interested to revisit this Java Exception.

java.lang.NullPointerException: Overview

NullPointerException is a runtime Exception thrown by the JVM when your application code, other referenced API(s) or middleware (Weblogic, WAS, JBoss...) encounters the following conditions:

-        Attempting to invoke an instance method of a null object
-        Attempting to access or modify a particular field of a null object
-        Attempting to obtain the length of such null object as an array

java.lang.NullPointerException: Sample Java program

** A YouTube tutorial video is now available.

It is always best to learn with examples and sample Java programs. The program below is a very simple Java program generating a java.lang.NullPointerException. Please simply copy/paste and run the program with the IDE of your choice (Eclipse IDE was used for this example).

package org.ph.java.courses;

/**
 * NullPointerExceptionSampleProgram
 * @author Pierre-Hugues Charbonneau
 *
 */
public class NullPointerExceptionSampleProgram {
      
       private String field1 = null;
       private String field2 = null;   
      
       public String getField1() {
             return field1;
       }

       public void setField1(String field1) {
             this.field1 = field1;
       }

       public String getField2() {
             return field2;
       }

       public void setField2(String field2) {
             this.field2 = field2;
       }

       /**
        * @param args
        */
       public static void main(String[] args) {
            
             try {
                    // Create a fresh object instance
                    NullPointerExceptionSampleProgram objectInstance =
                                 new NullPointerExceptionSampleProgram();                   
                    // Initialize field1...
                    objectInstance.setField1("field1Value");
                   
                    // reset our object instance to null
                    objectInstance = null;
                   
                    // Now initialize field2...BOOM! >> NullPointerException!
                    objectInstance.setField2("field1Value2");
            
             } catch (Throwable any) {
                    System.out.println("Java ERROR: "+any);
                    any.printStackTrace();
             }
            
       }

}

If you run the program as is, you will see the output as per below:

java.lang.NullPointerException
       at org.ph.java.courses.NullPointerExceptionSampleProgram.main
(NullPointerExceptionSampleProgram.java:47)

Java ERROR: java.lang.NullPointerException

As you can see in our example, the NullPointerException is thrown when attempting to execute the setField2() method against our objectInstance which is now null.

The JVM will typically show you the line of code (line 47 in this example) where the NullPointerException is triggered. This is critical data since it allows you to trace back the problem in your application Java code at this particular line of code of the affected Java class.

java.lang.NullPointerException: Resolution strategies

Now that you understand this problem, it is now time to resolve it. Complexity of the resolution will depend of the context of your problem since NullPointerException can be a problem by itself or simply a symptom of another problem (no data returned from a Web Service call etc.). Regardless of the context & root cause, you must shield your Java code and add proper error handling and null check validations when applicable:

-        Review the java.lang.NullPointerException Stack Trace and determine where the Exception is triggered (your application code, third part API, middleware software such as Weblogic etc.) and extract the line #
-        If problem is at your application code then a code walkthrough will be required. If problem is found from third party API and / or middleware, my recommendation is to first review your referenced code and determine if it could be indirectly be the source of the problem e.g. passing a null value to a third part API method etc.
-        If problem found within your application code, then attempt to determine which Object instance is null and causing the problem. You will need to modify your code in order to add proper null check validations and proper logging so you can understand the source of the null value as well

Now back at our example, a simple validation and logging can be added as per below:

** Please note that logging should be done via standard logging framework such as Log4J **

// Now Initialize field2 but only if objectInstance is not null
if (objectInstance != null) {
objectInstance.setField2("field1Value2");
} else{
       System.out.println("objectInstance is null, do not attempt to initialize field2");
}

Conclusion and best practices

Best practices include:

-        Add proper null check validations before attempting to use an object Instance method e.g. if (objectInstance != null) { objectInstance.method(); }
-        When a null object is found, please add proper logging so you can pinpoint the root cause / source of the null value
-        Avoid too many object instance method calls on a single line as it will increase diagnostic complexity in the event of a NullPointerException e.g. avoid calls like this below unless properly checked for null prior to the call:
objectInstance.method(objectInstance2.getData(), objectInstance3.getData(),objectInstance4.getData());

I hope this article has helped you to understand what is null in Java.
Please feel free to add any comment or question if you are still struggling with a java.lang.NullPointerException problem.