Wednesday, October 04, 2006

Thread Dump with OracleAS on Windows

A colleague, Troy Hedges, asked me today about getting a stack trace of OC4J on Windows for an issue he was having. My first question was is it OracleAS managed with OPMN or stand-alone?
'
Normally when running stand-alone OC4J on Windows-- because you normally just run it with:

set JAVA_HOME=
set PATH=/bin;%PATH%;
java -jar oc4j.jar

-- the quick and dirty way to get a thread dump is to hit CTRL-BREAK and voila, thread dump. This is a nice feature of Java for those intent in seeing what really is hung up in their application.

However, when you go to the normal Oracle Application Server deployment environment on Windows which uses the Oracle Process Manager OPMN to manage OracleAS, getting a thread dump is a little trickier because OPMN is a second process which is spawning the JDKs which run each OC4J instance.

And because there is no kill -3 on Windows likeUnix/Linux you are left trying to figure out how to get a thread dump for a process (OPMN) containing the process you want the thread dump from (an OC4J instance) . While OPMN gives you all sorts of great stuff - death detection, process management, clustering etc, this layer adds one more step to debugging that needs to be understood.

Troy was about 98% there when he asked me - just about when he discovered the sendsignal.exe utility popular for many Windows users - and with it was pretty much done. I add here a bit of color commentary so others can reproduce what he did.

Looking under the covers of OPMN -- which we have done in several other posts here -- you can see that the configuration for the JDK that runs each OC4J instance managed by OPPMNis done inside of the configuration file <ORACLE_HOME>\opmn\conf\opmn.xml.

For this example here is the relevant bits from my OPMN.xml:

<ias-component id="cluster_group" status="enabled"> <process-type id="j2ee1" module-id="OC4J" status="enabled">
<module-data>
<category id="start-parameters">
<data id="java-options" value="-server -Djava.security.policy=$ORACLE_HOME/j2ee/j2ee1/config/java2.policy -Djava.awt.headless=true -Dhttp.webdir.enable=false"/>

</category>

<category id="stop-parameters">

<data id="java-options" value="-Djava.security.policy=$ORACLE_HOME/j2ee/j2ee1/config/java2.policy -Djava.awt.headless=true -Dhttp.webdir.enable=false"/>

</category>
</module-data>
<start timeout="600" retry="2"/>

<stop timeout="120"/>

<restart timeout="720" retry="2"/>

<port id="default-web-site" range="12501-12600" protocol="ajp"/>

<port id="rmi" range="12401-12500"/>

<port id="rmis" range="12701-12800"/>

<port id="jms" range="12601-12700"/>

<process-set id="cluster_group" numprocs="1"/>

</process-type>


and when you run this you will see in your Windows task manager a process for the OC4J instance -- the javaw.exe highlighted below -- and typically two opmn.exe processes per Application Server instance:


If you end the process from the task manager you will get the pleasure of seeing OPMN in action - it will promptly re-start that OC4J instance. And if you look at the logs located at:

<ORACLE_HOME>\opmn\logs\ias_component_id_name~OC4J instance name~process_set_id_name~JVM_number>.log

in my case the ias-component id name was cluster_group, the OC4J instance name was j2ee1 and the process set id name was cluster group (from the opmn.xml above) and I had only one JVM running, that translates into:

D:\soasuite\opmn\logs\cluster_group~j2ee1~cluster_group~1.log

And wouldn't you know it - no stack trace. What's up? Well, it wasn't quite the right signal to OC4J for it to do the stack trace - ending a process in the task manager simply isn't a kill -3.

Let's make that stack trace happen! First, make sure that in your start-options for the OC4J instance you do not have -Xrs set as a parameter to the JVM:

<category id="start-parameters">
<data id="java-options" value="-Xrs ...

Just remove it and re-start the application server.

Next download a quick and dirty utility that makes the equivalent of kill -3 on Windows possible - check out:

http://www.latenighthacking.com/projects/2003/sendSignal/

and install the sendSignal.exe

Then, run it using the PID you see in the task manager above - in my case 10452:


And now you will be rewarded by going back to your OPMN log file (
D:\soasuite\opmn\logs\cluster_group~j2ee1~cluster_group~1.log) with a nice stack trace:


--------
06/10/04 11:54:33 Start process
--------
06/10/04 11:55:25 Oracle Containers for J2EE 10g (10.1.3.1.0) initialized

--------
06/10/04 12:21:02 Start process
--------
06/10/04 12:21:16 Oracle Containers for J2EE 10g (10.1.3.1.0) initialized
Full thread dump Java HotSpot(TM) Server VM (1.5.0_06-b05 mixed mode):

"HTTPThreadGroup-4" prio=6 tid=0x00862df0 nid=0x272c in Object.wait() [0x0bc1f000..0x0bc1fc98]
at java.lang.Object.wait(Native Method)
- waiting on <0x02af4e60> (a EDU.oswego.cs.dl.util.concurrent.LinkedNode)
at EDU.oswego.cs.dl.util.concurrent.SynchronousChannel.poll(SynchronousChannel.java:376)
- locked <0x02af4e60> (a EDU.oswego.cs.dl.util.concurrent.LinkedNode)
at EDU.oswego.cs.dl.util.concurrent.PooledExecutor.getTask(PooledExecutor.java:787)
at ....

the rest deleted for brevity. OPMN ever vigilant will start up another OC4J but you will have what you want, which is a stack trace. And there you go, stack traces on managed Oracle Application Server instances on Windows.

1 comment:

Anonymous said...

Thank you so much for this article! With your article, I was able to perform a successful thread dump.

By the way, in my environment, I did not see the javaw.exe. To workaround this issue, I executed:

opmnctl status -l

From this list, I used the PID number associated to the OC4J. Neat!