Sunday, September 24, 2006

Reliable Singleton OC4J Instances on OracleAS

I was working on a customer question this week which revolved around the ability to ensure a highly available singleton OC4J instance in OracleAS. After mucking about and looking around I remembered a feature that is designed to provide exactly this called Service Failover. It is documented here:

http://download-west.oracle.com/docs/cd/B25221_04/core.1013/b15976/common.htm#sthref631

but I think a picture illustrates better what I was trying to do and then the implementation of it is a lot easier to follow. Figure 1 below shows the idea:


Figure 1: Active Singleton OC4J Instance

Here you can see the orange OC4J instance named j2ee_1 in an OracleAS instance called soa_j2ee as the singleton, active node. It is part of a group of OC4J's called singleton_group where there is another stopped OC4J instance called j2ee_1 in an OracleAS called soasuite (for arguments sake on another hardware node assuming that the redundancy we are seeking is to deal with hardware failure).

When j2ee_1 in OracleAS instance soa_j2ee fails, the action I want is illustrated in figure 2


Figure 2: Singleton Instance Failed Over to New Singleton

When my singleton OC4J went down, the application server noticed it and immediately started up the backup OC4J in another part of the cluster, the OracleAS instance soasuite.

Ideally, depending on my requirement for redundancy I could carry this scenario on on many different nodes. The question is how? Turns out it is a very simple feature to implement.

The trick is with the process service (OPMN) that is watching over an OracleAS cluster. The lines that start an OC4J instance in the process server XML configuration file (opmn.xml) typically look like this with much of the extra bit deleted for simplicity here:

<ias-component id="singleton_group" status="enabled">
<process-type id="j2ee_1" status="enabled" >
<module-data>
<category id="start-parameters">
<data id="java-options" value="-server -Djava.security.policy=$ORACLE_HOME/j2ee/j2ee1/config/java2.policy -Djava.awt.headless=true -Dhttp.webdir.enable=false"/>
...
<process-set id="singleton_group" numprocs="1">
</process-type>
</ias-component>


The crux of it is OPMN is providing parameters for the JVM(s) that start the application server, port ranges (because there will be many OC4J's running in a cluster) and miscellaneous other settings.

To turn on a failover policy that says I want 1 and only one of these OC4J instances running in a cluster I simply need to add two parameters:

  • service-failover="1" - to indicate I want only one of these OC4J's in my cluster
  • service-weight="100" - an arbitrary logical weighting that will give OPMN a preference which of the configured failover instances I want the server to start in the event of a failure of another. A larger number means OPMN will prefer starting the failover instance than one configured with a smaller number
For a single OC4J instance configured with this service failover, the configuration looks like this:

<ias-component id="singleton_group" status="enabled">
<process-type id="j2ee_1" status="enabled"
service-failover="1" service-weight="200">
<module-data>
<category id="start-parameters">
<data id="java-options" value="-server -Djava.security.policy=$ORACLE_HOME/j2ee/j2ee1/config/java2.policy -Djava.awt.headless=true -Dhttp.webdir.enable=false"/>
...
<process-set id="singleton_group">
</process-type>
</ias-component>

Also note that on the process-set id I removed the numprocs="1" as service-failover does not support numprocs (i.e. multi-JVM).

The trick on this one is that the OC4J instance name has to be the same (in my case j2ee_1) and the group in which the OC4J instance resides also has to be identical (in my case you can see my group is called singleton_group).

If you were to look at my other OracleAS instance you would see an identically configured OC4J instance with the same group and same OC4J instance name. Setting up the topology of groups and OC4J instances is trivial in Application Server Control where these a simple operations as shown below:





What does it look like operationally? Well, now you know why I was playing with iHat earlier on in the week ...



What I did to test it was the following:

  1. Started up the application server with the configuration outlined above. I could see the picture above where the single j2ee_1 was happily running on the soa_j2ee OracleAS instance and stopped correctly on the soasuite OracleAS instance
  2. Then I went out to the file system $ORACLE_SOA_J2EE_HOME\j2ee\j2ee_1\config and renamed server.xml to dead_server.xml.
  3. Then I ran the application server command:

    opmnctl restartproc ias_component=singleton_group

    to bounce the server on soa_j2ee
  4. Of course when it brought down my j2ee_1 instance on soa_j2ee and then tried to re-start it failed as server.xml is the basic configuration for the Oc4J instance
  5. Almost immediately after that failure, I saw within iHat the backup instance start up.
In real life you could configure as many of these backup instances as you want in order to have the right amount of redundancy for your situation ... there is no limit. I also was using this to solve a singleton problem. You can use it to create doubleton's or tripletons by simply making the service-failover number equal to the number of unique instances you want running in your topology.

2 comments:

Unknown said...

Great article Mike. Can this OC4J instance actually includes an old Forms application? or just a pure J2EE apps

Thank you and best regards

Mike Lehmann said...

There is really nothing it can not contain, so yes, a forms app would be fine. One thing I have had feedback on which is a concern, is some people have noticed false/positives where OPMN thinks one instance is not available but it simply can't reach it so it starts a second instance even when it may not want to. If communication is ever re-established it will balance itself to a single instance.