rootupgrade.sh failed on 11g R1 to R2 Upgrade
I am playing with 11g ASM with XEN (one node on host (or dom0) and another node on vm (or dom1)) and was trying to do a Clusterware + ASM rolling upgrade from 11.0.6 to 11.2.0. After installing 11.2.0 Grid Infrastructure (Oracle seems to be changing its name for evey release??), I was instructed to run a rootupgrade.sh script on both nodes. The command ran successfully on one node, and both nodeapps and asm are now started from 11.2.0 Clusterware home. However, on the second node, it hanged for around 2 minutes after saying "Start of 'ora.evmd' on 'db11g-dom1' succeeded", and then timed out with failure
Here is the console display when running rootupgrade.sh:
{font:Courier}
...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root.sh script.
Now product-specific root actions will be performed.
2009-12-23 01:50:59: Parsing the host name
2009-12-23 01:50:59: Checking for super user privileges
2009-12-23 01:50:59: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0/grid/crs/install/crsconfig_params
-bash: /bin/env: No such file or directory
Cluster Synchronization Services appears healthy
Event Manager appears healthy
Cluster Ready Services appears healthy
Shutting down Oracle Cluster Ready Services (CRS):
Dec 23 01:51:21.661 | INF | daemon shutting down
Stopping resources.
This could take several minutes.
Successfully stopped Oracle Clusterware resources
Stopping Cluster Synchronization Services.
Shutting down the Cluster Synchronization Services daemon.
Shutdown request successfully issued.
Shutdown has begun. The daemons should exit soon.
LOCAL ADD MODE
Creating OCR keys for user 'root', privgrp 'root'..
Operation successful.
Adding daemon to inittab
INIT: Sending processes the TERM signal
CRS-4123: Oracle High Availability Services has been started.
CRS-2672: Attempting to start 'ora.mdnsd' on 'db11g-dom1'
CRS-2676: Start of 'ora.mdnsd' on 'db11g-dom1' succeeded
CRS-2672: Attempting to start 'ora.gipcd' on 'db11g-dom1'
CRS-2676: Start of 'ora.gipcd' on 'db11g-dom1' succeeded
CRS-2672: Attempting to start 'ora.gpnpd' on 'db11g-dom1'
CRS-2676: Start of 'ora.gpnpd' on 'db11g-dom1' succeeded
CRS-2672: Attempting to start 'ora.cssdmonitor' on 'db11g-dom1'
CRS-2676: Start of 'ora.cssdmonitor' on 'db11g-dom1' succeeded
CRS-2672: Attempting to start 'ora.cssd' on 'db11g-dom1'
CRS-2672: Attempting to start 'ora.diskmon' on 'db11g-dom1'
CRS-2676: Start of 'ora.diskmon' on 'db11g-dom1' succeeded
CRS-2676: Start of 'ora.cssd' on 'db11g-dom1' succeeded
CRS-2672: Attempting to start 'ora.ctssd' on 'db11g-dom1'
CRS-2676: Start of 'ora.ctssd' on 'db11g-dom1' succeeded
CRS-2672: Attempting to start 'ora.crsd' on 'db11g-dom1'
CRS-2676: Start of 'ora.crsd' on 'db11g-dom1' succeeded
CRS-2672: Attempting to start 'ora.evmd' on 'db11g-dom1'
CRS-2676: Start of 'ora.evmd' on 'db11g-dom1' succeeded
<Around 2 min>
Timed out waiting for the CRS stack to start.
{font}
I attempted to read the logs to find out the cause. This wasn't very successful because they seemed to point finger at each other. I believe (but not confirm) the root cause is found in crsd.log:
{font:Courier}
...
2009-12-23 01:53:54.209: [ CRSMAIN][646223616] Checking the OCR device
2009-12-23 01:53:54.342: [ CRSMAIN][646223616] Connecting to the CSS Daemon
2009-12-23 01:53:54.348: [ CRSMAIN][646223616] Initializing OCR
2009-12-23 01:53:54.349: [ OCRAPI][646223616]clsu_get_private_ip_addr: Calling clsu_get_private_ip_addresses to get first private ip
2009-12-23 01:53:54.349: [ OCRAPI][646223616]Check namebufs
2009-12-23 01:53:54.349: [ OCRAPI][646223616]Finished checking namebufs
2009-12-23 01:53:54.349: [ GIPC][646223616] gipcCheckInitialization: possible incompatible non-threaded init from [clsinet.c : 3229], original from [clsss.c : 5011]
2009-12-23 01:53:54.351: [ GPnP][646223616]clsgpnp_Init: [at clsgpnp0.c:404] gpnp tracelevel 3, component tracelevel 0
2009-12-23 01:53:54.351: [ GPnP][646223616]clsgpnp_Init: [at clsgpnp0.c:534] '/u01/app/11.2.0/grid' in effect as GPnP home base.
2009-12-23 01:53:54.355: [ GIPC][646223616] gipcCheckInitialization: possible incompatible non-threaded init from [clsgpnp0.c : 680], original from [clsss.c : 5011]
2009-12-23 01:53:54.355: [ GPnP][646223616]clsgpnp_InitCKProviders: [at clsgpnp0.c:3866] Init gpnp local security key providers (2) fatal if both fail
2009-12-23 01:53:54.355: [ GPnP][646223616]clsgpnp_InitCKProviders: [at clsgpnp0.c:3869] Init gpnp local security key proveders 1 of 2: file wallet (LSKP-FSW)
2009-12-23 01:53:54.356: [ GPnP][646223616]clsgpnpkwf_initwfloc: [at clsgpnpkwf.c:398] Using FS Wallet Location : /u01/app/11.2.0/grid/gpnp/penguin-dom1/wallets/peer/
2009-12-23 01:53:54.356: [ GPnP][646223616]clsgpnp_InitCKProviders: [at clsgpnp0.c:3891] Init gpnp local security key provider 1 of 2: file wallet (LSKP-FSW) OK
2009-12-23 01:53:54.356: [ GPnP][646223616]clsgpnp_InitCKProviders: [at clsgpnp0.c:3897] Init gpnp local security key proveders 2 of 2: OLR wallet (LSKP-CLSW-OLR)
[ CLWAL][646223616]clsw_Initialize: OLR initlevel [30000]
2009-12-23 01:53:54.359: [ GPnP][646223616]clsgpnp_InitCKProviders: [at clsgpnp0.c:3919] Init gpnp local security key provider 2 of 2: OLR wallet (LSKP-CLSW-OLR) OK
2009-12-23 01:53:54.359: [ GPnP][646223616]clsgpnp_getCK: [at clsgpnp0.c:1950] <Get gpnp security keys (wallet) for id:1,typ;7. (2 providers - fatal if all fail)
2009-12-23 01:53:54.359: [ GPnP][646223616]clsgpnpkwf_getWalletPath: [at clsgpnpkwf.c:498] req_id=1 ck_prov_id=1 wallet path: /u01/app/11.2.0/grid/gpnp/penguin-dom1/wallets/peer/
2009-12-23 01:53:54.366: [ GPnP][646223616]clsgpnpwu_walletfopen: [at clsgpnpwu.c:494] Opened SSO wallet: '/u01/app/11.2.0/grid/gpnp/penguin-dom1/wallets/peer/cwallet.sso'
2009-12-23 01:53:54.366: [ GPnP][646223616]clsgpnp_getCK: [at clsgpnp0.c:1965] Result: (0) CLSGPNP_OK. Get gpnp wallet - provider 1 of 2 (LSKP-FSW(1))
2009-12-23 01:53:54.366: [ GPnP][646223616]clsgpnp_getCK: [at clsgpnp0.c:1982] Got gpnp security keys (wallet).>
2009-12-23 01:53:54.368: [ GPnP][646223616]clsgpnp_getCK: [at clsgpnp0.c:1950] <Get gpnp security keys (wallet) for id:1,typ;4. (2 providers - fatal if all fail)
2009-12-23 01:53:54.368: [ GPnP][646223616]clsgpnpkwf_getWalletPath: [at clsgpnpkwf.c:498] req_id=1 ck_prov_id=1 wallet path: /u01/app/11.2.0/grid/gpnp/penguin-dom1/wallets/peer/
2009-12-23 01:53:54.374: [ GPnP][646223616]clsgpnpwu_walletfopen: [at clsgpnpwu.c:494] Opened SSO wallet: '/u01/app/11.2.0/grid/gpnp/penguin-dom1/wallets/peer/cwallet.sso'
2009-12-23 01:53:54.374: [ GPnP][646223616]clsgpnp_getCK: [at clsgpnp0.c:1965] Result: (0) CLSGPNP_OK. Get gpnp wallet - provider 1 of 2 (LSKP-FSW(1))
2009-12-23 01:53:54.374: [ GPnP][646223616]clsgpnp_getCK: [at clsgpnp0.c:1982] Got gpnp security keys (wallet).>
2009-12-23 01:53:54.374: [ GPnP][646223616]clsgpnp_Init: [at clsgpnp0.c:837] GPnP client pid=10996, tl=3, f=0
2009-12-23 01:53:54.384: [ OCRAPI][646223616]clsu_get_private_ip_addresses: no ip addresses found.
2009-12-23 01:53:54.384: [GIPCXCPT][646223616] gipcShutdownF: skipping shutdown, count 2, from [ clsinet.c : 1732], ret gipcretSuccess (0)
2009-12-23 01:53:54.389: [GIPCXCPT][646223616] gipcShutdownF: skipping shutdown, count 1, from [ clsgpnp0.c : 1021], ret gipcretSuccess (0)
[ OCRAPI][646223616]a_init_clsss: failed to call clsu_get_private_ip_addr (7)
2009-12-23 01:53:54.390: [ OCRAPI][646223616]a_init:13!: Clusterware init unsuccessful : [44]
2009-12-23 01:53:54.391: [ CRSOCR][646223616] OCR context init failure. Error: PROC-44: Error in network address and interface operations Network address and interface operations error [7]
2009-12-23 01:53:54.391: [ CRSD][646223616][PANIC] CRSD exiting: Could not init OCR, code: 44
2009-12-23 01:53:54.391: [ CRSD][646223616] Done.
{font}
(The strikethrough were actually enclosed by square brackets... How to avoid this effect?)
Any suggestion? And can I view or update OCR to correct the private ip?
Note: after this, I can deconfig the 11g R2 configuration. And I can rerun root.sh of in 11.0.6 Clusterware home to successful start the clusterware services and ASM from 11.0.6. However, 11.2.0 rootupgrade.sh failed the same way afterwards.
OS: OpenSUSE 11.1, XEN DOM0 and DOM1
Oracle: 11.0.6 upgrade to 11.2.0