OSB 10.2.0.2 Implementation on AIX 5.2 with HACMP - SSL Trust Issues??
Hello All
I think I'm on a bit of a long shot with this one unfortunately, but I am trying to implement an OSB solution on a production HACMP cluster. The configuration would look as follows:
OSB Admin & Media Host : Windows 2003 x86 (Host: FPTXOSB01)
OSB Clients : Server 'pserver1' is node 1 in an HACMP cluster, public IP address 192.168.14.6
: Server 'pserver2' is node 2 in the same HACMP cluster, pubic IP address 192.168.14.10
: Server 'ptest1' is a stand alone AIX 5.2 host)
OSB Version : 10.2.0.2.0
I have implemented the solution on the stand alone host 'ptest1' without any problems, and performed a full database RMAN backup on this test servr at the first time of asking. The problem I am running into is with adding the HACMP clients to the OSB admin domain.
HACMP is configured in such a way (rightly or wrongly I do not know as yet) with boot, public and cluster service addresses. Eg. Server 'pserver1' will return 'pserver1' if you enter the 'hostname' command at the AIX command prompt. Entering the 'uname -a' command also returns 'pserver1' as the machine host name. However, in the folder '/usr/local/oracle/backup/bin there is a link to a binary called 'hostinfo' and this is called by the installob routine during the installation phase. When I run this command manually, it returns the HACMP host boot address 'pserver1_boot'. The /etc/hosts file looks like this on one of the nodes:
# Internet Address Hostname # Comments
# 192.9.200.1 net0sample # ethernet name/address
# 128.100.0.1 token0sample # token ring name/address
# 10.2.0.2 x25sample # x.25 name/address
127.0.0.1 loopback localhost
10.10.10.86 pserver1_boot1 pserver1
10.10.10.87 pserver2_boot1 pserver2
10.11.10.86 pserver1_boot2
10.11.10.87 pserver2_boot2
10.12.10.86 pserver1_hb
10.12.10.87 pserver2_hb
192.168.14.5 pserver_svc
192.168.14.6 pserver1_pers
192.168.14.10 pserver2_pers
As you can see, the main host name is tagged on the same line as the boot1 IP addresses. Unfortunately, the 10.10.10.xx range is private and dedicated to the HACMP cluster configuration. So the situation is, all of the clients on the network access the cluster via the 'pserver_svc' virtual IP, which is fine. The Oracle databases listen on that VIP too, no problems. For telnet/SSH access to the host, we log on via the '?_pers' addresses (persistent addresses), no problem. However, two hosts themselves see their own respective hosts as the '?boot1' name. So, on 'pserver1' if I were to ping 'pserver1' it resolves to the 10.10.10.86 IP. All good, however the OSB admin server is going to come in on the 192.168.14 public network.
When adding the host using either the 'mkhost' command or the web tool, the host creation just sits there and eventually times out. If I change the '/etc/hosts' file such that 'pserver1' as en entry sits on a line on its own and configured with the correct persistent address of 192.168.14.6 and then try adding the host in OSB, the host adds okay. However, if I then try and ping the host using OSB, it returns the following:
ob> pingh pserver1
Error: can't connect to NDMP server on pserver1 (address 192.168.14.6) - timeout waiting for connection status message
pserver1 (address 192.168.14.6): Oracle Secure Backup services are available
Additionally, we have to switch the '/etc/hosts' configuration back because the HACMP cluster services expect that configuration and it will fail over if it performs a cluster host state check.
With this in mind, we've introduced cabling on to another unused NIC port on the two hosts, and put these NICs on the network on 192.168.13.110 and 111. I have retried adding the hosts with the machines actual host name, with the boot address (pserver1_boot1) and also with a new alias for the new NICs of 'pserver1_en1'. In most of these cases, adding the host actually comes back with a success status. However, the OSB ping consistently fails.
I believe that the mismatch in host names on each of the cluster hosts is causing the OSB trust relationships to break down as the certificates will be created with the non routable host/IP combination. The following is an extract of the 'observiced.log' from 'pserver2' following the host addition specifying the '192.168.13 .xxx' network:
2009/01/07.14:33:53 listening for requests on --
2009/01/07.14:33:53 en0 (10.10.10.87) port 400
2009/01/07.14:33:53 en2 (10.11.10.87) port 400
2009/01/07.14:33:53 en1 (192.168.13.111) port 400
2009/01/07.14:34:01 listening for NDMP connections on --
2009/01/07.14:34:01 en0 (10.10.10.87) port 10000
2009/01/07.14:34:01 en2 (10.11.10.87) port 10000
2009/01/07.14:34:01 en1 (192.168.13.111) port 10000
2009/01/07.14:38:54 failure to negotiate SSL connection with component obtool on fd 6 - SSL fatal alert during negotation (FSP Oracle network security functions)
I am clearly looking for help from anyone else who has had the unfortunate experience of implementing OSB in an HACMP environment. Speaking to people who work with HACMP tell me that the configuration is perfectly normal. To me, its odd that machine called one thing should return another value when it looks up itself, one that isn't routable.
If anyone can suggest anything that might help, additional tracing, manually creating SSL certificates to work around the host name, disabling SSL, anything that might allow two way communications on ports 400 and 10000 using the OSB tools.
Any helps here would be much appreciated.
Regards
Simon