When I start my OVM farm the OVS servers start and as soon as they connect to the OVM manager they shut themselves down. At first I thought it was an OCFS2 fencing issue, but the servers are shutting down cleanly, not being fenced and rebooting. If I shutdown the OVM Manager I can start the cluster fine and even get my repos mounted, start VM, etc. As soon as I start the Oracle VM Manager though the systems shut down.
Has anyone seen this before? Everything was working fine until today when I shut everything down through the manager. It looks like the ovm-manager is sending a sys_shutdown to each server for some reason. The strange thing is it tries to execute a discover right afterwards. It is almost as if the manager has the shutdown command still queued up in the database from where I shutdown the cluster earlier.
[root@zuul ~]# cat /etc/ovs-release
Oracle VM server release 3.2.3
[root@zuul ~]# uname -a
Linux zuul 2.6.39-300.29.1.el5uek #1 SMP Thu Feb 14 03:32:54 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@gozer ~]# uname -a
Linux gozer 2.6.39-300.29.1.el5uek #1 SMP Thu Feb 14 03:32:54 PST 2013 x86_64 x86_64 x86_64 GNU/Linux
[root@gozer ~]# rpm -q ovs-agent
ovs-agent-3.2.1-183.4
[root@gozer ~]# rpm -q xen
xen-4.1.3-25.el5.6.13
Here is the output from the /var/log/ovs-agent.log from one of the OVS servers when I start the manager.
[2013-06-17 14:02:20 9748] ERROR (notification:44) Unable to send notification: (2, 'No such file or directory')
[2013-06-17 14:02:30 11436] DEBUG (common:43) dispatch function sys_shutdown to server https://oracle:******@10.80.0.82:8899/api/3
[2013-06-17 14:02:38 11439] ERROR (ha:34) Failed to get VM list on 10.80.0.82: (111, 'Connection refused')
[2013-06-17 14:02:39 9740] DEBUG (notificationserver:237) Trying to connect to manager.
[2013-06-17 14:02:39 9740] DEBUG (notificationserver:239) Connected to manager.
[2013-06-17 14:02:40 9740] INFO (notificationserver:267) Service started.
[2013-06-17 14:02:42 11471] DEBUG (common:43) dispatch function sys_shutdown to server https://oracle:******@10.80.0.81:8899/api/3
[2013-06-17 14:02:42 11472] DEBUG (service:76) call start: sys_shutdown
[2013-06-17 14:02:42 11472] DEBUG (service:76) call complete: sys_shutdown
[2013-06-17 14:02:42 11475] DEBUG (service:76) call start: get_api_version
[2013-06-17 14:02:42 11475] DEBUG (service:76) call complete: get_api_version
[2013-06-17 14:02:42 11476] DEBUG (service:76) call start: discover_server
[2013-06-17 14:02:42 11476] DEBUG (service:76) call complete: discover_server
[2013-06-17 14:02:43 11496] DEBUG (service:76) call start: discover_hardware
[2013-06-17 14:02:44 11496] DEBUG (service:76) call complete: discover_hardware
[2013-06-17 14:02:44 11607] DEBUG (service:76) call start: discover_network
[2013-06-17 14:02:44 11607] DEBUG (service:76) call complete: discover_network
[2013-06-17 14:02:46 11612] DEBUG (service:76) call start: discover_storage_plugins
[2013-06-17 14:02:46 11612] DEBUG (service:76) call complete: discover_storage_plugins
[2013-06-17 14:02:46 11615] DEBUG (service:74) call start: discover_physical_luns('',)
[2013-06-17 14:02:47 11474] DEBUG (service:76) call complete: sys_shutdown
[2013-06-17 14:02:47 11615] DEBUG (service:76) call complete: discover_physical_luns
Here is the AdminServer log on the manager. As you can see it sends the shutdown action to the ovs-agent which kills the box.
####<Jun 17, 2013 2:02:41 PM EDT> <Info> <com.oracle.ovm.mgr.discover.DiscoverUtilities> <ovmmorrow> <AdminServer> <Scheduled Tasks-11> <<anonymous>> <> <0000JxJJ7fQFs1G5uzT4iX1Hjovl000002> <1371492161536> <BEA-000000> <Server: gozer, is RUNNING>
####<Jun 17, 2013 2:02:41 PM EDT> <Info> <com.oracle.ovm.mgr.task.AutoDiscoverTask> <ovmmorrow> <AdminServer> <Scheduled Tasks-11> <<anonymous>> <> <0000JxJJ7fQFs1G5uzT4iX1Hjovl000002> <1371492161592> <BEA-000000> <Re-discovering server: zuul, refresh type: ALL>
####<Jun 17, 2013 2:02:41 PM EDT> <Info> <com.oracle.ovm.mgr.task.AutoDiscoverTask> <ovmmorrow> <AdminServer> <Scheduled Tasks-11> <<anonymous>> <> <0000JxJJ7fQFs1G5uzT4iX1Hjovl000002> <1371492161593> <BEA-000000> <Discover server: zuul, refreshType: ALL>
####<Jun 17, 2013 2:02:42 PM EDT> <Info> <com.oracle.ovm.mgr.task.OvmTask> <ovmmorrow> <AdminServer> <Scheduled Tasks-11> <<anonymous>> <> <0000JxJJ7fQFs1G5uzT4iX1Hjovl000002> <1371492162026> <BEA-000000> <Created task child job: 1371492161594/AutoDiscoverTask: Discover server: zuul, refreshType: ALL/AutoDiscoverTask_1371492161593<22843>/t=1371492161594>
####<Jun 17, 2013 2:02:42 PM EDT> <Info> <com.oracle.ovm.mgr.api.physical.Server> <ovmmorrow> <AdminServer> <Odof Tcp Client Thread: /127.0.0.1:54321/94> <<anonymous>> <> <0000JxJJ7fQFs1G5uzT4iX1Hjovl000002> <1371492162303> <BEA-000000> <Stopping server: zuul>
####<Jun 17, 2013 2:02:42 PM EDT> <Info> <com.oracle.ovm.mgr.action.ServerAction> <ovmmorrow> <AdminServer> <Odof Tcp Client Thread: /127.0.0.1:54321/94> <<anonymous>> <> <0000JxJJ7fQFs1G5uzT4iX1Hjovl000002> <1371492162304> <BEA-000000> <Stopping: zuul, via OVM agent>