Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

OVS servers rebooting due to fencing

SCI_SupportJan 16 2014 — edited Jan 22 2014

Hi,

We have OVS version 3.1.1 and 3.1.2 running on Dell servers. We use EMC Clariion and HP 3PAR storage arrays which these servers use for their shared storage. Many times, we are seeing OVS servers rebooting with below messages. When we look at logs on SAN side or on storage arrays, we do not really see anything unusual that would cause heartbeat disk to be not available for 60 seconds or so. Interestingly, not all OVS servers in a cluster reboot. Some servers reboot & some do not. This happens randomly with these OVM servers irrespective of 3PAR storage or EMC Clariion storage. When we look at OVM server log messages, it always complains about all paths failed for all shared storage LUNs including heartbeat disks.

Jan 3 00:03:26 ovslx505 o2hbmonitor: Last ping 46838 msecs ago on /dev/dm-0, 0004FB0000050000E018378E1B9D075D

Currently, heartbeat dead threshold is set to 60 seconds (31). We had opened many cases with Oracle but it goes no where. Recent recommendation is to set heartbeat dead threshold to 120 seconds (61).

But it is strange that why all paths would go down at once via both HBAs. It is not making sense as both HBAs connect via separate fabrics & there is nothing on SAN switches (Brocade) that tells us why would be an issue. For some reason, some OVS servers are detecting as heartbeat disk is unavailable over all 4 paths & fence themselves resulting in their reboots. Again, not all OVS servers reboot in a cluster. Just a few. This happens randomly.

Has anyone seen this kind of issue ? We are trying to determine what could be causing these reboots. Storage side, things look OK.

Thanks,

Sameer

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Feb 19 2014
Added on Jan 16 2014
15 comments
8,795 views