Skip to Main Content

Infrastructure Software

Announcement

For appeals, questions and feedback about Oracle Forums, please email oracle-forums-moderators_us@oracle.com. Technical questions should be asked in the appropriate category. Thank you!

Cluster check failed. Fix errors before retrying

WGannonJul 27 2016 — edited Jul 27 2016

I am using Oracle VM Server 3.2.9 on a 2 server cluster attached to an EMC Clarion CX4-480 storage array.

When initially configured both of the VM Servers could see and access the storage - But - whenever one of the machines is rebooted the connection to the SAN will not automatically reconnect.

What I see in the /var/log/messages is that the rebooted server is not reconnecting to the cluster (var/log/messages) and that the mounts fail and o2cb goes offline (see ovs-agent.log below).

I can manually set o2cb to online (service o2cb online)

and then I can manually mount the failed clusters (mount /dev/mapper/3600601601f702000145529ce90efe511 /poolfsmnt/0004fb0000050000b2a38653f0aa572c)

but I would rather know what is causing this problem and correct it

- Any help would be Greatly appreciated.

excerpt fro var/log/messages

Jul 26 09:58:30 ovsvr7 kernel: o2hb: Heartbeat mode set to global

Jul 26 09:58:42 ovsvr7 kernel: o2hb: Heartbeat started on region 0004FB0000050000D6265D875E0E615C (dm-0)

Jul 26 09:58:42 ovsvr7 o2hbmonitor: Starting

Jul 26 09:58:44 ovsvr7 kernel: o2hb: Region 0004FB0000050000D6265D875E0E615C (dm-0) is now a quorum device

Jul 26 09:58:48 ovsvr7 kernel:  rport-11:0-6: blocked FC remote port time out: removing rport

Jul 26 09:58:50 ovsvr7 ntpdate[7030]: no server suitable for synchronization found

Jul 26 09:58:50 ovsvr7 ntpd[7040]: ntpd 4.2.2p1@1.1570-o Tue Jan  6 22:43:37 UTC 2015 (1)

Jul 26 09:58:50 ovsvr7 ntpd[7041]: precision = 1.000 usec

Jul 26 09:58:50 ovsvr7 ntpd[7041]: Listening on interface wildcard, 0.0.0.0#123 Disabled

Jul 26 09:58:50 ovsvr7 ntpd[7041]: Listening on interface lo, 127.0.0.1#123 Enabled

Jul 26 09:58:50 ovsvr7 ntpd[7041]: Listening on interface 0a646f00, 10.100.111.7#123 Enabled

Jul 26 09:58:50 ovsvr7 ntpd[7041]: kernel time sync status 0040

Jul 26 09:58:50 ovsvr7 ntpd[7041]: getaddrinfo: "::1" invalid host address, ignored

Jul 26 09:58:50 ovsvr7 ntpd[7041]: frequency initialized 4.051 PPM from /var/lib/ntp/drift

Jul 26 09:58:50 ovsvr7 kernel: Event-channel device installed.

Jul 26 09:58:50 ovsvr7 xenstored: Checking store ...

Jul 26 09:58:50 ovsvr7 xenstored: Checking store complete.

Jul 26 09:58:50 ovsvr7 kernel:  rport-12:0-6: blocked FC remote port time out: removing rport

Jul 26 09:58:53 ovsvr7 o2cb.init: online 0c682ee6d166e6ed

Jul 26 09:58:53 ovsvr7 kernel: OCFS2 1.8.0

Jul 26 09:58:53 ovsvr7 kernel: o2cb: This node is not connected to nodes: 1.

Jul 26 09:58:53 ovsvr7 kernel: o2cb: Cluster check failed. Fix errors before retrying.

Jul 26 09:58:53 ovsvr7 kernel: (mount.ocfs2,7597,0):ocfs2_dlm_init:3004 ERROR: status = -22

Jul 26 09:58:53 ovsvr7 kernel: (mount.ocfs2,7597,0):ocfs2_mount_volume:1890 ERROR: status = -22

Jul 26 09:58:53 ovsvr7 kernel: ocfs2: Unmounting device (252,0) on (node 0)

Jul 26 09:58:53 ovsvr7 kernel: (mount.ocfs2,7597,0):ocfs2_fill_super:1238 ERROR: status = -22

Jul 26 09:58:53 ovsvr7 o2cb.init: offline 0c682ee6d166e6ed 0

Jul 26 09:58:54 ovsvr7 kernel: o2hb: Heartbeat stopped on region 0004FB0000050000D6265D875E0E615C (dm-0)

Jul 26 09:58:56 ovsvr7 smartd[7803]: smartd 5.42 2011-10-20 r3458 [x86_64-linux-2.6.39-400.215.9.el5uek] (local build)

Jul 26 09:58:56 ovsvr7 smartd[7803]: Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Jul 26 09:58:56 ovsvr7 smartd[7803]: Opened configuration file /etc/smartd.conf

Jul 26 09:58:56 ovsvr7 smartd[7803]: Configuration file /etc/smartd.conf parsed but has no entries (like /dev/hda)

Jul 26 09:58:56 ovsvr7 smartd[7803]: Monitoring 0 ATA and 0 SCSI devices

Jul 26 09:58:56 ovsvr7 smartd[7810]: smartd has fork()ed into background mode. New PID=7810.

Jul 26 10:04:14 ovsvr7 ntpd[7041]: synchronized to 10.100.113.23, stratum 4

Jul 26 10:09:41 ovsvr7 ntpd[7041]: synchronized to 10.100.112.110, stratum 3

/var/log/ovs-agent.log

[2016-07-26 09:58:53 7299] DEBUG (ocfs2:163) cluster debug: {'/sys/kernel/debug/o2dlm': [], '/sys/kernel/debug/o2net': ['connected_nodes', 'stats', 'sock_containers', 'send_tracking'], '/sys/kernel/debug/o2hb': ['0004FB0000050000D6265D875E0E615C', 'failed_regions', 'quorum_regions', 'live_regions', 'livenodes'], 'service o2cb status': 'Driver for "configfs": Loaded\nFilesystem "configfs": Mounted\nStack glue driver: Loaded\nStack plugin "o2cb": Loaded\nDriver for "ocfs2_dlmfs": Loaded\nFilesystem "ocfs2_dlmfs": Mounted\nChecking O2CB cluster "0c682ee6d166e6ed": Online\n  Heartbeat dead threshold: 61\n  Network idle timeout: 60000\n  Network keepalive delay: 2000\n  Network reconnect delay: 2000\n  Heartbeat mode: Global\nChecking O2CB heartbeat: Active\n  0004FB0000050000D6265D875E0E615C /dev/dm-0\nNodes in O2CB cluster: 0 1 \n'}

[2016-07-26 09:58:53 7299] DEBUG (ocfs2:271) Trying to mount /dev/mapper/3600601601f702000145529ce90efe511 to /poolfsmnt/0004fb0000050000d6265d875e0e615c

[2016-07-26 09:58:53 7299] DEBUG (ocfs2:451) init_pool_filesystem: mount physical pool FS failed - error Command: ['mount', '/dev/mapper/3600601601f702000145529ce90efe511', '/poolfsmnt/0004fb0000050000d6265d875e0e615c'] failed (1): stderr: mount.ocfs2: Invalid argument while mounting /dev/mapper/3600601601f702000145529ce90efe511 on /poolfsmnt/0004fb0000050000d6265d875e0e615c. Check 'dmesg' for more information on this error.^M

stdout:

[2016-07-26 09:58:53 7299] DEBUG (ocfs2:163) cluster debug: {'/sys/kernel/debug/o2dlm': [], '/sys/kernel/debug/o2net': ['connected_nodes', 'stats', 'sock_containers', 'send_tracking'], '/sys/kernel/debug/o2hb': ['0004FB0000050000D6265D875E0E615C', 'failed_regions', 'quorum_regions', 'live_regions', 'livenodes'], '/sys/kernel/debug/ocfs2': [], 'service o2cb status': 'Driver for "configfs": Loaded\nFilesystem "configfs": Mounted\nStack glue driver: Loaded\nStack plugin "o2cb": Loaded\nDriver for "ocfs2_dlmfs": Loaded\nFilesystem "ocfs2_dlmfs": Mounted\nChecking O2CB cluster "0c682ee6d166e6ed": Online\n  Heartbeat dead threshold: 61\n  Network idle timeout: 60000\n  Network keepalive delay: 2000\n  Network reconnect delay: 2000\n  Heartbeat mode: Global\nChecking O2CB heartbeat: Active\n  0004FB0000050000D6265D875E0E615C /dev/dm-0\nNodes in O2CB cluster: 0 1 \n'}

[2016-07-26 09:58:55 7299] DEBUG (ocfs2:163) cluster debug: {'/sys/kernel/debug/o2dlm': [], '/sys/kernel/debug/o2net': ['connected_nodes', 'stats', 'sock_containers', 'send_tracking'], '/sys/kernel/debug/o2hb': ['failed_regions', 'quorum_regions', 'live_regions', 'livenodes'], 'service o2cb status': 'Driver for "configfs": Loaded\nFilesystem "configfs": Mounted\nStack glue driver: Loaded\nStack plugin "o2cb": Loaded\nDriver for "ocfs2_dlmfs": Loaded\nFilesystem "ocfs2_dlmfs": Mounted\nChecking O2CB cluster "0c682ee6d166e6ed": Offline\n'}

[2016-07-26 09:58:55 7299] WARNING (startup:27) Error init pool filesystem: Command: ['mount', '/dev/mapper/3600601601f702000145529ce90efe511', '/poolfsmnt/0004fb0000050000d6265d875e0e615c'] failed (1): stderr: mount.ocfs2: Invalid argument while mounting /dev/mapper/3600601601f702000145529ce90efe511 on /poolfsmnt/0004fb0000050000d6265d875e0e615c. Check 'dmesg' for more information on this error.^M

stdout:

[2016-07-26 09:58:55 7760] INFO (notificationserver:213) NOTIFICATION SERVER STARTED

[2016-07-26 09:58:55 7760] DEBUG (notificationserver:237) Trying to connect to manager.

[2016-07-26 09:58:55 7760] ERROR (notificationserver:244) Error initializing notification server: 'Invalid URL Request (send) https://10.100.113.77:7002/ovm/core/OVMManagerCoreServlet&c=1&s=-1&lb=p&t=2&p=4c4c454400324d108042b1c04f474232%2C65c75cca86d849c7847ccab3a58b14b1'

[2016-07-26 09:58:56 7764] INFO (remaster:151) REMASTER SERVER STARTED

[2016-07-26 09:58:56 7766] INFO (monitor:23) MONITOR SERVER STARTED

[2016-07-26 09:58:56 7768] INFO (ha:89) HA SERVER STARTED

[2016-07-26 09:58:56 7770] INFO (stats:26) STAT SERVER STARTED

[2016-07-26 09:58:56 7772] INFO (xmlrpc:308) Oracle VM Agent XMLRPC Server started.

[2016-07-26 09:58:56 7772] INFO (xmlrpc:317) Oracle VM Server version: {'release': '3.2.9', 'date': '201504081746', 'build': '751'}, hostname: ovsvr7.sjrwmd.com, ip: 10.100.111.7

[2016-07-26 09:59:01 7766] DEBUG (monitor:36) Cluster state changed from [Unknown] to [Offline]

[2016-07-26 09:59:01 7766] ERROR (notification:44) Unable to send notification: (2, 'No such file or directory')

[2016-07-26 09:59:01 7766] DEBUG (monitor:40) Error sending notification: (2, 'No such file or directory')

[2016-07-26 09:59:16 7770] ERROR (notification:44) Unable to send notification: (2, 'No such file or directory')

[2016-07-26 09:59:17 7888] DEBUG (service:76) call start: get_api_version

[2016-07-26 09:59:17 7888] DEBUG (service:76) call complete: get_api_version

[2016-07-26 09:59:17 7889] DEBUG (service:76) call start: discover_server

[2016-07-26 09:59:18 7889] DEBUG (service:76) call complete: discover_server

[2016-07-26 09:59:18 7903] DEBUG (service:76) call start: discover_hardware

[2016-07-26 09:59:18 7903] DEBUG (service:76) call complete: discover_hardware

[2016-07-26 09:59:18 7923] DEBUG (service:76) call start: discover_network

[2016-07-26 09:59:18 7923] DEBUG (service:76) call complete: discover_network

[2016-07-26 09:59:19 7924] DEBUG (service:76) call start: discover_storage_plugins

[2016-07-26 09:59:19 7924] DEBUG (service:76) call complete: discover_storage_plugins

[2016-07-26 09:59:19 7927] DEBUG (service:74) call start: discover_physical_luns('',)

[2016-07-26 09:59:19 7927] DEBUG (service:76) call complete: discover_physical_luns

[2016-07-26 09:59:19 7948] DEBUG (service:74) call start: discover_physical_luns('3600601601f702000145529ce90efe511 3600601601f702000145529ce90efe511 3600601601f702000e8a24c7b822ee611 2000b080034002532 3600601601f702000e8a24c7b822ee611 3600601601f702000145529ce90efe511 2000b080034002532 3600601601f702000145529ce90efe511 3600601601f702000e8a24c7b822ee611 2000b080034002532 3600601601f702000e8a24c7b822ee611 2000b080034002532',)

[2016-07-26 09:59:20 7948] DEBUG (service:76) call complete: discover_physical_luns

[2016-07-26 09:59:20 7969] DEBUG (service:76) call start: discover_repository_db

[2016-07-26 09:59:20 7969] DEBUG (service:76) call complete: discover_repository_db

[2016-07-26 09:59:20 7970] DEBUG (service:74) call start: storage_plugin_listMountPoints('oracle.ocfs2.OCFS2.OCFS2Plugin', {'status': '', 'admin_user': '', 'admin_host': '', 'uuid': '0004fb000009000018f08c0bbd085dec', 'total_sz': 0, 'admin_passwd': '******', 'free_sz': 0, 'name': '0004fb000009000018f08c0bbd085dec', 'access_host': '', 'storage_type': 'FileSys', 'alloc_sz': 0, 'access_grps': [], 'used_sz': 0, 'storage_desc': ''})

[2016-07-26 09:59:20 7970] INFO (storageplugin:109) storage_plugin_listMountPoints(oracle.ocfs2.OCFS2.OCFS2Plugin)

[2016-07-26 09:59:20 7970] DEBUG (service:76) call complete: storage_plugin_listMountPoints

[2016-07-26 09:59:20 7974] DEBUG (service:76) call start: get_yum_config

[2016-07-26 09:59:20 7974] DEBUG (service:76) call complete: get_yum_config

[2016-07-26 09:59:20 7975] DEBUG (service:76) call start: discover_cluster

[2016-07-26 09:59:20 7975] DEBUG (service:76) call complete: discover_cluster

[2016-07-26 09:59:25 7760] DEBUG (notificationserver:237) Trying to connect to manager.

[2016-07-26 09:59:25 7760] DEBUG (notificationserver:239) Connected to manager.

[2016-07-26 09:59:25 7760] INFO (notificationserver:267) Service started.

[2016-07-26 17:54:08 7760] ERROR (notificationserver:124) Error sending stats notification: 'NoneType' object has no attribute 'strip'

Comments
Locked Post
New comments cannot be posted to this locked post.
Post Details
Locked on Aug 24 2016
Added on Jul 27 2016
2 comments
1,757 views