SUN Cluster 3.2, Solaris 10, Corrupted IPMP group on one node.
807567Jul 20 2010 — edited Jul 21 2010Hello folks,
I recently made a network change on nodename2 to add some resilience to IPMP (adding a second interface but still using a single IP address).
After a reboot, I cannot keep this host from rebooting. For the one minute that it stays up, I do get the following result from scstat that seems to suggest a problem with the IPMP configuration. I rolled back my IPMP change, but it still doesn't seem to register the IPMP group in scstat.
nodename2|/#scstat
------------------------------------------------------------------
-- Cluster Nodes --
Node name Status
--------- ------
Cluster node: nodename1 Online
Cluster node: nodename2 Online
------------------------------------------------------------------
-- Cluster Transport Paths --
Endpoint Endpoint Status
-------- -------- ------
Transport path: nodename1:bge3 nodename2:bge3 Path online
------------------------------------------------------------------
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
--------- ------- -------- ------
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
----------- ------- -------- ------
Device votes: /dev/did/rdsk/d3s2 0 1 Offline
------------------------------------------------------------------
-- Device Group Servers --
Device Group Primary Secondary
------------ ------- ---------
Device group servers: jms-ds nodename1 nodename2
-- Device Group Status --
Device Group Status
------------ ------
Device group status: jms-ds Online
-- Multi-owner Device Groups --
Device Group Online Status
------------ -------------
------------------------------------------------------------------
------------------------------------------------------------------
-- IPMP Groups --
Node Name Group Status Adapter Status
--------- ----- ------ ------- ------
scstat: unexpected error.
I did manage to run scstat on nodename1 while nodename2 was still up between reboots, here is that result (it does not show any IPMP group(s) on nodename2)
nodename1|/#scstat
------------------------------------------------------------------
-- Cluster Nodes --
Node name Status
--------- ------
Cluster node: nodename1 Online
Cluster node: nodename2 Online
------------------------------------------------------------------
-- Cluster Transport Paths --
Endpoint Endpoint Status
-------- -------- ------
Transport path: nodename1:bge3 nodename2:bge3 faulted
------------------------------------------------------------------
-- Quorum Summary from latest node reconfiguration --
Quorum votes possible: 3
Quorum votes needed: 2
Quorum votes present: 3
-- Quorum Votes by Node (current status) --
Node Name Present Possible Status
--------- ------- -------- ------
Node votes: nodename1 1 1 Online
Node votes: nodename2 1 1 Online
-- Quorum Votes by Device (current status) --
Device Name Present Possible Status
----------- ------- -------- ------
Device votes: /dev/did/rdsk/d3s2 1 1 Online
------------------------------------------------------------------
-- Device Group Servers --
Device Group Primary Secondary
------------ ------- ---------
Device group servers: jms-ds nodename1 -
-- Device Group Status --
Device Group Status
------------ ------
Device group status: jms-ds Degraded
-- Multi-owner Device Groups --
Device Group Online Status
------------ -------------
------------------------------------------------------------------
------------------------------------------------------------------
-- IPMP Groups --
Node Name Group Status Adapter Status
--------- ----- ------ ------- ------
IPMP Group: nodename1 sc_ipmp1 Online bge2 Online
IPMP Group: nodename1 sc_ipmp0 Online bge0 Online
-- IPMP Groups in Zones --
Zone Name Group Status Adapter Status
--------- ----- ------ ------- ------
------------------------------------------------------------------
I believe that I should be able to delete the IPMP group for the second node from the cluster and re-add it, but I'm sure about how to go about doing this. I welcome your comments or thoughts on what I can try before rebuilding this node from scratch.
-AG