Hi Everyone,
Based on the adventure so far mentioned
http://forum.java.sun.com/thread.jspa?threadID=5258276&tstart=0
and
http://forum.java.sun.com/thread.jspa?threadID=5260436&tstart=0
, I have been able to do the following:
1) Install SC 3.2 on the three nodes
2) Install SC 3.2 on the quorum server (not to be a part of the cluster)
3) Configured the quorum server
4) Configured cluster on first node.
5) Added secondary node the to cluster.
6) Added a HAS-zfs shared disk to both the nodes.
7) Can failover and failback the shared disk resource to and from either nodes.
I am yet to bring the third node in the cluster, so thats for the later and should be trivial.
Now, the problem part:
For some reason I am not able to set the quorum server to an ONLINE state.
The quorum server is connected to these nodes through the public interconnect.
clq shows the following output from node 1:
db03sge0# clq show
Cluster Nodes ===
Node Name: db03sge0
Node ID: 1
Quorum Vote Count: 1
Reservation Key: 0x47AEA43100000001
Node Name: db02sge0
Node ID: 2
Quorum Vote Count: 1
Reservation Key: 0x47AEA43100000002
Quorum Devices ===
Quorum Device Name: cl01qs1
Enabled: yes
Votes: 0
Global Name: cl01qs1
Type: quorum_server
Hosts (enabled): db03sge0
Quorum Server Host: 192.168.11.150
Port: 9000
db03sge0#
and following is the output from clq status
db03sge0# clq status
Cluster Quorum ===
--- Quorum Votes Summary ---
Needed Present Possible
------ ------- --------
2 2 2
--- Quorum Votes by Node ---
Node Name Present Possible Status
--------- ------- -------- ------
db03sge0 1 1 Online
db02sge0 1 1 Online
--- Quorum Votes by Device ---
Device Name Present Possible Status
----------- ------- -------- ------
cl01qs1 0 0 Offline
db03sge0#
On the quorum server, the /etc/scqsd/scqsq.conf file has the following entry:
/usr/cluster/lib/sc/scqsd -i cl01qs1 -p 9000 -d /var/scqsd
And the clqs command on the quorum server shows:
storage-skge0# clqs show
=== Quorum Server on port 9000 ===
--- Cluster cl01 (id 0x479C227E) Reservation ---
--- Cluster cl01 (id 0x479C227E) Registrations ---
--- Cluster cl01 (id 0x47A2B1F7) Reservation ---
--- Cluster cl01 (id 0x47A2B1F7) Registrations ---
--- Cluster cl01 (id 0x47AEA431) Reservation ---
--- Cluster cl01 (id 0x47AEA431) Registrations ---
=== Quorum Server on port 9002 ===
Quorum server on port "9002" is not configured in any cluster.
storage-skge0#
Even if I try to bring the quorum server online by typing the following command, it just does not work.
db03sge0# clq enable cl01qs1
db03sge0# clq show
Cluster Nodes ===
Node Name: db03sge0
Node ID: 1
Quorum Vote Count: 1
Reservation Key: 0x47AEA43100000001
Node Name: db02sge0
Node ID: 2
Quorum Vote Count: 1
Reservation Key: 0x47AEA43100000002
Quorum Devices ===
Quorum Device Name: cl01qs1
Enabled: yes
Votes: 0
Global Name: cl01qs1
Type: quorum_server
Hosts (enabled): db03sge0
Quorum Server Host: 192.168.11.150
Port: 9000
db03sge0# clq status
Cluster Quorum ===
--- Quorum Votes Summary ---
Needed Present Possible
------ ------- --------
2 2 2
--- Quorum Votes by Node ---
Node Name Present Possible Status
--------- ------- -------- ------
db03sge0 1 1 Online
db02sge0 1 1 Online
--- Quorum Votes by Device ---
Device Name Present Possible Status
----------- ------- -------- ------
cl01qs1 0 0 Offline
db03sge0#
I don't see any messages in /var/adm/messages.
As of now, I am at a total loss as to what should be my next step.
Just one thing, and I dont know if its relevant, as per the documentation, the following network configuration should be done for the cluster to be able to use the external server as a quorum server. I have not done any of these steps in my setup, except the one which is for the hosts file entry of quorum server's IP.
##################################################
Make sure that all Sun Cluster nodes are online and can communicate with the Sun Cluster Quorum Server.
1.
Ensure that network switches that are directly connected to cluster nodes meet one of the following criteria:
*
The switch supports Rapid Spanning Tree Protocol (RSTP).
*
Fast port mode is enabled on the switch.
One of these features is required to ensure immediate communication between cluster nodes and the quorum server. If this communication is significantly delayed by the switch, the cluster interprets this prevention of communication as loss of the quorum device.
2.
If the public network uses variable-length subnetting, also called Classless Inter-Domain Routing (CIDR), modify the following files on each node.
If you use classful subnets, as defined in RFC 791, you do not need to perform these steps.
1.
Add to the /etc/inet/netmasks file an entry for each public subnet that the cluster uses.
The following is an example entry which contains a public-network IP address and netmask:
10.11.30.0 255.255.255.0
2.
Append netmask + broadcast + to the hostname entry in each /etc/hostname.adapter file.
nodename netmask + broadcast +
3.
On each node in the cluster, add the quorum server host name to the /etc/inet/hosts file or the /etc/inet/ipnodes file.
Add a host name-to-address mapping to the file, such as the following.
ipaddress qshost1
ipaddress
The IP address of the computer where the quorum server is running.
qshost1
The host name of the computer where the quorum server is running.
4.
If you use a naming service, add the quorum server host's name-to-address mapping to the name-service database.
######################################################
So, any help would be much appreciated.
Thanks in advance,
tualha