Rejoining node to Red Hat Cluster and performing the fail over
8:11 PM
Issue: Here is the issue were-in the clusters were
out of sync after the cluster resources (rgmanager, fenced, cman and
ccsd) were rebooted.
The node db02p was refused the chance to join the cluster, the cluster was no longer in sync and won't let the host join.
Root Cause: From the log it seemed like the node "db02p" was having a different cluster ID stored in its config so it can't join the cluster, when "db01p" came up it formed a NEW cluster, changing its cluster ID.
In the beginning when the 'cman' service was not stopping, the command 'cman_tool leave force' was run on "db01p" which started a 'house of cards' to get us into the state were "db01p" sees itself as a cluster with only 1 vote, and 1 expected vote to share the cluster info between hosts and make "db02p" part of the cluster.
Here is the snippet from the /var/log/messages
cluster.conf (cluster name = mysql_cluster, version = 8) found.
kernel: CMAN: Waiting to join or form a Linux-cluster
ccsd[9736]: Initial status:: Inquorate
ccsd[9736]: Cluster is not quorate. Refusing connection.
kernel: CMAN: forming a new cluster
kernel: CMAN: quorum regained, resuming activity
sjccorvdb02p ccsd[8133]: Cluster is not quorate. Refusing connection.
sjccorvdb02p ccsd[8133]: Error while processing connect: Connection refused
The status of the cluster with an ID of 5556, 1 node and 1 expected vote.
cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 8
Cluster name: mysql_cluster
Cluster ID: 5556
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 4
Node name: sjccorvdb01p
Node ID: 1
Node addresses: 10.128.222.20
Solution: In this case we need to be at 2 cluster nodes and 2 expected votes. Here are the steps to rejoin db02p back to cluster.
1 M 980 2012-12-11 09:38:25 sjccorvdb01p
2 M 1028 2012-12-11 14:53:30 sjccorvdb02p
Check cluster status on host sjccorvdb01p and sjccorvdb02p, nodes and votes should be 2.
Protocol version: 5.0.1
Config version: 8
Cluster name: mysql_cluster
Cluster ID: 5556
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 4
Node name: sjccorvdb01p (test on both nodes)
Node ID: 1
Node addresses: 10.128.222.20
The node db02p was refused the chance to join the cluster, the cluster was no longer in sync and won't let the host join.
Root Cause: From the log it seemed like the node "db02p" was having a different cluster ID stored in its config so it can't join the cluster, when "db01p" came up it formed a NEW cluster, changing its cluster ID.
In the beginning when the 'cman' service was not stopping, the command 'cman_tool leave force' was run on "db01p" which started a 'house of cards' to get us into the state were "db01p" sees itself as a cluster with only 1 vote, and 1 expected vote to share the cluster info between hosts and make "db02p" part of the cluster.
Here is the snippet from the /var/log/messages
cluster.conf (cluster name = mysql_cluster, version = 8) found.
kernel: CMAN: Waiting to join or form a Linux-cluster
ccsd[9736]: Initial status:: Inquorate
ccsd[9736]: Cluster is not quorate. Refusing connection.
kernel: CMAN: forming a new cluster
kernel: CMAN: quorum regained, resuming activity
sjccorvdb02p ccsd[8133]: Cluster is not quorate. Refusing connection.
sjccorvdb02p ccsd[8133]: Error while processing connect: Connection refused
The status of the cluster with an ID of 5556, 1 node and 1 expected vote.
cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 8
Cluster name: mysql_cluster
Cluster ID: 5556
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 4
Node name: sjccorvdb01p
Node ID: 1
Node addresses: 10.128.222.20
Solution: In this case we need to be at 2 cluster nodes and 2 expected votes. Here are the steps to rejoin db02p back to cluster.
- Start cman process on sjccorvdb02p.
- Start cman process cman start
- On host sjccorvdb02p add host sjccorvdb02p back to cluster
- cman_tool join --w
- Check cman status and node status on both nodes.
- Check cman status cman_tool status (check cluster ID)
- Check cman_nodes cman_tool nodes (Example below)
1 M 980 2012-12-11 09:38:25 sjccorvdb01p
2 M 1028 2012-12-11 14:53:30 sjccorvdb02p
Check cluster status on host sjccorvdb01p and sjccorvdb02p, nodes and votes should be 2.
Protocol version: 5.0.1
Config version: 8
Cluster name: mysql_cluster
Cluster ID: 5556
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 2
Expected_votes: 1
Total_votes: 2
Quorum: 1
Active subsystems: 4
Node name: sjccorvdb01p (test on both nodes)
Node ID: 1
Node addresses: 10.128.222.20
- Fail-over services to sjccorvdb02p to test
- clusvcadm --r service --m sjccorvdb02p
- Fail-over services to sjccorvdb01p to test
- clusvcadm --r service --m sjccorvdb02p
0 comments