Rejoining node to Red Hat Cluster and performing the fail over

8:11 PM

Issue: Here is the issue were-in the clusters were out of sync after the cluster resources (rgmanager, fenced, cman and ccsd) were rebooted.
The node db02p was refused the chance to join the cluster, the cluster was no longer in sync and won't let the host join.
Root Cause: From the log it seemed like the node "db02p" was having a different cluster ID stored in its config so it can't join the cluster, when "db01p" came up it formed a NEW cluster, changing its cluster ID.
In the beginning when the 'cman' service was not stopping, the command 'cman_tool leave force' was run on "db01p" which started a 'house of cards' to get us into the state were "db01p" sees itself as a cluster with only 1 vote, and 1 expected vote to share the cluster info between hosts and make "db02p" part of the cluster.
Here is the snippet from the /var/log/messages
cluster.conf (cluster name = mysql_cluster, version = 8) found.
kernel: CMAN: Waiting to join or form a Linux-cluster
ccsd[9736]: Initial status:: Inquorate
ccsd[9736]: Cluster is not quorate. Refusing connection.
kernel: CMAN: forming a new cluster
kernel: CMAN: quorum regained, resuming activity
sjccorvdb02p ccsd[8133]: Cluster is not quorate. Refusing connection.
sjccorvdb02p ccsd[8133]: Error while processing connect: Connection refused

The status of the cluster with an ID of 5556, 1 node and 1 expected vote.
cat /proc/cluster/status
Protocol version: 5.0.1
Config version: 8
Cluster name: mysql_cluster
Cluster ID: 5556
Cluster Member: Yes
Membership state: Cluster-Member
Nodes: 1
Expected_votes: 1
Total_votes: 1
Quorum: 1
Active subsystems: 4
Node name: sjccorvdb01p
Node ID: 1
Node addresses: 10.128.222.20
Solution: In this case we need to be at 2 cluster nodes and 2 expected votes. Here are the steps to rejoin db02p back to cluster.
  • Start cman process on sjccorvdb02p.
    • Start cman process   cman start
  • On host sjccorvdb02p  add host  sjccorvdb02p back to cluster
    • cman_tool join --w 
  • Check cman status and node status on both nodes.
    • Check cman status  cman_tool status  (check cluster ID)
    • Check cman_nodes  cman_tool nodes  (Example below)
Node Sts   Inc   Joined                                  Name
   1       M   980   2012-12-11 09:38:25          sjccorvdb01p
   2       M  1028   2012-12-11 14:53:30         sjccorvdb02p
Check cluster status on host sjccorvdb01p and sjccorvdb02p, nodes and votes should be 2.
           Protocol version: 5.0.1
           Config version: 8
           Cluster name: mysql_cluster
           Cluster ID: 5556
           Cluster Member: Yes
           Membership state: Cluster-Member
           Nodes: 2
           Expected_votes: 1
           Total_votes: 2
           Quorum: 1
           Active subsystems: 4
           Node name: sjccorvdb01p (test on both nodes)
           Node ID: 1
            Node addresses: 10.128.222.20
  • Fail-over services to  sjccorvdb02p to test
    • clusvcadm --r service --m sjccorvdb02p
  • Fail-over services to  sjccorvdb01p to test
    • clusvcadm --r service --m sjccorvdb02p
NOTE: May require restart of both the hosts to ensure proper cluster sync.

You Might Also Like

0 comments

Contact Form

Name

Email *

Message *

Translate

Wikipedia

Search results