Step 2: Configuring Pacemaker and Corosync
So if you haven’t already, make sure pacemaker, corosync, resource-agents and openais are installed. Then we need to apply a small patch to the CTDB resource-agent script in /usr/lib/ocfs/resource.d/heartbeat
. The current CTDB resource-agent script doesn’t manage NFS, which I feel simplifies management tremendously and should help to speed up recovery as CTDB includes scripts to ensure that tickles are sent to clients when services are migrated or fail over occurs.
In addition, I commented out two functions that modify the smb.conf file when CTDB is started or stopped. This will cause csync2 to mark the smb.conf file as being dirty, even though no changes have really occurred.
--- /usr/lib/ocf/resource.d/heartbeat/CTDB 2012-08-08 12:06:01.806465356 -0400 +++ /usr/lib/ocf/resource.d/heartbeat/CTDB 2012-08-09 21:35:47.368810599 -0400 @@ -81,6 +81,8 @@ : ${OCF_RESKEY_ctdb_service_nmb:=""} : ${OCF_RESKEY_ctdb_service_winbind:=""} : ${OCF_RESKEY_ctdb_samba_skip_share_check:=yes} +: ${OCF_RESKEY_ctdb_manages_nfs:=no} +: ${OCF_RESKEY_ctdb_nfs_skip_share_check:=yes} : ${OCF_RESKEY_ctdb_monitor_free_memory:=100} : ${OCF_RESKEY_ctdb_start_as_disabled:=yes} @@ -186,6 +188,24 @@ + + +Should CTDB manage starting/stopping the NFS service for you? + +Should CTDB manage NFS? + + + + + +If there are very many shares it may not be feasible to check that all +of them are available during each monitoring interval. In that case +this check can be disabled. + +Skip share check during monitor? + + + If the amount of free memory drops below this value the node will @@ -371,6 +391,11 @@ else chmod a-x $event_dir/50.samba fi + if ocf_is_true "$OCF_RESKEY_ctdb_manages_nfs"; then + chmod u+x $event_dir/60.nfs + else + chmod a-x $event_dir/60.nfs + fi } # This function has no effect (currently no way to set CTDB_SET_*) @@ -454,6 +479,8 @@ CTDB_SAMBA_SKIP_SHARE_CHECK=$(ocf_is_true "$OCF_RESKEY_ctdb_samba_skip_share_check" echo 'yes' || echo 'no') CTDB_MANAGES_SAMBA=$(ocf_is_true "$OCF_RESKEY_ctdb_manages_samba" echo 'yes' || echo 'no') CTDB_MANAGES_WINBIND=$(ocf_is_true "$OCF_RESKEY_ctdb_manages_winbind" echo 'yes' || echo 'no') +CTDB_MANAGES_NFS=$(ocf_is_true "$OCF_RESKEY_ctdb_manages_nfs" echo 'yes' || echo 'no') +CTDB_NFS_SKIP_SHARE_CHECK=$(ocf_is_true "$OCF_RESKEY_ctdb_nfs_skip_share_check" echo 'yes' || echo 'no') EOF append_ctdb_sysconfig CTDB_SERVICE_SMB $OCF_RESKEY_ctdb_service_smb append_ctdb_sysconfig CTDB_SERVICE_NMB $OCF_RESKEY_ctdb_service_nmb @@ -490,11 +517,12 @@ done # Add necessary configuration to smb.conf - init_smb_conf - if [ $? -ne 0 ]; then - ocf_log err "Failed to update $OCF_RESKEY_smb_conf." - return $OCF_ERR_GENERIC - fi + # This section is commented out because it messes with csync2 synchronizing smb.conf + #init_smb_conf + #if [ $? -ne 0 ]; then + # ocf_log err "Failed to update $OCF_RESKEY_smb_conf." + # return $OCF_ERR_GENERIC + #fi # Generate new CTDB sysconfig generate_ctdb_sysconfig @@ -525,7 +553,8 @@ -d $OCF_RESKEY_ctdb_debuglevel if [ $? -ne 0 ]; then # cleanup smb.conf - cleanup_smb_conf + # This command is commented out because it messes up file synchronization with csync2 + #cleanup_smb_conf ocf_log err "Failed to execute $OCF_RESKEY_ctdbd_binary." return $OCF_ERR_GENERIC @@ -583,7 +612,8 @@ done # Cleanup smb.conf - cleanup_smb_conf + # This commmand is disabled because it messes up file synchronization with csync2 + #cleanup_smb_conf # It was a clean shutdown, return success [ $rv -eq $OCF_SUCCESS ] return $OCF_SUCCESS
Apply that patch then run the command corosync-keygen
. This will generate the authentication keys required to run your cluster securely. It may take some time to generate enough bits. Once it’s done, copy the resulting file, /etc/corosync/authkey
to your cluster nodes. You’ll also need to modify /etc/corosync/corosync.conf
for your
environment. Here is my file for example:
totem { version: 2 # How long before declaring a token lost (ms) token: 5000 # How many token retransmits before forming a new configuration token_retransmits_before_loss_const: 20 # How long to wait for join messages in the membership protocol (ms) join: 1000 # How long to wait for consensus to be achieved before starting a new round of membership configuration (ms) consensus: 7500 # Turn off the virtual synchrony filter vsftype: none # Number of messages that may be sent by one processor on receipt of the token max_messages: 20 # Limit generated nodeids to 31-bits (positive signed integers) clear_node_high_bit: yes # Disable encryption #secauth: off # How many threads to use for encryption/decryption threads: 10 # Optionally assign a fixed node id (integer) nodeid: 0001 # This specifies the mode of redundant ring, which may be none, active, or passive. rrp_mode: none interface { # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 172.24.100.11 mcastaddr: 226.94.1.1 mcastport: 5410 } } amf { mode: disabled } service { # Load the Pacemaker Cluster Resource Manager ver: 1 name: pacemaker } aisexec { user: root group: root } logging { fileline: off to_stderr: yes to_logfile: no to_syslog: yes syslog_facility: local0 debug: off timestamp: on logger_subsys { subsys: AMF debug: off tags: enter|leave|trace1|trace2|trace3|trace4|trace6 } }
Corosync uses multicast to communicate so at the very least, you’ll want to ensure that you select a unique port per-cluster.
Modify the configuration file on all of your hosts, ensuring that you set the optiional nodeid
value to something unique per node and that the bindnetaddr
is set to the IP address that resolves back to the hostname of the node.
Next, we need to enable an openais plugin for Corosync. It’s a requirement if you are going to need OCFS2, like we are. Create the file /etc/corosync/service.d/ckpt
with the following contents:
service { name: openais_ckpt ver: 0 }
Lastly, ensure that the runlevel scripts for Pacemaker and Corosync are disabled. It’s a good practice to bring up the cluster stack manually in case a node gets fenced by the cluster stack or rebooted by the watchdog service. Now lets start configuring resources.
Excellent.
But I would like to see a samba ctdb only from you.
Possible ? 🙂
I could, but samba already has a pretty good explanation of how to do it at ctdb.samba.org. Not to mention, there are many reasons why you would not want to run ctdb, samba and a cluster filesystem without a full blown cluster-stack.
Hi,
When I try and apply the CTDB patch i get the following:
[root@cluster1 heartbeat]# cat ~/ctdb.patch | patch
patching file CTDB
Hunk #1 succeeded at 78 with fuzz 2 (offset -3 lines).
patch: **** malformed patch at line 34: @@ -371,6 +391,11 @@
Any suggestions ?
I am using the latest resource agents from GIT as I am using GlusterFS instead of fighting with DRBD / OCFS2.
I am also running directly on Oracle Linux rather than Centos with the kernel patched in.
Your guide has worked for the majority of it so far with a few teeth gnashes between parts 🙂
Cheers,
Kane.
Hey thanks for the comment and sorry for any troubles. I tried to test as much as possible lol.
Perhaps its the formatting of the patch? Try this db link . Let me know if it works/doesn’t work for you.
If you have time to elaborate, I’d love to hear about any other frustrations or problems you experiences.
Thanks
That worked, thanks.
Most of my problems were getting the ocfs2_controld.pcmk to come up, it would install each time but pacemaker could never start it. dlm_docntold.pcmk was running but there was no /dlm for ocfs2 to attach onto.
Otherwise it was silly things like DRDB tools (8.13) and kernel mod (8.11) are different in Oracle so when you yum update you then have to downgrade the tools or exclude them from the update.
I have to document the build I am doing for work so I will drop you a copy of it, GlusterFS once running seems to have a lot less to go wrong but of course only time and testing will tell.
Cheers
Kane.
MINECRAFT FOR LIFE DONT EVN TRY TRI 360-NOSCOPE ME BRUHHHH IM THE QUEEN OF MINCRAFT… MINECRAFT BLESSES U AND MINECRAFT WILL BE WITH U
LIKE AND FOLLOW ME ON INSTAGRAM