HACMP Notes - Patrick Vaughan's Web Site

Finding the HACMP configuration instance #

It’s possible for the HACMP configuration between two different nodes to be out of sync. Or, you may want to push a config from one node to another. We had one admin make changes to a down node, then try to sync the cluster. To clean it up, we had to figure out which node had the latest config. If you want to see which configuration instance number each of your HACMP nodes is using, you can run:

lssrc -l -s topsvcs | grep Instance

or:

odmget HACMPtopsvcs

secldapclntd Failure w/ HACMP

After upgrading to AIX 5.3 TL9 SP4, we found that secldapclntd will go into a death loop during a HACMP failover. It consumes more and more CPU until the system doesn’t have any capacity left, and stops the HACMP failover. Killing secldapclntd will let HACMP continue.

We didn’t see this behavior w/ AIX 5.3 TL8 SP3. IBM has identified a couple of issues that are probably coming together to cause our problem, but they won’t be fixed in TL9… ever. IBM’s work-around is to setup a pre and post-event script to stop secldapclntd before the IP takeover (and release) and restart it afterward. In testing, this works pretty well, and it only takes a few seconds to stop and start secldapclntd.

Here’s the workaround by IBM:

Create a script “/usr/local/cluster/start-ldap.sh” and copy it to every node in the cluster

#!/usr/bin/ksh
echo “STOP LDAP CLIENT Daemon”
/usr/sbin/start-secldapclntd
exit 0

Create a script “/usr/local/cluster/stop-ldap.sh” and copy it to every node in the cluster

#!/usr/bin/ksh
echo “STOP LDAP CLIENT Daemon”
/usr/sbin/stop-secldapclntd
exit 0

Create a pre-event

smitty hacmp
Extended Configuration
Extended Event Configuration
Configure Pre/Post-Event Commands
Add a Custom Cluster Event

Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]

* Cluster Event Name [] << I use the name "ldapClientStart" * Cluster Event Description [] << Start ldap client daemon * Cluster Event Script Filename [] << /usr/local/cluster/start-ldap.sh

Create a post-event

smitty hacmp
Extended Configuration
Extended Event Configuration
Configure Pre/Post-Event Commands
Add a Custom Cluster Event

Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]

* Cluster Event Name [] << I use the name "ldapClientStop" * Cluster Event Description [] << Stop ldap client daemon * Cluster Event Script Filename [] << /usr/local/cluster/stop-ldap.sh

Update the acquire_service_addr event

smitty hacmp
Extended Configuration
Extended Event Configuration
Change/Show Pre-Defined HACMP Events
(select –> acquire_service_addr )
Change/Show Cluster Events

Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]

Event Name acquire_service_addr
Pre-event Command [ldapClientStop] <<< the defined name "ldapClientStop" Post-event Command [ldapClientStart] <<< the defined name "ldapClientStart"

Update the release_service_addr event

smitty hacmp
Extended Configuration
Extended Event Configuration
Change/Show Pre-Defined HACMP Events
(select –> release_service_addr )
Change/Show Cluster Events

Type or select values in entry fields.
Press Enter AFTER making all desired changes.
[Entry Fields]

Event Name release_service_addr
Pre-event Command [ldapClientStop] <<< the defined name "ldapClientStop" Post-event Command [ldapClientStart] <<< the defined name "ldapClientStart"

Synchronize the cluster configuration, and you’re done

Testing Disk Heartbeats

To test your disk heartbeats, you can look at the output of “cllsif” or “lssrc -ls topsvcs”, or you can actively test them. IBM provides a command to do this. First, find the devices associated with the disk HB VG, I’ll assume hdisk4 on nodeA and hdisk5 on nodeB.

HACMP timestamp inconsistent

If the timestamps between the nodes in a Concurrent VG get out of sync, you can get:
WARNING: The HACMP timestamp file for shared volume group: vg1
is inconsistent with the time stamp in the VGDA for the
following nodes: node1

Enable cluster encryption

For more security you can make your cluster use encryption for inter-node communication with no downtime. Otherwise operations are allowed or rejected based on IP address, hostname, and the cluster rhosts file. And, C-SPOC operations are not encrypted one of the important ones being password changes. Possibly an even better option would be to create a IPsec VPN tunnel between nodes, but I haven’t tested that.

List available major device numbers

When creating HACMP concurrent volume groups, it’s necessary to sync the major device numbers between the nodes. To see what each node has available for major devices run:

lvlstmajor

Lazy Update – HACMP

On a multi-node HACMP cluster without enhanced concurrent VGs, anytime you add a LV to a volume group, you have to make sure the other nodes see the LV. This will also fix other VG out of sync issues. You can either take everything down and do an importvg on all the nodes, or you can do a “Lazy Update”: