Finding the HACMP configuration instance #

It’s possible for the HACMP configuration between two different nodes to be out of sync.  Or, you may want to push a config from one node to another.  We had one admin make changes to a down node, then try to sync the cluster.  To clean it up, we had to figure out which node had the latest config. If you want to see which configuration instance number each of your HACMP nodes is using, you can run:

lssrc -l -s topsvcs | grep Instance

or:

odmget HACMPtopsvcs

secldapclntd Failure w/ HACMP

After upgrading to AIX 5.3 TL9 SP4, we found that secldapclntd will go into a death loop during a HACMP failover. It consumes more and more CPU until the system doesn’t have any capacity left, and stops the HACMP failover. Killing secldapclntd will let HACMP continue.

We didn’t see this behavior w/ AIX 5.3 TL8 SP3. IBM has identified a couple of issues that are probably coming together to cause our problem, but they won’t be fixed in TL9… ever. IBM’s work-around is to setup a pre and post-event script to stop secldapclntd before the IP takeover (and release) and restart it afterward. In testing, this works pretty well, and it only takes a few seconds to stop and start secldapclntd.

Here’s the workaround by IBM:

  1. Create a script “/usr/local/cluster/start-ldap.sh” and copy it to every node in the cluster
  2. #!/usr/bin/ksh
    echo “STOP LDAP CLIENT Daemon”
    /usr/sbin/start-secldapclntd
    exit 0

  3. Create a script “/usr/local/cluster/stop-ldap.sh” and copy it to every node in the cluster
  4. #!/usr/bin/ksh
    echo “STOP LDAP CLIENT Daemon”
    /usr/sbin/stop-secldapclntd
    exit 0

  5. Create a pre-event
  6. smitty hacmp
    Extended Configuration
    Extended Event Configuration
    Configure Pre/Post-Event Commands
    Add a Custom Cluster Event

    Type or select values in entry fields.
    Press Enter AFTER making all desired changes.
    [Entry Fields]

    * Cluster Event Name [] << I use the name "ldapClientStart" * Cluster Event Description [] << Start ldap client daemon * Cluster Event Script Filename [] << /usr/local/cluster/start-ldap.sh

  7. Create a post-event
  8. smitty hacmp
    Extended Configuration
    Extended Event Configuration
    Configure Pre/Post-Event Commands
    Add a Custom Cluster Event

    Type or select values in entry fields.
    Press Enter AFTER making all desired changes.
    [Entry Fields]

    * Cluster Event Name [] << I use the name "ldapClientStop" * Cluster Event Description [] << Stop ldap client daemon * Cluster Event Script Filename [] << /usr/local/cluster/stop-ldap.sh

  9. Update the acquire_service_addr event
  10. smitty hacmp
    Extended Configuration
    Extended Event Configuration
    Change/Show Pre-Defined HACMP Events
    (select –> acquire_service_addr )
    Change/Show Cluster Events

    Type or select values in entry fields.
    Press Enter AFTER making all desired changes.
    [Entry Fields]

    Event Name acquire_service_addr
    Pre-event Command [ldapClientStop] <<< the defined name "ldapClientStop" Post-event Command [ldapClientStart] <<< the defined name "ldapClientStart"

  11. Update the release_service_addr event
  12. smitty hacmp
    Extended Configuration
    Extended Event Configuration
    Change/Show Pre-Defined HACMP Events
    (select –> release_service_addr )
    Change/Show Cluster Events

    Type or select values in entry fields.
    Press Enter AFTER making all desired changes.
    [Entry Fields]

    Event Name release_service_addr
    Pre-event Command [ldapClientStop] <<< the defined name "ldapClientStop" Post-event Command [ldapClientStart] <<< the defined name "ldapClientStart"

  13. Synchronize the cluster configuration, and you’re done

Testing Disk Heartbeats

To test your disk heartbeats, you can look at the output of “cllsif” or “lssrc -ls topsvcs”, or you can actively test them. IBM provides a command to do this. First, find the devices associated with the disk HB VG, I’ll assume hdisk4 on nodeA and hdisk5 on nodeB.

Enable cluster encryption

For more security you can make your cluster use encryption for inter-node communication with no downtime.  Otherwise operations are allowed or rejected based on IP address, hostname, and the cluster rhosts file.  And, C-SPOC operations are not encrypted one of the important ones being password changes.  Possibly an even better option would be to create a IPsec VPN tunnel between nodes, but I haven’t tested that.

Lazy Update – HACMP

On a multi-node HACMP cluster without enhanced concurrent VGs, anytime you add a LV to a volume group, you have to make sure the other nodes see the LV. This will also fix other VG out of sync issues. You can either take everything down and do an importvg on all the nodes, or you can do a “Lazy Update”: