Securing OpenSSH

I was recently researching the latest guidance on securing OpenSSH and came across a web page on a popular site espousing that the easiest way to protect OpenSSH is to define a login banner. While a login banner is useful, especially in a enterprise setting, it’s useless for securing SSH. So, here is my recipe for securing OpenSSH. While testing these, ALWAYS keep a connection open. It’s very easy to break something and if you don’t already have an open connection, you will have successfully locked yourself out.

  • Change the SSH port. I’m using 8022 in this example, you can use any port you like. This may not be practical in every setting and it’s of marginal value because SSH will report what it is when you connect. But, unless someone is doing an exhaustive search of the open ports on your server, they probably won’t find your open SSH port. Assuming your have selinux enabled, which I recommend, you must first allow SSH to use the target port number. First, review the current selinux context for the port:
    # semanage port -l | grep 8022
    oa_system_port_t tcp 8022
    

    You can see, this port is already in use by oa-system. We aren’t using oa-system and nothing else is using port 8022, so we can modify the context to allow ssh to use it:

    semanage port -m -t ssh_port_t -p tcp 8022
    

    Now when we look, we see that both contexts are applied:

    semanage port -l | grep 8022
    oa_system_port_t tcp 8022
    ssh_port_t tcp 8022, 22
    

    Now, you just need to tell SSH to use the above port by changing the Port line in /etc/ssh/shsd_config:

    Port 8022
    
  • Disable IPv6 if your not using it. Some people think this is stupid because they’re not using IPv6 yet. My take on it is that if your not using IPv6 you’re probably not watching the security of it. For instance if you’re setting IPv4 firewalls, but ignore IPv6 and leave IPv6 addresses enabled on your server, an attacker can probably connect to your server. If you’re not using it, just turn it off to lower the attack surface of your servers. In /etc/ssh/sshd_config change the AddressFamily line:
    AddressFamily inet
  • Set the address SSH listens on. By default OpenSSH will listen to every IP address. You may not want this if you have multiple IP addresses defined on your server, again reducing the attack surface. To do this, update the ListenAddress line with your IP address, you can specify multiple lines:
    ListenAddress 192.168.1.1
  • Set the SSH protocol to version 2 only. On any modern version of OpenSSH, this is the default, but I specify this anyway. Version 1 has been deprecated for years. Uncomment the Protocol line in /etc/ssh/sshd_config:
    Protocol 2
    
  • Disable week keys. You should disable any version1 or DSA keys. This leaves RSA, ECDSA and ED25519 keys enabled. To do this review any HostKey lines in /etc/ssh/sshd_config and comment out ssh_host_key or ssh_host_dsa_key:
    #HostKey /etc/ssh/ssh_host_key
    #HostKey /etc/ssh/ssh_host_dsa_key

    While you’re looking at the, review the remaining HostKey directives. You can verify the number of bits in the key and the encryption cipher by running this command against them:

    ssh-keygen -lf FILENAME
  • Disable weak ciphers. We’ll want to remove older weak ciphers. You’ll want to test this before rolling it out widely. I’ve found some very old SSH and SCP/SFTP clients don’t support some of the newer ciphers. Update or add these lines to /etc/ssh/sshd_config:
    Ciphers aes256-ctr,aes192-ctr,aes128-ctr,arcfour256
    MACs hmac-sha2-256,hmac-sha2-256-etm@openssh.com,hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com

    Note: these are valid for RHEL/Centos 7, refer to the sshd_config man page for a list of ciphers valid for your specific version.

  • Validate logging. Check the SyslogFacility and LogLevel lines in /etc/ssh/sshd_config and verify that you are logging those in syslog.
  • Disable root logins. Don’t allow users to login as root directly. Users should ideally login with their own IDs and then run whatever they need to run as root with sudo. To disable root logins uncomment or update the PermitRootLogin line:
    PermitRootLogin no
  • Set the maximum login attempts. By default users get 6 attempts to get their password right. Your organization may set this lower. Once the user has used half their attempts, the remaining attempts are logged. To change this, update the MaxAuthTries setting.
  • Set strict file permission checking, this is the default. Leaving StrictModes set to “yes” in /etc/ssh/sshd_config tells OpenSSH to only use files that have restrictive permissions. This ensures that other users can’t modify or access ssh key files.
  • Enable Public Key Authentication, this is the default. Leaving PubkeyAuthentication set to yes allows us to use public/private keys to authenticate. It’s recommended to use key authentication, with a password on the key. This is analogous to two factor authentication, you must have the private key and know the password.
  • Disable host based authentication.Host based authentication works like the old rhosts or hosts.equiv method. It means that if the login would be permitted by $HOME/.rhosts, $HOME/.shosts, /etc/hosts.equiv, or /etc/shosts.equiv, and the server can verify the client’s host key (see /etc/ssh/ssh_known_hosts), then the login is permitted without a password. This is only slightly more secure than rsh because is prevents IP spoofing and DNS spoofing but are still terribly insecure and are specifically disallowed by most organizations. To disallow this feature, set these options in /etc/ssh/sshd_config:
    RhostsRSAAuthentication no
    Hostbasedauthentication no
    IgnoreRhosts yes

    If you really need password-less authentication, most security policies will allow you to create a role or service account and setup user private key authentication.

  • Disallow empty passwords. If a user ID has a blank password, we don’t want that user ID to be able to login. Set PermitEmptyPasswords in /etc/ssh/sshd_config:
    PermitEmptyPasswords no
  • Disallow password authentication. This may not be possible in all environments. If possible, we dis-allow password logins, forcing users to use encryption keys. To do this, disable these options in /etc/ssh/sshd_config:
    PasswordAuthentication no
    ChallengeResponseAuthentication no
    
  • Set pre-login banner. Yes, I set this option. It’s informational to the person connecting and required by a lot of corporate standards. The pre-login banner usually contains a warning that you’re connecting to a private system, that the access is logged, and a warning not to login if that’s a problem. Update /etc/issue.net with the text you want to display. By default on a lot of systems it shows the kernel version, giving an attacker a little more information. Once you’ve updated your /etc/issue.net file, update the Banner line in /etc/ssh/sshd_config:
    Banner /etc/issue.net

    You may also want to update /etc/motd. This file is displayed AFTER the user logs in. I usually put in some banner, usually including the hostname and if the system is test or production. There are a lot of times that something as simple as seeing the hostname after the login keeps people from doing something they shouldn’t. I usually include the hostname in the prompt as well. And I can’t tell you the number of times someone has said “oh, I thought this was the TEST system” right after doing something stupid to their production server. This heads off a lot of those issues.

Once you have everything the way you want it, restart sshd and test:

service sshd restart

Is it Power7 or is it Power7+?


UPDATED

Last year I budgeted for 3 P740C model’s to replace 3 P6 550 models that were getting long in the tooth. Because of the long lead time in our budget process and the continued downward pressure from IBM on their pricing, I was able to purchase 4 P7+ 740D models.  That is a big win for us.

After implementing new 7042-CR7 model HMCs (which I recommend everyone upgrade to) and powering on our first box, I noticed that the latest HMC code reports the server has a Power7 and not a Power7+.  The Power7 chip has been out for nearly a year, and the HMC has been through several updates since then, so why does it not show Power7+ the way it did for Power6+?  Here’s what the screen looks like:

HMC CPU Mode

So, what does the LPAR say when it’s powered on?  Everywhere I look, it’s Power7.  Here’s what the system thinks the CPU is:

nim # lsattr -El proc0
frequency   4228000000     Processor Speed       False
smt_enabled true           Processor SMT enabled False
smt_threads 4              Processor SMT threads False
state       enable         Processor state       False
type        PowerPC_POWER7 Processor type        False

And prtconf:

nim # prtconf 
System Model: IBM,8205-E6D
Machine Serial Number: 
Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 7
Processor Version: PV_7_Compat

I do have a Power7 server running in Power6+ compatibility mode, here’s the output of prtconf on that server:

# prtconf
System Model: IBM,8202-E4B
Machine Serial Number: 10418BP
Processor Type: PowerPC_POWER7
Processor Implementation Mode: POWER 6
Processor Version: PV_6_Compat

So, maybe the OS commands aren’t aware of the CPU compatibility mode.  This is the latest firmware and the latest AIX 7.1 level.  I’m also running the latest HMC code, and I’ve confirmed the same behavior in the latest VIOS level (2.2.2.2).

Of course, the question was asked, did we really get what we paid for? So, I called my IBM Business Partner and asked their Technical sales team to dig into this.  The box does have Power7+ processors, so it’s wasn’t mis-ordered and it WAS built correctly in the factory. They reached out to some other customers running a new P7+ 770, and they’ve confirmed the same behavior there, so I assume this is the same across the product line.

Then I had a bit of luck. As part of this upgrade, I’m testing AME on our non-production servers.  The amepat tool, shows the correct processor mode:

nim # amepat

Command Invoked                : amepat

Date/Time of invocation        : Fri Sep 27 11:53:38 EDT 2013
Total Monitored time           : NA
Total Samples Collected        : NA

System Configuration:
---------------------
Partition Name                 : nim
Processor Implementation Mode  : POWER7+ Mode
Number Of Logical CPUs         : 4
Processor Entitled Capacity    : 0.10
Processor Max. Capacity        : 1.00
True Memory                    : 4.00 GB
SMT Threads                    : 4
Shared Processor Mode          : Enabled-Uncapped
Active Memory Sharing          : Disabled
Active Memory Expansion        : Enabled
Target Expanded Memory Size    : 8.00 GB
Target Memory Expansion factor : 2.00

There we see the expected Power7+ mode.  This command works and reports the processor correctly on systems without AME enabled, so it can be used on any LPAR to show the correct processor type for Power7+ systems.  Here is the output on our Power7 LPAR running in Power6+ mode:

# amepat
Command Invoked : amepat
Date/Time of invocation : Wed Oct 2 12:41:43 EDT 2013
Total Monitored time : NA
Total Samples Collected : NA
System Configuration:
---------------------
Partition Name : tsm1
Processor Implementation Mode : POWER6

So, amepat doesn’t report Power6+ for Power7 systems running in Power6+ mode.

Our IBM client team is looking into this issue, and I expect the relevant commands will be enhanced in a future service pack and HMC level.  But, in the mean time, we can prove that what we ordered is what was delivered.

UPDATE :

IBM’s answer:

Historically IBM has not included the “+” on any of our products (ie Power 5+, Power6 or Power7+).  You can open a PMR and request a Design Change Request (DCR) to have the “+” added for Power7 servers.

That is an interesting answer to me.  We never purchased any Power6+ servers, so I can’t comment on what the OS commands, lsattr and the like, may or may not report. But, the HMC most definitely did report a separate compatibility mode for Power6+. My only thought is that the Power7+ CPU didn’t introduce a new operational mode, which is a little surprising to me because of some of the work done in this chip.

Sending AIX Syslog Data to Splunk

I recently put up a test Splunk server to act as a central syslog repository, one of the issues in our security audits. There are some “open” projects to do this, but Splunk has a lot of features and is “pretty” compared to some of the open alternatives. Getting data from our Linux hosts was a snap, but data from our AIX hosts has a few minor annoyances. Fortunately, we were able to overcome them.

The syslogd shipped with AIX only supports UDP. rsyslog supports TCP, but hasn’t been ported to AIX. Another option is syslog-ng, for which there are open source and commercial versions compiled for AIX. But, after installing all the dependent RPMs for the open source version, it would only segfault with no indication of the problem. So, to support syslog via UDP, on the Splunk server you have to enable a UDP source. That’s easily accomplished by going to Manager -> Data Inputs -> UDP -> New, enter 514 for the port, set sourcetype to “From List”, and source type of “syslog”. Check “More settings” and select DNS for “Set host” and click Save.

Once that is done, add a line to /etc/syslog.conf on the source node to send the data you want Splunk to record to the Splunk server. If your splunk server is named “splunk” it would look something like this:

*.info        @splunk

One of the problems with AIX’s implementation of syslog is it’s format. Here’s what Splunk records:

3/26/13 12:32:07.000 PM	Mar 26 12:32:07 HOSTNAME Mar 26 12:32:07 Message forwarded from HOSTNAME: sshd[21168310]: Accepted publickey for root from xxx.xxx.xxx.xxx port 39508 ssh2 host=HOSTNAME   Options|  sourcetype=syslog   Options|  source=udp:514   Options|  process=HOSTNAME

The AIX implementation of syslog by default adds “Message forwarded from HOSTNAME:”. That’s a little annoying to look at, but worse is that Splunk uses the hostname of the source as the process name, so you lose the ability to search on the process field. You can turn this off on the source with:

stopsrc -s syslogd
chssys -s syslogd -a "-n"
startsrc -s syslogd

OK, I’m Converted. I Like AIXPert

As usual, I’m late to the party. I was at the Power Systems Technical University in San Antonio several years ago (an awesome venue), and there was a session on the new AIXPert feature of AIX 6.1 (later back-ported to 5.3). At the time I though it was clunky and wasn’t too excited about it.

It’s really just a bunch of pre-packaged shell scripts, that are defined in an XML file you really need to manage manually (the horror). You run the master aixpert command and specify what XML file you want. You can go with AIX Defaults, Low, Medium, High, and SOX (if you’re into that kind of thing). When you run aixpert, it applies whatever settings are associated with the level you selected. Here is where I usually yawn. It’s just not that exciting. Everyone pretty much does that with a script or text file or some corporate documentation (probably written by someone without a clue, E&Y I’m looking at you). Yes, there are a couple of ways to get to a GUI, but it’s really more manageable for me with the commandline.

The bit that gets tedious is that no one definition is going to fit any one shop. So you have to export the standard definitions to a custom XML file and open it up and hack it by hand. Systems like XML, I don’t really are to read it all day long. But, it’s not that difficult, here’s an example:

<AIXPertEntry name="cust_udp_recvspace" function="udp_recvspace">
    <AIXPertRuleType type="LLS"/>
    <AIXPertDescription>Network option udp_recvspace: Set network option udp_recvspace's value to 655360</AIXPertDescription>
    <AIXPertPrereqList>bos.rte.security,bos.rte.shell,bos.rte.ILS,bos.net.tcp.client,bos.rte.commands,bos.rte.date</AIXPertPrereqList>
    <AIXPertCommand>/etc/security/aixpert/bin/ntwkopts</AIXPertCommand>
    <AIXPertArgs>udp_recvspace=655360 s cust_udp_recvspace</AIXPertArgs>
    <AIXPertGroup>Tune network options</AIXPertGroup>
 </AIXPertEntry>

Nothing too fancy there. It’s mostly fluff, the PrereqList, Command, and Args options are the important ones, the rest of more for the user than anything else. When you apply that XML, the system runs the shell script which sets the appropriate network option. It’s all fairly simple.

The cool part is that most of the things set with aixpert can also be checked with aixpert. When you apply an XML file, aixpert saves the rules that applied to /etc/security/aixpert/core/appliedaixpert.xml. If I modify that setting and run “aixpert -c”, aixpert parses the appliedaixpert.xml file and checks things out. This is what I get:

# aixpert -c
do_action(): rule(cust_udp_recvspace_CA6BE6C2) : failed.
Processedrules=66       Passedrules=65  Failedrules=1   Level=AllRules
        Input file=/etc/security/aixpert/core/appliedaixpert.xml

 

To set the world right again, you just re-apply your XML file. I found a minor issue here. I’ve had to remove the /etc/security/aixpert/core/appliedaixpert.xml file before setting a new one. You can get the same rule in there repeatedly, why IBM doesn’t offer a commandline switch to do that I don’t know.

Another cool thing, you can Undo the changes applied by the built-in aixpert rules. When aixpert applies a setting, it writes an undo rule to /etc/security/aixpert/core/undo.xml. Then, running “aixpert -u” will undo what you’ve already done. I would probably purge that once in a while too so that you can recover to a known good state.

So, just wrap a dirt-simple cron script around it to notify when something goes wrong… Something like this:

#!/usr/bin/perl

$EMAIL_ADDRESS = "user\@domain.net";
@TEMP = `hostname`;
$HOSTNAME = $TEMP[0];
chomp($HOSTNAME);
$MAIL_FROM_ADDR="aixpert@$HOSTNAME";

@OUTPUT = `/usr/sbin/aixpert -c 2>&1`;
@REPORT = grep(/^Processedrules.*/, @OUTPUT);

$REPORT[0] =~ s/\s+/=/g;
($null, $PROCESSED ,$null, $PASSED, $null, $FAILED, $null, $LEVEL) = split(/=/, $REPORT[0]);

if ( $FAILED > 0 ) {
        open (MAIL, "| /usr/sbin/sendmail -t ");
        select (MAIL);

        print "Mime-Version: 1.0\n";
        print "Content-type: text/html; charset=\"iso-8859-1\"\n";
        print "From: $mail_from_addr\n";
        print "To: $EMAIL_ADDRESS <$EMAIL_ADDRESS>\n";
        print "Subject: Aixpert $HOSTNAME: $FAILED test FAILED\n";
        print "<html><head></head><body>\n";
        print "<pre>\n";

        print "@OUTPUT";

        print MAIL "</pre>\n";        print MAIL "</body></html>\n";
        $~ = "STDOUT";
        close (MAIL);

        exit 1;
}
exit 0;

That should keep the auditors happy. There are enough basic security settings in the default XML files that with a little tweaking you can hit all or very nearly all of your security audit queries.

That’s all well and good, as far as that goes, but what really made me like aixpert is that it gives you a very simple framework to apply your own settings, and make sure those settings are correct. If you distribute an XML file and a few scripts around your enterprise, you can ensure that those settings are standardized across hosts too.

Here’s a simple script to make sure the attributes of your vscsi devices are correct:

#!/bin/perl

@ADAPTERS = `lsdev -c adapter -Sa | grep -E "^vscsi" | awk '{ print \$1 }'`;
$REPORT = $ENV{'AIXPERT_CHECK_REPORT'};

%ATTRIBUTES = ("vscsi_err_recov", "fast_fail",
                "vscsi_path_to", 30 );

if ( $REPORT == 1 ) {
        for (@ADAPTERS) {
                chomp($_);
                $ADAPT = $_;
                @TEMP = `lsattr -El $ADAPT | awk '{ print \$1, \$2}'`;
                for (@TEMP) {
                        ($ATTR, $VALUE) = split(/\s+/, $_);

                        if ( $VALUE != $ATTRIBUTES{$ATTR} && $ATTRIBUTES{$ATTR} ) {
                                print "$ADAPT attribute $ATTR is $VALUE, should be $ATTRIBUTES{$ATTR}\n";
                                $FAIL++;
                        }
                }
        }
        if ( $FAIL ) {
                exit 1;
        }
}else {
        for (@ADAPTERS) {
                chomp($_);
                $ADAPT = $_;
                @TEMP = `lsattr -El $ADAPT | awk '{ print \$1, \$2}'`;
                for (@TEMP) {
                        ($ATTR, $VALUE) = split(/\s+/, $_);

                        if ( $VALUE != $ATTRIBUTES{$ATTR} && $ATTRIBUTES{$ATTR} ) {
                                system("/usr/sbin/chdev -l $ADAPT -a $ATTR=$ATTRIBUTES{$ATTR} 2>&1 > /dev/null");
                        }
                }
        }
}

Using a script like this as a template, you can check and correct the value of any number of system attributes. Adding it to aixpert is a breeze, just add a stanza to your XML file:

<AIXPertEntry name="cust_vscsi_config" function="vscsi_config">
    <AIXPertRuleType type="MLS"/>
    <AIXPertDescription>Resets attributes of the vscsi devices</AIXPertDescription>
    <AIXPertPrereqList></AIXPertPrereqList>
    <AIXPertCommand>/etc/security/aixpert/custom/vscsi_config.pl</AIXPertCommand>
    <AIXPertArgs>GENERIC</AIXPertArgs>
    <AIXPertGroup>Custom Rules</AIXPertGroup>
</AIXPertEntry>

Once you decide to get into it, aixpert is a pretty nice little tool. There’s a great little movie created by Nigel Griffiths at the DeveloperWorks website to get you started too!

But, I still don’t care for System Director. 🙂

CODBL0004W in IBM License Metric Tool

After installing the IBM License Metric Tool, you might see:
CODBL0004W
Essential periodic calculations did not occur when expected. The last day processed is Apr 25, 2011 while it should be Apr 29, 2011.

By default the tool processes the data collected 2 days prior, so you’ll see the specified dates are a few days old. IBM wants you to collect a bunch of data, and open a ticket, but you may be able to correct this yourself. In CODIF8140E Essential periodic calculations did not occur when expected IBM tells you that it’s probable that the TLMSRV user doesn’t have the correct privileges to the database, and to turn on debugging and send the logs to IBM. At the bottom of the page, it tells you what is actually needed:
Direct CREATETAB authority = YES 
Direct BINDADD authority = YES 
Direct CONNECT authority = YES

You can save yourself a lot of time and check it yourself. 'su' to your db2 user (probably db2inst1), run db2, and connect to the database as the tlmsrv user:
db2 => connect to TLMA user TLMSRV using XXXXXXX

   Database Connection Information

 Database server        = DB2/AIX64 9.7.0
 SQL authorization ID   = TLMSRV
 Local database alias   = TLMA
Then run 'get authorizations':
db2 => get authorizations

 Administrative Authorizations for Current User

 ...
 Direct CREATETAB authority                 = NO
 Direct BINDADD authority                   = YES
 Direct CONNECT authority                   = YES
...
 Indirect CREATETAB authority               = YES
 Indirect BINDADD authority                 = YES
 Indirect CONNECT authority                 = YES
...

If any of those direct and indirect privilleges say no, you can grant the privilege to the user. If the privileges are OK, you can skip this step. First, re-connect to the database to get the necessary privileges:
db2 => connect to TLMA

   Database Connection Information

 Database server        = DB2/AIX64 9.7.0
 SQL authorization ID   = DB2INST1
 Local database alias   = TLMA

Then grant the appropriate privilege:

db2 => grant CREATETAB on database to TLMSRV
DB20000I  The SQL command completed successfully.
db2 => grant BINDADD on database to TLMSRV
DB20000I  The SQL command completed successfully.
db2 => grant CONNECT on database to TLMSRV
DB20000I  The SQL command completed successfully.

My Privileges are OK, what do I do now?


In IZ98530: NO INFO: CPU CORE ON PARTITION/LOGICAL CPU CORE ON PARTITION there is a note that this can also be caused by known issues in populating the database during installation. This SHOULD be fixed in version 7.2.2, but you can do some simple steps to correct the issue.

First, login to the database as above and reset the Tier Table Version field:

db2 => connect to TLMA

   Database Connection Information

 Database server        = DB2/AIX64 9.7.0
 SQL authorization ID   = DB2INST1
 Local database alias   = TLMA

db2 => update adm.CONTROL set value = '2010-10-01' where name = 'TIER_TABLE_VERSION'
DB20000I  The SQL command completed successfully.

Now go to the ILMT web page and import a new Tiers XML file. Navigate to License Metric Tool -> Administration -> Import Systems Tier Table -> Import. I manually click on the link provided there and download the file, then import that file via the provided form. If you have Internet access setup, you can have the tool download it for you also.

Now stop the ILMT server, and go back to DB2. We need to clear out a table, the command in the website isn't quite right. Here's what I used:

db2 => connect to tlma

   Database Connection Information

 Database server        = DB2/AIX64 9.7.0
 SQL authorization ID   = DB2INST1
 Local database alias   = TLMA

db2 => select * from adm.PRD_AGGR_TIME
...
  3 record(s) selected.

db2 => delete from adm.PRD_AGGR_TIME
DB20000I  The SQL command completed successfully.
db2 => select * from adm.PRD_AGGR_TIME
...
  0 record(s) selected.

While we're in there, we need to reset the LAST_AGGREGATE_STEP field:

db2 => update adm.CONTROL set value = '0' where name = 'LAST_AGGREGATE_STEP'
DB20000I  The SQL command completed successfully.

Now restart the ILMT processes, wait 24 hours, and see if the problem goes away. If not, you're back to turning debugging up and calling IBM. But hopefully it won't come to that.

Smit 1800-109 Error With Printers

I’ve recently found some of our systems have corrupt smit screens when looking at printer queue characteristics. When looking at any options under “smit chpq” for some of the printers, we got:

 1800-109 There are currently no additional
SMIT screen entries available for this
item.  This item may require installation of
additional software before it can be accessed.

The message clearly points to missing filesets. But printers.rte, bos.rte.printers, and the printer device filesets ( like printers.hplj-4si.rte) were all installed and up to date. The problem is that the ODM stanzas for the printers aren’t correct. The queue subsystem looks a files under /var/spool/lpd/pio/@local to do the printing, but smit looks in the ODM.

So, there’s a quick fix. Find the files for the offending printer:

ls /var/spool/lpd/pio/@local/custom | grep queuename
queuename:hp@queuename
queuenameps:hp@queuename

Then just run the piodigest command to read in the colon file and recreates the ODM stanzas:

/usr/lib/lpd/pio/etc/piodigest /var/spool/lpd/pio/@local/custom/queuename:hp@queuename

After that, the smit screens were available again.

LPAR Memory Overhead

Here’s a simple thing that I ran across. I have a vendor that recommended that I set the Maximum memory in my LPARs to the system maximum. That way you never have to reboot to increase the maximum memory in that LPAR. I found out later that setting your LPARs memory to the system maximum makes the hypervisor allocate more memory for overhead.

This is a very old configuration issue, but I just ran across the actual numbers. When the LPAR is activated, the hypervisor allocates 1/64th the LPAR maximum for page frame tables. This is a memory structure that the hypervisor uses to track the memory pages used by the LPAR. So, lets say you have a 128GB managed system with LPARs that only really need 16GB of RAM, but the LPAR’s maximum memory is set to 128GB. By the time you’ve activated your 7th LPAR your using 2GB per LPAR, or 14GB of RAM, just for the hypervisor memory page frame tables.

Disk Cloning With Splitvg

In a recent post, Low-impact database clone with splitvg, Anthony English used the splitvg command to clone a database. I hadn’t thought of the splitvg command since playing with it when it was first announced in the Differences Guide for AIX 5.2 (?). As luck I was building a new LPAR that is a copy of an already existing LPAR. I don’t strictly NEED the files in the filesystems copied to the new LPAR, but I do need the filesystems. But getting the files might save the application analysts some time.  So, I decided to break out the old splitvg command.

Luckily, I had a spare LUN assigned to this LPAR that was available. The first step was to simply extend the VG to the new disk, and run mirrorvg. After everything was synced up, the split was painless and only took a few seconds:

splitvg -y copyvg -i -c 2 cernervg

After that, the new VG shows up:

# lspv
...
hdisk10         00043a1267585862                    cernervg        active
hdisk11         00043a1267650a54                    copyvg          active
...

And, you can look at the LVs with lsvg:

# lsvg -l copyvg
copyvg:
LV NAME             TYPE       LPs     PPs     PVs  LV STATE      MOUNT POINT
fscernerlv          jfs2       8       8       1    closed/syncd  /fs/fs/cerner
fscernerwhlv        jfs2       1       1       1    closed/syncd  /fs/fs/cerner/w_standard
...

For some odd reason, the filesystems were created with the prefix "/fs/fs". They should have been created with just the "/fs" prefix, but I'll fix that later anyway.

Then I did a varyoffvg and exportvg on the source LPAR, presented the LUN to the target LPAR, and ran cfgmgr on the target. After that, the disk showed up, with the same PVID as before:

 # lspv
hdisk0          000439c25388aca6                    rootvg          active
hdisk1          00043a1267650a54                    None

A quick importvg, and we're in business:

# importvg -y cernervg hdisk1

But, the filesystems still have the "/fs/fs" prefix. So, a quick and dirty script cleans that up:

for fs in `lsvg -l cernervg | grep fs | awk '{ print $7 }' | cut -d'/' -f 4-`
do
chfs -m /$fs /fs/fs/$fs
done

And, the LVs still have the "fs" prefix, I could leave them, but my OCD won't let me:

for fs in `lsvg -l cernervg | grep fs | awk '{ print $1 }' | cut -d's' -f 2-`
do
chlv -n $fs fs$fs
done

Then I used "mount -a" to mount all the filesystems. They had to replay the JFS2 logs, but since they didn't have much in the way of writes going on when I ran the splitvg, they were fine.

Overall, it wasn't a bad way to go. The mirrorvg took a while to complete, and fixing the names for the LVs and filesystems took a little work, but not bad. It's better than creating all the filesystems by hand.

If I really wasn't concerned about the data, I could have use the savevg and restvg command to recreate the filesystems onto a blank LUN faster with less effort.

Limit Sendmail Message Size

I recently had a AIX box send a 1.5 GB Email to our MS Exchange Email system, which brought Exchange to a screeching halt. Our Exchange admin was understandably unimpressed. So after a few seconds of research, I found sendmail has a setting to limit the maximum message size. Put this in your sendmail.cf file and restart sendmail:

O MaxMessageSize=50000000

That's in bytes, so that should be 50MB.