In a recent post, Low-impact database clone with splitvg, Anthony English used the splitvg command to clone a database. I hadn’t thought of the splitvg command since playing with it when it was first announced in the Differences Guide for AIX 5.2 (?). As luck I was building a new LPAR that is a copy of an already existing LPAR. I don’t strictly NEED the files in the filesystems copied to the new LPAR, but I do need the filesystems. But getting the files might save the application analysts some time. So, I decided to break out the old splitvg command.
After upgrading to AIX 6.1, or a fresh install, you need to add a few filesets:
idsldap.cltbase61.rte idsldap.cltbase61.adt idsldap.cltbase61.rte idsldap.clt32bit61.rte idsldap.clt32bit61.rte
I recently had a AIX box send a 1.5 GB Email to our MS Exchange Email system, which brought Exchange to a screeching halt. Our Exchange admin was understandably unimpressed. So after a few seconds of research, I found sendmail has a setting to limit the maximum message size. Put this in your sendmail.cf file and restart sendmail:
That's in bytes, so that should be 50MB.
I setup VIO servers so rarely, that it’s really easy to miss a step. These are some of the tuning commands I use when setting up VIO. Set these on the VIO server.
Set the FC adapter error recovery policy to fast_fail:
chdev -dev fscsi0 -attr fc_err_recov=fast_fail -perm
Enable dynamic tracking on the FC adapter:
chdev -dev fscsi0 -attr dyntrk=yes -perm
Set the FC adapter link type to point-to-point:
chdev -dev fcs0 -attr init_link=pt2pt -perm
I haven’t found a need to tweak the num_cmd_elems, lg_term_dma, or max_xfer_size with 4Gb Fibre Channel.
For each disk to be used as a VTD, disable SCSI reservations:
The memory collection script for Munin lists used, free, pinned, and swap in a stacked graph. The problem is that the “used” graph is total used memory, which includes pinned, computational RAM, and filesystem cache. So, the pinned RAM is double-counted. And, to me, it’s very important to know how much RAM is used by filesystem cache. With the default script a 64GB system with 16GB of pinned memory, 16GB of computational memory, and 32GB of filesystem page looked like it had 80GB of RAM and was suffering from memory exhaustion. When in reality it’s a 64GB system with only 50% of the memory used by the OS and user programs.
I’ve had a “capacity planning” system in place for a few years that we wrote in-house. We’ve been running nmon every 5 minutes, pulling all kinds of statistics out, and populating RRD files on a central server. It allows us to see trends and patterns that provides data for planning purchases. It’s not really a tactical monitoring system, more a strategic planning system. But, there are a couple of problems
First, it spawns a nmon process every 5 minutes. nmon has a annoying habit of touching the volume groups on startup. If there is a VG modification running, nmon can’t get the info it wants. So if you run a long process, say a reorgvg, the nmon processes just stack up.
DRAFT – I’m still working on it
You may have noticed that in the “vmstat -s” output there are several counters related to I/O.
... 344270755 start I/Os 16755196 iodones ...
Here’s what the man page has to say about these counters:
I don’t care about I/O Wait time in AIX, at least not a lot. But, I can’t seem to get through to my ex-technical manager or coworkers that I/O Wait largely doesn’t matter.
The thinking goes that IO wait time reported by NMON or Topas is time that the CPU couldn’t do anything else because there are I/Os that aren’t getting satisfied. But, the systems in question don’t have any I/O load. They’re middleware servers that take requests from the clients, do some processing on them, then query the database for the data, and then the process happens in reverse. There’s no real disk I/O going on at all, in fact the disk I/O only spikes to about 20MB/s ( on an enterprise class SAN) when a handful of reports are written to disk at the top of the hour. Really, there’s not a lot of network I/O going on either, maybe a couple of MB/s on a 1Gb network.
After upgrading to AIX 6.1, you may notice that some terminal behaviors have changed. SMIT loses it’s pretty boarders when commands execute. When you leave commands like more or vi, the screen clears. And, some function keys may not work.
The problem is that IBM decided to change their TERMINFO file for xterm. If you want the old xterm file, just copy /usr/share/lib/terminfo/x/xterm from any AIX 5.3 system. You can also try setting your TERM type to xterm-old or xterm-r5, but copying the old file works better for me.