Why I Don’t Care About AIX I/O Wait, And You Probably Shouldn’t Either

I don’t care about I/O Wait time in AIX, at least not a lot. But, I can’t seem to get through to my ex-technical manager or coworkers that I/O Wait largely doesn’t matter.

The thinking goes that IO wait time reported by NMON or Topas is time that the CPU couldn’t do anything else because there are I/Os that aren’t getting satisfied. But, the systems in question don’t have any I/O load. They’re middleware servers that take requests from the clients, do some processing on them, then query the database for the data, and then the process happens in reverse. There’s no real disk I/O going on at all, in fact the disk I/O only spikes to about 20MB/s ( on an enterprise class SAN) when a handful of reports are written to disk at the top of the hour. Really, there’s not a lot of network I/O going on either, maybe a couple of MB/s on a 1Gb network.

So, to dispel the confusion about the disk I/O bottleneck, we run some disk I/O benchmarks on the SAN. The results come back across the board several hundred MB/s with a respectable number of IOPS, way way more than the system generates during normal operations. And, the bottleneck is probably in the HBA adapters and system bus.

So, what’s going on? Well I/O wait reported by the system is badly named. It’s not time that the system is waiting at all. Kind of an over-simplification would be that I/O wait time is time that the CPU had nothing to do (idle) and there was an I/O in the I/O queue. You can generate high I/O by having a system with nothing going on, but a trickle of I/O activity. Conversely, if you want to reduce I/O on the same system, pile on more CPU work load. There will be less time that the CPU is “idle”, but with the same I/O load the I/O wait time will be less. So, really, you can almost ignore I/O wait time.

So, how do I know if my system is bogged down by I/O? I like vmstat and NMON. Look at the disk adapter I/O in NMON. Knowing your underlying disk architecture, is it a lot of I/O both in terms of MB/s and IOPS? That tells you how busy your HBAs are. To see how much CPU time is being actually held up by I/O wait, look at vmstat. I run it with 2 or 3 second intervals, check the man page for more info.

 # vmstat -Iwt 2

System configuration: lcpu=16 mem=95488MB

   kthr            memory                         page                       faults           cpu       time
----------- --------------------- ------------------------------------ ------------------ ----------- --------
  r   b   p        avm        fre    fi    fo    pi    po    fr     sr    in     sy    cs us sy id wa hr mi se
  3   0   2   16182808     152244     0    70     0     0     0      0  4324  45931 20125 26  4 60 10 12:18:43
  3   0   1   16182831     152220     0    16     0     0     0      0  3509  37090 17622 22  3 67  8 12:18:45

You can see, I have a 16 lcpu machine (Power6 with SMT on, so it's really 8 CPUs), with a runq of 3. The runq shows you the number of runnable threads, that's how many threads are eligible to use a CPU timeslice. With that machine, if the runq is 8 or less, your golden. If it's 16, eh, it's okay. Over that and there are threads waiting on CPUs to finish processing. The "b" column is threads blocked from running because of I/Os. If you're doing FS I/O, that's the column to watch. You want that as low as possible. This system is a DB server doing RAW I/O, the "p" column is just like the "b" column but for raw I/O. For an enterprise DB server I don't get too excited over a few in the "p" column.

How do you reduce the "b" or "p" column? For FS I/O, there are some VMM tuning options to try, but it depends on your I/O characteristics. For both FS and raw I/O, spread the I/O across more drive heads (maybe you have a couple of hot LUNs or can't push the IOPs you need), install more disk cache (or DB SGA), possibly install more HBAs if you're really moving a lot of data.

Leave a Reply

Your email address will not be published. Required fields are marked *