Static DHCP IPs with KVM Virtualization

When building a virtualization lab system, I’ve found that I want static IPs assigned to my guests. You could just assign static IPs in the guest OS, but then you should document what IPs are in use for what hosts. It would be easier to just assign static IP entries in the DHCP server. There doesn’t seem to be a straight-forward way to get this done.

What I’ve found works is to destroy the network, edit it directly, and then restart it.

[root@m77413 libvirt]# virsh -c qemu:///system net-destroy default
Network default destroyed

[root@m77413 libvirt]# virsh -c qemu:///system net-edit default
Network default XML configuration edited.

[root@m77413 libvirt]# virsh -c qemu:///system net-start default
Network default started

The xml file entries should look like:

  <ip address='192.168.122.1' netmask='255.255.255.0'>
    <dhcp>
      <range start='192.168.122.2' end='192.168.122.254' />
      <host mac='52:54:00:10:6e:17' name='cent-install.test' ip='192.168.122.2' />
      <host mac='52:54:00:ab:10:2a' name='cent-netserver.test' ip='192.168.122.3' />
      <host mac='52:54:00:df:47:95' name='install.test' ip='192.168.122.10' />
    </dhcp>
  </ip>

VMWare Datastore Sizing and Locking

I had a recent discussion with a teammate about VMWare datastores. We are using thin provisioning on a ESXi 4.1 installation backed by IBM XIV storage.

In our previous installation we ran ESX 3.X backed by DS4000 disk. What we found out is that VMs grow like weeds and our datastores quickly filled up. This admin just resized the datastores and we went on our way. A technical VMWare rep afterward mentioned that while it is supported, adding extents to VMFS datastores isn’t necessarily best practice.

When we laid down our new datastores, I wanted to avoid adding extents, so made the LUNs 1 TB. That’s as big as I dared to avoid using extents in datastores, but is probably too big for our little installation.

I noticed that our datastores were getting to about 90% utilized, so I added a new LUN and datastore. When I mentioned in our team meeting that I had added a datastore we had a somewhat heated discussion. My teammate really wanted to resize the LUN and add extents to the datastore. I pointed out that I didn’t think that was the best practice and 3 or 4 datastores isn’t really a lot to manage.

So, why not just use one datastore per storage array? The big argument seems to be that people add a second LUN, then extend the datastore to the new LUN. The down-side of this is that if one LUN goes off-line, all the associated data is unavailable. VMWare will try to keep all the data for each VM on one extent, but it’s not always successful. So, if one LUN goes offline, best case is only some of your VMs are affected. Less ideally, they lose part of their data and more VMs are impacted or are running in a state where some of the storage isn’t available. Or, if the failed LUN is the first LUN (the Master Extent), the whole datastore goes offline. At least the architecture allows for a datastore to survive losing an extent under ideal circumstances.

What’s less apparent is the performance hit of multiple VMs and ESX hosts accessing one large LUN. With a lot of VMs generating I/Os you can exceed the disk queues, which default to 32 operations per LUN, for the datastore. Adding more LUNs to the datastore DOES increase the number of queue slots for the whole datastore. And that would be a good thing, assuming the data is equally distributed across all the LUNs, which is not going to be the case.

And, similar to inode locking in a filesystem, shared storage has to contend with volume locking. Multiple systems can read from the same LUN with no problem. But when a write occurs, the volume is locked by one host until the write is committed. Any other host trying to do a write gets a signal that there is a lock and has to wait for the lock to be released. On modern disk arrays, with write caching, this should be very fast; but it’s not ideal.

So, to avoid write locking you can try to keep all your servers on one datastore. But, that’s not really practical long-term as VMs get migrated between hosts. Or, you can minimize the number of VMs that are using each datastore. In addition to keeping the number of VMs/datastore low, a strategy to consider is to mix heavy I/O VMs with VMs that have low I/O requirements; which will help manage the queue depth for each LUN.

How many VMs is too many per datastore? Depends on your environment. I’ve seen recommendations ranging from 12 to 30. If you have a lot of static web servers that don’t do any real writes, you can get away with lots. If you have Oracle or MS SQL servers that do a lot of I/O, including writes, keep the numbers low. You can log into the ESX host and run exstop and hit “u”. There are lots of interesting fields in here. CMDS/s, READS/s, WRITES/s, and so on. Check the QUED field to see the current number of queu slots in use.

A good rundown on this is Mastering VMware VSphere 4. Recommendations from the book: single extend VMFS datastores per LUN, don’t add extents just because you don’t want to manage another datastore, but go ahead and span a VMFS datastore if you need really high I/O or really big datastores.

I have another take on it. Always use one LUN per datastore. The argument that datastores backed my multiple LUNs give better performance is a little flawed because VMWare tries to allocate all the storage associated with one VM on one extent. If you need high I/O assign a LUN from each datastore, then separate the data logically on the VM. You get to leverage more disk queue slots by bringing in more LUNs per VM, the datastores are a single LUN which is easy to manage and maintain, and LUN locking is less of an issue with smaller datastores. And, while you do end up with more datastores, it’s not that big of a deal to manage.

The down-side, and there usually is one, is that you’re back to relying on more pieces that could go wrong. If you spread the data across multiple datastores, and a datastore goes offline, that VM is impacted. It’s really about the same exposure you have with using multiple LUNs per datastore. If the LUN your data is on goes down, your data is unavailable. So plan your DR and availability schemes accordingly.

TSM Full-VM Image Backup With VStore API

As a request from one of our VMWare admins, I’ve started testing the VMWare image backups. In the past, we’ve installed the regular backup/archive client on each VMWare guest OS. This has let us standardize our install regardless of if it’s a virtual or physical server. This doesn’t allow you to take a full snapshot of a VM, instead you have to rely on your baremetal recover procedures just as if it was a physical server.

Unfortunately, an application admin messed up a server, almost beyond repair. If the VMWare admins had been made aware of the work, they could have generated a snapshot just before the work started, and recovering would have been quick and simple.

Starting in the 6.2.2 version of the Windows client TSM supports full image backups using the VStorage API. I’ve heard people complain there isn’t a lot of documentation on this, so I thought I would write about my testing.

VMWare image backups have been in TSM for a while, but previous versions have relied on VMWare Consolidated Backup (VCB), which VMWare is withdrawing support for. The VCB functionality is still in the client, so you can upgrade to the latest version if you are already using image backups with VCB.

Like some other similar products, you will need a windows server to do the backups. This does not do a proxy backup as the file level backups using the VStorage API, but saves the files in TSM under the node doing the backups. I used the VCenter server because it was handy, though I don’t see why you couldn’t use another Windows box. My VCenter server isn’t super powerful, but it has plenty of power to spare for just running VCenter, and should be adequate to use for backups of our VMWare cluster. Here are the specs:
Intel Xeon E5530 2.4 GHz
4 GB RAM
Windows Server 2008 R2 Standard
2 – 140GB SCSI drives (mirrored)
2 – 1GB NICs (no etherchannel)

Setup could not be much more simple. In the client options, there is a “VM Backup” tab with several options. First, you must select if you want to do a VMWare File Level backup (does a proxy backup for Windows guests), VMWare Full VM backup (either using VCB or VStorage), or a Hyper-V Full VM Backup (for Microsoft Hyper-V users).

Next, you can specify doing a “Domain Full VM” or a “Domain File Level VM” backup. Domain Full VM backups are image backups of the whole VM. Domain File Level backups seem to be proxy node backups of Windows boxes. I don’t know why there are two options to specify this in the GUI, as they seem to be mutually exclusive. In the same section, you can list VMs to backup, use pre-defined groups like “ALL-VM” or “ALL-WINDOWS”, or specify folders of VMs to backup if you’ve broken down your VMs into folders.

Next where you specify your VMWare VCenter or ESX server login information. You’ll probably want to specify a role account for this. In testing I just used my regular VCenter login.

And, finally, you can tell the client to use a specific management class for your image backups. I made a new management class called VM_IMAGES. Expiration and retention seem to work as expected. You must also specify if you’re using the VStorage API or VCB.

Here’s a screenshot:

I turned off client deduplication and compression for my testing. I’ll test it’s effects later. I found that the test VM images grew in size when compressed, and enabling compression and deduplication cut the transfer speed in half. There is a note in the documentation that says that compression can only be used with the VStore Full VM backup if it’s being saved to a deduplication enabled storage pool.

There are a couple of other things to note, the VCB backup feature uses a staging area on a datastore to copy the files to before backing them up. The new VStorage API backup doesn’t. This may make it somewhat faster, and in my testing the disk usage did not increase during backups. And, even cooler, the VStorage API uses the changed block tracking feature of ESX 4 and later to only backup the occupied space. So, if you have a VM with a 100GB volume and only 4GB of data, it’s not going to backup the whole 100GB.

Lets do a backup! In the GUI, you can select Actions -> Backup VM which opens a GUI selection box. It’s a lot like the file backup GUI, just select the VM:

When you hit the backup button, the same status window used for file backups opens and everything progresses as usual. You can also use the commandline:

tsm> backup vm DCFM -vmfulltype=vstor -vmmc=VM_IMAGES
Full BACKUP VM of virtual machines 'DCFM'.

Backup VM command started.  Total number of virtual machines to process: 1

Backup of Virtual Machine 'DCFM' started

Starting Full VM backup of Virtual Machine 'DCFM'

Backing up Full VM configuration information for 'DCFM'
          4,086 VM Configuration [Sent]
Processing snapshot disk [OpsXIV2 Datastore 1] DCFM/DCFM.vmdk (Hard Disk 1), Capacity: 17,179,869,184, Bytes to Send: 15,998,124,032  (nbdssl)
Volume --> 15,998,124,032 Hard Disk 1
Backup processing of 'DCFM' finished without failure.

Total number of objects inspected:        1
Total number of objects backed up:        1
...
Total number of bytes inspected:     14.89 GB
Total number of bytes transferred:   14.89 GB
Data transfer time:                1,075.59 sec
Network data transfer rate:        14,525.08 KB/sec
Aggregate data transfer rate:      14,487.67 KB/sec
Objects compressed by:                    0%
Total data reduction ratio:            0.00%
Subfile objects reduced by:               0%
Elapsed processing time:           00:17:58

Successful Full VM backup of Virtual Machine 'DCFM'

Unmount virtual machine disk on backup proxy for VM 'DCFM (nbdssl)'
Deleted directory C:\Users\pvaughan\AppData\Local\Temp\2\vmware-pvaughan\421a7a97-64a4-79d1-b854-c9b30ea6dca7-vm-65\san
Deleted directory C:\Users\pvaughan\AppData\Local\Temp\2\vmware-pvaughan\421a7a97-64a4-79d1-b854-c9b30ea6dca7-vm-65\nbdssl

Backup VM command complete
Total number of virtual machines backed up successfully: 1
  virtual machine DCFM backed up to nodename MMC-OPSVC1
Total number of virtual machines failed: 0
Total number of virtual machines processed: 1

When the backup starts the TSM client creates a new snapshot of the VM. When the backup is finished (or interrupted) the snapshot is dropped. The disk images of that snapshot is what you end up with. The up-shot of this is that the disks are probably in pretty good shape (fsck ran clean for me) and you can get a good backup of a running VM. Here’s a screenshot taken during the backup:

Backups are cool and all, but what about restores? The restore process is pretty simple too. Just select Actions -> Restore VM from the GUI. A familiar window pops open, and you select the VM backup to restore:

When you select Restore another window opens. This allows you to restore back to the original VM image (boring), or specify a new VM. This MAY be useful for cloning a VM between VM Clusters if you can’t use the VM Converter tool. If you want to make a new VM, just specify the new name, the Datacenter, ESX host, and Datastore:

The client will reach out and create a new VM exactly like the original, and then restore the files for it. I successfully cloned my test VM while it was running the first try. And, the restored VM booted without incident.