Securing OpenSSH

I was recently researching the latest guidance on securing OpenSSH and came across a web page on a popular site espousing that the easiest way to protect OpenSSH is to define a login banner. While a login banner is useful, especially in a enterprise setting, it’s useless for securing SSH. So, here is my recipe for securing OpenSSH. While testing these, ALWAYS keep a connection open. It’s very easy to break something and if you don’t already have an open connection, you will have successfully locked yourself out.

  • Change the SSH port. I’m using 8022 in this example, you can use any port you like. This may not be practical in every setting and it’s of marginal value because SSH will report what it is when you connect. But, unless someone is doing an exhaustive search of the open ports on your server, they probably won’t find your open SSH port. Assuming your have selinux enabled, which I recommend, you must first allow SSH to use the target port number. First, review the current selinux context for the port:
    # semanage port -l | grep 8022
    oa_system_port_t tcp 8022
    

    You can see, this port is already in use by oa-system. We aren’t using oa-system and nothing else is using port 8022, so we can modify the context to allow ssh to use it:

    semanage port -m -t ssh_port_t -p tcp 8022
    

    Now when we look, we see that both contexts are applied:

    semanage port -l | grep 8022
    oa_system_port_t tcp 8022
    ssh_port_t tcp 8022, 22
    

    Now, you just need to tell SSH to use the above port by changing the Port line in /etc/ssh/shsd_config:

    Port 8022
    
  • Disable IPv6 if your not using it. Some people think this is stupid because they’re not using IPv6 yet. My take on it is that if your not using IPv6 you’re probably not watching the security of it. For instance if you’re setting IPv4 firewalls, but ignore IPv6 and leave IPv6 addresses enabled on your server, an attacker can probably connect to your server. If you’re not using it, just turn it off to lower the attack surface of your servers. In /etc/ssh/sshd_config change the AddressFamily line:
    AddressFamily inet
  • Set the address SSH listens on. By default OpenSSH will listen to every IP address. You may not want this if you have multiple IP addresses defined on your server, again reducing the attack surface. To do this, update the ListenAddress line with your IP address, you can specify multiple lines:
    ListenAddress 192.168.1.1
  • Set the SSH protocol to version 2 only. On any modern version of OpenSSH, this is the default, but I specify this anyway. Version 1 has been deprecated for years. Uncomment the Protocol line in /etc/ssh/sshd_config:
    Protocol 2
    
  • Disable week keys. You should disable any version1 or DSA keys. This leaves RSA, ECDSA and ED25519 keys enabled. To do this review any HostKey lines in /etc/ssh/sshd_config and comment out ssh_host_key or ssh_host_dsa_key:
    #HostKey /etc/ssh/ssh_host_key
    #HostKey /etc/ssh/ssh_host_dsa_key

    While you’re looking at the, review the remaining HostKey directives. You can verify the number of bits in the key and the encryption cipher by running this command against them:

    ssh-keygen -lf FILENAME
  • Disable weak ciphers. We’ll want to remove older weak ciphers. You’ll want to test this before rolling it out widely. I’ve found some very old SSH and SCP/SFTP clients don’t support some of the newer ciphers. Update or add these lines to /etc/ssh/sshd_config:
    Ciphers aes256-ctr,aes192-ctr,aes128-ctr,arcfour256
    MACs hmac-sha2-256,hmac-sha2-256-etm@openssh.com,hmac-md5-etm@openssh.com,hmac-sha1-etm@openssh.com

    Note: these are valid for RHEL/Centos 7, refer to the sshd_config man page for a list of ciphers valid for your specific version.

  • Validate logging. Check the SyslogFacility and LogLevel lines in /etc/ssh/sshd_config and verify that you are logging those in syslog.
  • Disable root logins. Don’t allow users to login as root directly. Users should ideally login with their own IDs and then run whatever they need to run as root with sudo. To disable root logins uncomment or update the PermitRootLogin line:
    PermitRootLogin no
  • Set the maximum login attempts. By default users get 6 attempts to get their password right. Your organization may set this lower. Once the user has used half their attempts, the remaining attempts are logged. To change this, update the MaxAuthTries setting.
  • Set strict file permission checking, this is the default. Leaving StrictModes set to “yes” in /etc/ssh/sshd_config tells OpenSSH to only use files that have restrictive permissions. This ensures that other users can’t modify or access ssh key files.
  • Enable Public Key Authentication, this is the default. Leaving PubkeyAuthentication set to yes allows us to use public/private keys to authenticate. It’s recommended to use key authentication, with a password on the key. This is analogous to two factor authentication, you must have the private key and know the password.
  • Disable host based authentication.Host based authentication works like the old rhosts or hosts.equiv method. It means that if the login would be permitted by $HOME/.rhosts, $HOME/.shosts, /etc/hosts.equiv, or /etc/shosts.equiv, and the server can verify the client’s host key (see /etc/ssh/ssh_known_hosts), then the login is permitted without a password. This is only slightly more secure than rsh because is prevents IP spoofing and DNS spoofing but are still terribly insecure and are specifically disallowed by most organizations. To disallow this feature, set these options in /etc/ssh/sshd_config:
    RhostsRSAAuthentication no
    Hostbasedauthentication no
    IgnoreRhosts yes

    If you really need password-less authentication, most security policies will allow you to create a role or service account and setup user private key authentication.

  • Disallow empty passwords. If a user ID has a blank password, we don’t want that user ID to be able to login. Set PermitEmptyPasswords in /etc/ssh/sshd_config:
    PermitEmptyPasswords no
  • Disallow password authentication. This may not be possible in all environments. If possible, we dis-allow password logins, forcing users to use encryption keys. To do this, disable these options in /etc/ssh/sshd_config:
    PasswordAuthentication no
    ChallengeResponseAuthentication no
    
  • Set pre-login banner. Yes, I set this option. It’s informational to the person connecting and required by a lot of corporate standards. The pre-login banner usually contains a warning that you’re connecting to a private system, that the access is logged, and a warning not to login if that’s a problem. Update /etc/issue.net with the text you want to display. By default on a lot of systems it shows the kernel version, giving an attacker a little more information. Once you’ve updated your /etc/issue.net file, update the Banner line in /etc/ssh/sshd_config:
    Banner /etc/issue.net

    You may also want to update /etc/motd. This file is displayed AFTER the user logs in. I usually put in some banner, usually including the hostname and if the system is test or production. There are a lot of times that something as simple as seeing the hostname after the login keeps people from doing something they shouldn’t. I usually include the hostname in the prompt as well. And I can’t tell you the number of times someone has said “oh, I thought this was the TEST system” right after doing something stupid to their production server. This heads off a lot of those issues.

Once you have everything the way you want it, restart sshd and test:

service sshd restart

VMWare Datastore Sizing and Locking

I had a recent discussion with a teammate about VMWare datastores. We are using thin provisioning on a ESXi 4.1 installation backed by IBM XIV storage.

In our previous installation we ran ESX 3.X backed by DS4000 disk. What we found out is that VMs grow like weeds and our datastores quickly filled up. This admin just resized the datastores and we went on our way. A technical VMWare rep afterward mentioned that while it is supported, adding extents to VMFS datastores isn’t necessarily best practice.

When we laid down our new datastores, I wanted to avoid adding extents, so made the LUNs 1 TB. That’s as big as I dared to avoid using extents in datastores, but is probably too big for our little installation.

I noticed that our datastores were getting to about 90% utilized, so I added a new LUN and datastore. When I mentioned in our team meeting that I had added a datastore we had a somewhat heated discussion. My teammate really wanted to resize the LUN and add extents to the datastore. I pointed out that I didn’t think that was the best practice and 3 or 4 datastores isn’t really a lot to manage.

So, why not just use one datastore per storage array? The big argument seems to be that people add a second LUN, then extend the datastore to the new LUN. The down-side of this is that if one LUN goes off-line, all the associated data is unavailable. VMWare will try to keep all the data for each VM on one extent, but it’s not always successful. So, if one LUN goes offline, best case is only some of your VMs are affected. Less ideally, they lose part of their data and more VMs are impacted or are running in a state where some of the storage isn’t available. Or, if the failed LUN is the first LUN (the Master Extent), the whole datastore goes offline. At least the architecture allows for a datastore to survive losing an extent under ideal circumstances.

What’s less apparent is the performance hit of multiple VMs and ESX hosts accessing one large LUN. With a lot of VMs generating I/Os you can exceed the disk queues, which default to 32 operations per LUN, for the datastore. Adding more LUNs to the datastore DOES increase the number of queue slots for the whole datastore. And that would be a good thing, assuming the data is equally distributed across all the LUNs, which is not going to be the case.

And, similar to inode locking in a filesystem, shared storage has to contend with volume locking. Multiple systems can read from the same LUN with no problem. But when a write occurs, the volume is locked by one host until the write is committed. Any other host trying to do a write gets a signal that there is a lock and has to wait for the lock to be released. On modern disk arrays, with write caching, this should be very fast; but it’s not ideal.

So, to avoid write locking you can try to keep all your servers on one datastore. But, that’s not really practical long-term as VMs get migrated between hosts. Or, you can minimize the number of VMs that are using each datastore. In addition to keeping the number of VMs/datastore low, a strategy to consider is to mix heavy I/O VMs with VMs that have low I/O requirements; which will help manage the queue depth for each LUN.

How many VMs is too many per datastore? Depends on your environment. I’ve seen recommendations ranging from 12 to 30. If you have a lot of static web servers that don’t do any real writes, you can get away with lots. If you have Oracle or MS SQL servers that do a lot of I/O, including writes, keep the numbers low. You can log into the ESX host and run exstop and hit “u”. There are lots of interesting fields in here. CMDS/s, READS/s, WRITES/s, and so on. Check the QUED field to see the current number of queu slots in use.

A good rundown on this is Mastering VMware VSphere 4. Recommendations from the book: single extend VMFS datastores per LUN, don’t add extents just because you don’t want to manage another datastore, but go ahead and span a VMFS datastore if you need really high I/O or really big datastores.

I have another take on it. Always use one LUN per datastore. The argument that datastores backed my multiple LUNs give better performance is a little flawed because VMWare tries to allocate all the storage associated with one VM on one extent. If you need high I/O assign a LUN from each datastore, then separate the data logically on the VM. You get to leverage more disk queue slots by bringing in more LUNs per VM, the datastores are a single LUN which is easy to manage and maintain, and LUN locking is less of an issue with smaller datastores. And, while you do end up with more datastores, it’s not that big of a deal to manage.

The down-side, and there usually is one, is that you’re back to relying on more pieces that could go wrong. If you spread the data across multiple datastores, and a datastore goes offline, that VM is impacted. It’s really about the same exposure you have with using multiple LUNs per datastore. If the LUN your data is on goes down, your data is unavailable. So plan your DR and availability schemes accordingly.