You might have heard of the massive September EC2 virtual machine reboot on the Amazon Web Services cloud. Right away it was speculated that this is due to a security bug in Xen, the hypervisor platform used by EC2. On October 1st, the embargo was lifted and the suspicions were confirmed. The bug, XSA-108, can allow a guest VM on the hypervisor to read data related to other guests on the same hypervisor. Other cloud service providers using Xen were affected as well, including RackSpace and SoftLayer.
Jeff Barr from AWS mentions in an update to the announcement: “we completed a reboot of less than 10% of the EC2 fleet…”, claiming that less than 10% of the hosts (or possibly guests) are vulnerable. Having analyzed this snippet and other details that were provided in the announcements, I reached some intriguing conclusions.
Reading the XSA-108 advisory, one can learn that it affects all Xen instances, starting with version 4.1. It is also mentioned that “Running only PV guests will avoid this vulnerability”. A PV guest is a short for pravirtual guest, a guest VM that has a modified kernel, so that it can run on a host without special hardware support while providing better IO performance. The alternative to PV is HVM, hardware assisted VM. Xen is able to use both modes of operation, PV, however, is only available for certain Linux kernels, and Windows OS, for example, can only run in the HVM mode.
If more than 90% of the hosts are not vulnerable then it must mean that those hosts are either running a version lower than 4.0 or that those hosts are guaranteed not to run PV guests. Let’s consider the first explanation. Xen 4.1 was released in March 2011, whereas EC2 is out there since 2008. It might be that Amazon believes in “sticking to what works”, something I can often identify with, and as a result keeps running on a 3 year-old hypervisor version.
According to the second explanation, a large amount of hosts will be dedicated to running only PV guests. We need to be mindful of two factors: a) AWS is switching to HVM and leaving PV behind, and b) Windows hosts are not PV. So having most of your host PV-only, would mean that you need to designate certain hosts for Windows and others for PV supporting OSes, this will also mean that there is a relatively small install base of Windows VMs at EC2 (not surprising) and that the spread of the new HVM enabled instances is not yet broad.
Personally, I would like AWS and other cloud providers to upgrade to newer hypervisors and offer better virtualization features. For example, if EC2 would support live migration, then they could just roll the upgrade and would not need to announce any of this.