Mark Russinovich — Microsoft technical fellow, a lead on the Azure platform and a renowned Windows expert — took pains at PDC ’10 (Watch the “Inside Windows Azure” session here) to lay out a detailed, high-level overview of the Azure platform and what actually happens when users interact with it.
The Azure cloud(s) is (are) built on Microsoft’s definition of commodity infrastructure. It’s “Microsoft Blades,” that is, bespoke OEM blade servers from several manufacturers. It’s probably Dell or HP, just saying, in dense racks. Microsoft containerizes its data centers now and pictures abound; this is only interesting to data center nerds anyway.
For systems managements nerds, here’s a 2006 presentation from Microsoft on the rudiments of shared I/O and blade design.
Azure considers each rack a ‘node’ of compute power and puts a switch on top of it. Each node — servers+top rack switch — is considered a ‘fault domain’ (see glossary, below), i.e., a possible point of failure. An aggregator and load balancers manage groups of nodes, and all feed back to the Fabric Controller (FC), the operational heart of Azure.
The FC gets it’s marching orders from the “Red Dog Front End” (RDFE). RDFE takes its name from nomenclature left over from Dave Cutler’s original Red Dog project that became Azure. The RDFE acts as kind of router for request and traffic to and from the load balancers and Fabric Controller.
Russinovich said that the development team passed an establishment called the “Pink Poodle” while driving one day. Red Dog was deemed more suitable, and Russinovich claims not to know what sort of establishment the Pink Poodle is.
How Azure works
Azure works like this:
|___Aggregators and Load Balancers
The Fabric Controller
The Fabric Controller does all the heavy lifting for Azure. It provisions, stores, delivers, monitors and commands the virtual machines (VMs) that make up Azure. It is a “distributed stateful application distributed across data center nodes and fault domains.”
In English, this means there are a number of Fabric Controller instances running in various racks. One is elected to act as the primary controller. If it fails, another picks up the slack. If the entire FC fails, all of the operations it started, including the nodes, keep running, albeit without much governance until it comes back online. If you start a service on Azure, the FC can fall over entirely and your service is not shut down.
The Fabric Controller automates pretty much everything, including new hardware installs. New blades are configured for PXE and the FC has a PXE boot server in it. It boots a ‘maintenance image,’ which downloads a host operating system (OS) that includes all the parts necessary to make it an Azure host machine. Sysprep is run, the system is rebooted as a unique machine and the FC sucks it into the fold.
The Fabric Controller is a modified Windows Server 2008 OS, as are the host OS and the standard pre-configured Web and Worker Role instances.
What happens when you ask for a Role
The FC has two primary objectives: to satisfy user requests and policies and to optimize and simplify deployment. It does all of this automatically, “learning as it goes” about the state of the data center, Russinovich said.
Log into Azure and ask for a new “Web Role” instance and what happens? The portal takes your request to the RDFE. The RDFE asks the Fabric Controller for the same, based on the parameters you set and your location, proximity, etc. The Fabric Controller scans the available nodes and looks for (in the standard case) two nodes that do not share a Fault Domain, and are thus fault-tolerant.
This could be two racks right next to each other. Russinovich said that FC considers network proximity and available connectivity as factors in optimizing performance. Azure is unlikely to pick nodes in two different facilities unless necessary or specified.
Fabric Controller, having found its juicy young nodes bursting with unused capacity, then puts the role-defining files at the host. The host OS creates the requested virtual machines and three Virtual Hard Drives (VHDs) (count ’em, three!): a stock ‘differencing’ VHD (D:\) for the OS image, a ‘resource’ VHD (C:\) for user temporary files and a Role VHD (next available drive letter), for role specific files. The host agent starts the VM and away we go.
The load balancers, interestingly, do nothing until the instance receives its first external HTTP communication (GET); only then is the instance routed to an external endpoint and live to the network.
The Platform as a Service part
Why so complicated? Well, it’s a) Windows and b) the point is to automate maintenance and stuff. The regular updates that Windows Azure systems undergoes — same as (within the specifications of what is running) the rest of the Windows world — happen typically about once a month and require restarting the VMs.
Now for the fun part: Azure requires two instances running to enjoy its 99.9% uptime service-level agreement (SLA), and that’s one reason why. Microsoft essentially enforces a high-availability, uninterrupted fault tolerance fire drill every time the instances are updated. Minor updates and changes to configuration do not require restarts, but what Russinovich called ‘VIP swaps’ do.
Obviously, this needs to be done in such a way that the user doesn’t skip a beat. A complicated hopscotch takes place as updates are installed to the resource VHD. One instance is shut down and the resource VHD updated, then the other one. The differencing VHDa makes sure new data that comes into the Azure service is retained and synced as each VM reboots.
Virtualization and security
What is it running on, we asked? Head scratching ensued for many moons as Microsoft pushed Hyper-V to customers but claimed Azure was not compatible or interoperable with Hyper-V.
It is, in fact, a fork of Hyper-V. Russinovich said it was basically tailored from the ground up for the hardware layout that Microsoft uses, same as the Azure OSes.
Russinovich said that the virtual machine is the security boundary for Azure. At the hypervisor level, the host agents on each physical machine are trusted. The Fabric Controller OSes are trusted. The guest agent- the part the user controls—is not trusted. The VMs communicate only through the load balancers and the public (user’s endpoint) IP and back down again.
Some clever security person may now appear and make fun of this scheme, but that’s not my job.
The Fabric Controller handles network security and Hyper-V uses machine state registries (MSRs) to verify basic machine integrity. That’s not incredibly rich detail, but its more than you knew five minutes ago and I guarantee its more than you know about how Amazon secures Xen. Here’s a little more on Hyper-V security.
New additions to Azure, like full admin rights on VMs (aka elevated privileges) justify this approach, Russinovich said. “We know for a fact we have to rely on this [model] for security,” he said.
Everyone feel safe and cozy? New user-built VM Roles are implemented a little differently
Azure now offers users the ability to craft their own Windows images and run them on Microsoft’s cloud. These VM Roles are built by you (sysprep recommended) and uploaded to your blob storage. When you create a service around your custom VMs and start the instances, Fabric Controller takes pains to redundantly ensure redundancy. It makes a shadow copy of your file, caches that shadow copy (in the VHD cacher, of course) and then creates the three VHDs seen above for each VM needed. From there, you’re on your own; Microsoft does not consider having to perform your own patches an asset in Azure.
A healthy host is a happy host
Azure uses heartbeats to measure instance health: It simply pings the Fabric Controller every few seconds and that’s that. Here again, fault tolerance is in play. You have two instances running (if you’re doing it right. Azure will let you run one, but then you don’t get the SLA). If one fails, the heartbeat times out, the differencing VHD on the other VM starts ticking over and Azure restarts the faulty VM, or recreates the configuration somewhere else. Then changes are synced and you’re back in business.
Do not end these processes
Now that we have the ability to RDP into our Azure Roles and monkey around, Russinovich helpfully explains that the processes Azure runs within the VM are WaAppHost.exe (Worker Role), WaWebHost.exe (Web Role), clouddrivesvc.exe (All Roles) and a handful of others, a special w3wp.exe for IIS configuration and so forth. All of these were previously restricted from user access but can be accessed via the new admin privileges.
Many of the features set out here are in development and beta but are promised to the end user soon. Russinovich noted that the operations outlined here still could change significantly. At any rate, his PDC session provided a fascinating look into how a cloud can operate, and it’s approximately eleventy bajillion percent more than I (or anyone else, for that matter) know about how Amazon Web Services or Google App Engine works.
Azure : Microsoft’s cloud infrastructure platform
Fabric Controller: A set of modified virtual Windows Server 2008 images running across Azure that control provisioning and management
Fault Domain: A set of resources within an Azure data center that are considered non-fault tolerant and a discrete unit, like a single rack of servers. A Service by default splits virtual instances across at least two Fault Domains.
Role: Microsoft’s name for a specific configuration of Azure virtual machine. The terminology is from Hyper-V.
Service: Azure lets users run Services, which then run virtual machine instances in a few pre-configured types, like Web or Worker Roles. A Service is a batch of instances that are all governed by the Service parameters and policy.
Web Role: An instance pre-configured to run Microsoft’s Web server technology Internet Information Services (IIS)
Worker Role: An instance configured not to run IIS but instead to run applications developed and/or uploaded to the VM by the end user
VM Role: User-created, unsupported Windows Server 2008 virtual machine images that are uploaded by the user and controlled through the user portal. Unlike Web and Worker Roles, these are not updated and maintained automatically by Azure.