What is live migration?
Live migration refers to the process of moving a virtual machine (VM) running on one physical host to another host without disrupting normal operations or causing any downtime or other adverse effects for the end user.
Live migration is considered a major step in virtualization. By allowing an entire VM to be moved with a running operating system (OS), live migration helps accommodate low-disruption fault management, load balancing and low-level system maintenance.
Understanding live migration
Live migration is usually performed when a physical host machine (computer or server) needs maintenance or an update, or when a VM must be switched to a different host. The process transfers the VM memory, network connectivity and storage. Most of the migration occurs while the OS continues to run.
The process allows a clean separation between hardware and software with a separation of concerns between the users and operator of a data center or cluster. For these reasons, live OS migration is particularly useful for cluster administrators.
With live migration, admins can consolidate clustered hardware into a single coherent management domain. If they need to remove a particular physical machine from service for maintenance, they may migrate OS instances (including applications) to one or more alternative machines, freeing the original machine. Similarly, when hosts are congested, they may rearrange OS instances across machines in a cluster to relieve the load. In either situation, the combination of virtualization and migration eases systems management for the cluster admin.
The live migration process starts by transferring the data in the VM's memory to the target physical machine. Once all the data is transferred, an "operational resource state" consisting of CPU, memory and storage is created on the target machine.
After this, the original VM -- along with its installed applications -- is suspended, copied and initiated on the destination. This entire process causes minimal downtime. Although it's not possible to completely avoid downtime, it can be further reduced with pre-paging and by using the memory's probability density function.
Live migration benefits
Live migration offers several benefits for administrators of data centers and clusters.
Migrating an entire OS and all its applications as one unit can eliminate many of the difficulties involved in process-level migration approaches. This method eliminates the issue of residual dependencies that require the original host machine to remain available and network-accessible to service memory accesses or system calls on behalf of migrated processes.
Migrating at the entire VM level also means that its in-memory state can be transferred consistently and efficiently. This applies to both the kernel internal state and application level state.
Live migration supports more efficient load balancing, so systems and CPU resources can be shared for optimum use. It also allows applications to continue running while the administrator manages maintenance activities, such as security updates, in the background.
Users can control the software and services they want to run within their VM without providing the operator with any OS-level access. Moreover, the system remains active even if any hardware such as the CPU, network interface card or memory stops working. If the system crashes completely or the live migration fails, it will crash the VM, log a host error and automatically restart the machine.
Finally, live migration minimizes system downtime by using the pre-paging approach in which the OS guesses in advance which pages of memory will be required, and proactively pre-loads them into the main memory without halting the VM being migrated.
Live migration process
Live migration happens in a step-by-step manner:
0: Pre-migration or preparation
The target host (host A) is preselected for migration, and the VM is made active on the client side. The hypervisor also duplicates the memory pages from the source file to the destination file.
A request for migration is passed from host A to host B. With this request initialization, host B reserves a VM container of the required size. If these resources cannot be secured, the VM continues to run in host A unaffected.
2: Iterative (Repetitive) Pre-copy
Pre-copy migration combines an iterative push phase and a stop-and-copy phase. This way, all pages from host A are transferred to host B. Further, in subsequent iterations, only pages that were altered or dirtied during the transfer process will be considered.
Running OS instances are suspended at host A and the network traffic is redirected to host B. The CPU state and other inconsistent memory pages are then transferred to host B. Finally, there is a consistent suspended copy of the VM in both hosts, with the copy at A considered primary. This way, migration can be resumed from A in case of failure.
Host B informs host A that it has received the consistent OS image. Host A acknowledges this message, and this becomes the commitment of migration transaction. Now host B becomes the primary host, and host A can discard the original VM.
The migrated VM is activated on the now primary host B. Device drivers are reattached to the new machine and moved IP addresses are advertised with post-migration codes. Normal operations resume in host B.
Live migration requirements
Live migration ensures that the consistent VM image remains in at least one host. However, process success hinges on two key requirements:
- Original host remains stable: Throughout the migration process, the original host must be stable without any interruption until the commitment stage.
- Suspending and resuming VM: The VM can be suspended and resumed in the physical host without a risk of failure.