February 22-24, 2013
Hilton Los Angeles International Airport
Checkpoint/restore is a feature that allows to freeze a set of running processes and save their complete state to a disk. This state can later be restored, so the processes resume exactly the way they were running before. This feature opens a whole set of possibilities, such as live migration, fast start of a huge application, or kernel update without service interruption. While such functionality exist in e.g. OpenVZ kernel, many attempts to merge it upstream (i.e. to vanilla Linux kernel) had failed miserably, mostly for code complexity reasons.
We found a way to overcome this by implementing most of the required pieces in userspace, using the existing kernel APIs where possible, and extending those if necessary. This is what Checkpoint and Restore in Userspace (aka CRIU) project is about.
The talk outlines the current state of the project, including:
The report is of interest to system and distro developers, advanced users, and anyone interested in containers, virtualization, HA and HPC.