February 22-24, 2013
Hilton Los Angeles International Airport
LXC (Linux Containers) is a lightweight virtualization system. It allows to create (within a Linux machine) multiple environments (or containers), each of them being invisible and impervious to the others.
LXC is similar to OpenVZ, VServer, FreeBSD jails, or even Solaris Zones: they can all be seen as improvements over a basic “chroot”, isolating processes within a common host kernel, with very low overhead. This contrasts with KVM, Xen, or VirtualBox, which all emulate a complete machine, with its own kernel, and a higher overhead. Of course, virtual machines have their advantages as well: for instance, they make it possible to run a Windows VM inside a Linux machine (or vice versa), while LXC only deals with Linux processes.
LXC relies on two significant feature sets of the kernel: kernel namespaces, and control groups. Kernel namespaces provide basic isolation, making sure that each container cannot “see” or affect other containers. We will explain the different kinds of namespaces, to give a strong understanding of the low-level mechanisms at the heart of LXC. Control groups, on the other hands, are used to allocate resources (memory, CPU, I/O...) between containers. We will detail the most useful knobs which can be tuned to adjust resources allocated to containers.
We will then explain how to leverage on unioning filesystems to provision new containers. While not strictly necessary to LXC, unioning filesystems allow to provision new environments very quickly (less than a second) and very cheaply (much less than a megabyte). We will give some technical details to explain pros and cons of unioning filesystems, and how to work around potential issues.
Finally, we will give technical details about “lxc-attach”, a tool which makes containers administration much easier, but requires specific kernel patches. We will discuss the issue and how to work around it, with and without kernel patching.
Those techniques have been used in production at dotCloud since 2010, on a cluster of a few hundreds of machines; some of them hosting thousands of containers.