Virtual filesystems: why we need them and how they work
Virtual filesystems (VFS) are the magic abstraction that makes the "everything is a file" philosophy of Linux possible. The trick works because any object may be treated as a file as long as open(), close(), read() and write() functions that operate on it are defined that the kernel deems acceptable. More remarkably, it's possible to perform other operations such as "seek" on some VFS.
Linux has a large variety of VFS, notably /sys, /proc and /dev on every system. Why so many kinds and what differentiates them? We'll peek at the internals of some VFS by looking at the implementations of these system calls in order to better understand how they work.
VFS have another important role in embedded Linux systems, namely making possible the read-only root filesystems that enable consumers to blithely yank the power plug on their devices. The same tricks enable the wonder of the "live CD" or USB stick that can boot Linux even though the kernel requires, for example, a writable /var partition. Through some quick and simple demos (whose code will be posted), we'll look at how these filesystems are implemented and how they can be used. While on this topic is it a good idea to mount /tmp as a VFS, and how can a user accomplish this on their own system?
The final VFS-related sleight-of-hand worth mentioning is mount namespaces, which are one of the enabling technologies for the Linux container revolution. The old chroot technology allowed users to remap the filesystem root to a new path, but the resulting environment still needed to share /proc and /sys with the host, making security a joke. The newer mount namespaces allow a much higher degree of isolation, yet permit features like controllably propagating some filesystem events from the host to the container.