As a follow-up to my post on SaaS isolation patterns, I'm looking at different application-level isolation patterns – containers. There's a whole spectrum of choices, each with different strengths and weaknesses.
Virtualize the Hardware – Virtual Machines. The first and oldest class of containers is the virtual machine. An emulator called a hypervisor emulates physical hardware – everything from CPUs to Floppy drives.
There are two main classes of hypervisors – ones that work directly on the host machine's hardware and those that work as a privileged process on the host's operating system. For example, Microsoft's Hyper-V framework works directly on the hardware, unlike Virtualbox, which doesn't.
Minimize the operating system– Unikernels. A specially built kernel in which all processes share the same address space. Imagine building a specialized Linux distribution for each different program that only contains the exact requirements for that program to run.
Optimize and minimize the Virtual Machine – Firecracker is the virtualization technology that powers AWS's Lambda Function-as-a-Service platform. Firecracker runs in userspace and spins up fast and tiny virtual machines (think thousands per host).
Intercept Kernel Calls – gVisor virtualizes system calls instead of spinning up a virtual machine. Applications call system calls, which are intercepted by gVisor and then possibly routed to the host kernel. You can think of gVisor as a userspace operating system – that comes with all the difficulties of building a networking stack in userspace.
Isolate the processes – Docker. Docker containers use a combination of cgroups and namespaces to do OS-level isolation. As a result, containers get their own view of process IDs, networking, and file systems. Unlike virtual machines, containers are usually more lightweight and can share hardware resources.
Runtime containers – Java Virtual Machine. Java runs its programs in an application-defined virtual machine, compared to the hypervisor-type virtual machines earlier in this post.
Chromium Sandbox – Chrome ships with its own container mechanism that keeps users safe from malicious sites. At a high level, a privileged broker process communicates over IPC with a less privileged target executing in a sandbox. Since it has to be cross-platform, the exact security boundaries differ a bit between Windows, macOS, and Linux. Unlike the Java Virtual Machine, code isn't executed in a virtual machine, so you get native speeds for C/C++ programs. Link to the design.
WebAssembly Sandbox – WebAssembly (WASM) binaries execute in a sandboxed environment separated from the host runtime. This includes memory safety and conditional access to system calls.
Of course, other containers deserve a mention: OpenVZ, Rkt, LXC, and more. Maybe a follow-up post one day – a discussion of the different (and moving) security boundaries that each method provides.