You might be familiar with containers at a high level – they are used like lightweight virtual machines to isolate workloads. But what's actually going on? A brief overview.
Containers are actually a few pieces of technology that are bundled together
- Partitioning resources with Linux namespaces – partitions resources so that a set of processes only sees a certain set of resources. Namespaces take global resources like – process ids (pids), mount points (mnt), network stack (net), and abstract them so that each process has its own view of them. There are currently 8 different namespaces (mnt, pid, net, ipc, UTS, user, cgroup, and time).
- Limiting resource usage with Linux control groups (cgroups). This is how containers isolate, prioritize, and account for resource usage. Cgroups also allow pausing, checkpointing, and restarting for groups of processes. Combined with
Together, namespaces and cgroups provide most of the functionality for containers. An honorable mention might be union mount filesystems (such as OverlayFS), which powers the composability of layering of Docker images. Another might be seccomp, which provides some of the hardening for containers by optionally restricting sets of syscalls.
In Kubernetes, the concept of Pod toggles the namespaces that are in effect – containers in the same pod share the same network and ipc namespace (but not pid), among other things.