Container Vs Process

Virtual machines have a full OS with its own memory management, device drivers, daemons, etc. Containers share the host’s OS and are therefore lighter weight.

The more interesting comparison is between containers and processes: Containers really are processes with their full environment. A computer science textbook will define a process as having its own address space, program, CPU state, and process table entry. But in today’s software environment this is no longer the full story. The program text is actually memory mapped from the filesystem into the process address space and often consists of dozens of shared libraries in addition to the program itself, thus all these files are really part of the process.

In addition, the running process usually requires a number of files from the environment (typically in /etc or /var/lib on Linux) and this is not just for config files or to store program data. For example, any program making SSL connections needs the root CA certs, most programs need locale info, etc. All these shared libraries and files make the process super-dependent on its filesystem environment. What containers do is encapsulate the traditional process plus its filesystem environment. The upshot of all this is that it’s much more appropriate to think of a container as being a better abstraction for a process than as a slimmed-down VM.

The right way to think about Docker is thus to view each container as an encapsulation of one program with all its dependencies. The container can be dropped into (almost) any host and it has everything it needs to operate. This way of using containers leads to small and modular software stacks and follows the Docker principle of one concern per container. A blog post by Jerome Petazzoni has a good discussion on all this.

An aspect of Docker that leads many first-time users astray is that most containers are built by installing software into a base OS container, such as Ubuntu, CentOS, etc. It’s thus easy to think that the container is actually going to “run” this base OS. But that’s not true; rather, the base OS install only serves the purpose of starting out with a standard filesystem content as the environment on which the program being run in the container can rely upon. One could install just the files actually required by the program into the container, but targeting such minimalist container content is often difficult, cumbersome and inflexible. Besides, the way Docker uses overlay filesystems minimizes the cost of the extra unused files.

Starting out with a full OS base install also often leads users to want to run a suite of system daemons in a container, such as initd, syslogd, crond, sshd, etc. But it’s really better to think of a container as running just one process (or process tree) and having all its filesystem dependencies encapsulated.

All this being said, it is possible to treat containers as slimmed-down VMs in some use cases, but I won’t be focusing on those here.

Docker Vs VMs