Skip to main content

Images and Containers

Images

A Docker image is a bundle of everything you need to run some software. You can think of it like a big zip archive that contains every file used by the software. It’d be like if you zipped up Node’s node_modules folder, or Python’s site-packages folder, alongside the actual source code for the software.

Images are more general than that, though, since they include not just the application and library code, but an entire root directory structure for an operating system you’re running the software on. The contents of a Docker image based on Ubuntu are kind of like what you’d see if you just zipped up the / folder of someone’s Ubuntu computer, minus some low-level components (like kernel modules) outside of Docker’s purview.

Images aren’t literally zip files, and I’ll discuss how they’re made in more detail later, but the takeaway for now is that they contain a static copy of all the files you need to run some software, including dependencies.

Intercepting

If not by virtualizing in the traditional sense, how does Docker help us in our pursuit of predictable software runs?

Well, we’ll start by asking: why does it matter that a Docker image contains an entire operating system’s worth of dependencies? Think about the libjpeg example I mentioned earlier, how the Pillow library can only save JPEGs if the operating system has libjpeg installed. By bundling an actual copy of libjpeg in a Docker image, an image creator can provide the right version of the library, one that is known to work with the software.

But if I just run some Python code on my computer that needs to call into libjpeg, it’s going to look for the OS library at /usr/lib/x86_64-linux-gnu/libjpeg.so, which is where libjpeg is installed on my computer, not the version of the library inside the image. For that Docker image – the one with the developer’s preferred version of libjpeg installed – to be of any use to me, I need to make sure that, when I run the Python code, the image’s copy of the library is what gets used.

This is what the Docker Engine software (the docker command in the terminal) does. When I run that Python script using a Docker image provided by the developer, although that Python binary is truly running on my host machine, Docker will intercept the running program’s attempt to read libjpeg.so, loading the copy that came from the Docker image, not the one installed on the host. This happens up and down the stack, for every interaction the running program has with the filesystem.

In fact, this mechanism even lets you use Linux-based operating system distributions other than the one installed on your host machine. When you run some software (like python3) from the terminal, your shell (like Bash) is locating the binary you asked for (maybe it’s at /usr/bin/python3) and running its code. But it’s not just that program’s interactions with the filesystem that Docker is intercepting. Even Bash’s attempt to look for python3 at /usr/bin/python3 in the first place is being intercepted by Docker. If another Linux distribution lays out its files differently, Docker will handle the redirection wherever it needs to happen.

In this way, Docker is able to take control of the full circumstances surrounding your program's execution.

Containers

When I use docker run with the name of an image and a command:

docker run ubuntu:22.04 ls
note

When you’re using this kind of software that gives you limited visibility into how its abstract components are structured, it’s always good to learn the tools that do give you that visibility (like how we use git status and git log to inspect a git repository, since we can’t really look in the .git folder). For Docker, you’ll often use the docker ps command, which will show you the active containers on your system. Adding the -a flag will show you containers that don’t currently have any applications running but that are still sitting around, idle.

  • Docker will create a container, which you can imagine sort of as an empty workspace folder somewhere in my computer’s storage.
    • This won’t actually be a single folder you can go and inspect. It exists in the aether, kind of like data stored in a Postgres database or repository metadata stored in a .git folder. Obviously, data in a container must be stored somewhere on the computer, but don’t think too hard about where. The Docker software manages this for you.
  • Docker will “unzip” the contents of the image file (in this case, ubuntu:22.04) into that container.
    • If you don’t have the specified image on your computer, Docker will try to find that image in the cloud, searching on Docker Hub (a cloud service provided by the Docker corporation) by default.
    • Although this works like an “unzip” for practical purposes, in reality Docker performs some optimizations here; a new container won’t actually make copies of files from the image until it needs to.
  • Docker will run the command you specified (in this case, ls) “inside” that container.

Running a command “inside” a container works using the interception behavior I described above. The arguments to docker run (like ls) will be passed through to the container’s entrypoint. By default, this is /bin/sh -c, which just looks for the program you specify (which will happen inside the container filesystem!) and attempts to run it.

It’s not just filesystem read calls that get intercepted. Writes, too, will affect the virtual container filesystem. For example, running touch /file.txt inside the container will create file.txt in the root of the virtual filesystem, not at the root of your host machine.

Docker performs other interceptions, too. For example, server software that binds to a TCP port will, when run inside a container, actually bind to a virtual port whose traffic is routed by Docker. But the most important feature, and the big thing to drive your intuition about containerization, is that Docker ensures that software running in a container will only ever see the virtual filesystem that’s being managed by Docker.