Bind mounts

So that works, but it’s a bit of a pain. It’d be better if the container and host could share a filesystem directory. That way, the container would be able to operate directly on files on the host machine, rather than us needing to copy things in and out.

This would let us keep our data on the host machine, while our code (and dependencies) live inside the container.

By making sure that container filesystems contain only code (and not data), we’d also be able to delete containers once they’ve stopped running, which is important to make sure we don’t accrue a bunch of dead container clutter over the course of a long time using Dockerized software. In fact, Docker typically assumes that stopped containers are safe to clean up and rebuild from their image files, so if you get in the habit of storing data inside container filesystems instead of somewhere more persistent, you may be in for a rude awakening when a stray command casually deletes a stopped container (don’t ask me how I know).

If we treat containers as disposable, we can rebuild our image (say, to change our program’s code) and swap out the old container. This is important especially for long-running applications (like a web server that stores and loads data on disk), since the new container may still want to access data created by an older version of the container. It’s important to rebuild images each time we make a change (instead of just trying to incrementally make updates in a long-lived container), since this ensures the Dockerfile stays in sync as accurate documentation of how to construct a container from scratch.

To create a “portal” between the the host machine and a container, we can use a bind mount:

note

Docker’s documentation recommends alternative syntax for these mounts, but I haven’t really found it to be better. I mostly use the syntax described here, though most of my mounting happens using Docker Compose, which we’ll discuss later.

$ docker run -v $(pwd)/data:/app-data [image] [args]

When this command is run, Docker will set up a bidirectional mount between our host and the underlying container. In particular, this command uses a [host]:[container] path format, binding the host’s data folder in the working directory to the container’s /app-data folder (creating a folder called data on the host if it doesn’t exist already). Since the mount needs an absolute path, we used $(pwd)/data on the host, which will expand to the absolute path of the data folder in the working directory.

Since our grayscale conversion script outputs straight to the root folder of the container, let’s make a small change and rebuild the container:

convert.py
import sys
from PIL import Image

image = Image.open(sys.argv[1])
gray_image = image.convert('L')
gray_image.save('/app-data/gray.jpg')

Working directory

Dockerfile

convert.py

data/

matilda.jpg

gray.jpg

Now, we can run our script, first placing matilda.jpg in the host’s ./data folder:

$ mkdir data
$ mv matilda.jpg data/
$ docker run -v $(pwd)/data:/app-data docker-convert-grayscale /app-data/matilda.jpg
$ ls ./data
gray.jpg  matilda.jpg

When the Python script attempts to read the file /app-data/matilda.jpg, our host’s file ./data/matilda.jpg gets read directly. When Python writes back to /app-data, it’s actually just writing to that same ./data folder on our host. Anything written outside /app-data in the container (like, say, to a log file in /var/log) is still not synchronized, living only in the ephemeral filesystem of the container. The next time we use docker run with this image and bind mount, we’ll get a fresh container, but it will still have access to everything in our host’s ./data folder.

Bind mounts can be handy for lots of purposes, not just running utility scripts. You might use them to keep your database storage in a fixed spot on your host machine, or to communicate between containers (although best practices vary). You can also configure bind mounts to be read-only to the container, in case you only want to send data into the container and don’t need it to spit anything out.

When using bind mounts for utility scripts like this, there are a few things to keep in mind. First, your container is often running things as the root user by default; although in a traditional deployment we tend to avoid this, Docker’s containerization provides many of the same protections that we’d normally get by forcing software to run as a particular user. The downside is that files will get written to the bind mount as the root user, which means you may need to change their file ownership before you start using them. It’s possible to ask docker run to run the underlying command as a particular user, or simply to switch users inside the container, but this is another step you’ll have to take.

The other thing that can be annoying about this type of use is that paths passed into command arguments need to be relative to the container. In our docker run command above, we had to refer to /app-data/matilda.jpg, a path that only exists inside the container. You may be able to improve the experience by modifying the script and the mount to use the current working directory, but the mismatch between host and container filesystems is always lurking in the background.

Since the mount information is provided at the time of creating the container rather than in the Dockerfile, every user can choose their own mount configuration. But every user has to choose their mount configuration. With multiple volume binds and maybe some other configuration options (like environment variables and bound ports), a docker run command can get pretty beefy.

I’ve taken to putting them in some run.sh file that takes in just a few arguments and constructs the docker run command with all the binds set up correctly. Otherwise, you end up relying on README documentation just to see what docker run options you’re expected to use, which isn’t quite as painful as having to use the README to figure out which system packages to install, but it still leaves a bad taste in my mouth, since it’s convenient to be able to use Docker to avoid making users think too hard about how to run your script. Later, when we talk about Docker Compose, we’ll see another way to simplify running your containers.