Skip to main content

Moving data around

Containers help us isolate software from the host machine, but that’s not always what we want. Consider this program, which takes in a JPG and outputs a copy of that file converted to grayscale:

convert.py
import sys
from PIL import Image

image = Image.open(sys.argv[1])
gray_image = image.convert('L')
gray_image.save('gray.jpg')

To run this without Docker, we'd just pass in our input photo to the command, as long as the right dependency is installed.

$ python3 convert.py input.jpg

We can write a Dockerfile as before, which will be quick to build, since Docker cached some of the earlier steps for us during our last project:

Dockerfile
FROM ubuntu:22.04

RUN apt update
RUN apt install -y python3 python3-pip

RUN pip3 install Pillow

COPY convert.py /convert.py

ENTRYPOINT ["python3", "/convert.py"]
Working directory
Dockerfile
convert.py
matilda.jpg

Okay, let’s give it a shot with this photo of a cat (by Eduardo Gorghetto on Unsplash):

matilda.jpg

cute cat in color
$ docker build --tag docker-convert-grayscale .
$ docker run docker-convert-grayscale matilda.jpg
Traceback (most recent call last):
File "/convert.py", line 4, in <module>
image = Image.open(sys.argv[1])
File "/usr/local/lib/python3.10/dist-packages/PIL/Image.py", line 3243, in open
fp = builtins.open(filename, "rb")
FileNotFoundError: [Errno 2] No such file or directory: 'matilda.jpg'

Right. Although matilda.jpg is in my working directory on the host machine, it’s never getting copied into the container’s virtual filesystem.

We could do that in the Dockerfile:

Dockerfile
FROM ubuntu:22.04

RUN apt update
RUN apt install -y python3 python3-pip

RUN pip3 install Pillow

COPY convert.py /convert.py

ENTRYPOINT ["python3", "/convert.py"]

COPY matilda.jpg /matilda.jpg

But now our Docker image can only be used for this one photo. If someone else were to download the Docker image from Docker Hub, they wouldn’t be able to use our program on their own photos; they’d have to rebuild the Docker image using a modified Dockerfile every time, which (as we discussed earlier) isn’t great for reproducibility. Ideally, the docker run command would work just like running python3 convert.py straight on our host system (but without the pain of installing dependencies manually).

We can get a little closer by using our first Docker image (without pre-loading a photo), then manually copying the photo into a container before running the script. First we’d create a new container from the image, then copy our photo in, then run the script, then copy the created photo out. Let’s try it.

Unfortunately, our docker run technique to create a new container is awkward here, since it will also immediately run the command we specified in the entrypoint. Since the photo doesn’t exist in the container yet, this will give us an error before we’ve even gotten a chance to add the photo.

Instead, we can create a new container from a Docker image without starting the container:

$ docker create docker-convert-grayscale /matilda.jpg
26fdc21bc3971a132c7b0678ae283e4ad4a1f2672c60b8eef0b0a92efb4a1ef3

You’ll notice that we had to provide a /matilda.jpg argument to the docker create command. This is the argument that will be passed to the entrypoint when the container does start, which will happen later.

The output from docker create is the container ID. We can also find it in the output of docker ps -a. Now, we can copy our file into the stopped container (it’s fine to use just the first 12 characters of the container ID):

$ docker cp ./matilda.jpg 26fdc21bc397:/

This works similarly to how we’d copy files to a remote machine using a command like scp or rsync. That [container]:[path] syntax is how we specify where the file should go.

Now we can run the script in the same container. Instead of using docker run, which will create a new container, we use docker start to run the entrypoint command in the existing container.

$ docker start 26fdc21bc397

If we wait a moment and use docker ps -a again, we’ll see that the container has stopped already (its status is “Exited”). That’s because the entrypoint command (python3 /convert.py /matilda.jpg) finished running.

We can grab the final image out of the container now:

$ docker cp 26fdc21bc397:/gray.jpg .
Working directory
Dockerfile
convert.py
matilda.jpg
gray.jpg

And now, in our host machine’s working directory, we have gray.jpg:

gray.jpg

still a cute cat, but now in grayscale