Skip to main content

Building images

If changes made to a container don’t stick around, we can’t just pop into bash and start npm installing things to set up our application. How do we distribute packaged software to other people with Docker?

It’s not typically instances of containers that we hand to other people – instead, it’s pre-built images that have everything you need to run some software. Users will download our images and use them to create containers that run the software.

In the earlier examples, the images we used were just base operating system images, without any additional software installed. In the real world, if we’re building and deploying software with Docker, we don’t want our users to have to load Bash in a container and replicate a bunch of dependency build steps. It’d be better to just hand users an image that already has everything installed, so that they can create a container with the image contents and get started quickly.

Images aren’t only useful for sending software to other people. If you’re building something you want to deploy as server software, having a consistent way to recreate an exact copy of your environment enables you to port your software between physical machines (in case your datacenter catches fire) or quickly add your software to new machines (in case you need to spin up more capacity for your web app) without running into trouble.

Making a fully-loaded image is pretty easy, fortunately. Images come in layers, so we can start with an existing image (like ubuntu:22.04) and layer additional changes on top. Let’s do this for a toy example to get a feel for it.

A simple example

We’ll Dockerize a simple Python application with a couple dependencies. Here’s the application: it takes in a URL, grab’s the page’s title, and prints it to the console.

title.py
import sys
import requests
from bs4 import BeautifulSoup

response = requests.get(sys.argv[1])
soup = BeautifulSoup(response.content, 'html.parser')
print(soup.title.get_text().strip())
Working directory
Dockerfile
title.py

This application requires that we install the requests and beautifulsoup4 dependencies, and of course it’ll also require that we have Python installed.

I’ve saved this as title.py in an empty folder. Let’s also create a file called Dockerfile, which is a script in a Docker-specific format that defines how an image is built. Writing this out in a script, rather than just doing ad hoc installations from a terminal inside a container, is how we document our dependency installation process.

note

You don’t need to have a starting image; if you start your Dockerfile with FROM scratch, you’ll start with a totally empty filesystem. You probably won’t ever want to do this, because your code will need to run on some operating system. FROM scratch is used by the people who build those OS base images.

We’ll start with a FROM command, which tells Docker which image we’re using as a base.

Dockerfile
FROM ubuntu:22.04

I often use Ubuntu base images, if only because my daily use computers run Ubuntu, so they generally work how I expect (with respect to, say, package installation). Alpine Linux is also popular as a base, since it tries to be super lightweight, installing by default only the bare minimum that you need. This FROM directive, like the docker run command mentioned earlier, will look for images on Docker Hub if it can’t find an image with the appropriate name locally.

During the build step, Docker will spin up a temporary build container whose initial contents match the contents of the base image. From there, we can run commands within the new container. For example, we need to install Python using Ubuntu’s package manager:

Dockerfile
FROM ubuntu:22.04

RUN apt update
RUN apt install -y python3 python3-pip

The RUN directive tells Docker to just run these commands straight in the build container. Let’s go ahead and try building the image now:

$ docker build --tag docker-example .

We give the docker build command a “tag” name for the image, in this case docker-example, and the folder (., the current working directory) where Docker can find the Dockerfile and associated build files.

This will take a moment to build, since it’s actually running apt update and apt install on a fresh install of Ubuntu. If you haven’t downloaded the ubuntu:22.04 base image yet, that’ll also take a moment.

Once the build process finishes, we’ll have a new image called docker-example available to us. If we run Bash in it like we did before, we’ll be able to use Python inside the image:

$ docker run -it docker-example bash
root@c2f320bbc31b:/# python3
Python 3.10.12 (main, Jun 11 2023, 05:26:28) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

This wouldn’t have worked in the raw ubuntu:22.04 image, which doesn’t come with Python installed. Already, we’ve created a useful image! There’s more work to do, though. Let’s also install our script’s dependencies:

Dockerfile
FROM ubuntu:22.04

RUN apt update
RUN apt install -y python3 python3-pip

RUN pip3 install requests beautifulsoup4

Now we can run the same docker build command:

$ docker build --tag docker-example .
[+] Building 3.4s (8/8) FINISHED
=> [internal] load .dockerignore
=> => transferring context: 2B
=> [internal] load build definition from Dockerfile
=> => transferring dockerfile: 160B
=> [internal] load metadata for docker.io/library/ubuntu:22.04
=> [1/4] FROM docker.io/library/ubuntu:22.04
=> CACHED [2/4] RUN apt update
=> CACHED [3/4] RUN apt install -y python3 python3-pip
=> [4/4] RUN pip3 install requests beautifulsoup4
=> exporting to image
=> => exporting layers
=> => writing image sha256:1e23a961b23661bf0a137831cd89f4889df4f4691d67b7ac6086a6977e4ea03e
=> => naming to docker.io/library/docker-example:latest

This time, even though we still have those apt steps in our Dockerfile, the build only took a few seconds. You’ll see that the build command outputs a line for each step of the Dockerfile ([1/4], [2/4], etc.). After running each directive, the build process will capture the new contents of the build container and save them to a new image. These intermediate steps from the previous build have been cached, meaning Docker can reuse partial results from previous builds.

Since the only new command in this build was the final one, that’s all that Docker needed to run to create the new, updated docker-example image. Docker’s cache will only activate when it is confident that the previously-cached steps match the output that would result from building again from the beginning. In the rare cases where you really do want Docker to build everything from scratch, you can pass --no-cache into the build command, but most of the time the cache will save you time as you build up your Dockerfile layer-by-layer.

Now, to actually include our title.py script in the final Docker image.

Dockerfile
FROM ubuntu:22.04

RUN apt update
RUN apt install -y python3 python3-pip

RUN pip3 install requests beautifulsoup4

COPY title.py /title.py

There’s no command we can RUN inside the isolated build container to grab our code from the host machine, so here we use Docker’s COPY directive to copy a file from the host machine (in the same directory as the Dockerfile, in this case) to a particular path in the container. Here, I’ve just chosen to copy straight to the root of the container. Throwing stuff in the root isn’t a good practice on a typical Linux install, but this container will only ever be used to run our script, so it’s not a big deal. Later, when we have more than just the one file, we’ll have our Dockerfile put our code in a subdirectory of the root.

We can run the build command again, and in this case it should be pretty quick, since it’s just copying a small file into the container after using pre-cached versions of the layers that result from the previous Dockerfile lines.

We could have done this COPY earlier, by the way, right after the FROM line, if we wanted to; the result would be the same. But if we made any changes to title.py down the road, the build step would enter unseen territory right after the COPY command. Since Docker doesn’t know whether the later commands (like apt and pip3) depend on the contents of title.py, it will have to re-run them every time title.py changes. By putting the COPY command at the end, we can leverage the build cache more effectively, since the build step can see that nothing has changed until the very end of the Dockerfile.

Now, the Docker image is ready to be used. Let’s try a docker run again, first just with an interactive Bash shell so we can have a look around:

$ docker run -it docker-example bash
root@a06733e9dffe:/# ls
bin boot dev etc home lib lib32 lib64 libx32 media mnt opt proc root run sbin srv sys title.py tmp usr var
root@a06733e9dffe:/# python3 title.py 'http://example.com'
Example Domain
root@a06733e9dffe:/#

It works! title.py is in the root directory of the container’s filesystem, and we can use Python to run the program.

We can run the command in one go, too, if we’d like:

$ docker run docker-example python3 /title.py 'http://bid.berkeley.edu'
BiD | Home

One thing we can put in our Dockerfile to make life a little easier is an ENTRYPOINT directive, which will tell Docker which command to run in a container made from this image:

Dockerfile
FROM ubuntu:22.04

RUN apt update
RUN apt install -y python3 python3-pip

RUN pip3 install requests beautifulsoup4

COPY title.py /title.py

ENTRYPOINT ["python3", "/title.py"]

After rebuilding, using docker run will run python3 /title.py inside the container, passing in any additional arguments from our docker run command straight into the entrypoint command.

$ docker run docker-example 'http://bid.berkeley.edu'
BiD | Home

Uploading the image

When we’re finished building our image, we can upload it to Docker Hub with docker push if we want to share it with others. This can be handy, since now people won’t even have to pull our source code if they’d like to run our software; they can use the docker run command as above, and Docker will automatically download and run the image. If we used a base image that the user has downloaded before, Docker will even avoid re-downloading that particular layer, fetching instead just the (smaller) patches to the image from each subsequent layer.

$ docker build --tag timothyaveni/docker-example . # here, timothyaveni is my Docker Hub username
$ docker push timothyaveni/docker-example

# on another computer:
$ docker run timothyaveni/docker-example 'http://bid.berkeley.edu'
Unable to find image 'timothyaveni/docker-example:latest' locally
latest: Pulling from timothyaveni/docker-example
43f89b94cd7d: Pull complete
8750b1a9733b: Pull complete
366d12aa2e9d: Pull complete
4eed9fd58dc6: Pull complete
7e4d40691576: Pull complete
Digest: sha256:a3a75c50946ecf8c168a3333ef0ee2abbf74a243d8620e1d46b6332ef9577d03
Status: Downloaded newer image for timothyaveni/docker-example:latest
BiD | Home

If you have certain groups of build steps that you’re using a lot, you can even use your own local or uploaded images in the FROM directive of other images.

Consistency

Okay, so what did we get out of using Docker here? Now, we can be confident that someone else who uses our pre-built image will get the Python script to run. No risk of a Python version mismatch, or the BeautifulSoup dependency install being incompatible with the version we used in development. Even in situations where switching versions isn’t too much of a pain, by building an image, we’ve made it so our end user doesn’t have to think about it (or even what language we built the software with). The container will have the right version installed, and it will coexist in isolation from all the other containers on the system.

One thing Docker did not do for us is lock our dependencies the next time we build the image. If I run the docker build command five years from now, I’ll get a different version of Python and a different version of the Python libraries, and it’s possible my little script won’t work anymore. That’s why it’s important to make the distinction between the built Docker image and the Dockerfile that specifies how the image should be built. Although using a Dockerfile makes image builds a lot more reproducible (by defining a base image and specifying every transformation necessary to build up to the final image), the build process is only as deterministic as the Dockerfile commands are. We’d probably still want to lock down our dependency versions for future builds, e.g. with a requirements.txt file. Our base image may also change slightly; ubuntu:22.04 will receive occasional updates, but ubuntu:jammy-20231004 refers to a particular build of Ubuntu.

At least the final built image, once we’ve verified it works, is locked in place (although this means it won’t even receive security updates). If your software project is active, it’s good to rebuild your image against newer dependencies every once in a while, making sure everything still works; even if you don’t, though, other users or your deployment server can always use your older images.

It is possible for someone else to rebuild your image using your Dockerfile, and in fact it’ll probably work most of the time if you’re not doing anything too weird. Codifying your dependencies in a Dockerfile, even an imperfect one, is better than the free-for-all of hoping you keep a README up to date with your system configuration. Just keep in mind that anyone building an image needs to have the same files on their computer that you had when you wrote your Dockerfile so that those files can be copied into the image (like title.py was in our example).