Broken by default: why you should avoid most Dockerfile examples

When it’s time to package up your Python application into a Docker image, the natural thing to do is search the web for some examples. And a quick search will provide you with plenty of simple, easy examples.

Unfortunately, these simple, easy examples are often broken in a variety of ways, some obvious, some less so. To demonstrate just some of the ways they’re broken, I’m going to:

  1. Start with an example Dockerfile that comes up fairly high on some Google searches.
  2. Show how it’s broken.
  3. Give some suggestions on how to make it less broken.

Broken by default

Consider the following Dockerfile, which I found by searching for Python Dockerization examples. I’ve made some minor changes to disguise its origin, but otherwise it is the same:

# DO NOT USE THIS DOCKERFILE AS AN EXAMPLE, IT IS BROKEN
FROM python:3

COPY yourscript.py /

RUN pip install flask

CMD [ "python", "./yourscript.py" ]

Some of the problems with this Dockerfile

How many different problems can you spot in this image?

Problem #1: Non-reproducible builds re Python version

The first thing to notice is that this Dockerfile is based off of the python:3 image. At the time of writing this will install Python 3.7, but at some point it will switch to installing Python 3.8.

At that point rebuilding the image will switch to a different version of Python, which might break the software: a minor change in your code can lead to a deploy that breaks production.

Solution: Use python:3.7.3-stretch as the base image, to pin the version and OS. Or, python:3.7-stretch if you’re feeling less worried about point releases. See my article for choosing a base image for Python for more details on image variants.

Problem #2: Non-reproducible builds re dependencies.

Similarly, flask is installed with no versioning, so each time the image is rebuilt potentially a new version of flask (or one of its dependencies, or one of its dependencies’ dependencies) will change. If they’re compatible, great, but there’s no guarantee that is the case.

Solution: Create requirements.txt with transitively-pinned versions of all dependencies, e.g. by using pip-tools, poetry, or Pipenv.

Problem #3: Changes to source code invalidate the build cache

If you want fast builds, you want to rely on Docker’s layer caching. But by copying in the file before running pip install, all later layers are invalidated—this image will be rebuilt from scratch every time.

Solution: Copy in files only when they’re first needed.

Problem #4: Running as root, which is insecure

By default Docker containers run as root, which is a security risk.

Solution: It’s much better to run as a non-root user, and do so in the image itself so that you don’t listen on ports<1024 or do other operations that require a subset of root’s permissions.

A somewhat better image

Here’s a somewhat better—though still not ideal—Dockerfile that addresses the issues above:

FROM python:3.7.3-stretch

COPY requirements.txt /tmp/

RUN pip install -r /tmp/requirements.txt

RUN useradd --create-home appuser
WORKDIR /home/appuser
USER appuser

COPY yourscript.py .

CMD [ "python", "./yourscript.py" ]

Even if the resulting image was something you’d want to run in production—and it almost certainly isn’t!—the image is still insufficient on its own.

For example, you also need to regularly update requirements.txt in a controlled manner, in order to get security updates and bug fixes, and you’ll need to regularly rebuild your images without caching to get security updates.

Note: Outside any specific best practice being demonstrated, the Dockerfiles in this article are not examples of best practices, since the added complexity would obscure the main point of the article.

Python on Docker Production Handbook Need to ship quickly, and don’t have time to figure out every detail on your own? Read the concise, action-oriented Python on Docker Production Handbook.

Be careful what you learn from

A broken Docker image can lead to production outages, and building best-practices images is a lot harder than it seems. So don’t just copy the first example you find on the web: do your research, and spend some time reading about best practices.

As a starting point, I recommend the official Dockerfile best practices documentation, Hynek’s articles, and of course the various articles on this site about Docker packaging for Python.