Lighten your Python image with Docker multi-stage builds
· 6 min read
I will explain the basics of Docker multi-stage builds required to understand the post but I won't repeat the documentation (see further reading).
⚙️ Multi-stage builds
Basically a multi-stage build allows you to sequentially use multiple images in one Dockerfile and pass data between them.
This is especially useful for projects in statically compiled languages such as Go, in which the output is a completely standalone binary: you can use an image containing the Go toolchain to build your project and copy your binary to a barebones image to distribute it.
1 2 3 4 5
1 2 3 4 5 6 7 8 9 10 11
This example1 produces a working Docker image containing only the binary built from the project. It also perfectly illustrates the basics of multi-stage builds.
Notice the second
FROM instruction? It tells Docker to start again from a new image, like at the beginning of a build, except that it will have access to the last layers of all the previous stages.
COPY --from is used to retrieve the built binary from the first stage.
In this extreme case, the final image weighs nothing more than the binary itself since
scratch is a special empty image with no operating system.
🐍 Applying to Python & Poetry
Install the dependencies
Let's start with a basic Dockerfile with a single stage that will just install this blog's dependencies and run the project.2
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
It's already not that bad! We are taking advantage of the cache by copying only the files that describe our dependencies before installing them, and the Dockerfile is easy to read.
Now, our final image attack surface could be reduced: we're using a full Debian buster with all the build tools included and we have
poetry installed in our image when we don't need it at runtime.
We'll add another stage to this build. First, we will install poetry and the project's dependencies, and in a second stage we will copy the virtual environment and our source code.
Multi-staged dependencies & code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
See? We didn't have to change much but our final image is already much slimmer!
Without accounting for what we install or add inside, the base
python:3.8.6-buster weighs 882MB vs 113MB for the
slim version. Of course it's at the expense of many tools such as build toolchains3 but you probably don't need them in your production image.4
Your ops teams should be happier with these lighter images: less attack surface, less code that can break, less transfer time, less disk space used, ... And our Dockerfile is still readable so it should be easy to maintain.
For this blog, I use a slightly modified version of what we just saw:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55
There are not much differences between this and the previous one, except for an added stage to retrieve the git commit hash and some tweaking when copying the code.
There is also the addition of the
POETRY_OPTIONS build argument. It allows me to build the same Dockerfile with two different outputs: one with the development dependencies like
pre-commit and the other without.
I use it like this:
1 2 3 4
Again, this is in the spirit of minimizing the production image.
🗒 Closing thoughts
Docker multi-stage builds helped me reduce my image sizes and attack surface - sometimes by a lot - without compromising on features.
I hope that you enjoyed reading this article and that you found it interesting or helpful! Please feel free to contact me if you want to comment on the subject.
In a future post, I'll talk about reducing Docker images build time in a CI environment where the filesystem isn't guaranteed to stay between runs.
📚 Further reading
You often need these tools to install some python dependencies which require compiling. That's why I don't use the
slimversion to install my dependencies. ↩
Except of course if your goal is to compile stuff on the go or provide a platform for people to build their code. ↩