How to Dockerize a Python Application

Published in

Python in Plain English

4 min readSep 26, 2022

Docker and Python logos. Image by Author

Python has become one of the most popular programming languages in order to build scalable and reliable applications. One of the preferred ways to build and deliver them is through the usage of Docker Containers since it brings the great advantage of its portability and many others.

When it comes to creating a Docker image, there are some tips and best practices that can help us to make it more reliable and secure.
In some tutorials you might see a Docker image as the following one being used:

Certainly the file above can be used to run almost any application.

However, there is a lot we can do to optimize it for usage in a real production environment.

Choosing a Small Base Image

Notice that we have used the python image to start building ours. The problem is that such image contains a lot of dependencies we are not necessarily going to use and they carry a lot of vulnerabilities with them.

As a security measure, we should choose a base image as small as possible. In this case, we could use python-alpine or python-slim as alternatives. As a rule of thumb, the smaller the base image, the smaller the attack surface it has.

Let’s see how these images compare in terms of the number of known vulnerabilities.

The python image name stands for python:3.10.7-bullseye as the complete name, after running the security scan:

docker scan python

It gave us the following result:

Package manager: deb
Project name: docker-image|python
Docker image: python
Platform: linux/amd64
Base image: python:3.10.7-bullseye
Tested 427 dependencies for known vulnerabilities, found 281 vulnerabilities.
Base Image Vulnerabilities Severity
python:3.10.7-bullseye 281 4 critical, 35 high, 3 medium, 239 low

On the other hand, taking the python:3.10.7-alpine3.16 image we have:

Package manager: apk
Project name: docker-image|python
Docker image: python:3.10.7-alpine3.16
Platform: linux/amd64
Base image: python:3.10.7-alpine3.16
Tested 37 dependencies for known vulnerabilities, found 1 vulnerability.

That is a huge difference: 281 vulnerabilities versus 1! Therefore, we can take the python:3.10.7-alpine3.16 and add the dependencies as we need them.

Another problem is that images names such as python, ubuntu or golang will default to their latest version, which may change all the time. It is a much better choice to use a more specific tag that determines the version. If you prefer, you can even use the image digest hash to ensure you are always using the same base image.

Running as non-root users

By default, a process inside a docker container will run as root. It means that in case someone gets access to the container, if they manage to have a process to break out from the container to the underlying machine, they will gain root access to the host itself. Since the user will be mapped as root outside the container as well.

To mitigate this issue, we can create a dedicated user to run the processes inside the container:

RUN addgroup app_user && adduser -S app_user -u 1000 -G app_user
USER app_user

In order to prevent permission issues inside the container, we have to add a chown flag to what we copy into it:

COPY --chown=app_user:app_user . .

Furthermore, as an additional defensive layer, you can run the Docker engine as a non-root user. This is a way of making sure that the user running the container does not get root access to the underlying host.

Using Multi-Stage build

The multi-stage build is a great feature for optimizing Docker Images. Normally, the tools that we need to build an application and its dependencies are not the same when we are just running it.

Usually, we define a builder image in the beginning of the file, them we copy all the artifacts we need from it to the final container:

FROM python:3.10.7-alpine3.16 as builder 
### Set up the environment and
### install whatever we need for
### the build phase
FROM python:3.10.7-alpine3.16
## Get specifically what we need to run the application
COPY --chown=app_user:app_user --from=builder /app/artifacts . 
ENTRYPOINT ["app-entrypoint"]

The multi-stage pattern allows us to perform through the docker image some of the build and setup that we need, while at the end we can just take what is needed to run the application properly. You can check the details more in-depth in the official Docker docs.

While installing python dependencies, we can have everything listed inside a requirements.txt file. Also, we can gather them inside a virtual environment, which will make it easier for us to transfer them to the final container.

Putting together everything we have seen in this post, we have the final docker image:

In the image file above, I have added a few dependencies that are needed, specially if you are building an image of a Django application using PostgreSQL.

In addition, you can choose also to the dependencies install using wheels instead of a virtual environment:

RUN pip install --no-cache /wheels/*

Key Takeaways

These were some of the main tips that have helped me to create better and more optimized docker images for python applications. Summing up, in this post we have seen:

How the choice of the base image matters and can help us to eliminate a series of vulnerabilities in our containers.
Running the container as a non-root user as well as the docker engine are important security layers to avoid attacks.
Multi-Stage builds can optimize the size of the image, leaving much of the unnecessary dependencies behind.

Also, don’t forget to check the references at the end of this post for additional tips and best practices when building images for your python application. I hope this post has been helpful.

References

More content at PlainEnglish.io. Sign up for our free weekly newsletter. Follow us on Twitter, LinkedIn, YouTube, and Discord.