Curious about Docker? Eager to strengthen your skills with containers?
In this blog-post, I’ll share five (5) tips, tricks, and best practices for using Docker. Let’s start with a short analogy for everything that will be covered.
The Dockerfile is a recipe for creating Docker images. Hence it should be treated as if it’s the recipe for your favorite cake 🍰. It should be concise, readable, and easy to follow; this will make the whole baking (development) process easier.
As part of writing an “easy” recipe (Dockerfile), it’s important to enable baking (building) the cake (Docker image) in any kitchen (machine and any UID:GID). After all, if the cake is so good, we’ll want to bake (build) the same cake (Docker image) over and over again, anywhere, and speed up 🚀 the baking (build) process over time by memorizing parts of the recipe (layers) in our heads (cache).
It’s best to split ✂️ the baking (building) process into steps (multi-stage build), where the final product (Docker image) includes only the relevant contents. We don’t want to serve (publish) the cake (Docker image) with a bag of sugar (source-code) or with an oven (build packages/compilers) as it might be embarrassing (and heavy). 🙄
Other than that, keeping the cake (Docker image/container) secured and safe 🔒 from unwanted people or animals 🐈 (hackers) should be taken care of as part of the process of baking the cake (writing a Dockerfile).
And finally, if the cake’s recipe (Dockerfile) contains reusable keywords (ARG) such as “double sweet” 🍫 for “2 sugar”, and it is used repeatedly in the recipe (Dockerfile), it should be declared once at the top of the recipe (Dockerfile Global ARGs) which will make it possible to use it as a reference ($MY_ARG).
Enough with that.
A Docker command (ARG, ENV, RUN, etc.) that is not supposed to be executed when the source-code is changing, should be pushed to the top as much as possible. When comparing to cakes, the base of the cake is the bottom layer, while in a Dockerfile the base of the image is at the top of the file.
The cache of the “requirements packages” should be purged only if a package was added, removed, or its version was changed, but not when something in the code was changed because that happens a lot.
In the following code snippet, the source-code is copied to the image, followed by the installation of requirements (packages). This means that every time one of the source-code files was modified, all the “requirements packages” will be installed. This results in purging the cache of the “requirements packages” on any change in the source-code, which is bad since we want to cache them.
# BAD
# Copy everything from the build context
COPY . /code/
# Install packages - on any change in the source-code
RUN pip install --user -r "requirements.txt"
A good example for caching the requirements layer would be first copying the requirements.txt
file, or any other lock-file (package-lock.json, yarn.lock, go.mod, etc.) followed by the installation of the requirements.txt
, and only then to copy the source-code.
# GOOD
# Copy and install requirements - only if requirements.txt was changed
COPY requirements.txt /code/
RUN pip install --user -r "requirements.txt"
# Copy everything from the build context
COPY . /code/
Now, there’s an “extra” command (COPY
) that copies the requirements.txt
twice. This might look like a bad thing if you see it for the first time. Its beauty is that it caches the installation of the “requirements packages” and only then copies the source-code. Amazing!
NOTE: Docker will cache commands that haven’t affected the file-system during the build process. This is why the order of
RUN
,WORKDIR
, andCOPY
is crucial.
Multi-Stage Build enables releasing slim images, including only packages and artifacts the application needs.
Let’s investigate the following Dockerfile
# BAD - Not that bad, but it could be better
FROM python:3.9.1-slim
# Upgrade pip and then install build tools
RUN pip install --upgrade pip && \
pip install --upgrade wheel setuptools wheel check-wheel-contents
# Copy and install requirements - better caching
COPY requirements.txt /code/
RUN pip install --user -r "requirements.txt"
# Copy everything from the build context
COPY . /code/
### Build the application
### COMMANDS ...
ENTRYPOINT ["app"]
A few things about this solution
setup
, wheel
, and check-wheel-contents
root
user; I’ll cover it in the next topicWith Multi-Stage Build, it’s possible to create an intermediate image, let’s call it build
, including the source-code and required packages for building. The build
stage is followed by the app
stage, which is the “final image” that will be published to the Docker registry (DockerHub, ECR, ACR, GCR, etc.) and eventually deployed to the Cloud or On-Premise infrastructure.
Now let’s break the above snippet into a Multi-Stage Build pattern.
# GOOD
FROM python:3.9.1-slim as build
# Upgrade pip and then install build tools
RUN pip install --upgrade pip && \
pip install --upgrade wheel setuptools wheel check-wheel-contents
### Consider the comments as commands
# Copy and install requirements - better caching
# Copy the application from Docker build context to WORKDIR
# Build the application, validate wheel contents and install the application
FROM python:3.9.1-slim as app
WORKDIR /myapp/
COPY --from=build /dist/ /myapp/
ENTRYPOINT ["app"]
In general, the last FROM
command in a Dockerfile indicates that this is the final image. This is how we know to name it app
(or prod
) and make sure that it contains only the relevant contents. I called it app
even though it’s not used anywhere else in the code; this is just for brevity and better documentation.
NOTE: If you’re curious why I didn’t need to install anything in the final image, it’s because the build process includes all the packages in the
/dist/lib
directory. This is by design, and I totally recommend adopting this practice.
The code snippets above didn’t mention anything about which user is running the commands. The default user is root
, so all the commands to build the application are executed with superuser permissions, which is okay since this stage is done behind the scenes. What troubles me is - why should I allow the user to run the application (container) to execute everything as a superuser (root)?
Picture this - your application is running in the cloud, and you haven’t followed the principle of least privilege.
John, the nifty hacker, was able to hack into your application. Do you realize that John can execute apt-get install ANYTHING
? If John is really good at what he’s doing, he can access any back-end service exposed to your application. Let’s take some “negligible” service, such as your database, where John can install mysql
and communicate with your database.
To solve this problem, you can use the USER
command in the Dockerfile to switch the user from root
to some appuser
whose sole purpose (and permission) is to execute the application, nothing more.
Omitting the build stage, let’s focus on the app stage
# GOOD
FROM python:3.9.1-slim as app
WORKDIR /myapp/
# Creates `appuser` and `appgroup` and sets permissions on the app`s directory
RUN addgroup appgroup --gid 1000 && \
useradd appuser --uid 1000 --gid appgroup --home-dir /myapp/ && \
chown -R appuser:appgroup /myapp/
# All the following commands will be executed by `appuser`, instead of `root`
USER appuser
# Copy artifacts from the build stage and set `appuser` as the owner
COPY --from=build --chown=appuser:appgroup /myapp/
ENTRYPOINT ["app"]
Back to John, the nifty hacker; John tries to execute apt-get install ANYTHING
, and fails, since apt-get
requires super-user permissions. John tries to write malicious code in /root/
and gets permission denied
because this directory’s permissions set is 700
- read, write and execute by the owner (root:
) or group (:root
) and nothing more.
I’m sure that if John is very talented, he’ll still be able to do some harm, but still, it’s best to minimize the collateral damage and isolate applications as much as possible. We also don’t want John to laugh about the fact that we could’ve prevented him from using apt-get install ANYTHING
, and we simply didn’t do it.
As you can see in the code snippet above, I used --uid 1000
and --gid 1000
. The values 1000:1000
are the default values for creating a new user or group in Ubuntu, and I used 1000:1000
because I’m on WSL2 Ubuntu:20.04, so I could’ve just omitted those arguments. Here’s how my user looks like
$ cat /etc/passwd | grep "$(whoami)"
myuser:x:1000:1000:,,,:/home/myuser:/bin/bash
If the numbers are not the same as those on your machine, then adjusting them with --uid UID
and --gid GID
will ease the development process. Sounds interesting, right? …
I’ll use a real containerized Python application; here’s the Dockerfile of unfor19/frigga/Dockerfile (yes, yes, I wrote it). Imagine that I hadn’t used the USER
command in the Dockerfile; let’s imagine it together by enforcing the container to run as root
with docker run --user=root ...
# BAD
# Reminder - My machine's UID:GID is 1000:1000
# root UID:GID is 0:0
$ docker run --rm -it -v $PWD/:/code/ --user=root --workdir=/code/ --entrypoint=bash unfor19/frigga
root@987c5784a52e:/code# cat /etc/passwd | grep "$(whoami)"
root:x:0:0:root:/root:/bin/bash
# UID:GID = 0:0
root@987c5784a52e:/code# echo "root contents" > root-file.txt
root@987c5784a52e:/code# ls -lh root-file.txt
# -rw-r--r-- 1 root root 14 Feb 12 14:03 root-file.txt
root@987c5784a52e:/code# exit
# Local machine
$ ls -lh root-file.txt
# -rw-r--r-- 1 root root 14 Feb 12 14:04 root-file.txt
$ echo "more contents" >> root-file.txt
# bash: root-file.txt: Permission denied
The above could be resolved by adding sudo
before the echo
command.
$ sudo echo "more contents" >> root-file.txt
# success
But do we really want to use sudo
for editing files? What about our IDE? Do we need to run it with sudo
to edit files? I hope not. A better approach would be adjusting the application’s (container) UID:GID according to the local machine’s UID:GID. In my case, I didn’t have to use --uid
and --gid
in the Dockerfile, since I’m using the same IDs as my application (container) uses.
# GOOD
# Reminder - My machine's UID:GID is 1000:1000
# frigga's user UID:GID - 1000:1000
$ docker run --rm -it -v $PWD/:/code/ --workdir=/code/ --entrypoint=bash unfor19/frigga
appuser@52ad885a9ad5:/code$ echo "file contents" > some-file.txt
appuser@52ad885a9ad5:/code$ ls -lh some-file.txt
# -rw-r--r-- 1 appuser appgroup 28 Feb 12 14:15 some-file.txt
appuser@52ad885a9ad5:/code$ exit
# Local machine
$ ls -lh some-file.txt
# -rw-r--r-- 1 meir meir 14 Feb 12 14:16 some-file.txt
$ echo "more contents" >> some-file.txt
# success
The file some-file.txt
is set with the following permissions rw-r-r
(644), so only the file owner can edit this file. Luckily (or is it?), my UID and GID are also 1000, so I’m able to edit the file with my current user, without adding sudo
every time.
Going back to the Dockerfile - it’s possible to declare global ARGs and pass them along to the Stages. This helps with following the Don’t Repeat Yourself (DRY) principle. For example, providing the PYTHON_VERSION
as a global argument, instead of hardcoding it for each Stage is superb! Let’s see it in action.
# BAD - 3.9.1 is hardcoded
FROM python:3.9.1-slim as build
# Build stage commands
FROM python:3.9.1-slim as app
# App stage commands
ENTRYPOINT ["app"]
Consider this instead-
# GOOD - 3.9.1 is declared once at the top of the file
ARG PYTHON_VERSION="3.9.1"
FROM python:"$PYTHON_VERSION"-slim as build
# Build stage commands
FROM python:"$PYTHON_VERSION"-slim as app
# App stage commands
ENTRYPOINT ["app"]
If you are here, then it means you’re really into it. See a full example of a containerized Python application, essentially a CLI, see unfor19/frigga. I’ve implemented all the best practices I could think of in this project, and to take it even further check the GitHub Actions (CI/CD) of this project, I added a fully blown test-suite to make sure that frigga can run on both docker-compose and Kubernetes, so you might find it handy.
That would be all. Feel free to ask questions or leave a comment with your best practices for using Docker.