Curious about Docker? Eager to strengthen your skills with containers?
In this blog-post, I’ll share five (5) tips, tricks, and best practices for using Docker. Let’s start with a short analogy for everything that will be covered.
The Dockerfile is a recipe for creating Docker images. Hence it should be treated as if it’s the recipe for your favorite cake 🍰. It should be concise, readable, and easy to follow; this will make the whole baking (development) process easier.
As part of writing an “easy” recipe (Dockerfile), it’s important to enable baking (building) the cake (Docker image) in any kitchen (machine and any UID:GID). After all, if the cake is so good, we’ll want to bake (build) the same cake (Docker image) over and over again, anywhere, and speed up 🚀 the baking (build) process over time by memorizing parts of the recipe (layers) in our heads (cache).
It’s best to split ✂️ the baking (building) process into steps (multi-stage build), where the final product (Docker image) includes only the relevant contents. We don’t want to serve (publish) the cake (Docker image) with a bag of sugar (source-code) or with an oven (build packages/compilers) as it might be embarrassing (and heavy). 🙄
Other than that, keeping the cake (Docker image/container) secured and safe 🔒 from unwanted people or animals 🐈 (hackers) should be taken care of as part of the process of baking the cake (writing a Dockerfile).
And finally, if the cake’s recipe (Dockerfile) contains reusable keywords (ARG) such as “double sweet” 🍫 for “2 sugar”, and it is used repeatedly in the recipe (Dockerfile), it should be declared once at the top of the recipe (Dockerfile Global ARGs) which will make it possible to use it as a reference ($MY_ARG).
Enough with that.
A Docker command (ARG, ENV, RUN, etc.) that is not supposed to be executed when the source-code is changing, should be pushed to the top as much as possible. When comparing to cakes, the base of the cake is the bottom layer, while in a Dockerfile the base of the image is at the top of the file.
The cache of the “requirements packages” should be purged only if a package was added, removed, or its version was changed, but not when something in the code was changed because that happens a lot.
In the following code snippet, the source-code is copied to the image, followed by the installation of requirements (packages). This means that every time one of the source-code files was modified, all the “requirements packages” will be installed. This results in purging the cache of the “requirements packages” on any change in the source-code, which is bad since we want to cache them.
# BAD # Copy everything from the build context COPY . /code/ # Install packages - on any change in the source-code RUN pip install --user -r "requirements.txt"
A good example for caching the requirements layer would be first copying the
requirements.txt file, or any other lock-file (package-lock.json, yarn.lock, go.mod, etc.) followed by the installation of the
requirements.txt, and only then to copy the source-code.
# GOOD # Copy and install requirements - only if requirements.txt was changed COPY requirements.txt /code/ RUN pip install --user -r "requirements.txt" # Copy everything from the build context COPY . /code/
Now, there’s an “extra” command (
COPY) that copies the
requirements.txt twice. This might look like a bad thing if you see it for the first time. Its beauty is that it caches the installation of the “requirements packages” and only then copies the source-code. Amazing!
NOTE: Docker will cache commands that haven’t affected the file-system during the build process. This is why the order of
Multi-Stage Build enables releasing slim images, including only packages and artifacts the application needs.
Let’s investigate the following Dockerfile
# BAD - Not that bad, but it could be better FROM python:3.9.1-slim # Upgrade pip and then install build tools RUN pip install --upgrade pip && \ pip install --upgrade wheel setuptools wheel check-wheel-contents # Copy and install requirements - better caching COPY requirements.txt /code/ RUN pip install --user -r "requirements.txt" # Copy everything from the build context COPY . /code/ ### Build the application ### COMMANDS ... ENTRYPOINT ["app"]
A few things about this solution
rootuser; I’ll cover it in the next topic
With Multi-Stage Build, it’s possible to create an intermediate image, let’s call it
build, including the source-code and required packages for building. The
build stage is followed by the
app stage, which is the “final image” that will be published to the Docker registry (DockerHub, ECR, ACR, GCR, etc.) and eventually deployed to the Cloud or On-Premise infrastructure.
Now let’s break the above snippet into a Multi-Stage Build pattern.
# GOOD FROM python:3.9.1-slim as build # Upgrade pip and then install build tools RUN pip install --upgrade pip && \ pip install --upgrade wheel setuptools wheel check-wheel-contents ### Consider the comments as commands # Copy and install requirements - better caching # Copy the application from Docker build context to WORKDIR # Build the application, validate wheel contents and install the application FROM python:3.9.1-slim as app WORKDIR /myapp/ COPY --from=build /dist/ /myapp/ ENTRYPOINT ["app"]
In general, the last
FROM command in a Dockerfile indicates that this is the final image. This is how we know to name it
prod) and make sure that it contains only the relevant contents. I called it
app even though it’s not used anywhere else in the code; this is just for brevity and better documentation.
NOTE: If you’re curious why I didn’t need to install anything in the final image, it’s because the build process includes all the packages in the
/dist/libdirectory. This is by design, and I totally recommend adopting this practice.
The code snippets above didn’t mention anything about which user is running the commands. The default user is
root, so all the commands to build the application are executed with superuser permissions, which is okay since this stage is done behind the scenes. What troubles me is - why should I allow the user to run the application (container) to execute everything as a superuser (root)?
Picture this - your application is running in the cloud, and you haven’t followed the principle of least privilege.
John, the nifty hacker, was able to hack into your application. Do you realize that John can execute
apt-get install ANYTHING? If John is really good at what he’s doing, he can access any back-end service exposed to your application. Let’s take some “negligible” service, such as your database, where John can install
mysql and communicate with your database.
To solve this problem, you can use the
USER command in the Dockerfile to switch the user from
root to some
appuser whose sole purpose (and permission) is to execute the application, nothing more.
Omitting the build stage, let’s focus on the app stage
# GOOD FROM python:3.9.1-slim as app WORKDIR /myapp/ # Creates `appuser` and `appgroup` and sets permissions on the app`s directory RUN addgroup appgroup --gid 1000 && \ useradd appuser --uid 1000 --gid appgroup --home-dir /myapp/ && \ chown -R appuser:appgroup /myapp/ # All the following commands will be executed by `appuser`, instead of `root` USER appuser # Copy artifacts from the build stage and set `appuser` as the owner COPY --from=build --chown=appuser:appgroup /myapp/ ENTRYPOINT ["app"]
Back to John, the nifty hacker; John tries to execute
apt-get install ANYTHING, and fails, since
apt-get requires super-user permissions. John tries to write malicious code in
/root/ and gets
permission denied because this directory’s permissions set is
700 - read, write and execute by the owner (
root:) or group (
:root) and nothing more.
I’m sure that if John is very talented, he’ll still be able to do some harm, but still, it’s best to minimize the collateral damage and isolate applications as much as possible. We also don’t want John to laugh about the fact that we could’ve prevented him from using
apt-get install ANYTHING, and we simply didn’t do it.
As you can see in the code snippet above, I used
--uid 1000 and
--gid 1000. The values
1000:1000 are the default values for creating a new user or group in Ubuntu, and I used
1000:1000 because I’m on WSL2 Ubuntu:20.04, so I could’ve just omitted those arguments. Here’s how my user looks like
$ cat /etc/passwd | grep "$(whoami)" myuser:x:1000:1000:,,,:/home/myuser:/bin/bash
If the numbers are not the same as those on your machine, then adjusting them with
--uid UID and
--gid GID will ease the development process. Sounds interesting, right? …
I’ll use a real containerized Python application; here’s the Dockerfile of unfor19/frigga/Dockerfile (yes, yes, I wrote it). Imagine that I hadn’t used the
USER command in the Dockerfile; let’s imagine it together by enforcing the container to run as
docker run --user=root ...
# BAD # Reminder - My machine's UID:GID is 1000:1000 # root UID:GID is 0:0 $ docker run --rm -it -v $PWD/:/code/ --user=root --workdir=/code/ --entrypoint=bash unfor19/frigga [email protected]:/code# cat /etc/passwd | grep "$(whoami)" root:x:0:0:root:/root:/bin/bash # UID:GID = 0:0 [email protected]:/code# echo "root contents" > root-file.txt [email protected]:/code# ls -lh root-file.txt # -rw-r--r-- 1 root root 14 Feb 12 14:03 root-file.txt [email protected]:/code# exit # Local machine $ ls -lh root-file.txt # -rw-r--r-- 1 root root 14 Feb 12 14:04 root-file.txt $ echo "more contents" >> root-file.txt # bash: root-file.txt: Permission denied
The above could be resolved by adding
sudo before the
$ sudo echo "more contents" >> root-file.txt # success
But do we really want to use
sudo for editing files? What about our IDE? Do we need to run it with
sudo to edit files? I hope not. A better approach would be adjusting the application’s (container) UID:GID according to the local machine’s UID:GID. In my case, I didn’t have to use
--gid in the Dockerfile, since I’m using the same IDs as my application (container) uses.
# GOOD # Reminder - My machine's UID:GID is 1000:1000 # frigga's user UID:GID - 1000:1000 $ docker run --rm -it -v $PWD/:/code/ --workdir=/code/ --entrypoint=bash unfor19/frigga [email protected]:/code$ echo "file contents" > some-file.txt [email protected]:/code$ ls -lh some-file.txt # -rw-r--r-- 1 appuser appgroup 28 Feb 12 14:15 some-file.txt [email protected]:/code$ exit # Local machine $ ls -lh some-file.txt # -rw-r--r-- 1 meir meir 14 Feb 12 14:16 some-file.txt $ echo "more contents" >> some-file.txt # success
some-file.txt is set with the following permissions
rw-r-r (644), so only the file owner can edit this file. Luckily (or is it?), my UID and GID are also 1000, so I’m able to edit the file with my current user, without adding
sudo every time.
Going back to the Dockerfile - it’s possible to declare global ARGs and pass them along to the Stages. This helps with following the Don’t Repeat Yourself (DRY) principle. For example, providing the
PYTHON_VERSION as a global argument, instead of hardcoding it for each Stage is superb! Let’s see it in action.
# BAD - 3.9.1 is hardcoded FROM python:3.9.1-slim as build # Build stage commands FROM python:3.9.1-slim as app # App stage commands ENTRYPOINT ["app"]
Consider this instead-
# GOOD - 3.9.1 is declared once at the top of the file ARG PYTHON_VERSION="3.9.1" FROM python:"$PYTHON_VERSION"-slim as build # Build stage commands FROM python:"$PYTHON_VERSION"-slim as app # App stage commands ENTRYPOINT ["app"]
If you are here, then it means you’re really into it. See a full example of a containerized Python application, essentially a CLI, see unfor19/frigga. I’ve implemented all the best practices I could think of in this project, and to take it even further check the GitHub Actions (CI/CD) of this project, I added a fully blown test-suite to make sure that frigga can run on both docker-compose and Kubernetes, so you might find it handy.
That would be all. Feel free to ask questions or leave a comment with your best practices for using Docker.