When you scale the number of containers up or down in the data center, container size must stay at the absolute minimum. A multistage Docker build process makes it possible to build out an application and then remove any unnecessary development tools from the container. This approach reduces the container's final size, but it is a complex process.
The Dockerfile contains a set of instructions on how to build a Docker container-based application. A Docker image is a snapshot of a Docker container at some point in time, providing a template to execute containers. A base image is equivalent to a fresh install of an OS. Finally, an artifact is anything created during a build process of an application or Docker image.
A multistage build enables IT teams to optimize Dockerfiles and maintain them over time, regardless of infrastructure requirements. To set up a successful multistage build, you should first identify when to use the process, assess the advantages and disadvantages, and review these guidelines.
An introduction to Docker multistage builds
A multistage Docker build is a process to create a Docker image through a series of steps. Each stage accomplishes a task -- such as to load or remove build tools -- specific to the base Docker image, as part of the compilation process.
The primary use cases of the multistage process is to clean up after a development build. Once you build a target application on the target image, you can remove the build tools to compress the total image size.
Another use case is if you compile a library that multiple applications use and the build process makes it so the newly compiled library links into the final executable. In this case, you only need to run a single build for the library, rather than for every application that uses said library.
Benefits and disadvantages of a multistage Docker build
Multistage builds let the developer automate the creation process of applications that require some amount of compilation. Developers can create versions that target different OS versions or any other process dependency, which is a big benefit of the approach. It does require some amount of scripting to build the appropriate image based on parameters such as a switch setting.
This process also offers security and caching benefits for the containers. For security, there is the option to have prestaged trusted local container images. Caching can take less time than in a typical build, because Docker can pull parts from a locally cached image and can reuse unchanged pieces from a build stage.
Flexibility is both an advantage and a disadvantage. Flexibility often brings complexity for build script creation and maintenance. You must take care to isolate any dependencies so they don't break a release targeted at any given version. Clear comments and documentation help shed light on intentions and desired outcomes with the container deployment.
Docker multistage build guidelines
The key to create a multistage build Dockerfile is to use multiple FROM statements to reference a specific image necessary for that stage. Docker recommends you name each stage to simplify the process of copying results from one stage into the final image with the AS qualifier. For example:
# --- Base Node ---
FROM alpine:3.13 AS base
You can then reference a named stage in a subsequent stage to pick up where the previous stage left off. Later in the Dockerfile, you would reference this image as follows:
# --- Dependency Node ---
FROM base AS dependencies
You can then use the COPY command to include items from a previous stage, or from an external image, in the same way you would copy items from a build stage.
You can find multistage Docker build examples online to gain familiarity before attempting it on your own. The great thing about building out applications with Docker is you can easily test the build tools on your own development machine. Once the Dockerfile works locally, you can then push it to a build/test stage of development and, eventually, onto production.