# 3. Building Docker Images ## What are Dockerfile? **Dockerfile**: * is a small "program" to create an image * Run Dockerfile using `docker build -t name_of_container .` * where `.` means Dockerfile is here * `-t name_of_contain` mean tag the container * When finished, the result will be in local docker registry, ready to be run ### Producing the Next Image with Each Step * Each line (step) in Dockerfile takes the image from the previous line and make another image * The previous images is unchanged * state is not carried forward from line to line * Multiple command in oneline is different from multiple commands in separate line * Hence, you don't want large files to span lines, otherwise, the image is too large * e.g. Download a large file, edit it, and delete it; If done in oneline, the image is small, otherwise, it's big Details of working with Dockerfile are available in [Dockerfile reference](https://docs.docker.com/engine/reference/builder/) ### Caching with Each Step * `docker build` using Dockerfile will save output of each step in cache * Watch build output for "using cache" * Docker will skips lines that have not changed since the last build. Time/resource saved * Caching saves huge amounts of time * Tip of editing Dockerfile: always put the parts that make change at the end of Dockerfile ### Dockerfile != Shell Scripts * Dockerfiles look like shell scripts * But they are not same * Process in one line won't be running on next line * Each line run for the duration of that container, then container gets shutdown, saved into an image. Fresh start on next line * If two programs need passing values in same container, they have to be in same line * Environment variables can be passed to next line using `ENV` cmd Summary to notify: each line in Dockerfile is its own call to `docker run` ## Building Dockerfiles ### The Most Basic Dockerfile Create a simple Dockerfile with following lines ```dockerfile FROM busybox RUN echo "building simple docker images." CMD echo "Hello Container" ``` Build image using this Dockerfile `docker build -t hello .`, result in build output as shown below: ``` Sending build context to Docker daemon 2.048kB Step 1/3 : FROM busybox latest: Pulling from library/busybox 5f5dd3e95e9f: Pull complete Digest: sha256:9f1c79411e054199210b4d489ae600a061595967adb643cd923f8515ad8123d2 Status: Downloaded newer image for busybox:latest ---> dc3bacd8b5ea Step 2/3 : RUN echo "building simple docker image" ---> Running in 1990ae4f8398 building simple docker image Removing intermediate container 1990ae4f8398 ---> cf5a3650fa24 Step 3/3 : CMD echo "hello container" ---> Running in 3e3b64874c2c Removing intermediate container 3e3b64874c2c ---> 4a0a8c1c2d1b Successfully built 4a0a8c1c2d1b Successfully tagged hello:latest ``` 1. In step 1/3, a image `dc3bac...` is created. 2. In step 2/3, a container `1990ae...` is created using the image as shown; And `echo "building ..."` is executed using `RUN` command. 1. The container is also removed at the end of Step 2/3 as no body is using this container any more. 3. In step 3/3, a new container `3e3b...` is created, added command `echo` using `CMD`, which is then saved as new image `4a0a...` Running the image `4a0a...` via: `docker run --rm hello`, will print "hello container" ### Installing a Program with Docker Build Create a new Dockerfile: ```Dockerfile FROM debian:sid RUN apt-get -y update RUN apt-get -y upgrade RUN apt-get -y install nano CMD "nano" "/tmp/notes" ``` ### Adding a File through Docker Build Start from previous built image ```Dockerfile FROM example/nanoer ADD notes.txt /notes.txt CMD "nano" "/notes.txt" ``` In the same directory, create a notes.txt with inputted words. This dockerfile will add required file into image ## Dockerfile syntax ### The FROM statement * Indicate which image to download and start from * Must be the first cmd in Dockerfile ### The MAINTAINER Statement * Defines the author of this Dockerfile ```Dockerfile MAINTAINER Firstname Lastname ``` ### The RUN Statement * Runs the command line, waits for it to finish, and saves the result ```Dockerfile RUN unzip install.zip /opt/install/ ``` ### The ADD Statement * Adds local files `ADD run.sh /run.sh` * Adds the contents of tar archives * `ADD project.tar.gz /install/`, it will un-compress tar.gz and add to container * Works with URLs as well * `ADD https://project.example.com/download/1.0/project.rpm /project/` ### The ENV Statement * Sets environment variables * Both during the build and when running the image ```Dockerfile ENV DB_HOST=db.production.example.com ENV DB_PORT=5432 ``` ### The ENTRYPOINT and CMD Statement * **ENTRYPOINT** specifies the start of the command to run * **CMD** specifies the whole command to run * If container acts like a cmd-line program, you can use ENTRYPOINT * If you are unsure, CMD is more used ### Shell Form vs. Exec FORM * ENTRYPOINT & CMD can use both forms * **Shell form** looks like normal shell script: * `nano notes.txt` * **Exec form** looks like: * `["/bin/nano", "notes.txt"]` ### The EXPOSE Statement * Maps a port into the container * `EXPOSE 8080`, same as `-p 8080` in `docker run` ### VOLUME Statement * Defines shared or ephemeral volumes * `VOLUME ["/host/path/" "/container/path/"]` map host path to container * `VOLUME ["/shared-data"` create a volumes can be inherited by later containers Tips: Avoid defining shared folders in Dockerfile, as it makes Dockerfile only work with the current computer ### WORKDIR Statement * Sets the directory the container starts in after `docker run` ```dockerfile WORKDIR /install/ ``` ### The USER Statement * Sets which user the container will run as * Useful when have a shared network directory involved a fixed username/number ```dockerfile USER arthur USER 1000 ``` ### TODO: Read docker reference guid ## Multi-project Docker files > It was actually very common to have one Dockerfile to use for development (which contained everything needed to build your application), and a slimmed-down one to use for production, which only contained your application and exactly what was needed to run it. This has been referred to as the “builder pattern”. Maintaining two Dockerfiles is not ideal. > With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image. ```dockerfile FROM ubuntu:16.04 as builder RUN apt-get -y update RUN apt-get -y install curl RUN curl https://google.com | wc -c > google-size FROM alpine COPY --from=builder /google-size /google-size ENTRYPOINT echo google is this big; cat google-size ``` ## Avoid golden images Golden images: A locally-modified image (like legacy of previous developer that nobody dare to modify) ### Preventing the Golden Image Problem * Include installers in the project. If any dependencies needed for building the image, check it in image * Have a canonical (权威) build system that builds everything from scratch. * From a base image * Build until final stage * Tag builds with git has of the code that built it * Use small base images, e.g. Alpine * Build images you share publicly from Dockerfiles, always * Don't leave password in layers.