learning-docker/notes/3_building_docker_images.md

# 3. Building Docker Images

## What are Dockerfile?

**Dockerfile**:
* is a small "program" to create an image
* Run Dockerfile using `docker build -t name_of_container .`
  * where `.` means Dockerfile is here
  * `-t name_of_contain` mean tag the container
* When finished, the result will be in local docker registry, ready to be run

### Producing the Next Image with Each Step

* Each line (step) in Dockerfile takes the image from the previous line and make another image
* The previous images is unchanged
* state is not carried forward from line to line
  * Multiple command in oneline is different from multiple commands in separate line
* Hence, you don't want large files to span lines, otherwise, the image is too large
  * e.g. Download a large file, edit it, and delete it; If done in oneline, the image is small, otherwise, it's big

Details of working with Dockerfile are available in [Dockerfile reference](https://docs.docker.com/engine/reference/builder/)

### Caching with Each Step

* `docker build` using Dockerfile will save output of each step in cache
  * Watch build output for "using cache"
* Docker will skips lines that have not changed since the last build. Time/resource saved
* Caching saves huge amounts of time
* Tip of editing Dockerfile: always put the parts that make change at the end of Dockerfile

### Dockerfile != Shell Scripts
* Dockerfiles look like shell scripts
* But they are not same
  * Process in one line won't be running on next line
  * Each line run for the duration of that container, then container gets shutdown, saved into an image. Fresh start on next line
* If two programs need passing values in same container, they have to be in same line
* Environment variables can be passed to next line using `ENV` cmd

Summary to notify: each line in Dockerfile is its own call to `docker run`

## Building Dockerfiles

### The Most Basic Dockerfile

Create a simple Dockerfile with following lines

```dockerfile
FROM busybox
RUN echo "building simple docker images."
CMD echo "Hello Container"
```

Build image using this Dockerfile `docker build -t hello .`, result in build output as shown below:

```
Sending build context to Docker daemon  2.048kB
Step 1/3 : FROM busybox
latest: Pulling from library/busybox
5f5dd3e95e9f: Pull complete
Digest: sha256:9f1c79411e054199210b4d489ae600a061595967adb643cd923f8515ad8123d2
Status: Downloaded newer image for busybox:latest
 ---> dc3bacd8b5ea
Step 2/3 : RUN echo "building simple docker image"
 ---> Running in 1990ae4f8398
building simple docker image
Removing intermediate container 1990ae4f8398
 ---> cf5a3650fa24
Step 3/3 : CMD echo "hello container"
 ---> Running in 3e3b64874c2c
Removing intermediate container 3e3b64874c2c
 ---> 4a0a8c1c2d1b
Successfully built 4a0a8c1c2d1b
Successfully tagged hello:latest
```
1. In step 1/3, a image `dc3bac...` is created.
2. In step 2/3, a container `1990ae...` is created using the image as shown; And `echo "building ..."` is executed using `RUN` command.
   1. The container is also removed at the end of Step 2/3 as no body is using this container any more.
3. In step 3/3, a new container `3e3b...` is created, added command `echo` using `CMD`, which is then saved as new image `4a0a...`

Running the image `4a0a...` via: `docker run --rm hello`, will print "hello container"

### Installing a Program with Docker Build

Create a new Dockerfile:

```Dockerfile
FROM debian:sid
RUN apt-get -y update
RUN apt-get -y upgrade
RUN apt-get -y install nano
CMD "nano" "/tmp/notes"
```

### Adding a File through Docker Build

Start from previous built image

```Dockerfile
FROM example/nanoer
ADD notes.txt /notes.txt
CMD "nano" "/notes.txt"
```

In the same directory, create a notes.txt with inputted words. This dockerfile will add required file into image

## Dockerfile syntax

### The FROM statement

* Indicate which image to download and start from
* Must be the first cmd in Dockerfile

### The MAINTAINER Statement

* Defines the author of this Dockerfile

```Dockerfile
MAINTAINER Firstname Lastname <email@example.com>
```

### The RUN Statement

* Runs the command line, waits for it to finish, and saves the result

```Dockerfile
RUN unzip install.zip /opt/install/
```

### The ADD Statement

* Adds local files `ADD run.sh /run.sh`
* Adds the contents of tar archives
  * `ADD project.tar.gz /install/`, it will un-compress tar.gz and add to container
* Works with URLs as well
  * `ADD https://project.example.com/download/1.0/project.rpm /project/`

### The ENV Statement

* Sets environment variables
* Both during the build and when running the image

```Dockerfile
ENV DB_HOST=db.production.example.com
ENV DB_PORT=5432
```

### The ENTRYPOINT and CMD Statement

* **ENTRYPOINT** specifies the start of the command to run
* **CMD** specifies the whole command to run
* If container acts like a cmd-line program, you can use ENTRYPOINT
* If you are unsure, CMD is more used

### Shell Form vs. Exec FORM

* ENTRYPOINT & CMD can use both forms
* **Shell form** looks like normal shell script:
  * `nano notes.txt`
* **Exec form** looks like:
  * `["/bin/nano", "notes.txt"]`

### The EXPOSE Statement

* Maps a port into the container
  * `EXPOSE 8080`, same as `-p 8080` in `docker run`

### VOLUME Statement

* Defines shared or ephemeral volumes
  * `VOLUME ["/host/path/" "/container/path/"]` map host path to container
  * `VOLUME ["/shared-data"` create a volumes can be inherited by later containers

Tips: Avoid defining shared folders in Dockerfile, as it makes Dockerfile only work with the current computer

### WORKDIR Statement

* Sets the directory the container starts in after `docker run`

```dockerfile
WORKDIR /install/
```

### The USER Statement

* Sets which user the container will run as
* Useful when have a shared network directory involved a fixed username/number

```dockerfile
USER arthur
USER 1000
```

### TODO: Read docker reference guid

## Multi-project Docker files

> It was actually very common to have one Dockerfile to use for development (which contained everything needed to build your application), and a slimmed-down one to use for production, which only contained your application and exactly what was needed to run it. This has been referred to as the “builder pattern”. Maintaining two Dockerfiles is not ideal.

> With multi-stage builds, you use multiple FROM statements in your Dockerfile. Each FROM instruction can use a different base, and each of them begins a new stage of the build. You can selectively copy artifacts from one stage to another, leaving behind everything you don’t want in the final image.

```dockerfile
FROM ubuntu:16.04 as builder
RUN apt-get -y update
RUN apt-get -y install curl
RUN curl https://google.com | wc -c > google-size

FROM alpine
COPY --from=builder /google-size /google-size
ENTRYPOINT echo google is this big; cat google-size
```

## Avoid golden images

Golden images: A locally-modified image (like legacy of previous developer that nobody dare to modify)

### Preventing the Golden Image Problem

* Include installers in the project. If any dependencies needed for building the image, check it in image
* Have a canonical (权威) build system that builds everything from scratch.
  * From a base image
  * Build until final stage
* Tag builds with git has of the code that built it
* Use small base images, e.g. Alpine
* Build images you share publicly from Dockerfiles, always
* Don't leave password in layers.