Me vs Docker Round 2: Optimizing Dockerfiles

June 5, 2020

In the interest of making a solid foundation for my app I made some passes at optimizing my Dockerfiles and splitting out dev dependencies. I also finally wrapped my head around how a / effects a Dockerfiles COPY function, which had previously confused me.

Dockerfile RUN Optimizations

An interesting concept I came across while reading and working through Pluralsights Docker course is that every single RUN command (and I think every command in general) in the Dockerfile creates its own layer with its own resulting contents from that command. These layers are then connected together to create the final image. This can have some initially unintuitive effects with regards to final image size.

A rather practical example of this would be as follows. Lets say you were basing a container off of a debian image. Upon creating the image FROM debian you wished to run an update of that debian system and install wget. But at the end of that you know it would be good practice to remove update files that would be in apt/lists directory post update, thus keeping the image only as large as it needs to be to be functional. You may come up with something like this:

FROM debian
RUN apt-get update
RUN apt-get install -y wget
RUN rm -rf /var/lib/apt/lists/*

And that should clean things up nicely right? No quite. While the resulting final file system of the image will indeed have the apt/lists files removed, there will still be an image deeper in that contains all those /lists/ files. This is because every single individual run command creates a layer. And that layer is part of the final image. They are additive.

As a result, the first RUN command creates an image of about 9mb, the second one installing wget creates one that is about 40mb, and the final RUN command that deletes the contents of the /list/ dir is 0mb, for an overall image size of around 50mb.

Instead this can all be achieved in a single RUN command, that updates, adds, and removes file all in one go. This results in less layers to maintain, and a smaller overall image.

FROM debian
RUN apt-get update && \
    apt-get install -y wget && \
    rm -rf /var/lib/apt/lists/*

The overall image size will now be only 40mb and a single layer.

The Frontend Dockerfile

I took this new found knowledge and looked to see if I had any groups of commands that could be optimized. Though I did not have any situations (yet) removing unneeded files that were sneakily creating excess layers and bloating overall image size, I did have a situation where I could reduce the overall number of layers.

The Dockerfile for my /frontend/ dir looked like this previously:

FROM node:13.12.0-alpine

WORKDIR /app

ENV PATH /app/node_modules/.bin:$PATH

COPY package.json ./
COPY package-lock.json ./

RUN npm install
RUN npm install react-scripts@3.4.1 -g

COPY . ./

CMD ["npm", "start"]

See those two back to back RUN commands? They were able to be rewritten as a single RUN command like so:

RUN npm install && \
    npm install react-scripts@3.4.1 -g

I really like setting up precedents like this in my code because it serves a good reminder down the road for me to follow these kinds of practices. Did this refactor make a huge difference? Absolutely not. It removed 1 layer. Overall image size was unchanged which I conformed to myself by using the docker history command to observe the build history of the Frontend image.

Heres the old image history with two RUN commands:

two_layers

The two npm install commands create two layers, 167mb and 203mb respectively for a total between them of 370mb. And heres the result of the refactor to one RUN command:

one_layer

One layer, 370mb.

But inevitably this Dockerfile will get more stuff some day, and having run commands already chained like that really idiot proofs it against future me.

Another small layer reduction could be done by reducing the number of COPY commands. Because I seem to be allergic to reading instructions first, I first tried to unify my COPY commands the same way I did the RUN commands like so:

COPY package.json ./ &&\
    package-lock.json ./

Yeah that didn't work.

ERROR: Service 'frontend' failed to build: COPY failed: stat /var/lib/docker/tmp/docker-builder932974682/&&: no such file or directory

COPY commands to the same location are unified like this:

COPY package.json package-lock.json ./

Another layer bites the dust. Worth noting is that if files need to be copied to multiple different locations in the image, multiple COPY instructions will be needed. No way around that.

Another small optimization to my Dockerfile was changing my install command to:

npm install --production

Which since I had been so far good about installing JavaScript dependencies like eslint that are only needed for development by using the --save-dev flag, this tells npm to only install the packages needed for production. Thus further reduced the size of my resulting image, and reduced build time a bit. A drop in the npm ocean but hey, I'll take it.

However. The word "production" got the gears turning in my head, which is usually dangerous. I remember looking at console outputs for the react app and seeing warnings about "not running an optimized production build" or something along those lines. Sure enough in the CRA documentation I found this article about how to build an optimized copy of the source code and serve it using the serve library. After testing this locally and confirming it works I took the dive on getting it working in Docker. The file then looked like this:

FROM node:13.12.0-alpine

WORKDIR /app

ENV PATH /app/node_modules/.bin:$PATH

COPY package.json package-lock.json ./

RUN npm install --production && \
    npm install react-scripts@3.4.1 -g && \
    npm install serv

COPY . ./

RUN npm run build

CMD ["serve", "-s", "build", "-l", "3000"]

There were some things that could still be improved here, like reordering some copy and run commands to unify layers, but the build worked, and my full optimized code was now served on the port instead of the development build. I also suspected I would now be able to rm -rf all my old source code, just leaving the /build/ directory in place, but that all sounded adventurous enough to save for a later time. Progress.

The Backend Dockerfile

Speaking of dependencies, could I remove dev dependencies for the backend side of things? The Django app had very few dev dependencies but there were two, pycodestyle and pylint which I was using in conjunction with VScode to lint my Python code. To solve this I split up my dependencies into two files in a new directory called /dependencies/ like so:

deps

With the contents of the dev.txt file being just

pycodestyle==2.6.0
pylint==2.5.2

The prod.txt file containing all the rest. Then in my Dockerfile for the backend I simply changed my RUN command for installing dependencies to:

RUN pip install -r requirements/prod.txt

This removed two dependencies from the final Docker image and sets a nice precedent for the future while developing. The only wrinkle being if I want to set this up for local dev on another machine I need to run a pip install against both files now.

The COPY Command and Trailing Slashes

Whether a copy command ended with a / or not had vexed me previously. Looking at other peoples Dockerfiles I saw times when it was there, and times when it wasn't and hadn't been able to decern why at a glance. It turns out the COPY command is based on the GO languages filepath.Match() function.

COPY source_code /destination

will copy a file or or a directory named source_code and rename it as /destination. The leading / indicating an absolute path.

COPY source_code /destination/

indicates that destination is a directory. This means that a file named source_code will be placed in a directory named destination. For example source_code.txt will land in the image as /destination/source_code.txt. However, if source_code is a directory, it will be copied into the image and renamed as /destination. NOT copied into a directory like /destination/source_code/ which I had previously misunderstood.

To copy a lot of things you can use a command like

COPY path/source_code* /destination/

which will copy all files or directories located on that path, to the directory /destination.

COPY source_code destination

will copy a file or directory named source_code as destination, located relative (instead of absolute) to the previous WORKDIR instruction.

Guess I should learn GO eh?

Thank you for reading! Part 3 will contain some actually React and UI building I promise.

Dashdashforce

Me vs Docker Round 2: Optimizing Dockerfiles

Dockerfile RUN Optimizations

The Frontend Dockerfile

The Backend Dockerfile

The COPY Command and Trailing Slashes