How to Work with Docker? — Part 3 — Dockerfile

8 min readJan 22, 2021

In this part of the series, we’re going to see what is the best solution for building an image in a fast and elegant way.

If we go a little back to part 1 of these series, we remember that if we wanted to create an image we needed to download an image, run a container out of it, do a docker commit with all the details, and the result would be an image to use.

Right now we’re going to see how to configure a Dockerfile which is a simple text file.

You can think of Dockerfile as a cooking recipe for Docker to understand how to build the image for you easily with no human interactions regarding installations or any configurations.

Imagine you want someone to update the image for production use-cases, or simply use it to test it out but he wants to also change something in the image building process.

The Dockerfile helps as a cooking recipe to alter the image building process in many ways, so that way you can get a whole other image for your use.

Show me a Dockerfile!

FROM ubuntu:18.04
ENV HTTP_SERVER_PORT=8000RUN apt-get update && \
    apt-get install python3 && \
    mkdir appCOPY ./my-app.py /app/
WORKDIR /app
EXPOSE $HTTP_SERVER_PORTCMD ["python3", "my-app.py"]

It looks a little bit frightening but believe me, it’s simple as making an omelet.

Let’s start from top to bottom.

The FROM Command

The command FROM tells for the Dockerfile what is the base image with its version for this image we wish to build.🤔

That means that the new image we will build will have everything the base image has inside of it, so if we wanted to run a node application, we could have taken the node image as the base image, but for this one we took ubuntu.

The ENV Command

Tells to have an environment variable called HTTP_SERVER_PORT, which could be accessed as an OS environment variable.

This allows for defining many variables that could also be overridden in the docker run command. All of them could be accessed in the same way you are accessing environment variables in your OS.

The ARG Command

You don’t see that command here but it’s important to know it.
The ARG command allows defining variables that are accessible in the build phase of the image only, and not when we’re inside the container because that’s the ENV variable's job.

Why use that you ask?
In most cases, there is the use of CI machines that build our images and they know for what environment and what configurations they are meant. Because of that, we can tell an ARG with for the image build process, and pass it on to an ENV variable, so that ARG could be accessed in the runtime of the container.

The RUN Command

The command RUN is a shell command to execute when building our image. It’s a very important command because it allows us to install dependencies or configure anything else when the image itself is being created.

If you aren’t familiar with the apt-get install than the use of ‘\’ allows to continue for the same command in another new line, and the && allows to combine multiple commands in a single line, so the value the RUN command receive is a single line of command that could be represented like this:

apt-get update && apt-get install python3 && mkdir app
OR
apt-get update
apt-get install python3
mkdir app

The COPY Command

The command COPY as you might’ve already guessed is allowing us to copy from the folder the Dockerfile is located(and running) to the image we’re creating.

It could be useful in cases we wish to include complex setup or execution scripts inside our image to execute them directly, and/or pass to the image our application binary or scripts.

The WORKDIR Command

The command WORKDIR tells Dockerfile basically what is the root directory to work in. You can compare this command to like executing the “cd app” inside the image.

The EXPOSE Command

The command EXPOSE allows us to open a port for our image to be able to receive traffic on that port, it’s similar to adding a port mapping when executing our container.

The CMD Command

The command CMD is actually very interesting.
Some of you might’ve said, “We have the RUN so why the CMD?”.

That’s a great question and the simple difference between them is the context in which they are running.

When a RUN command exists in a Dockerfile it would execute when the building of the image is in process.

While the CMD command is being executed when we’re making a container out of that image.

Simple right?
But how do we create an image out of that Dockerfile?

Another great question!
We now are going to add another command for our toolbox.

docker build -t MY-NEW-IMGE-NAME:MY-IMAGE-TAG .

The build command tells docker to look for a Dockerfile in the directory path we tell him, which for this example is in the current directory, and by that Dockerfile he builds an image.

The ‘.’ has actually another meaning.
It means that when executing the COPY command inside the Dockerfile, the working directory that those commands are executing at is actually that ‘.’ or a full path we tell in the build command.

How Dockerfile was created?

Remember when we talked about creating images by downloading them, customized by hand in bash, and then save them?

Today we’ve already many pre-created images for most of the popular projects at least, and we can create new customized images out of them which eventually could be used by others to create their custom images.

You can treat the base image as specifying what is the operating system we wish to use for our application, I personally like that analogy very much.

The Dockerfile main reasons are allowing us to create the images very fast, and also have them in a script like of a way to define it.

What we’ve seen until now and will see, it’s only a portion of the capabilities of what Dockerfile is being used as of today, so believe me when I say there’s room to improve after this post.

How does the build process work?

When you execute the build command you can see there are steps logged in the build process, and for those steps, you can see a container id created and removed.

That container id is actually the container that is being run for building the image for us and in the end results as an image.

For example, when a Dockerfile sees a RUN command he understands that he needs to make some customization inside the base image, and because of that he creates a container for that.

That container is making the commands defined for him in the Dockerfile, in the relevant step of course, and for each Dockerfile command that container is executing it, and afterward, save that end result as a layer for the final image.

As you might understand already, each Dockerfile creates an image layer, but in order to be specific, if we will do a RUN command that has multiple bash commands, that’s still one single command regarding Dockerfile, so only one layer would be created for a multi-line command.

Each layer as you might guess is being stocked up upon each layer, so in the end, the end result image is the combination of all the images. These images can be saved locally so when we build another image that has the need for a similar layer, that layer can be re-used instead of using the cached one.

This means if we copy to the image a file with the size of 1GB in the RUN command, and we don’t delete it, the result is the image will grow in its size of the file. Even when we would delete that file in another RUN command, the 1GB wouldn’t disappear.

Again, just for reminding, as you already understand, each command of creating the image is acting like an onion, so we start from the kernel of it, and adding up a layer on top of another layer.

In case you want to debug your image regarding its layers, you can execute the following command to see the layers of it:

docker image history <image-id>

Few best practices

When you create images we could easily get to 1GB of memory, and when we’ve reached those sizes, only downloading them gives a huge downtime for new, failed, or updated deployments.

So the goal is trying to reach the sweet spot of optimizing the image size as best as we can, and because of that, I wanted to have a section about it with concentrated points for it.

If I’m being honest, these are only the basic rules for optimizing Docker images sizes, but I don’t wanna overwhelm you so we’re gonna stick to some basics.

Try to do the following:

Don’t pass huge compiled applications inside the container to be extracted and installed.
Reason — The COPY command would create a layer that you can’t delete.
Solution — Move to download your project applications or any other installation files from a network resource, and once you’re done with them simply delete them.
Delete any leftovers or unneeded dependencies that aren’t needed for the running phase of the container like dev packages, which are needed for compiling your applications.
Instead of running complicated commands in your RUN commands, try creating a separation of concerns to installation bash scripts, which could be understood better by anyone else reading it.
Create a single RUN command which executes your scripts/commands, and also delete everything in the end. If you don’t know how you’re in luck, there’s an example downstairs.
For any packages, you install you can tell the package managers to not install the recommended packages, which could also save up more place.

For the example from the 4th comment, here it is:

RUN apt-get install python3 some-dev-package-for-compilation -y \
    && chmod +x my_bash_script.sh \
    && ./my_bash_scripts
    && apt remove some-dev-package-for-compilation -y

In Conclusion

Again as with any subject, we’ve seen only the simple features of Dockerfile, and believe me, there are a lot more.

We’ve seen that Dockerfile simplifies for us the process of building an image in multiple ways.

Moreover then simplifying the building process of the images, and preventing human-prone errors also gives a cooking recipe that allows understanding pretty fast what the image contains.

I’ve seen and also heard from a lot of friends, who are working with Docker today, writing a Dockerfile is not that easy and sometimes could get messy, so it’s an art you could say for creating a well documented Dockerfile that is well understood, but as with everything, the more the experience the better the results 😇 so don't be afraid to get your hands dirty by creating them yourselves some time.

If not then again I hope you had a great time reading this piece, and if you have any further questions I would be delighted to answer them.
Also, if you have any opinions or suggestions for improving this piece, I would like to hear :)

Thank you all for your time and I wish you a great journey!