Dockerfiles are the defining point of what happens in the build process of a Docker image, and afterward how it will run.
In most cases, we can deal with a simple Docker image that simply takes a binary and runs it, but in some cases, we aren’t.
We can have a Docker container that runs multiple apps, running additional applications for managing the internal logs or configurations, configuring additional debugging tools, and many more …
Basically, it can get pretty complicated in time.
So a question that might be asked, how can we simplify the flow as far as possible so the Dockerfile for the build and also for the running phase of it, will be understandable when we get back to it after a while, and also make it easy for extending to new features?
Well, the first response is that it’s not that easy at first, and the reason behind it is quite simple.
We aren’t making deployment environments and configurations each day, which is also related to a Docker container. Because of that, we might not think of what happens in a few cases that someone uses that container in a few ways that we didn’t expect to.
Therefore, when creating a Dockerfile, we should try thinking about how the image would be built regarding time, efficiency, not duplicated flows or use of files, also thinking on how it will run on deployment environments on the cloud and also on running them in local environments like Windows laptops.
After a big project of mine in Docker, I’ve had from time to time talks with friends who requested few tips regarding building a Dockerfile that will help deploy containers efficiently and would be easy to scale or test in any manner.
Not that I’m an expert or anything but I wanted to share my own personal insights after seeing few things that really gave me headaches sometimes, and from all of that this post was made.
In this post, we’re going to cover a few best practices we can follow in our Dockerfile, allowing us to better maintain them, test them easily not necessarily in the same environment, find errors easily, and easily extend them, and more …
Some may seem very intuitive and some even like foolish to mention, but from experience, some of them might not come to mind, unfortunately.
I hope you’re ready so let’s go
Show Me the Money First
If you want the bullet points and not deep explanations then you can see them here, and if not skip to the next part.
- For CI/CD pipelines mainly, pass the environment(Dev, QA, Prod, etc..) in which the image is supposed to run. This way for every environment, an environment variable(ENV Command) can be declared on the top section of the Dockerfile with the default value of the passed argument, and can also be altered if wished-for testings purposes.
- Don’t do environment-specific stuff(compilation, build, configuration, etc..) in the build phase of the image, because that will make the image to be run in specific requirements, so use your ENV variables carefully ;)
- Make your image to be small by not leaving unneeded files in the image. There’s a whole new post for this subject so I highly suggest you check it out.
- Make an entrypoint script and use it using the ENTRYPOINT command so it will be the main point when anyone wishes to understand what is going on inside the container.
- Use scripts with separation of concerns and print out steps being done in the Container, but make sure to give them variables.
Our CI Building Procedure
Today we mostly compile/build/deploy our applications from a CI/CD pipeline. I do know that there are of course some cases that aren’t with a fully functional CI/CD pipeline but I guess in those cases, the images are being built by someone manually.
In our CI/CD jobs we usually know which environment the flow was started for(Dev, QA, Prod, etc..) and we can use that for giving an argument by using the ARG keyword for notifying which environment is supposed to be used by default.
This argument can be passed from the ARG to an ENV variable so it could be overridden if wished when running a container out of that image.
This allows multiple stuff to be done. For starters it allows us to know when we run a container out of that image to know for which relative environment it needs to be configured. For example, we can download configurations or do some procedures that are environment-specific in the runtime of the container, and we don’t need to specify the ENV when we run the container.
Something that should be taken care of is not building the image for the specific environment. This means that configurations or anything else should not be decided on the build phase but the container phase by using the environment variable.
Why do you ask?
When we would build the image that would be coupled to the environment, it makes that image usable only for that specific environment, and not making it easily runnable locally or another environment for testing purposes like finding bugs for example in our development/deployment cycles.
Just for an example, you have a configuration of a URL which you know your container goes to with an HTTP to receive some data, but you have this URL be passed as an argument.
So what we understand from this section is when building an image try to do the following:
- In the case of CI/CD — Make sure to pass the ARG that states which environment that image is being built for, and pass it to an ENV variable so it could be overridden for testing purposes.
- Don’t make the build of the image be coupled to specific environments or configurations.
Configuration, Configuration, and Configuration…
If you have multiple configuration files for single or multiple applications then it is highly suggestible that you would have a template/base form of them, copy them in the build phase to the docker image, and replace the values in the build or runtime of the image.
If the configurations are environment-specific then like before do it in runtime and if not, do it in the build phase.
This will simply allow understanding better of the flow of which configuration files are there in the Docker image/container.
Another important aspect is making a container as configurable as possible. So I highly encourage you to make as many ENV variables as possible and use them all to replace values when you complete your configurations.
For example:
As you can see from the image, we replace pre-defined patterns in places we wish to replace with actual configurations that can be overridden easily in the docker run command by specific environment variables values.
In the end, allowing for each configuration to be configurable outside, will allow easier changes for our containers. Of course, there’s no need for all possible configurations out there because the Dockerfile will be huge, but try to find the weak spots that will need to be updated frequently or wished to be updated somehow.
Another option is to have a secured storage location like AWS S3 for example, have their configurations separated for each environment, and download them when you run your container.
You can also combine the two by downloading configurations that might not change, and also update them in runtime if needed, so it’s for your personal needs and choices :)
Optimal Image Size & Layers Utilizations
This subject for itself can have a post if not several posts for itself.
I do wish to say in the manner that you should strive for making the image as small as possible, but not at any cost or even worse at the cost of other important features.
We mostly today work on cloud providers like AWS/GCP/Azure and many more … They all have high connectivity environments so if the image is 400MB or 800MB wouldn’t make that a difference, but if the difference is 400MB between 2GB+ then it may be due to inefficient build phase for the image.
I do remember a talk with a friend that only if you work on on-premise deployments that don’t have that high-speed connectivity then you really should strive for the best practices in small images, and also efficient build/pull for them by managing your image layers efficient as possible.
So utilizing our layers when we build the images or downloading them can save a lot of time in these flows, and shorten the time windows for our workflows.
But if we still want to have some kind of cookbook for the manner is to do the following:
- Don’t install any recommended packages for your installed application because in most cases you wouldn’t need them.
- Every package you don’t need at runtime, you can remove it after you’re done building your image, but make sure that the layer in which you install the packages/applications you’re also removing them.
- Don’t copy huge files with COPY and then try to remove them later, because a layer of that file will be always present. The solution is to download it in a RUN command, and remove it at the end of it.
Don’t worry, there would be a post especially for this subject for you, so keep in touch for updates :)
Be a Script Master
In Dockerfile we can run commands that basically allow executing bash commands inside the Docker image that we build.
As with time that our needs grow those commands might not necessarily be that well understood for everyone, and especially the necessity of them.
Therefore, it’s highly suggestible that for future purposes of documentation or ease of updating/adding stuff for relative applications, having scripts that have separation of concerns regarding what they do, and split to functions for easier understanding happens inside.
Unfortunately, with scripts there could be downfalls regarding image layers that wouldn’t be reused properly if not called separately with the RUN command, for example, so be careful with this one so you will re-use built image layers, or put the script commands in a single RUN command for example.
But one more crucial thing is that scripts can’t be detected if they’ve changed, so try passing parameters in the execution of them, so if the execution command parameters wouldn’t change, the layer wouldn’t be re-created.
Another thing is using an entrypoint script instead of executing a command at the end of the Dockerfile.
I can’t even tell you how that many problems it has saved me thus far. This allows for the entrypoint script to be the base understanding of what is being run inside the container, and especially if it’s being split by functions.
It simply allows for ease of reading and better understanding as you understand, so being clean and organized can help a lot here :)
Also, don’t be afraid of printing stuff!
This not only would allow you to debug the image when it’s being built locally or on your CI/CD pipeline, but also understand or even re-understand what is going on inside regarding the flow of starting a container when you’re using it as a solution for debugging multiple services for example.
Love the Generics
When dealing with a simple Dockerfile that simply runs a single app with a configuration file that it loads, and that’s it, this part may not be that special for you.
But for the heavy Docker images, some might have 3rd party applications and/or multiple configuration files, which sometimes can be having “static configurations” and not being able to change them unless entering the container, and updating them.
So a cool option for debugging cases is having variables that the Docker container can receive, and by doing so, it let us customize the way the container is making himself ready for use.
3rd Party Apps
This is a short note but in case you use 3rd party applications, make sure to not use the default master branch in case you take them from git.
Make sure to also have an environment variable that can be configured externally, which will allow deciding which 3rd applications versions you wish to have inside your Docker image from the first place.
It’s mainly related to the first points of using ENV variables and make everything generic, but sometimes there’s the thought that we will not update the 3rd party applications that much, so we say like “We will update the Dockerfile value of the commit id or branch we use”, so trust me and use an ENV for configuring that commit id.
In Conclusion
This was a post with a few best practices of how to build, configure and deploy our Docker containers so it would be easier for us regarding development cycles, analysis, and many more cases that happen all the time.
Of course, these are my personal best practices that I’ve come thus far, and I guess there might be even more, so feel free to continue searching online because there are always different insights for different people on any matter.
Again, if you do wish for me to make an extension with detailed examples, or even add your own then feel free to contact me about this manner.
I hope you had a great time reading this piece, and if you have any further questions I would be delighted to answer them.
Also, if you have any opinions or suggestions for improving this piece, I would like to hear :)
Thank you all for your time and I wish you a great journey!