YOUR DOCKER IMAGE
MIGHT BE BROKEN
without you knowing it


Learn the right way to build your Dockerfile.





So you're building a Docker image. What might be wrong with it?

You learned about Docker. It's awesome and you're excited. You go and create a Dockerfile:

FROM ubuntu:16.04

RUN apt-get install all_my_dependencies
ADD my_app_files /my_app

CMD ["/my_app/start.sh"]

Cool, it seems to work. Pretty easy, right?

Not so fast.

You just built a container which contains a minimal operating system, and which only runs your app. But the operating system inside the container is not configured correctly. A proper Unix system should run all kinds of important system services. You're not running them, you're only running your app.

"What do you mean? I'm just using Ubuntu in Docker. Doesn't the OS inside the container take care of everything automatically?"

Not quite. You have Ubuntu installed in Docker. The files are there. But that doesn't mean Ubuntu's running as it should.

When your Docker container starts, only the CMD command is run. The only processes that will be running inside the container is the CMD command, and all processes that it spawns. That's why all kinds of important system services are not run automatically – you have to run them yourself.

Furthermore, Ubuntu is not designed to be run inside Docker. Its init system, Upstart, assumes that it's running on either real hardware or virtualized hardware, but not inside a Docker container, which is a locked down environment with e.g. no direct access to many kernel resources. Normally, that's okay: inside a container you don't want to run Upstart anyway. You don't want a full system, you want a minimal system. But configuring that minimal system for use within a container has many strange corner cases that are hard to get right if you are not intimately familiar with the Unix system model. This can cause a lot of strange problems.

"What important system services am I missing?"

A correct init process

Main article: Docker and the PID 1 zombie reaping problem

Here's how the Unix process model works. When a system is started, the first process in the system is called the init process, with PID 1. The system halts when this processs halts. If you call CMD ["/my_app/start.sh"] in your Dockerfile, then start.sh is your init process.

But the init process has an extra responsibility. It inherits all orphaned child processes. It is expected that the init process reaps them.

Most likely, your init process is not doing that at all. As a result your container will become filled with zombie processes over time.

Furthermore, docker stop sends SIGTERM to the init process, which is then supposed to stop all services. If your init process is your app, then it'll probably only shut down itself, not all the other processes in the container. The kernel will then forcefully kill those other processes, not giving them a chance to gracefully shut down, potentially resulting in file corruption, stale temporary files, etc. You really want to shut down all your processes gracefully.

syslog

Syslog is the standard Unix logging service. A syslog daemon is necessary so that many services - including the kernel itself - can correctly log to /var/log/syslog. If no syslog daemon is running, a lot of important messages are silently swallowed. You don't want warnings and errors to be silently swallowed, do you?

The syslog daemon is not run automatically. You have to start it yourself.

cron

Many apps use cron services. But cron jobs never get run until the cron daemon is running in your container.

The cron daemon is not run automatically. You have to start it yourself.

SSH daemon (sometimes)

Occasionally, you may want to run a command inside the container for contingency reasons. For example you may want to debug your misbehaving app. docker exec provides a great way of doing this, but unfortunately there are a number of drawbacks. For example, users who run docker exec must have access to the Docker daemon, and that way they essentially have root access over the Docker host.

If that is problematic, then you should use SSH to log into the container instead. SSH has its own issues, like requiring key management, but that way you can prevent people from getting root access on the Docker host.

"Does all this apply too if I'm using CentOS inside the container, or another Linux distribution?"

Yes. The problem exist in those cases too.

"But I thought Docker is about running a single process in a container?"

Absolutely not true. Docker runs fine with multiple processes in a container. In fact, there is no technical reason why you should limit yourself to one process – it only makes things harder for you and breaks all kinds of essential system functionality, e.g. syslog.

We encourage you to use multiple processes.

Managing multiple processes can be painful, but it doesn't have to. We have a solution for that, so read on.

Getting everything right: baseimage-docker

Solving all the aforementioned problems is a huge pain. I'm sure you have better things to do than to worry about them. That's where baseimage-docker jumps in.

Baseimage-docker is a special Docker image that is configured for correct use within Docker containers. It is Ubuntu, plus:

  • Modifications for Docker-friendliness.
  • Administration tools that are especially useful in the context of Docker.
  • Mechanisms for easily running multiple processes, without violating the Docker philosophy.

Also, every single one of the aforementioned problems are taken care of for you.

You can use it as a base for your own Docker images. That means it's available for pulling from the Docker registry!

Github Docker registry Discussion forum Twitter Blog

Why use baseimage-docker?

You can configure the stock ubuntu image yourself from your Dockerfile, so why bother using baseimage-docker?

  • Stop reinventing the wheel.
    Configuring the base system for Docker-friendliness is no easy task. As stated before, there are many corner cases. By the time that you've gotten all that right, you've reinvented baseimage-docker. Using baseimage-docker will save you from this effort.
  • Reduce development time.
    It reduces the time needed to write a correct Dockerfile. You won't have to worry about the base system and can focus on your stack and your app.
  • Reduce building time.
    It reduces the time needed to run docker build, allowing you to iterate your Dockerfile more quickly.
  • Reduce deployment time.
    It reduces download time during redeploys. Docker only needs to download the base image once: during the first deploy. On every subsequent deploys, only the changes you make on top of the base image are downloaded.
  • What's included?

    A correct init process
    Baseimage-docker comes with an init process /sbin/my_init that reaps orphaned child processes correctly, and responds to SIGTERM correctly. This way your container won't become filled with zombie processes, and docker stop will work correctly.
    Fixes APT incompatibilities with Docker
    See Docker issue #1024.
    syslog-ng
    It runs a syslog daemon so that important system messages don't get lost.
    cron daemon
    It runs a cron daemon so that cronjobs work.
    SSH server

    Allows you to easily login to your container to inspect or administer things.

    SSH is only one of the methods provided by baseimage-docker for this purpose. The other method is through `docker exec`. SSH is also provided as an option because `docker exec` has issues.

    Password and challenge-response authentication are disabled by default. Only key authentication is allowed.

    The SSH daemon is disabled by default.

    runit

    Used for service supervision and management. Much easier to use than SysV init and supports restarting daemons when they crash. Much easier to use and more lightweight than Upstart.

    Baseimage-docker encourages you to run multiple processes through the use of runit.

    You might be familiar with supervisord. Runit (written in C) is much lighter weight than supervisord (written in Python).

    setuser
    A custom tool for running a command as another user. Easier to use than su, has a smaller attack vector than sudo, and unlike chpst this tool sets $HOME correctly. Available as /sbin/setuser.

    Despite all these components, baseimage-docker is extremely lightweight: it only consumes 6 MB of memory.

GETTING STARTED NOW

The image is called phusion/baseimage, and is available on the Docker registry.

Example Dockerfile:

# Use phusion/baseimage as base image. To make your builds
# reproducible, make sure you lock down to a specific version, not
# to `latest`! See
# https://github.com/phusion/baseimage-docker/blob/master/Changelog.md
# for a list of version numbers.
FROM phusion/baseimage:<VERSION>

# Use baseimage-docker's init system.
CMD ["/sbin/my_init"]

# ...put your own build instructions here...

# Clean up APT when done.
RUN apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

Adding additional daemons

You can add additional daemons (e.g. your own app) to the image by creating runit entries. You only have to write a small shell script which runs your daemon, and runit will keep it up and running for you, restarting it when it crashes, etc.

The shell script must be called run, must be executable, and is to be placed in the directory /etc/service/<NAME>.

Here's an example showing you how a memcached server runit entry can be made.

### In memcached.sh (make sure this file is chmod +x):
#!/bin/sh
# `/sbin/setuser memcache` runs the given command as the user `memcache`.
# If you omit that part, the command will be run as root.
exec /sbin/setuser memcache /usr/bin/memcached >>/var/log/memcached.log 2>&1

### In Dockerfile:
RUN mkdir /etc/service/memcached
ADD memcached.sh /etc/service/memcached/run

Note that the shell script must run the daemon without letting it daemonize/fork it. Usually, daemons provide a command line flag or a config file option for that.

More documentation

This website only covers the basics. Please refer to the Github repository for more documentation. Topics include:

Having problems? Want to participate in development? Please post a message at the discussion forum.