This article shows how to access GPUs from Docker Swarm services. In essence, we need to do two things:
- Set the nodes in the cluster to advertise their GPUs as Docker generic resources;
- Have the service specify the constraint that it needs GPU resources.
Once these are both in place, the swarm orchestrator can automatically allocate services that need GPUs to nodes that have GPUs, without us needing to manually place tasks on specific nodes. Yay!
However, please note that only one Docker service replica can be assigned to a given GPU; there is no time-sharing between services on a single node. Practically this means you need at least as many nodes with GPUs as tasks that require them. If you have 5 nodes with GPUs and start 6 replicas of your service, one replica will stay pending due to lack of resources.
This article assumes you are already familiar with a number of concepts. Here are some resources for more background information:
- A nice introduction to Docker images and containers;
- A tutorial on Docker Swarm and service creation;
- An introduction to specifying constraints on Docker services.
Why GPUs and Docker Swarm?
Why might you want to access GPUs from Docker Swarm services? For this article I’ll assume that you want to rapidly train a lot of neural networks using Apache Spark. We can use Docker Swarm to manage our Spark cluster, deploying the Spark master on one node and replicating the Spark workers across the remaining nodes. With this architecture, we can direct each worker to train a single network, and use the GPU on a given worker node to speed up the training time.
Accessing the GPU from your own software
Before we get to Spark workers and Docker services, we need to ensure that our neural network training code can access the GPU in the first place. The nodes in the cluster should have an nVidia GPU (e.g. AWS EC2 instances starting with p), and the nVidia CUDA toolkit installed.
You also need a framework for designing and training neural networks such as Tensorflow or Theano, or the higher-level wrapper Keras. If installing these Python packages yourself make sure to install the GPU-enabled versions, e.g.:
If running on EC2, Amazon provides an AMI for their GPU-enabled nodes that comes with CUDA, Tensorflow, and Python already installed.
Now your Keras or Tensorflow neural network program should run on the GPU!
Accessing the GPU from a Docker container
Containers are great for abstracting away the details of the native system that we’re running on, but the GPU is one of the details that gets abstracted away! In order for a Docker container to access the GPU, we need to use nvidia-docker
instead of docker
to run containers.
On Linux we install nvidia-docker
through the package manager in the usual way (e.g. apt-get
). Then launching a container becomes:
If we launch our Keras program in this container, it will run on the GPU!
However, originally nvidia-docker
didn’t support Docker Swarm. This meant that Spark workers couldn’t be replicated across nodes in a cluster. The work-around was to manually allocate a Spark worker to a specific node by issuing an nvidia-docker run
command on that node, instead of issuing a service create --replicas
request to the swarm manager. It gets the job done, but it misses all the nice benefits of orchestration.
In December 2017 nvidia-docker2
was released which supports Docker Swarm. Yay! The rest of this article draws from a GitHub comment explaining how to use nvidia-docker
with Docker Swarm from January this year. If you previously had nvidia-docker
installed, you need to uninstall it and change to nvidia-docker2
for swarm support. For example:
Accessing the GPU from a Docker service
So how do we get Docker services to use the GPU? Well, in addition to the requirements above (CUDA, keras-gpu
, nvidia-docker2
) we need to do three more things:
- Configure the Docker daemon on each node to advertises its GPU
- Make the Docker daemon on each node default to using
nvidia-docker
- Add a constraint to our Docker service specifying that it needs a GPU
Once we take these steps, the orchestrator will be able to see which nodes have GPUs and which services require them, and deploy our services accordingly!
Configuring the Docker daemon
The first step is to find the identifier of the GPU on a specific node, so we can pass it to the daemon later. We find it and store it in an environment variable with this command:
What this is doing is running nvidia-smi -a
, finding the line containing ‘UUID’, then extracting the first 12 characters of the 4th column of this line. You can see an example of the output of nvidia-smi -a
in the comment here. Line 19 contains the UUID; columns 1, 2, and 3 are ‘GPU’, ‘UUID’, and ‘:’ respectively. The first 12 characters of column 4 should be enough to uniquely identify this GPU.
If we echo $GPU_ID
, we can see it looks something like GPU-c143e771
or GPU-c5c84263
.
Docker is launched and managed as a service through systemd. We can change its default behaviour by adding an override file, called /etc/systemd/system/docker.service.d/override.conf
.
This file should contain the following lines:
Note: the second line is essential, because it clears any previously set ExecStart
commands. You’ll get an error if this is missing.
What is the third line doing? Three things: it’s saying that when we start the Docker daemon we want the default runtime to be nvidia-docker
(instead of docker
), and that this node provides a generic resource of type gpu
. (The name gpu
could be anything, but it should be the same thing across all the nodes in our cluster so that the orchestrator sees which nodes offer the same resource type.) Finally, it’s saying that on this specific node, the generic gpu
resource has the identifier we previously stored in $GPU_ID
.
Next, we modify the file /etc/nvidia-container-runtime/config.toml
to allow the GPU to be advertised as a swarm resource. Uncomment or add the following line to this file:
After taking these three steps, we need to reload the Docker daemon (to pick up the new configuration override file), and start it:
Scripting these steps
It’s a bit tedious to manually take these steps on every node in our cluster. They can be scripted as follows:
Adding a service constraint
Now our cluster nodes are advertising to the swarm that they offer access to a GPU. The final step is to ensure that the service requests a GPU. We do this by adding to the Docker service create command --generic-resource "gpu=1"
. The full command looks something like this:
The name of the generic resource being requested (gpu
here) should match the name of the resource being advertised by the nodes.
Congratulations! The Docker swarm orchestrator will now distribute your Spark workers onto nodes with GPU capability.