In a recent LakeTide project, we wished to emulate a 12-server rack by setting up VMs on our beefy local workstations. The aim was to include one VM with 1 GTX 1080 and another with 4 GTX 1080s. Ultimately, the combination of Fedora, KVM, and nvidia-docker made deploying GPU accelerated workloads a snap! Here’s how we did it:

The first task is to install virtualization tools.
Then add yourself to the correct groups in order to run virtualization commands.
Finally make changes to support bridging your NIC so that VMs can access the network.

sudo dnf install qemu-kvm qemu-img libvirt virt-install
sudo usermod -G libvirt -a $(whoami)
sudo usermod -G kvm -a $(whoami)
echo "net.ipv4.ip_forward = 1"|sudo tee /etc/sysctl.d/99-ipforward.conf
sudo sysctl -p /etc/sysctl.d/99-ipforward.conf
sudo vi /etc/sysconfig/network-scripts/ifcfg-eno1
	TYPE="Ethernet"
	#BOOTPROTO="dhcp"
	DEVICE="eno1"
	BRIDGE=virbr0
sudo vi /etc/sysconfig/network-scripts/ifcfg-virbr0
	DEVICE="virbr0"
	TYPE=BRIDGE
	ONBOOT=yes
	BOOTPROTO="dhcp"

Next we need to load a kernel module called vfio-pci that maps memory regions from the PCI bus to the VM, and activate support for IOMMU groups. We also need to modify grub to load vfio-pci first so that framebuffer drivers, nouveau, nvidia, and others don’t grab it first while booting. After these modifications we need to commit the changes to grub and generate a new initrd image.

cat /etc/modprobe.d/vfio.conf
	options vfio-pci ids=10de:1b80,10de:10f0
	options vfio-pci disable_vga=1
cat /etc/defaults/grub
	[…]
	GRUB_CMDLINE_LINUX="intel_iommu=on iommu=pt rd.driver.pre=vfio-pci video=efifb:off"
sudo grub2-mkconfig -o /etc/grub2-efi.cfg
cat /etc/dracut.conf.d/vfio.conf
	add_drivers+="vfio vfio_iommu_type1 vfio_pci"
sudo dracut -f --kver `uname -r`

Reboot. Make sure the firmware on your motherboard is set up to make use of IOMMU. Now you can check that the driver loaded correctly and find what device number your GPU has.

lspci -nnk
        [...]
        02:00.0 VGA compatible controller [0300]: NVIDIA Corporation GP104 [GeForce GTX 1080] [10de:1b80] (rev a1)
        Subsystem: ASUSTeK Computer Inc. Device [1043:8591]
        Kernel driver in use: vfio-pci
        Kernel modules: nouveau, nvidia_drm, nvidia, vfio-pci
        [...]

Now we can create a VM pointing to device 02:00.0.
We specify kvm=off so that the nvidia driver running in the guest OS doesn’t know it’s running on a virtual machine. Nvidia does not support running consumer-grade cards in VMs. We also want to tell the VM to use EFI firmware and emulate the X99 (q35) chipset.
For this specific VM we specify 120G of RAM, 20 vCPUs, and a dedicated SSD for the VM to use. Lastly we pass some arguments so that the machine outputs to the serial console so we can quickly attach to the machine using a terminal if we need to troubleshoot.

virt-install --name leviathan --memory 120000 --vcpus 20 --network bridge=virbr0 --disk /dev/sdc,bus=virtio,sparse=false,device=disk --location /opt/qemu/Fedora.iso --graphics none --extra-args='console=ttyS0,115200' --host-device 02:00.0 -cpu host,kvm=off --boot uefi --machine q35
virsh console leviathan

Specify in the installer that you want to use a console install, rather than a graphical one.
With that step complete we can get to the fun part.
Download the Nvidia software at https://developer.nvidia.com/cuda-downloads

sudo dnf install kernel-devel kernel-headers gcc dkms acpid perl-Getopt-Long
echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist.conf
sed -e 's_quiet_quiet rd.driver.blacklist=nouveau_g' /etc/sysconfig/grub | sudo tee /etc/sysconfig/grub
sudo grub2-mkconfig -o /boot/efi/EFI/fedora/grub.cfg
sudo dracut /boot/initramfs-$(uname -r).img $(uname -r) --force
chmod 755 cuda_8*.run
sudo ./cuda_8*.run --override

Restart the VM. If everything went well, you should be able to see GPU(s) within the VM!

nvidia-smi
Wed Mar  1 18:06:34 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48                 Driver Version: 367.48                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 0000:05:00.0      On |                  N/A |
|100%   32C    P8     9W / 180W |     71MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 1080    Off  | 0000:06:00.0      On |                  N/A |
|100%   31C    P8     9W / 180W |     14MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 1080    Off  | 0000:09:00.0      On |                  N/A |
|100%   32C    P8     9W / 180W |     14MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 1080    Off  | 0000:0A:00.0      On |                  N/A |
|100%   29C    P8    13W / 180W |    730MiB /  8110MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0     20144    G   /usr/bin/X                                      67MiB |
|    1     20144    G   /usr/bin/X                                      10MiB |
|    2     20144    G   /usr/bin/X                                      10MiB |
|    3     15858    C   /usr/local/julia/bin/julia                     237MiB |
|    3     17352    C   /usr/local/julia/bin/julia                     479MiB |
|    3     20144    G   /usr/bin/X                                      10MiB |
+-----------------------------------------------------------------------------+

It’s worth noting that now when the GPUs are dedicated to the VM, they won’t be visible to the host, meaning you will need to use ssh to work on the host. Now that everything is set up, spin up some GPU accelerated code and watch it go!
Don’t know where to start? Check out ArrayFire or MXNet, and browse our blog posts so see neat examples of how to use them. To troubleshoot your GPU-passthrough setup further, the most comprehensive collection of tips and tricks can be found on the Arch Linux wiki.