Guide to Using Proxmox, Containers, and GPU Pass Through for Machine Learning

04 Jan 2021

Setting up a home lab with GPU support can be helpful in learning machine learning and other technologies. While we could use virtual machines on Proxmox, we will focus on using containers with GPU pass through in this guide. Aside from making a few updates, this post has most of the details necessary to setup the home lab properly. I am going to detail my experience in setting up a lab with Nvidia 1070 Ti and Core i7 setup.

Installing Proxmox

To install Proxomox, you will need to create a USB key with the installation software. I downloaded Proxomox 6.3 and used a Mac to create the installation media.

Download Proxmox

You can download the latest version of Proxmox directly at this page.
Creating the USB key

I created the key on macOS Cantalina. For other OSes, you can find the instructions at the Proxmox wiki.
```
 hdiutil convert -format UDRW -o proxmox-ve_*.dmg proxmox-ve_*.iso # convert the proxmox image into proper format
```
After converting the image file, plug the USB key into the Mac and looking for the disk.
```
 # look for the external disk
 diskutil list # list all the disk attached to the computer
```
From the list of disks, there should be an external disk. We need to unmount it prior to writing the installation media.
```
 diskutil unmountDisk /dev/diskX # replace X with the number corresponding to the external disk
```
Now, you can create the install media using dd
```
 sudo dd if=proxmox-ve_*.dmg of=/dev/rdiskX bs=1m
```
Installing Proxmox
Installing Proxmox

Aside from a making a couple of choices on the file system and the address of the home lab, installation of Proxmox is straight forward.
Logging into Proxomox’s Web Interface

At this point, you can unplug the home lab box’s monitor and keyboard. The rest of the guide can be done via a terminal or Proxmox’s web interface.
```
 https://address.you.chose:8006
```
Port 8006 is the default port number if you did not change it during the installation process. On visiting the web interface from a browser, the server would prompt for user name and password. The default user name is root and the password would have been chosen during the install. If the page fails to load, you may have forgotten the s in https. Because the s is needed, your browser may give you are warning about the webpage being insecure. You may safely ignore it.

Configuring Proxomox for GPU Pass Through

Making GPU pass through work on Proxmox and containers is essentially a two step process:

1. configure the drivers on the server, and
2. configure the containers. 

Configure the Nvidia Drivers on the Server

Either through Promox web interface or login to the server directly via SSH, we need to have command line access to the server. Since Proxmox is based on Debian, we follow the steps outlined in the Debian wiki to install the drivers on Proxmox. You can find references to package repositories in the Proxmox wiki.

We need to add the following lines to /etc/apt/sources.list:

 # security updates
 deb http://security.debian.org buster/updates main contrib
    
 # PVE pve-no-subscription repository provided by proxmox.com,
 # NOT recommended for production use
 deb http://download.proxmox.com/debian buster pve-no-subscription
    
 # buster-backports
 deb http://deb.debian.org/debian buster-backports main contrib non-free

I used Proxmox 6.3 in writing this guide and it is based on Debian 10. If you are using a different version of Proxmox, it might be based off a different Debian version. In that case, you need to change the word buster to the corresponding version.

Update all the packages to the latest versions and reboot

 apt-get update
 apt-get dist-upgrade
 shutdown -r now

We need to verify the kernel version and the corresponding headers with

 uname -r
 apt-cache search pve-header

For me, I got this output

 root@machinename:~# uname -r
 5.4.78-2-pve
 root@machinename:~# apt-cache search pve-header
 pve-headers-5.0.12-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.15-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.18-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.21-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.21-2-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.21-3-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.21-4-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.21-5-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.8-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0.8-2-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.0 - Latest Proxmox VE Kernel Headers
 pve-headers-5.3.1-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.10-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.13-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.13-2-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.13-3-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.18-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.18-2-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.18-3-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3.7-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.3 - Latest Proxmox VE Kernel Headers
 pve-headers-5.4.22-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.24-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.27-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.30-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.34-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.41-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.44-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.44-2-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.55-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.60-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.65-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.73-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.78-1-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4.78-2-pve - The Proxmox PVE Kernel Headers
 pve-headers-5.4 - Latest Proxmox VE Kernel Headers
 pve-headers - Default Proxmox VE Kernel Headers

We can install the proper version with

 apt-get install pve-headers-5.4.78-2-pve

and install Nvidia drivers with

 apt-get install -t buster-backports nvidia-driver

Like I mentioned before, you may need to change the version number if you are installing with a different version of Proxmox.

You can install some tools to go along with the driver:

 apt-get install i7z nvidia-smi htop iotop

If you check /dev now, there should be some Nvidia related files:

 root@machinename:~# ls -alh /dev/nvid*
 crw-rw-rw- 1 root root 195, 254 Dec 27 01:16 /dev/nvidia-modeset
 crw-rw-rw- 1 root root 235,   0 Dec 27 01:16 /dev/nvidia-uvm
 crw-rw-rw- 1 root root 235,   1 Dec 27 01:16 /dev/nvidia-uvm-tools
 crw-rw-rw- 1 root root 195,   0 Dec 27 01:16 /dev/nvidia0
 crw-rw-rw- 1 root root 195, 255 Dec 27 01:16 /dev/nvidiactl

To ensure that these drivers are loaded at boot time, you need to edit /etc/modules-load.d/modules.conf with your favourite editor and add

 # /etc/modules: kernel modules to load at boot time.
 #
 # This file contains the names of kernel modules that should be loaded
 # at boot time, one per line. Lines beginning with "#" are ignored.

 nvidia
 nvidia_uvm

Because nvidia and nvidia_uvm are not automatically created until X-server or nvidia-smi is called, we need to add the following lines to /etc/udev/rules.d/70-nvidia.rules:

 # /etc/udev/rules.d/70-nvidia.rules
 # Create /nvidia0, /dev/nvidia1 … and /nvidiactl when nvidia module is loaded
     KERNEL==”nvidia”, RUN+=”/bin/bash -c ‘/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*’”
 # Create the CUDA node when nvidia_uvm CUDA module is loaded
     KERNEL==”nvidia_uvm”, RUN+=”/bin/bash -c ‘/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*’”

Now, reboot the server with shutdown -r now and check if everything worked with nvidia-smi in a new command line.

Configure the Containers:

Find the container ID in Proxmox’s web interface and then edit the corresponding file at /etc/pve/lxc/. I am using an Ubuntu container with ID 100 so here’s my config file

 #Ubuntu 20.04 with GPU passthrough
 arch: amd64
 cores: 10
 hostname: CT100
 memory: 16384
 net0: name=eth0,bridge=vmbr0,hwaddr=AA:98:43:03:D4:41,ip=dhcp,ip6=dhcp,type=veth
 ostype: ubuntu
 rootfs: local-zfs:basevol-100-disk-1,size=30G
 swap: 16384
 template: 1
 unprivileged: 1

 # GPU passthrough configs
 lxc.cgroup.devices.allow: c 195:* rwm
 lxc.cgroup.devices.allow: c 243:* rwm
 lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
 lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
 lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
 lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
 lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Reboot the container for the settings to take effect. After the reboot check to see if the configurations worked with the following commands and outputs:

 # nvidia-smi
 Tue Jan  5 02:57:07 2021
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  GeForce GTX 107...  Off  | 00000000:01:00.0  On |                  N/A |
 |  0%   45C    P8     8W / 180W |      1MiB /  8116MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+

 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 |  No running processes found                                                 |
 +-----------------------------------------------------------------------------+

and

 # ls /dev/nvidia* -l
 -rw-r — r — 1 root root 0 16.01.2017 20:11 /dev/nvidia-modeset
 crw-rw-rw- 1 nobody nobody 243, 0 16.01.2017 20:05 /dev/nvidia-uvm
 -rw-r — r — 1 root root 0 16.01.2017 20:11 /dev/nvidia-uvm-tools
 crw-rw-rw- 1 nobody nobody 195, 0 16.01.2017 20:05 /dev/nvidia0
 crw-rw-rw- 1 nobody nobody 195, 255 16.01.2017 20:05 /dev/nvidiactl

Creating Snapshots and Templates

After all this hard work, you should save your progress on the container by creating a snapshot. If you chose ZFS as the base file system, this would be an option in the web interface under the specific container’s menu.

You could also make the new GPU enabled container a template as a basis for other containers. The option of making that cointainer a template can be found via right-clicking on the name of the container.

Final Thoughts

That’s it. You are done. You can now move to more important work of using the home lab for learning or actual work. Good luck and let me know the cool projects you do with the GPU enabled Proxmox home lab.

[ proxmox, GPU, machine-learning, container, pyenv, virtualenv, home-lab ]

JADSB Just another data science blog

Guide to Using Proxmox, Containers, and GPU Pass Through for Machine Learning

Installing Proxmox

Installing Proxmox

Configuring Proxomox for GPU Pass Through

Creating Snapshots and Templates

Final Thoughts

Related posts

Programming in C# for People Who Like Vim 24 Jan 2021

Adding SSH Key to Remote Server 03 Jan 2021

Finding a Data Set 02 May 2020