Guide to Using Proxmox, Containers, and GPU Pass Through for Machine Learning
04 Jan 2021Setting up a home lab with GPU support can be helpful in learning machine learning and other technologies. While we could use virtual machines on Proxmox, we will focus on using containers with GPU pass through in this guide. Aside from making a few updates, this post has most of the details necessary to setup the home lab properly. I am going to detail my experience in setting up a lab with Nvidia 1070 Ti and Core i7 setup.
Installing Proxmox
To install Proxomox, you will need to create a USB key with the installation software. I downloaded Proxomox 6.3 and used a Mac to create the installation media.
-
Download Proxmox
You can download the latest version of Proxmox directly at this page.
-
Creating the USB key
I created the key on macOS Cantalina. For other OSes, you can find the instructions at the Proxmox wiki.
hdiutil convert -format UDRW -o proxmox-ve_*.dmg proxmox-ve_*.iso # convert the proxmox image into proper format
After converting the image file, plug the USB key into the Mac and looking for the disk.
# look for the external disk diskutil list # list all the disk attached to the computer
From the list of disks, there should be an external disk. We need to unmount it prior to writing the installation media.
diskutil unmountDisk /dev/diskX # replace X with the number corresponding to the external disk
Now, you can create the install media using dd
sudo dd if=proxmox-ve_*.dmg of=/dev/rdiskX bs=1m
Installing Proxmox
-
Installing Proxmox
Aside from a making a couple of choices on the file system and the address of the home lab, installation of Proxmox is straight forward.
-
Logging into Proxomox’s Web Interface
At this point, you can unplug the home lab box’s monitor and keyboard. The rest of the guide can be done via a terminal or Proxmox’s web interface.
https://address.you.chose:8006
Port 8006 is the default port number if you did not change it during the installation process. On visiting the web interface from a browser, the server would prompt for user name and password. The default user name is root and the password would have been chosen during the install. If the page fails to load, you may have forgotten the
s
inhttps
. Because thes
is needed, your browser may give you are warning about the webpage being insecure. You may safely ignore it.
Configuring Proxomox for GPU Pass Through
Making GPU pass through work on Proxmox and containers is essentially a two step process:
1. configure the drivers on the server, and
2. configure the containers.
-
Configure the Nvidia Drivers on the Server
Either through Promox web interface or login to the server directly via SSH, we need to have command line access to the server. Since Proxmox is based on Debian, we follow the steps outlined in the Debian wiki to install the drivers on Proxmox. You can find references to package repositories in the Proxmox wiki.
We need to add the following lines to
/etc/apt/sources.list
:# security updates deb http://security.debian.org buster/updates main contrib # PVE pve-no-subscription repository provided by proxmox.com, # NOT recommended for production use deb http://download.proxmox.com/debian buster pve-no-subscription # buster-backports deb http://deb.debian.org/debian buster-backports main contrib non-free
I used Proxmox 6.3 in writing this guide and it is based on Debian 10. If you are using a different version of Proxmox, it might be based off a different Debian version. In that case, you need to change the word
buster
to the corresponding version.Update all the packages to the latest versions and reboot
apt-get update apt-get dist-upgrade shutdown -r now
We need to verify the kernel version and the corresponding headers with
uname -r apt-cache search pve-header
For me, I got this output
root@machinename:~# uname -r 5.4.78-2-pve root@machinename:~# apt-cache search pve-header pve-headers-5.0.12-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.15-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.18-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.21-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.21-2-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.21-3-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.21-4-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.21-5-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.8-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.0.8-2-pve - The Proxmox PVE Kernel Headers pve-headers-5.0 - Latest Proxmox VE Kernel Headers pve-headers-5.3.1-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.10-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.13-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.13-2-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.13-3-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.18-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.18-2-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.18-3-pve - The Proxmox PVE Kernel Headers pve-headers-5.3.7-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.3 - Latest Proxmox VE Kernel Headers pve-headers-5.4.22-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.24-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.27-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.30-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.34-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.41-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.44-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.44-2-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.55-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.60-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.65-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.73-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.78-1-pve - The Proxmox PVE Kernel Headers pve-headers-5.4.78-2-pve - The Proxmox PVE Kernel Headers pve-headers-5.4 - Latest Proxmox VE Kernel Headers pve-headers - Default Proxmox VE Kernel Headers
We can install the proper version with
apt-get install pve-headers-5.4.78-2-pve
and install Nvidia drivers with
apt-get install -t buster-backports nvidia-driver
Like I mentioned before, you may need to change the version number if you are installing with a different version of Proxmox.
You can install some tools to go along with the driver:
apt-get install i7z nvidia-smi htop iotop
If you check
/dev
now, there should be some Nvidia related files:root@machinename:~# ls -alh /dev/nvid* crw-rw-rw- 1 root root 195, 254 Dec 27 01:16 /dev/nvidia-modeset crw-rw-rw- 1 root root 235, 0 Dec 27 01:16 /dev/nvidia-uvm crw-rw-rw- 1 root root 235, 1 Dec 27 01:16 /dev/nvidia-uvm-tools crw-rw-rw- 1 root root 195, 0 Dec 27 01:16 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Dec 27 01:16 /dev/nvidiactl
To ensure that these drivers are loaded at boot time, you need to edit
/etc/modules-load.d/modules.conf
with your favourite editor and add# /etc/modules: kernel modules to load at boot time. # # This file contains the names of kernel modules that should be loaded # at boot time, one per line. Lines beginning with "#" are ignored. nvidia nvidia_uvm
Because
nvidia
andnvidia_uvm
are not automatically created until X-server or nvidia-smi is called, we need to add the following lines to/etc/udev/rules.d/70-nvidia.rules
:# /etc/udev/rules.d/70-nvidia.rules # Create /nvidia0, /dev/nvidia1 … and /nvidiactl when nvidia module is loaded KERNEL==”nvidia”, RUN+=”/bin/bash -c ‘/usr/bin/nvidia-smi -L && /bin/chmod 666 /dev/nvidia*’” # Create the CUDA node when nvidia_uvm CUDA module is loaded KERNEL==”nvidia_uvm”, RUN+=”/bin/bash -c ‘/usr/bin/nvidia-modprobe -c0 -u && /bin/chmod 0666 /dev/nvidia-uvm*’”
Now, reboot the server with
shutdown -r now
and check if everything worked withnvidia-smi
in a new command line. -
Configure the Containers:
Find the container ID in Proxmox’s web interface and then edit the corresponding file at
/etc/pve/lxc/
. I am using an Ubuntu container with ID 100 so here’s my config file#Ubuntu 20.04 with GPU passthrough arch: amd64 cores: 10 hostname: CT100 memory: 16384 net0: name=eth0,bridge=vmbr0,hwaddr=AA:98:43:03:D4:41,ip=dhcp,ip6=dhcp,type=veth ostype: ubuntu rootfs: local-zfs:basevol-100-disk-1,size=30G swap: 16384 template: 1 unprivileged: 1 # GPU passthrough configs lxc.cgroup.devices.allow: c 195:* rwm lxc.cgroup.devices.allow: c 243:* rwm lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file
Reboot the container for the settings to take effect. After the reboot check to see if the configurations worked with the following commands and outputs:
# nvidia-smi Tue Jan 5 02:57:07 2021 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce GTX 107... Off | 00000000:01:00.0 On | N/A | | 0% 45C P8 8W / 180W | 1MiB / 8116MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
and
# ls /dev/nvidia* -l -rw-r — r — 1 root root 0 16.01.2017 20:11 /dev/nvidia-modeset crw-rw-rw- 1 nobody nobody 243, 0 16.01.2017 20:05 /dev/nvidia-uvm -rw-r — r — 1 root root 0 16.01.2017 20:11 /dev/nvidia-uvm-tools crw-rw-rw- 1 nobody nobody 195, 0 16.01.2017 20:05 /dev/nvidia0 crw-rw-rw- 1 nobody nobody 195, 255 16.01.2017 20:05 /dev/nvidiactl
Creating Snapshots and Templates
After all this hard work, you should save your progress on the container by creating a snapshot. If you chose ZFS as the base file system, this would be an option in the web interface under the specific container’s menu.
You could also make the new GPU enabled container a template as a basis for other containers. The option of making that cointainer a template can be found via right-clicking on the name of the container.
Final Thoughts
That’s it. You are done. You can now move to more important work of using the home lab for learning or actual work. Good luck and let me know the cool projects you do with the GPU enabled Proxmox home lab.