As someone working in the mobile gaming industry, I wanted to learn more about the nuts and bolts of game making. To get hands-on experience, I am taking C# Programming for Unity Game Development from . The first step in learning any programming language is to setup the development environment. The course instructor suggested using either Visual Studio or MonoDevelop, but I am a fan of Vim. In this post, I will document my initial experience in programming with C# from the terminal on macOS.
So far, writing C# code on macOS seems straightforward. Though, the course involves using Unity. I might end up needing the functionalities in Visual Studio later.
Setting up a home lab with GPU support can be helpful in learning machine learning and other technologies. While we could use virtual machines on Proxmox, we will focus on using containers with GPU pass through in this guide. Aside from making a few updates, this post has most of the details necessary to setup the home lab properly. I am going to detail my experience in setting up a lab with Nvidia 1070 Ti and Core i7 setup.
Installing Proxmox
To install Proxomox, you will need to create a USB key with the installation software. I downloaded Proxomox 6.3 and used a Mac to create the installation media.
Download Proxmox
You can download the latest version of Proxmox directly at this page.
Creating the USB key
I created the key on macOS Cantalina. For other OSes, you can find the instructions at the Proxmox wiki.
hdiutil convert -format UDRW -o proxmox-ve_*.dmg proxmox-ve_*.iso # convert the proxmox image into proper format
After converting the image file, plug the USB key into the Mac and looking for the disk.
# look for the external disk
diskutil list # list all the disk attached to the computer
From the list of disks, there should be an external disk. We need to unmount it prior to writing the installation media.
diskutil unmountDisk /dev/diskX # replace X with the number corresponding to the external disk
Aside from a making a couple of choices on the file system and the address of the home lab, installation of Proxmox is straight forward.
Logging into Proxomox’s Web Interface
At this point, you can unplug the home lab box’s monitor and keyboard. The rest of the guide can be done via a terminal or Proxmox’s web interface.
https://address.you.chose:8006
Port 8006 is the default port number if you did not change it during the installation process. On visiting the web interface from a browser, the server would prompt for user name and password. The default user name is root and the password would have been chosen during the install.
If the page fails to load, you may have forgotten the s in https. Because the s is needed, your browser may give you are warning about the webpage being insecure. You may safely ignore it.
Configuring Proxomox for GPU Pass Through
Making GPU pass through work on Proxmox and containers is essentially a two step process:
1. configure the drivers on the server, and
2. configure the containers.
Configure the Nvidia Drivers on the Server
Either through Promox web interface or login to the server directly via SSH, we need to have command line access to the server. Since Proxmox is based on Debian, we follow the steps outlined in the Debian wiki to install the drivers on Proxmox. You can find references to package repositories in the Proxmox wiki.
We need to add the following lines to /etc/apt/sources.list:
# security updates
deb http://security.debian.org buster/updates main contrib
# PVE pve-no-subscription repository provided by proxmox.com,# NOT recommended for production use
deb http://download.proxmox.com/debian buster pve-no-subscription
# buster-backports
deb http://deb.debian.org/debian buster-backports main contrib non-free
I used Proxmox 6.3 in writing this guide and it is based on Debian 10. If you are using a different version of Proxmox, it might be based off a different Debian version. In that case, you need to change the word buster to the corresponding version.
Update all the packages to the latest versions and reboot
apt-get update
apt-get dist-upgrade
shutdown -r now
We need to verify the kernel version and the corresponding headers with
uname-r
apt-cache search pve-header
For me, I got this output
root@machinename:~# uname-r
5.4.78-2-pve
root@machinename:~# apt-cache search pve-header
pve-headers-5.0.12-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.15-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.18-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.21-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.21-2-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.21-3-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.21-4-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.21-5-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.8-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0.8-2-pve - The Proxmox PVE Kernel Headers
pve-headers-5.0 - Latest Proxmox VE Kernel Headers
pve-headers-5.3.1-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.10-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.13-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.13-2-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.13-3-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.18-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.18-2-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.18-3-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3.7-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.3 - Latest Proxmox VE Kernel Headers
pve-headers-5.4.22-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.24-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.27-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.30-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.34-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.41-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.44-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.44-2-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.55-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.60-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.65-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.73-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.78-1-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4.78-2-pve - The Proxmox PVE Kernel Headers
pve-headers-5.4 - Latest Proxmox VE Kernel Headers
pve-headers - Default Proxmox VE Kernel Headers
We can install the proper version with
apt-get install pve-headers-5.4.78-2-pve
and install Nvidia drivers with
apt-get install-t buster-backports nvidia-driver
Like I mentioned before, you may need to change the version number if you are installing with a different version of Proxmox.
You can install some tools to go along with the driver:
apt-get install i7z nvidia-smi htop iotop
If you check /dev now, there should be some Nvidia related files:
To ensure that these drivers are loaded at boot time, you need to edit /etc/modules-load.d/modules.conf with your favourite editor and add
# /etc/modules: kernel modules to load at boot time.## This file contains the names of kernel modules that should be loaded# at boot time, one per line. Lines beginning with "#" are ignored.
nvidia
nvidia_uvm
Because nvidia and nvidia_uvm are not automatically created until X-server or nvidia-smi is called, we need to add the following lines to /etc/udev/rules.d/70-nvidia.rules:
# /etc/udev/rules.d/70-nvidia.rules# Create /nvidia0, /dev/nvidia1 … and /nvidiactl when nvidia module is loadedKERNEL==”nvidia”, RUN+=”/bin/bash -c ‘/usr/bin/nvidia-smi -L&& /bin/chmod 666 /dev/nvidia*’”
# Create the CUDA node when nvidia_uvm CUDA module is loadedKERNEL==”nvidia_uvm”, RUN+=”/bin/bash -c ‘/usr/bin/nvidia-modprobe -c0-u&& /bin/chmod 0666 /dev/nvidia-uvm*’”
Now, reboot the server with shutdown -r now and check if everything worked with nvidia-smi in a new command line.
Configure the Containers:
Find the container ID in Proxmox’s web interface and then edit the corresponding file at /etc/pve/lxc/. I am using an Ubuntu container with ID 100 so here’s my config file
Reboot the container for the settings to take effect. After the reboot check to see if the configurations worked with the following commands and outputs:
# nvidia-smi
Tue Jan 5 02:57:07 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce GTX 107... Off | 00000000:01:00.0 On | N/A |
| 0% 45C P8 8W / 180W | 1MiB / 8116MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
After all this hard work, you should save your progress on the container by creating a snapshot. If you chose ZFS as the base file system, this would be an option in the web interface under the specific container’s menu.
You could also make the new GPU enabled container a template as a basis for other containers. The option of making that cointainer a template can be found via right-clicking on the name of the container.
Final Thoughts
That’s it. You are done. You can now move to more important work of using the home lab for learning or actual work. Good luck and let me know the cool projects you do with the GPU enabled Proxmox home lab.
You could setup ssh keys so that user name and password are not required on every login on remote server.
Generate key-pair on local
mkdir ~/.ssh #if .ssh does not already existcd ~/.ssh
ssh-keygen #follow on screen instructions. use no password
ssh-add -K newly_created_key # add key to ssh-agent
The authenticity of host ‘203.0.113.1 (203.0.113.1)’ can’t be established.
ECDSA key fingerprint is fd:fd:d4:f9:77:fe:73:84:e1:55:00:ad:d6:6d:22:fe.
Are you sure you want to continue connecting (yes/no)? yes
Once you have picked a couple of topics based on the guidelines provided in my previous post, you need to find data sets to go with these topics. In this post, we will discuss what makes a data set good for data science projects. Then, we will list a couple of ways to find them.
Picking the Data Set
Narrow, a.k.a. Tall and Skinny, Data
Of the many components that make up data science, employers tend to focus on machine learning more than other disciplines. Aside from being relevant to the topic of choice, the data set should showcase your machine learning skills. To facilitate machine learning, a data set with many records and relatively few attributes, i.e., a narrow data set is preferred. If you need some examples on what a narrow data set should be like, I highly recommend downloading some of the data sets in Kaggle. The main data set in each competition is usally in the right form.
Right Size
While a larger data set is preferred in machine learning in general, data set can be too large and impede progress. Available computational power wwould become an issue and it would be difficult to iterate in the project. Until you are comfortable using some advanced tools, such as Apache Spark, you should avoid data set that is too large to fit into memory. Obvioulsy, if you find a great data set, but it is too large, you can always use only a fraction of the data. When down sampling the data, you should be mindful to not alter the distribution in the original data set.
Finding Data Set on the Internet
There are so much data available on the web. I am going to list some standard ways for data scientists to find, scrape, or curate the right data set for their projects.
Search Engines
i. Try Google and Google Data Search
Google should be the obvious first tool for finding a data set. It is the most popular search engine in the world because it works well. In fact, Google has a decidated service called Dataset Search for data sets.
ii. Go beyond the first couple of pages of search results
Google and Google Dataset Search should be suffice for many data sets, but what if you did not find the data you are looking for? We know search engines are very good for common queries as the top results tend to satisfy. However, a data set search is not a typical query so the result may not appear readily. In the past, I have found the results that are relevant to me ten, maybe twenty, pages down.
iii. Try other search engines
While Google is the dominant search engine in the world, there are othersearchengines out in the wild. I often find useful information from the alternatives.
Scraping
Sometimes, data might be formated for viewing on the web, but not necessarily readily available for download. While cutting and pasting is a solution, it would likely be too laborious for any data set suitable for machine learning. In some cases, you might be able to scrape it off the pages directly. There are many great tutorials on webscrapping so I am not going to repeat them here.
Finding and Calling Public APIs
Aside from downloading data sets whole or scraping them from website, you can also find public APIs that would allow you to find the right data. Calling APIs directly is a great way to practice your programming skills as well. You can find many public APIs avaiable using a simple search.
Final Thoughts
I have listed here a couple of ways to find a data set for data science projects. There are many great resources on finding data sets. My hope is that this post will provide you with some ideas to get started.
In I mentioned in my previous post, projects are important aspiring data scientists as way to gain experience and exhibit their skills. A question naturally arises from the need to do self-direct projects: What topics are appropriate for such projects? If you are looking to enhance your resume and your Github profile, you do not have constraints from your non-existent manager or client. While this freedom seems wonderful at first, many of my mentees have found it to be overwhelming.
To guide my mentees, I always advise them to pick a topic that is important to them on a personal level. In comparison to the standard topics that can be found on Kaggle and other data science focused sites, there are several advantages:
1. Originality
Anyone who has been around data science long enough would have seen Twitter sentiment analysis or classifying images of cats and dogs a few dozen times. I am personally tired of seeing the same old capstone projects over and over again. While one could always put a new spin on an old topic, this is a difficult task for an inexperienced data scientist.
On the other hand, a topic of personal interest is unlikely to be a retread of the tired old themes in data science. Given fierce competition in the data science job market, any opportunity to stand out from the crowd is good. As I mentioned in the previous post, the purpose of the capstone project is to hone and show off the data science skills. Without constraints from prior works, one could explore the topic freely using any technique and thus able to show off one’s best self.
2. Passion
In all the data science interviews that I have been involved in, both as an interviewee and as an interviewer, there was always a presentation component. It can be difficult to make these presentations engaging because the content are mostly determined by industry standards. In each presentation, one must include an introduction of the topic and the data set, an exploratory data analysis, a statistical or machine learning model, and finally an analysis of the results from the model. Given the mechanical nature of these presentations, a personal passion on the topic will shine through. Actually caring about the topic would inject excitement that is not commonly found in corporate presentations. This small difference may help you stand out from other candidates.
3. Expertise
As a burgeoning data scientist, one is unlikely to be the expert on the typical capstone project during the interview process. In fact, the interviewers would likely have more expertise on both the topic and the techniques involved. In my experience, it is difficult to present to people who are more knowledgeable than me. While you cannot change the fact that the interviewers are technically more proficient, you could at least speak with authority on the topic if it is of your personal interest.
Final Thoughts
Picking a topic based on personal interest is a way to take advantage of the freedom of not having anyone to answer to. I have shown the numerous advantages in picking a project topic that can show-off your passion and knowledge. For the next step in a data science project, there has to be a data set. In my next post, I will discuss how to pick an appropriate data set.