kube-cascade/nvidia
2023-12-24 14:02:37 -06:00
..
deploy.sh true it on up y'all 2023-12-22 11:50:32 -06:00
gpu-pod.yaml change gpu test pod to run the cuda runtime on pascal 2023-12-24 14:02:37 -06:00
nvidia-runtime-class.yaml true it on up y'all 2023-12-22 11:50:32 -06:00
README.md true it on up y'all 2023-12-22 11:50:32 -06:00

Installing NVIDIA stuff on a new k3s node

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list sudo apt-get update sudo apt-get install -y nvidia-container-toolkit nvidia-kernel-dkms

Installing k3s stuff

There is a deploy.sh in this folder.

It installs the RuntimeClass needed to target the nvidia runtime and installs the device plugin with GFD (node finder for GPUs) via helm.

Testing

With these two pieces installed, you should be able to find a GPU-bearing node.

kubectl get node -l 'nvidia.com/gpu.count'