Nvidia A100 環境設定
剛好有幸用到 8Core 80G GPU 的 A100 機台,但以為設定跟之前一樣卻要用 CUDA 跑 LLM 時遇到問題並紀錄解決方法。
當你今天是使用 ( V100 / A100 / A30 …等等 ) 時因為是使用 NVSwitch 連通所以需要安裝 3 以後的步驟才能正常使用 NVIDA GPU 的功能
-
Install CUDA
Follow: CUDA Toolkit 12.6 Downloads | NVIDIA Developer
Base Installer:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/12.6.0/local_installers/cuda-repo-ubuntu2204-12-6-local_12.6.0-560.28.03-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2204-12-6-local_12.6.0-560.28.03-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2204-12-6-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda-toolkit-12-6
Driver Installer:
sudo apt-get install -y nvidia-open
Setting NVCC:
Command 中的 cuda-12 請依照你安裝的版本去替換
sudo vim ~/.bashrc export PATH=/usr/local/cuda-12/bin${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-12/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
-
Install cuDNN
Follow: cuDNN 9.3.0 Downloads | NVIDIA Developer
Base Installer:
wget https://developer.download.nvidia.com/compute/cudnn/9.3.0/local_installers/cudnn-local-repo-ubuntu2204-9.3.0_1.0-1_amd64.deb sudo dpkg -i cudnn-local-repo-ubuntu2204-9.3.0_1.0-1_amd64.deb sudo cp /var/cudnn-local-repo-ubuntu2204-9.3.0/cudnn-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cudnn
If install specific CUDA version package:
sudo apt-get -y install cudnn-cuda-<CUDA-Version>
Install libfreeimage:
sudo apt install libfreeimage-dev
Test cuDNN:
git clone https://github.com/NVIDIA/cuda-samples.git cd cuda-samples/Samples/bandwidthTest make ./bandwidthTest
如果失敗請安裝接下來的步驟
-
Install DCGM
Follow: NVIDIA DCGM | NVIDIA Developer
sudo apt-get update sudo apt-get install -y datacenter-gpu-manager
-
Install nvidia-fabricmanager
Follow: fabric-manager-user-guide.pdf (nvidia.com) - Chapter 2.6
version=<your-gpu-Driver Version> main_version=$(echo $version | awk -F '.' '{print $1}') apt-get update apt-get -y install nvidia-fabricmanager-${main_version}=${version}-*
Nvidia-smi: +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ 以此為例 version = 560.35.03
-
Disabel nv-hostengine
sudo nv-hostengine -t
-
Start the fabricmanager
sudo service nvidia-fabricmanager start
-
Test cuDNN again
git clone https://github.com/NVIDIA/cuda-samples.git cd cuda-samples/Samples/bandwidthTest make ./bandwidthT
root@test-ORACLE-SERVER-E4-2c:~/cudnn_samples_v9/mnistCUDNN# ./mnistCUDNN Executing: mnistCUDNN cudnnGetVersion() : 90400 , CUDNN_VERSION from cudnn.h : 90400 (9.4.0) Host compiler version : GCC 11.4.0 There are 8 CUDA capable devices on your machine : device 0 : sms 108 Capabilities 8.0, SmClock 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=0 device 1 : sms 108 Capabilities 8.0, SmClock 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=1 device 2 : sms 108 Capabilities 8.0, SmClock 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=2 device 3 : sms 108 Capabilities 8.0, SmClock 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=3 device 4 : sms 108 Capabilities 8.0, SmClock 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=4 device 5 : sms 108 Capabilities 8.0, SmClok 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=5 device 6 : sms 108 Capabilities 8.0, SmClock 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=6 device 7 : sms 108 Capabilities 8.0, SmClock 1410.0 Mhz, MemSize (Mb) 81155, MemClock 1593.0 Mhz, Ecc=1, boardGroupID=7 Using device 0 Testing single precision Loading binary file data/conv1.bin Loading binary file data/conv1.bias.bin Loading binary file data/conv2.bin Loading binary file data/conv2.bias.bin Loading binary file data/ip1.bin Loading binary file data/ip1.bias.bin Loading binary file data/ip2.bin Loading binary file data/ip2.bias.bin Loading image data/one_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm_v7 ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.027648 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.036864 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.062464 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.070656 time requiring 178432 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.091136 time requiring 2057744 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.092160 time requiring 184784 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Testing cudnnGetConvolutionForwardAlgorithm_v7 ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 129072 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.055296 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.064512 time requiring 1433120 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.090112 time requiring 2450080 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.093184 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.098304 time requiring 4656640 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.189440 time requiring 129072 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Resulting weights from Softmax: 0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 Loading image data/three_28x28.pgm Performing forward propagation ... Testing cudnnGetConvolutionForwardAlgorithm_v7 ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.023552 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.026624 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.028672 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.057344 time requiring 184784 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.057344 time requiring 178432 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.064512 time requiring 2057744 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Testing cudnnGetConvolutionForwardAlgorithm_v7 ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 129072 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Testing cudnnFindConvolutionForwardAlgorithm ... ^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.053248 time requiring 2450080 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.055296 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.063488 time requiring 1433120 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.064512 time requiring 4656640 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.092160 time requiring 0 memory ^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.102400 time requiring 129072 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory ^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory Resulting weights from Softmax: 0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 Loading image data/five_28x28.pgm Performing forward propagation ... Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed!
看到 Test passed 恭喜你成功可以正常使用 GPU 了!!
補充:
NVLink Topology Command:
nvidia-smi topo -m
NVLink Status Command:
nvidia-smi nvlink --status
更多相關 Nvidia-smi 查看 Nvlink Command:
nvidia-smi 工具检查NVIDIA NVLink - Docs