Ollama Deploy LLM
此文章介紹如何從安裝 Nvidia 驅動到從 HuggingFace 下載的 LLM Model 使用 Ollama 啟動 LLM 並使用 OpenWeb ui 進行溝通的部署過程紀錄。
整體流程: LLM Model(HugingFace) => Download .gguf model => Ollama (backend run) => OpenWeb ui (forten Web)
環境安裝:
-
禁用 nouveau 驅動
# 在 /etc/modprobe.d/blacklist-nouveau.conf 檔案中加入下方兩行字 # blacklist nouveau # options nouveau modeset=0 echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf && echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf # 更新 kernel initramfs sudo update-initramfs -u # 重開機 reboot # 執行後,沒有出現任何訊息,就表示成功禁用 nouveau 驅動 lsmod | grep nouveau # 也可以使用以下指令檢查一下 configuration 是否還有 nouveau 文字 sudo lshw -numeric -C display
-
apt-install tool
sudo apt-get update sudo apt-get install libc-dev -y sudo apt-get install linux-headers-$(uname -r) -y sudo apt-get install ubuntu-drivers-common
-
install nvidia-CUDA-Toolkit
解安裝舊 NVIDIA 驅動: sudo apt-get --purge remove nvidia* sudo apt-get --purge remove libnvidia* Base Installer: wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600 wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cuda-toolkit-12-4 Driver Installer: sudo apt-get install -y cuda-drivers
Follower:Nvidia cuda 官方安裝網頁
補充:
Check Nvidia Version: modinfo nvidia|grep version
-
setting nvcc env bashrc and check nvcc
=> sudo nano ~/.bashrc# 加在最下面 export PATH="/usr/local/<cuda-version-folder>/bin:$PATH" export LD_LIBRARY_PATH="/usr/local/<cuda-version-folder>/lib64:$LD_LIBRARY_PATH"
=> source ~/.bashrc => nvcc –version
-
install cuDNN
wget https://developer.download.nvidia.com/compute/cudnn/9.1.0/local_installers/cudnn-local-repo-ubuntu2204-9.1.0_1.0-1_amd64.deb sudo dpkg -i cudnn-local-repo-ubuntu2204-9.1.0_1.0-1_amd64.deb sudo cp /var/cudnn-local-repo-ubuntu2204-9.1.0/cudnn-*-keyring.gpg /usr/share/keyrings/ sudo apt-get update sudo apt-get -y install cudnn => Get:1 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 libcudnn9-cuda-12 9.1.0.70-1 [439 MB] Get:2 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 libcudnn9-dev-cuda-12 9.1.0.70-1 [34.1 kB] Get:3 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 libcudnn9-static-cuda-12 9.1.0.70-1 [436 MB] Get:4 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 cudnn9-cuda-12-4 9.1.0.70-1 [12.3 kB] Get:5 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 cudnn9-cuda-12 9.1.0.70-1 [12.3 kB] Get:6 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 libcudnn9-samples 9.1.0.70-1 [1670 kB] Get:7 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 cudnn9 9.1.0-1 [2442 B] Get:8 file:/var/cudnn-local-repo-ubuntu2204-9.1.0 cudnn 9.1.0-1 [2414 B] Selecting previously unselected package libcudnn9-cuda-12. (Reading database ... 216732 files and directories currently installed.) Preparing to unpack .../0-libcudnn9-cuda-12_9.1.0.70-1_amd64.deb ... Unpacking libcudnn9-cuda-12 (9.1.0.70-1) ...
Follower: Nvidia cuDNN 官方安裝網頁
Check cuDNN:
- see version:
cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2 or cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
- check can use cuDNN:
sudo apt install libfreeimage3 libfreeimage-dev cp -r /usr/src/cudnn_samples_v9/ /home/cuDNN-test/ cd /home/cuDNN-test/cudnn_samples_v9/mnistCUDNN make clean && make ./mnistCUDNN => Resulting weights from Softmax: 0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 Result of classification: 1 3 5 Test passed!
- see version:
-
install miniconda
mkdir -p ~/miniconda3 wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 rm -rf ~/miniconda3/miniconda.sh ~/miniconda3/bin/conda init bash conda create --name llama_py python=3.10
Follower: miniconda 官方網頁
-
install pytorch
pip3 install torch torchvision torchaudio
Ref: pytorch 官方網頁
-
Try run LLama2
git clone https://github.com/meta-llama/llama.git pip install -e . download the model (Ex:llama-2-7b-chat) torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir llama-2-7b-chat/ --tokenizer_path tokenizer.model --max_seq_len 512 --max_batch_size 6
-
Docker run Ollama & openWeb ui
Run in LLM folder create docker-compose.yaml
docker-compose.yaml:
version: '3.8' services: ollama: image: ollama/ollama:latest ports: - 11434:11434 runtime: nvidia environment: NVIDIA_VISIBLE_DEVICES: all volumes: - .:/code - ./ollama/ollama:/root/.ollama container_name: ollama pull_policy: always tty: true restart: always open-webui: image: ghcr.io/open-webui/open-webui:main container_name: open-webui volumes: - ./ollama/open-webui:/app/backend/data depends_on: - ollama ports: - 8080:8080 environment: - '/ollama/api=http://ollama:11434/api' extra_hosts: - host.docker.internal:host-gateway restart: unless-stopped
-
Create Ollama Modelfile to Ollama use Model
create ./Makefile
Makefile: (還有其他 LLM 詳細設定可以到 Ollama 的 github 上看)
FROM ./<model_name>.gguf
docker compose up -d docker exec -it ollama /bin/bash cd code ollama create <Ollama_Show_Model_Name> -f Modelfile
root@454866b8147a:/code# ollama create llama3-8B-chat -f ./Modelfile transferring model data creating model layer using already created layer sha256:ce22d8a49a949089fd2b50a4c19fd043b8480da951d9ace3aa50446d64d4468c writing layer sha256:6e8dc213cf73dab521788f5a7e506d202db50b73d104d7d1bbc343089dfd1e8a writing manifest success root@454866b8147a:/code# ls
-
Go to Web use
補充:
Docker install:
-
uninstall old docekr
=> for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
-
install docker source
# Add Docker's official GPG key: sudo apt-get update sudo apt-get install ca-certificates curl sudo install -m 0755 -d /etc/apt/keyrings sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc sudo chmod a+r /etc/apt/keyrings/docker.asc # Add the repository to Apt sources: echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \ $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \ sudo tee /etc/apt/sources.list.d/docker.list > /dev/null sudo apt-get update
-
install docker
Last version: => sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
Specific version: # List the available versions: => apt-cache madison docker-ce | awk '{ print $3 }' 5:26.1.0-1~ubuntu.24.04~noble 5:26.0.2-1~ubuntu.24.04~noble ...
=> VERSION_STRING=5:26.1.0-1~ubuntu.24.04~noble => sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io docker-buildx-plugin docker-compose-plugin
-
Verify Docker
=> sudo docker run hello-world
Ref:docker 官方網頁
Nvidia container toolkit:
- install Nvidia container toolkit
=> curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list => sudo apt-get update => sudo apt-get install -y nvidia-container-toolkit # Configure NVIDIA Container Toolkit => sudo nvidia-ctk runtime configure --runtime=docker => sudo systemctl restart docker # Test GPU integration => docker run --gpus all nvidia/cuda:11.5.2-base-ubuntu20.04 nvidia-smi