此文章介紹如何從安裝 Nvidia 驅動到從 HuggingFace 下載的 LLM Model 使用 Ollama 啟動 LLM 並使用 OpenWeb ui 進行溝通的部署過程紀錄。

整體流程: LLM Model(HugingFace) => Download .gguf model => Ollama (backend run) => OpenWeb ui (forten Web)

環境安裝:

  1. 禁用 nouveau 驅動

    # 在 /etc/modprobe.d/blacklist-nouveau.conf 檔案中加入下方兩行字
    # blacklist nouveau
    # options nouveau modeset=0
    echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf && echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf
    
    # 更新 kernel initramfs
    sudo update-initramfs -u
    
    #  重開機
    reboot
    
    # 執行後,沒有出現任何訊息,就表示成功禁用 nouveau 驅動
    lsmod | grep nouveau
    
    # 也可以使用以下指令檢查一下 configuration 是否還有 nouveau 文字
    sudo lshw -numeric -C display
    

    Ref: How to disable Nouveau kernel driver - askubuntu

  2. apt-install tool

    sudo apt-get update
    sudo apt-get install libc-dev -y
    sudo apt-get install linux-headers-$(uname -r) -y
    sudo apt-get install ubuntu-drivers-common
    
  3. install nvidia-CUDA-Toolkit

    解安裝舊 NVIDIA 驅動:
    sudo apt-get --purge remove nvidia*
    sudo apt-get --purge remove libnvidia*
    
    Base Installer:
    wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
    sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
    wget https://developer.download.nvidia.com/compute/cuda/12.4.1/local_installers/cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
    sudo dpkg -i cuda-repo-ubuntu2204-12-4-local_12.4.1-550.54.15-1_amd64.deb
    sudo cp /var/cuda-repo-ubuntu2204-12-4-local/cuda-*-keyring.gpg /usr/share/keyrings/
    sudo apt-get update
    sudo apt-get -y install cuda-toolkit-12-4
    
    Driver Installer:
    sudo apt-get install -y cuda-drivers
    

    Follower:Nvidia cuda 官方安裝網頁

    補充:

    Check Nvidia Version:
    modinfo nvidia|grep version
    
  4. setting nvcc env bashrc and check nvcc
    => sudo nano ~/.bashrc

    # 加在最下面
    export PATH="/usr/local/<cuda-version-folder>/bin:$PATH"
    export LD_LIBRARY_PATH="/usr/local/<cuda-version-folder>/lib64:$LD_LIBRARY_PATH"
    

    => source ~/.bashrc => nvcc –version

  5. install cuDNN

    wget https://developer.download.nvidia.com/compute/cudnn/9.1.0/local_installers/cudnn-local-repo-ubuntu2204-9.1.0_1.0-1_amd64.deb
    sudo dpkg -i cudnn-local-repo-ubuntu2204-9.1.0_1.0-1_amd64.deb
    sudo cp /var/cudnn-local-repo-ubuntu2204-9.1.0/cudnn-*-keyring.gpg /usr/share/keyrings/
    sudo apt-get update
    sudo apt-get -y install cudnn
    =>
    Get:1 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  libcudnn9-cuda-12 9.1.0.70-1 [439 MB]
    Get:2 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  libcudnn9-dev-cuda-12 9.1.0.70-1 [34.1 kB]
    Get:3 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  libcudnn9-static-cuda-12 9.1.0.70-1 [436 MB]
    Get:4 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  cudnn9-cuda-12-4 9.1.0.70-1 [12.3 kB]
    Get:5 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  cudnn9-cuda-12 9.1.0.70-1 [12.3 kB]
    Get:6 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  libcudnn9-samples 9.1.0.70-1 [1670 kB]
    Get:7 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  cudnn9 9.1.0-1 [2442 B]
    Get:8 file:/var/cudnn-local-repo-ubuntu2204-9.1.0  cudnn 9.1.0-1 [2414 B]
    Selecting previously unselected package libcudnn9-cuda-12.
    (Reading database ... 216732 files and directories currently installed.)
    Preparing to unpack .../0-libcudnn9-cuda-12_9.1.0.70-1_amd64.deb ...
    Unpacking libcudnn9-cuda-12 (9.1.0.70-1) ...
    

    Follower: Nvidia cuDNN 官方安裝網頁

    Check cuDNN:

    • see version:
      cat /usr/local/cuda/include/cudnn_version.h | grep CUDNN_MAJOR -A 2
          or
      cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2
      
    • check can use cuDNN:
      sudo apt install libfreeimage3 libfreeimage-dev
      cp -r /usr/src/cudnn_samples_v9/ /home/cuDNN-test/
      cd /home/cuDNN-test/cudnn_samples_v9/mnistCUDNN
      make clean && make
      ./mnistCUDNN
      =>
      Resulting weights from Softmax:
      0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
      
      Result of classification: 1 3 5
      
      Test passed!
      
  6. install miniconda

    mkdir -p ~/miniconda3
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
    bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
    rm -rf ~/miniconda3/miniconda.sh
    ~/miniconda3/bin/conda init bash
    conda create --name llama_py python=3.10
    

    Follower: miniconda 官方網頁

  7. install pytorch

    pip3 install torch torchvision torchaudio
    

    Ref: pytorch 官方網頁

  8. Try run LLama2

    git clone https://github.com/meta-llama/llama.git
    pip install -e .
    download the model (Ex:llama-2-7b-chat)
    torchrun --nproc_per_node 1 example_chat_completion.py     --ckpt_dir llama-2-7b-chat/     --tokenizer_path tokenizer.model     --max_seq_len 512 --max_batch_size 6
    

    Ref:LLama GitHub Page

  9. Docker run Ollama & openWeb ui

    Run in LLM folder
    create docker-compose.yaml
    

    docker-compose.yaml:

    version: '3.8'
    services:
    ollama:
        image: ollama/ollama:latest
        ports:
        - 11434:11434
        runtime: nvidia
        environment:
        NVIDIA_VISIBLE_DEVICES: all
        volumes:
        - .:/code
        - ./ollama/ollama:/root/.ollama
        container_name: ollama
        pull_policy: always
        tty: true
        restart: always
    
    open-webui:
        image: ghcr.io/open-webui/open-webui:main
        container_name: open-webui
        volumes:
        - ./ollama/open-webui:/app/backend/data
        depends_on:
        - ollama
        ports:
        - 8080:8080
        environment:
        - '/ollama/api=http://ollama:11434/api'
        extra_hosts:
        - host.docker.internal:host-gateway
        restart: unless-stopped
    
  10. Create Ollama Modelfile to Ollama use Model

    create ./Makefile
    

    Makefile: (還有其他 LLM 詳細設定可以到 Ollama 的 github 上看)

    FROM ./<model_name>.gguf
    
    docker compose up -d
    docker exec -it ollama /bin/bash
    cd code
    ollama create <Ollama_Show_Model_Name> -f Modelfile
    
    root@454866b8147a:/code# ollama create llama3-8B-chat -f ./Modelfile
    transferring model data
    creating model layer
    using already created layer sha256:ce22d8a49a949089fd2b50a4c19fd043b8480da951d9ace3aa50446d64d4468c
    writing layer sha256:6e8dc213cf73dab521788f5a7e506d202db50b73d104d7d1bbc343089dfd1e8a
    writing manifest
    success
    root@454866b8147a:/code# ls
    

    Ref: Ollama GitHub Page - import

  11. Go to Web use

補充:

Docker install:

  • uninstall old docekr

    => for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
    
  • install docker source

    # Add Docker's official GPG key:
    sudo apt-get update
    sudo apt-get install ca-certificates curl
    sudo install -m 0755 -d /etc/apt/keyrings
    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
    sudo chmod a+r /etc/apt/keyrings/docker.asc
    
    # Add the repository to Apt sources:
    echo \
    "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
    $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
    sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
    sudo apt-get update
    
  • install docker

    Last version:
    => sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    
    Specific version:
    
    # List the available versions:
    => apt-cache madison docker-ce | awk '{ print $3 }'
    
    5:26.1.0-1~ubuntu.24.04~noble
    5:26.0.2-1~ubuntu.24.04~noble
    ...
    
    => VERSION_STRING=5:26.1.0-1~ubuntu.24.04~noble
    => sudo apt-get install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io docker-buildx-plugin docker-compose-plugin
    
  • Verify Docker

    => sudo docker run hello-world
    

    Ref:docker 官方網頁

Nvidia container toolkit:

  • install Nvidia container toolkit

    => curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
    && curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
        sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
        sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
    => sudo apt-get update
    => sudo apt-get install -y nvidia-container-toolkit
    
    # Configure NVIDIA Container Toolkit
    => sudo nvidia-ctk runtime configure --runtime=docker
    => sudo systemctl restart docker
    
    # Test GPU integration
    => docker run --gpus all nvidia/cuda:11.5.2-base-ubuntu20.04 nvidia-smi
    

    Ref:

    Nvidia container-toolkit

    傻瓜 LLM 架設 - Ollama + Open WebUI 之 Docker Compose 懶人包