DeepSeek-R1国产化系统gpu驱动+cuda+ollama+webui可视化离线私有化部署

1.概述

网上大部分教程都是在线部署，完全离线私有化部署的文章不多，本文介绍从GPU驱动、cuda、ollama、deepseek模型和open webui等完全离线安装几个方面，让小白0基础也可以私有化部署大模型deepseek-R1。

我使用的设备是银河麒麟V10操作系统，海光CPU，内存128G，三张英伟达T4显卡（每张显存16G）。这里说一下国产化海光芯片c86架构就是源自于x86架构，与intel和amd等x86架构cpu使用并没有区别。

模型分别部署了deepseek-R1-32B-Q4、32B-Q8和70B-Q4，Q4是量化4位整数，量化数字越大精度越高模型越大，当然效果也越好。亲测了用单张T4显卡部署deepseek 32B-Q4速度较卡，所以显存最好大于16G。

2.GPU驱动安装

显卡硬件安装后，使用命令lspci | grep NVIDIA 查看是否存在GPU设备，lspci命令可查看所有连接PCI总线的设备。

但还需要安装显卡驱动，操作系统才能使用显卡。nvidia-smi命令若没有输出则未安装驱动。

下载 NVIDIA 官方驱动 | NVIDIA

这里有个坑，操作系统选择时，麒麟系统不可以选择Linux 64-bit KylinOS 10，否则安装后nvidia-smi命令仍然无法使用。

[root@localhost ~]# sudo modprobe nvidia             #手动加载驱动内核
modprobe: FATAL: Module nvidia not found in directory /lib/modules/4.19.90-89.11.v2401.ky10.x86_64
表示当前系统的内核中找不到 NVIDIA 驱动模块。这可能是因为 NVIDIA 驱动没有正确编译或加载，或者驱动版本与当前内核不兼容。

操作系统应该直接选择Linux 64-bit 这个版本。

下载并运行驱动文件NVIDIA-Linux-x86_64-570.86.15.run


chmod 755 NVIDIA-Linux-x86_64-570.86.15.run
sudo ./NVIDIA-Linux-x86_64-570.86.15.run

安装过程中一些选项的选择：

1. Multiple kernel module types are available for this systems. Which would you like to use? MIT/GPL NVIDIA Proprietary
2. An alternate method of installing the NVIDIA driver was detected.(This is usually a package provided by your distributor.) A driver installed via that method may integrate better with you system than a driver installed by
naidia-installer

3. Please review the message provided by the maintainer of this alternate installation method and decide how to proceed:

Continue installation Abort installtion

4. The NVIDIA driver provided by Ubuntu can be installed by launching the "Software & Updates" application,and by selecting the NVIDIA driver from the "Additional Drivers" tab.
Continue installation

5. Would you like to register the kernel module source with DKMS? This will allow DKMS.... Yes

6. Install NVIDIA's 32-bit compatibility libraries? NO

7. Would you like to run the nvidia-xconfig utility to automatically update you X configuration file so that the NVIDIA X driver will be used when you restart X? Any pre-existing X configuration file will be backed up.
Yes

出现下面字样，说明安装成功，安装后重启操作系统。

Your X configuration file has been successfully updated. Installation of the NVIDIA Accelerated Graphics Driver for Linux-x86_64 (version:570.86.15) is now complete.

安装后用nvidia-smi命令可以看到GPU信息。

lshw -c video | grep configuration # 查看显卡驱动driver是否是nvidia

3.安装cuda

GPU使用CUDA + cuDNN 来加速计算，所以安装cuda和cudnn。

nvidia-smi 命令查看驱动支持的cuda最高版本

前往 Nvidia 的 CUDA 官网：https://developer.nvidia.com/cuda-toolkit-archive 下载对应版本cuda

这里的操作系统就选择麒麟v10

$ sudo sh cuda_12.8.0_570.86.10_linux.run

选择 “accept”

笔者服务器已经安装过 Nvidia 显卡驱动了，因此不需要再安装了。如果你尚未安装驱动，可以顺便一起安装了。摁一下空格取消 Driver 安装，直接选择 Install 安装：

安装成功：

cuda安装目录在/usr/local/cuda-12.8

接下来配置环境变量，直接在 Linux 命令行输入以下命令：

sudo vim ~/.bashrc

在末尾添加

export PATH=/usr/local/cuda-12.8/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.8/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda-12.8

使系统环境变量生效

source ~/.bashrc
sudo ldconfig

使用命令 nvcc -V 输出cuda版本号则安装成功。

4.安装cudnn

需要根据cuda版本来安装对应的cudnn版本

cuDNN Archive | NVIDIA Developer

tar -xvf cudnn-linux-x86_64-8.9.7.29_cuda12-archive.tar.xz
cd cudnn-linux-x86_64-8.9.7.29_cuda12-archive/

sudo cp lib/* /usr/local/cuda-12.8/lib64/
sudo cp include/* /usr/local/cuda-12.8/include/
sudo chmod a+r /usr/local/cuda-12.8/lib64/*
sudo chmod a+r /usr/local/cuda-12.8/include/*

验证cuDNN安装成功

cat /usr/local/cuda-12.8/include/cudnn_version.h | grep CUDNN_MAJOR -A 2

5.安装ollama

Ollama 是一个开源的大型语言模型服务工具，可以帮助用户快速在本地运行大模型。

https://ollama.com/download/ollama-linux-amd64.tgz

sudo tar -C /usr -xzf ollama-linux-amd64.tgz #解压安装

添加systemctl服务启动文件，创建文件：/etc/systemd/system/ollama.service

[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"
Environment="OLLAMA_MODELS=/opt/llm_work/deepseek-r1/models"
Environment="OLLAMA_HOST=0.0.0.0"

[Install]
WantedBy=default.target

上边文件中指定OLLAMA_MODELS位置，还需要设置AI模型目录的权限：

chmod +777 /opt/llm_work/deepseek-r1/models

需要外网访问记得开放防火墙11434端口。

sudo systemctl daemon-reload #服务配置生效
sudo systemctl enable ollama #开机启动
sudo systemctl start ollama #启动ollama
sudo systemctl status ollama #查看ollama运行状态

ollama -v #查看ollama版本

[root@localhost ~]# ollama -v
ollama version is 0.5.11

6.离线安装deepseek模型

在https://huggingface.co/models或魔搭社区下载模型的GGUF格式文件，实现单独文件直接放到ollama模型目录下即可运行。

显存16G可以下载32B的量化Q4的，可以运行就是比较卡，可以根据自己显存大小选择不同的模型大小。

在模型目录/opt/llm_work/deepseek-r1/models/blobs（blobs是ollama运行时自动创建的，也可手动创建）下创建文件Modelfile，

Modelfile内容是根据ollama官方模版设置的

Modelfile文件内容如下：

FROM ./DeepSeek-R1-Distill-Qwen-32B-Q4_0.gguf

TEMPLATE """{{- if .System }}{{ .System }}{{ end }}
{{- range $i, $_ := .Messages }}
{{- $last := eq (len (slice $.Messages $i)) 1}}
{{- if eq .Role "user" }}<｜User｜>{{ .Content }}
{{- else if eq .Role "assistant" }}<｜Assistant｜>{{ .Content }}{{- if not $last }}<｜end▁of▁sentence｜>{{- end }}
{{- end }}
{{- if and $last (ne .Role "assistant") }}<｜Assistant｜>{{- end }}
{{- end }}
"""

PARAMETER stop <｜begin▁of▁sentence｜>
PARAMETER stop <｜end▁of▁sentence｜>
PARAMETER stop <｜User｜>
PARAMETER stop <｜Assistant｜>

最重要的是第一行，FROM + 你的模型文件。

ollama create model-name -f Modelfile # model-name是自定义的模型名

创建成功后可以使用下面命令检查Ollama中是否已有模型。

ollama list

使用命令 ollama run model-name 启动模型，就可以在命令行中使用deepseek了，按Ctrl+d可以退出对话。

还可以调用大模型API接口：

curl http://localhost:11434/api/generate -d "{\"model\": \"deepseek-r1-32B-Q4:latest\",\"prompt\": \"你是谁\",\"stream\":flase}"

"stream":true 是流式返回，flase是全部生成后一次性返回。

7.离线安装open webui

只部署deepseek只能在命令行中使用大模型，通过部署Open WebUI可以通过网页来方便使用大模型。部署open webui需要用到docker，需要先在联网环境中下载镜像，再导入到离线环境中。

open-webui官方安装文档：https://github.com/open-webui/open-webui

（1）离线安装docker

下载安装包 https://download.docker.com/linux/static/stable/x86_64/

tar -zxvf docker-24.0.6.tgz

将解压之后的docker文件移到 /usr/bin目录下

sudo cp docker/* /usr/bin/

注册成服务

vim /etc/systemd/system/docker.service

[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target

[Service]
Type=notify
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s

[Install]
WantedBy=multi-user.target

chmod +x /etc/systemd/system/docker.service # 配置权限

systemctl daemon-reload # 配置生效

systemctl enable docker.service # 设置开机启动

systemctl start docker # 启动docker

配置国内镜像源，镜像源可能不好使，则需要翻墙下载镜像。

vi /etc/docker/daemon.json

# 内容如下：
{
  "registry-mirrors": [
    "https://xx4bwyg2.mirror.aliyuncs.com",
    "http://f1361db2.m.daocloud.io",
    "https://registry.docker-cn.com",
    "http://hub-mirror.c.163.com",
    "https://docker.mirrors.ustc.edu.cn"
  ]
}{}

# 退出并保存
:wq

# 使配置生效
systemctl daemon-reload

# 重启Docker
systemctl restart docker

（2）下载镜像

在联网环境中下载镜像

docker pull ghcr.io/open-webui/open-webui:main

docker images #列出本地镜像

将下载的镜像导出为open-webui.tar包，然后导入到离线环境中

docker save -o open-webui.tar <镜像名称>

（3）离线安装

在离线环境中导入open-webui.tar文件

docker load -i open-webui.tar 离线导入镜像

docker images #列出本地镜像
docker ps -a #查看所有容器和运行状态

docker run #用于首次创建并运行容器
docker start #启动已存在的容器

启动open-webui容器

docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main

启动后在浏览器访问http://localhost:3000 ，初次登录注册账号。