ubuntu20.04 安装tensorflow-gpu
配置:
系统 ubuntu 20.04 LTS
显卡 GTX 1060 6G
1 安装cudatoolkit (我选 CUDA Toolkit 12.2 )
NVIDIA CUDA Installation Guide for Linux
https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#prepare-ubuntu
选择 2.7 步骤,下载 deb 包,本地安装
2.7. Download the NVIDIA CUDA Toolkit
https://developer.nvidia.com/cuda-downloads
选择
linux | x86_64 | ubuntu 20.04 | deb(local) |
命令行执行
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.2.1/local_installers/cuda-repo-ubuntu2004-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-2-local_12.2.1-535.86.10-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-2-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
安装完成
2 安装cudNN (我选 CUDA Toolkit 12.2 对应的 版本 cuDNN v8.9.3 )
需要注册并且登录Nvidia 账号
然后到这个地址下载
https://developer.nvidia.com/rdp/cudnn-download
我选择这个,根据CUDA Toolkit 版本选对应的
Local Installer for Ubuntu20.04 x86_64 (Deb):
Download cuDNN v8.9.3 (July 11th, 2023), for CUDA 12.x
然后下载到本地安装
sudo chmod 777 cudnn-local-repo-ubuntu2004-8.9.3.28_1.0-1_amd64.deb
sudo dpkg -i cudnn-local-repo-ubuntu2004-8.9.3.28_1.0-1_amd64.deb
完成
docker 拉取镜像 tensorflow/tensorflow:devel-gpu
参考 Docker Linux 构建 https://tensorflow.google.cn/install/source?hl=zh-cn
在某个目录,我这$PWD
是 /home/wmx/software/tensorDocker
sudo docker run --gpus all -it -w /tensorflow -v $PWD:/mnt -e HOST_PERMS="$(id -u):$(id -g)" tensorflow/tensorflow:devel-gpu bash
报错:
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].
ERRO[0000] error waiting for container: context canceled
解决:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
启动成功: