<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[服务器GPU配置]]></title><description><![CDATA[<h1>1. 安装nvidia 驱动</h1>
<h2>1.1 安装</h2>
<pre><code class="language-bash"># 添加 nvidia repository
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update

# 查看驱动版本
ubuntu-drivers devices

# 根据列出的驱动列表选择安装的驱动版本
# (nvidia-xxx 有些电脑显示 nvidia-driver-xxx)
sudo apt install nvidia-driver-440

# 注意：如果在 BIOS 中将 secure boot 设置为 on，在上述安装过程中可能出现设置 secure boot 密码的相关提示。如果在安全性方面要求不是很苛刻，可以考虑将 secure boot 设置为 off.


# 重启，然后看看系统设置中的附加驱动中是否添加了 nvidia 的驱动
# smi=System Management Interface,安装成功显示驱动版本号
nvidia-smi

</code></pre>
<h2>1.2 错误及处理</h2>
<p dir="auto">错误信息如下说明 NVIDIA 内核驱动版本与系统驱动不一致，需要卸载电脑驱动，重装与nvidia 内核版本匹配的版本</p>
<pre><code class="language-bash"># 查看当前版本 报错，
anneng@anneng01:~$ nvidia-smi
Failed to initialize NVML: Driver/library version mismatch
</code></pre>
<pre><code class="language-bash"># 查看 显卡驱动所使用的内核版本
anneng@anneng01:~$ cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  440.100  Fri May 29 08:45:51 UTC 2020
GCC version:  gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) 

# 440.100 表示当前驱动的版本

# 查看电脑驱动
anneng@anneng01:~$ cat /var/log/dpkg.log | grep nvidia
</code></pre>
<p dir="auto">报错如下一种可能的解决方法是选择其他驱动版本，尤其是较新的版本，安装之后重启，可能解决问题。</p>
<pre><code class="language-bash">NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
</code></pre>
<h1>2. 安装CUDA</h1>
<h2>下载</h2>
<p dir="auto"><a href="https://developer.nvidia.com/cuda-toolkit-archive" rel="nofollow ugc">官网</a> 选择 CUDA 版本进行下载</p>
<p dir="auto">这里选择 <code>CUDA Toolkit 10.2 (Nov 2019) -&gt; Linux Ubuntu 18.04 x86_64  deb [local]</code> 版本下载,用户可根据实际情况选择合适版本进行下载</p>
<p dir="auto">官网给出的安装命令如下:</p>
<pre><code class="language-bash">wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-2-local-10.2.89-440.33.01/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
</code></pre>
<h2>添加环境变量</h2>
<pre><code class="language-bash">vim ~/.bashrc

# 文件末尾增加如下内容
export CUDA_HOME=/usr/local/cuda-10.2
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.2/bin:$PATH


# 刷新环境变量
source ~/.bashrc
</code></pre>
<p dir="auto">环境变量nvcc才能生效,nvcc 程序应该在路径 <code>/usr/local/cuda-{xx}/bin</code> 中</p>
<h2>验证</h2>
<pre><code class="language-bash">anneng@anneng01:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Nov__3_21:07:56_CDT_2017
Cuda compilation tools, release 9.1, V9.1.85

anneng@anneng01:~$ nvidia-smi
</code></pre>
<h1>3. 安装 cuDNN</h1>
<h2>3.1 下载</h2>
<p dir="auto"><a href="https://developer.nvidia.com/rdp/cudnn-download" rel="nofollow ugc">官网</a> 选择 cuDNN 版本进行下载,需要和CUDA版本一致</p>
<p dir="auto">这里选择 <code>Download cuDNN v8.0.1 RC2 (June 26th, 2020), for CUDA 10.2</code> 版本,将以下三个文件都下载到本地</p>
<ul>
<li><code>cuDNN Runtime Library for Ubuntu18.04 (Deb)</code></li>
<li><code>cuDNN Developer Library for Ubuntu18.04 (Deb)</code></li>
<li><code>cuDNN Code Samples and User Guide for Ubuntu18.04 (Deb)</code></li>
</ul>
<p dir="auto">下载完成后会得到以下3个文件</p>
<ul>
<li><code>libcudnn8_8.0.1.13-1+cuda10.2_amd64.deb</code></li>
<li><code>libcudnn8-dev_8.0.1.13-1+cuda10.2_amd64.deb</code></li>
<li><code>libcudnn8-doc_8.0.1.13-1+cuda10.2_amd64.deb</code></li>
</ul>
<p dir="auto">下载,用户可根据实际情况选择合适版本进行下载</p>
<blockquote>
<p dir="auto">[注:] 需要注册,注册后才能下载</p>
</blockquote>
<h2>3.2 安装</h2>
<pre><code class="language-bash">sudo dpkg -i libcudnn8_8.0.1.13-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn8-dev_8.0.1.13-1+cuda10.2_amd64.deb
sudo dpkg -i libcudnn8-doc_8.0.1.13-1+cuda10.2_amd64.deb
</code></pre>
<p dir="auto">lspci | grep -i nvidia<br />
cat /proc/driver/nvidia/version<br />
lsmod | grep nouveau</p>
<h1>4. 安装Docker CE及nvidia-docker2</h1>
<h2>4.1 安装环境</h2>
<ul>
<li>OS：Ubuntu 18.04 64 bit</li>
<li>显卡：NVidia Tesla T4</li>
<li>CUDA：10.2</li>
<li>cnDNN：7.5</li>
</ul>
<h2>4.2 配置Docker源</h2>
<pre><code class="language-bash"># 更新源
sudo apt update

# 启用HTTPS
sudo apt install -y \
    apt-transport-https \
    ca-certificates \
    curl \
    gnupg-agent \
    software-properties-common

# 添加GPG key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -

# 添加稳定版的源
sudo add-apt-repository \
   "deb [arch=amd64] https://download.docker.com/linux/ubuntu \
   $(lsb_release -cs) \
   stable"
</code></pre>
<h2>4.3 安装Docker CE</h2>
<pre><code class="language-bash">sudo apt update
# 安装Docker CE
sudo apt install -y docker-ce

# 验证,输出有 Hello from Docker! 内容则成功
sudo docker run hello-world
</code></pre>
<h2>4.4 配置<code>nvidia-docker</code>源,并安装</h2>
<pre><code class="language-bash"># 添加源
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list

# 安装并重启docker
sudo apt update &amp;&amp; sudo apt install -y nvidia-container-toolkit

sudo systemctl restart docker
</code></pre>
<pre><code class="language-bash">sudo curl -L https://github.com/docker/compose/releases/download/1.17.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
 
sudo chmod +x /usr/local/bin/docker-compose
docker-compose --version
</code></pre>
]]></description><link>http://an.forum.genostack.com/topic/39/服务器gpu配置</link><generator>RSS for Node</generator><lastBuildDate>Sat, 13 Jun 2026 12:37:12 GMT</lastBuildDate><atom:link href="http://an.forum.genostack.com/topic/39.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 30 Jul 2020 06:22:09 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to 服务器GPU配置 on Mon, 30 May 2022 09:14:59 GMT]]></title><description><![CDATA[<p dir="auto">nvidia-smi<br />
Failed to initialize NVML: Driver/library version mismatch<br />
cat /var/log/dpkg.log | grep nvidia<br />
2021-07-26 06:50:34 status installed nvidia-driver-460:amd64 460.91.03-0ubuntu0.18.04.1</p>
<p dir="auto">系统自动进行了一次升级 重启即可！</p>
<p dir="auto">用这个帖子的办法可以不重启<br />
<a href="https://stackoverflow.com/questions/43022843/nvidia-nvml-driver-library-version-mismatch#comment73133147_43022843" rel="nofollow ugc">https://stackoverflow.com/questions/43022843/nvidia-nvml-driver-library-version-mismatch#comment73133147_43022843</a></p>
<p dir="auto">How can we do that?<br />
First, we should know which drivers are loaded.</p>
<p dir="auto">lsmod | grep nvidia<br />
You may get</p>
<p dir="auto">nvidia_uvm            634880  8<br />
nvidia_drm             53248  0<br />
nvidia_modeset        790528  1 nvidia_drm<br />
nvidia              12312576  86 nvidia_modeset,nvidia_uvm<br />
Our final goal is to unload nvidia mod, so we should unload the module depend on nvidia:</p>
<p dir="auto">sudo rmmod nvidia_drm<br />
sudo rmmod nvidia_modeset<br />
sudo rmmod nvidia_uvm<br />
Then, unload nvidia</p>
<p dir="auto">sudo rmmod nvidia<br />
Troubleshooting<br />
If you get an error like rmmod: ERROR: Module nvidia is in use, which indicates that the kernel module is in use, you should kill the process that using the kmod:</p>
<p dir="auto">sudo lsof /dev/nvidia*<br />
and then kill those process, then continue to unload the kmods.</p>
<p dir="auto">Test<br />
Confirm you successfully unload those kmods</p>
<p dir="auto">lsmod | grep nvidia<br />
You should get nothing. Then confirm you can load the correct driver:</p>
<p dir="auto">nvidia-smi<br />
You should get the correct output.</p>
]]></description><link>http://an.forum.genostack.com/post/712</link><guid isPermaLink="true">http://an.forum.genostack.com/post/712</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Mon, 30 May 2022 09:14:59 GMT</pubDate></item><item><title><![CDATA[Reply to 服务器GPU配置 on Sat, 24 Jul 2021 07:29:20 GMT]]></title><description><![CDATA[<p dir="auto"><a href="https://zhuanlan.zhihu.com/p/91334380" rel="nofollow ugc">https://zhuanlan.zhihu.com/p/91334380</a><br />
显卡，显卡驱动,nvcc, cuda driver,cudatoolkit,cudnn到底是什么？</p>
]]></description><link>http://an.forum.genostack.com/post/693</link><guid isPermaLink="true">http://an.forum.genostack.com/post/693</guid><dc:creator><![CDATA[anneng]]></dc:creator><pubDate>Sat, 24 Jul 2021 07:29:20 GMT</pubDate></item></channel></rss>