Installing cuda on Ubuntu 18.04 for pytorch or tensorflow
I recently needed to update some servers running an old Ubuntu LTS (Xenial, 16.04) to a slightly less old Ubuntu LTS (Bionic, 18.04). I had been putting it off for some time, mostly due to the noise I heard about problems installing the Nvidia CUDA toolkit. But that was two years ago, which seems like enough time for things to have been figured out.
And it turns out indeed that installing cuda and cudnn on bionic is really easy, but only if you don't pay attention to what people have written about it on the internet. There are a lot of instructions on the internet that have half-apt-managed installations, with some additional binary linking and done-by-hand-file-modification 😬. This includes the instructions that Google has provided for tensorflow here, which failed to locate the appropriate packages when I executed the instructions locally. I tried about four varieties of installation steps, to absolutely no profit, before reverting back to the official instructions from Nvidia themselves, which you can find here.
Unfortunately, enough blog posts were written around the initial difficulties installing cuda that they overshadow anything from Nvidia in Google's rankings:
If you, like me, are installing cuda in 2020 or later for pytorch or tensorflow, here is a set of instructions that worked at least twice. The good news is that we can get relatively new versions of all the things, but the newest pytorch took a few extra steps. Briefly, we are going to install CUDA version 10.1, CUDNN version 7.6.5, tensorflow-gpu version 2.1.0. Then we are going to clone the latest releases of pytorch, torchvision, and torchtext, and build them from source.
1. Make sure your GPU can run moden CUDA
Do not start by installing any device drivers by hand. This will be done for you during the cuda installation process. You will, however, make sure that your GPU supports the CUDA versions required by popular neural network libraries. You can query your device name/version with:
lspci | grep -i nvidia
Once you know which GPU you have, you should check that it is listed here: https://developer.nvidia.com/cuda-gpus. If your model doesn't appear, the rest of these instructions will not be useful for you.
2. Install CUDA
CUDA is the low-level language that describes how to perform distributed linear algebraic operations across the many cores of your graphics processor. To download and install it, execute the following:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget http://developer.download.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-10-1-local-10.1.243-418.87.00_1.0-1_amd64.deb
sudo apt-key add /var/cuda-repo-10-1-local-10.1.243-418.87.00/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
During this process, Nvidia will almost certainly have replaced the driver for your card. You need to restart your system for the new driver to take effect:
sudo shutdown -r now
Once it has booted back up run
nvidia-smi
to make sure you see your device and driver listed. The driver version should be larger than 400 (e.g. for me it is 418).
To make use of CUDA, programs need to be able to find it. The easy way to do this is to add it to your .bashrc
, by doing something like the following:
echo "export PATH=/usr/local/cuda-10.1/bin/:${PATH}" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=/usr/local/cuda-10.1/lib64/${LD_LIBRARY_PATH}" >> ~/.bashrc
. ~/.bashrc
Now we want to test the installation to make sure everything worked. Start by downloading the test suite from Nvidia.
cuda-install-samples-10.1.sh .
Then, enter the directory and build the tests (this will take a while, so this is a good time to grab a snack).
cd NVIDIA_CUDA-10.1_Samples
make clean
make
Once you have the tests built, we are first going to test whether the device is visible:
./bin/deviceQuery
You should see a readout of device info, including the CUDA driver version (it should be 10.1) followed by "Result = PASS". Then we can test the data transfer speeds:
./bin/bandwidthTest
You should see a readout of data transfer speeds followed by "Result = PASS". If either of these tests fail, something bad happened during your installation.
3. Install CUDNN
Navigate to https://developer.nvidia.com/rdp/cudnn-archive. If you have a developer account with Nvidia, you'll need to sign in. If not, you'll need to sign up. In the archive list, choose cudnn version 7.6.5, for cuda version 10.1. You'll see three files to download: the runtime, the developer kit, and the code samples. Once you have all three, change into the directory where have downloaded them, and install them with dpkg
:
sudo dpkg -i libcudnn7_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7-dev_7.6.5.32-1+cuda10.1_amd64.deb
sudo dpkg -i libcudnn7-doc_7.6.5.32-1+cuda10.1_amd64.deb
After they are installed, copy the test directory to your user home, build the tests, and run them:
cp -r /usr/src/cudnn_samples_v7/ $HOME
cd $HOME/cudnn_samples_v7/mnistCUDNN
make clean
make
./mnistCUDNN
If everything worked correctly, you should see the classification output followed by "Test passed!"
4. Install tensorflow and pytorch
Now, finally, we can install our neural network libraries. Tensforflow 2.1.0 has been compiled against CUDA 10.1, and should work correctly out of the box:
conda install "tensorflow-gpu==2.1.0"
Pytorch, on the other hand, does not provide compiled binaries for older GPU cards (see the comments from this Github issue), so I had to compile mine from source. You may not need to do this, but if you do, the commands are the following
git clone https://github.com/pytorch/pytorch.git
git clone https://github.com/pytorch/vision.git
git clone https://github.com/pytorch/text.git
cd torch
python setup.py install
cd ../vision
python setup.py install
cd ../text
python setup.py install