Ubuntu 18.04.1 - Cuda 10.1 installation, updates nvidia driver to 455 which is not compatible with tensorflow,

My tensorflow 2.3.1 setup with cuda 10.1 was working fine till the time I mistakenly updated nvidia drivers and cuda.

Following are the steps I am using to install cuda 10-1

Purge all cuda and nvidia drivers

sudo apt-get --purge remove "cublas" "cuda*" "nsight*"

sudo apt-get --purge "nvidia*"

sudo apt-get autoremove sudo apt-get autoclean sudo rm -rf /usr/local/cuda*

Reboot

After this I follow instructions from tensorflow page

wget

sudo apt-key adv --fetch-keys

sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb

sudo apt-get update

wget

sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb

sudo apt-get update

sudo apt-get install --no-install-recommends nvidia-driver-450
sudo apt-get install --no-install-recommends cuda-10-1

It creates 2 folders in my /usr/local cuda-10.1 cuda-10.2

at this step, it removes 450 driver and installs 455, following are part of the messages I get

The following packages will be REMOVED: libnvidia-cfg1-450 libnvidia-compute-450 libnvidia-decode-450 libnvidia-encode-450 libnvidia-extra-450 libnvidia-fbc1-450 libnvidia-gl-450 libnvidia-ifr1-450 nvidia-compute-utils-450 nvidia-dkms-450 nvidia-driver-450 nvidia-kernel-common-450 nvidia-kernel-source-450 nvidia-utils-450 xserver-xorg-video-nvidia-450

If I go forward and install libcudnn7, and tensorflow

sudo apt-get install --no-install-recommends
libcudnn7=7.6.5.32-1+cuda10.1
libcudnn7-dev=7.6.5.32-1+cuda10.1

I get this in python

tf.config.list_physical_devices("GPU")

2020-10-07 13:10:02.262260: E tensorflow/stream_executor/cuda/cuda_diagnostics.cc:313] kernel version 450.80.2 does not match DSO version 455.23.5 -- cannot find working devices in this configuration

To fix this I tried

uninstalling 455

sudo apt purge nvidia-455*

reinstalling tensorflow, Now I get this error in python

tf.config.list_physical_devices("GPU")

2020-10-07 13:20:46.923513: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1 2020-10-07 13:20:46.959289: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:982] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-10-07 13:20:46.959608: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1716] Found device 0 with properties: pciBusID: 0000:01:00.0 name: GeForce RTX 2070 computeCapability: 7.5 coreClock: 1.62GHz coreCount: 36 deviceMemorySize: 7.79GiB deviceMemoryBandwidth: 417.29GiB/s 2020-10-07 13:20:46.959626: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudart.so.10.1 2020-10-07 13:20:46.959769: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory

How to fix this, Thanks

1 Answer

Terrance's reply helped fixing the issue of driver upgrade but had to install additional packages and set the config files.

this helped with additional steps

Following are the steps I used for cuda10.1 with nvidia 450 driver for unix 18.04

Steps:

Before installing cuda from run file, we need to install Driver

##Driver, this is as per tensorflow requirement, 455 doesnt work for current tensorflow version

sudo apt-get install --no-install-recommends nvidia-driver-450

##get runfile for cuda 10.1

wget

##install dependencies

sudo apt-get install freeglut3 freeglut3-dev libxi-dev libxmu-dev

##Follow installation steps by running following

sudo sh cuda_10.1.243_418.87.00_linux.run

#installer gives warning about preexisting driver, continue #select everything except driver in the menu, cuda will be installed, use ls /usr/local

Folder cuda-10.1

Create bash file for cuda profile

#you can use any text editor,

vim /etc/profile.d/cuda.sh

##add the following lines to this file to add path

export PATH=$PATH:/usr/local/cuda-10.1/bin export CUDADIR=/usr/local/cuda-10.1

##Create another file for LD_LIBRARY_PATH

vim /etc/ld.so.conf.d/cuda.conf

#add this line

/usr/local/cuda-10.1/lib64

#run

sudo ldconfig

For Cudnn, use these steps for tar file installation

These are 4 commands

tar -xzvf cudnn-10.1-linux-x64-v7.6.5.32.tgz

sudo cp cuda/include/cudnn*.h /usr/local/cuda/include

sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64

sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn*

If you get this error while using tf

failed call to cuInit: CUDA_ERROR_UNKNOWN

#use this sudo apt install nvidia-modprobe

If somebody wants to install tensorRT, these links are helpful

Why do I get "/sbin/ldconfig.real: /usr/local/cuda/lib64/libcudnn.so.7 is not a symbolic link"?

Pop Feed Daily

Ubuntu 18.04.1 - Cuda 10.1 installation, updates nvidia driver to 455 which is not compatible with tensorflow,

1 Answer

Your Answer

Sign up or log in

Post as a guest

You Might Also Like

Where are the thugs who reveal the Riddler trophy locations?

My Minecraft account is taking forever to sign in [closed]

How is it possible to preview a Minecraft world only by using its seed?

Objective minecraft.custom:minecraft.leave_game not working