This is going to be a tutorial on how to install
Tensorflow is an open source software library developed and used by Google that is fairly common among students, researchers, and developers for deep learning applications such as neural networks. It has both the CPU as well as GPU version available and although the CPU version works quite well, realistically, if you are going for deep learning, you will need GPU. In order to use the GPU version of TensorFlow, you will need an NVIDIA GPU with a compute capability > 3.0.
Using latest version of Tensorflow provides you latest features and optimization, using latest CUDA Toolkit provides you speed improvement with latest gpu support and using latest CUDNN greatly improves deep learing training time.
There must be 64-bit python installed tensorflow does not work on 32-bit python installation.
Step 1: Update and Upgrade your system:
sudo apt-get update
sudo apt-get upgrade
Step 2: Verify You Have a CUDA-Capable GPU:
lspci | grep -i nvidia
Note GPU model. eg. GeForce 840M
If you do not see any settings, update the PCI hardware database that Linux maintains by entering update-pciids (generally found in /sbin) at the command line and rerun the previous lspci command.
If your graphics card is from NVIDIA then goto http://developer.nvidia.com/cuda-gpus and verify if listed in CUDA enabled gpu list.
Note down its Compute Capability. eg. GeForce 840M 5.0
Step 3: Verify You Have a Supported Version of Linux:
To determine which distribution and release number you’re running, type the following at the command line:
uname -m && cat /etc/*release
The x86_64 line indicates you are running on a 64-bit system which is supported by cuda 9.1
Step 4: Install Dependencies:
Required to compile from source:
sudo apt-get install build-essential
sudo apt-get install cmake git unzip zip
sudo apt-get install python-dev python3-dev python-pip python3-pip
Step 5: Install linux kernel header:
Goto terminal and type:
uname -r
You can get like “4.15.0-36-generic”. Note down linux kernel version.
To install linux header supported by your linux kernel do following:
sudo apt-get install linux-headers-$(uname -r)
Step 6: Install NVIDIA CUDA 10.0:
Remove previous cuda installation:
sudo apt-get purge nvidia*
sudo apt-get autoremove
sudo apt-get autoclean
sudo rm -rf /usr/local/cuda*
Install cuda :
For Ubuntu 16.04 :
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
For Ubuntu 18.04 :
sudo apt-key adv --fetch-keys http://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
echo "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64 /" | sudo tee /etc/apt/sources.list.d/cuda.list
For Both Options:
sudo apt-get update
sudo apt-get -o Dpkg::Options::="--force-overwrite" install cuda-10-0 cuda-drivers
You can also install cuda toolkit following instructions from here and it is recommended to use deb[network].
Step 7: Reboot the system to load the NVIDIA drivers.
Step 8: Go to terminal and type:
echo 'export PATH=/usr/local/cuda-10.0/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-10.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc
sudo ldconfig
nvidia-smi
Check driver version probably Driver Version: 396.26
(not likely) If you got nvidia-smi is not found then you have unsupported linux kernel installed. Comment your linux kernel version noted in step 5.
You can check your cuda installation using following sample:
cuda-install-samples-10.0.sh ~
cd ~/NVIDIA_CUDA-10.0_Samples/5_Simulations/nbody
make
./nbody
hello,
thank you for the tutorial.
I have the following problem when importing TF
do you know how to fix? Thanks
Traceback (most recent call last):
File “”, line 1, in
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/__init__.py”, line 24, in
from tensorflow.python import pywrap_tensorflow # pylint: disable=unused-import
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/__init__.py”, line 88, in
from tensorflow.python import keras
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/__init__.py”, line 24, in
from tensorflow.python.keras import activations
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/activations/__init__.py”, line 22, in
from tensorflow.python.keras._impl.keras.activations import elu
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/__init__.py”, line 21, in
from tensorflow.python.keras._impl.keras import activations
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/activations.py”, line 23, in
from tensorflow.python.keras._impl.keras import backend as K
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/_impl/keras/backend.py”, line 36, in
from tensorflow.python.layers import base as tf_base_layers
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/layers/base.py”, line 25, in
from tensorflow.python.keras.engine import base_layer
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/__init__.py”, line 23, in
from tensorflow.python.keras.engine.base_layer import InputSpec
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py”, line 35, in
from tensorflow.python.keras import backend
File “/home/lia/.local/lib/python3.6/site-packages/tensorflow/python/keras/backend/__init__.py”, line 22, in
from tensorflow.python.keras._impl.keras.backend import abs
ImportError: cannot import name ‘abs’
Change directory to any other from source of tensorflow and try again.
After installing cuda 10 and reboot i had an error:
NVIDIA-SMI has failed because it couldn’t communicate with the NVIDIA driver
the solution was to disable Secure Boot in BIOS:
https://www.gigabyte.com/us/Support/Faq/3001
nvidia-smi is not found.
Linux kernel version: 4.15.0-39-generic
What should I do now?
Try to downgrade linux kernel from boot menu -> advanced options. Also disable secure boot from Bios.
Thank you!
Having tried many nvidia-docker and other solutions that fail for one reason or another.
These build steps has me up and running, native on my machine, with cuda-10.0, cudnn-7.4.1.5 and nccl_2.3.7
Very nice tutorial.
Using your tutorial I got Tensorflow running on a RTX 2070.
My setup: Linux Mint 19, Python 3.6, Tensorflow 1.12.0, Cuda 10.0, cudNN 7.4.1, NCCL 2.3.7, Keras 2.2.4.
The build using Bazel 0.17.2 took about 2 hours.
For Linux Mint users: to build the Cuda example and test the Cuda installation, modify the file /NVIDIA_CUDA-10.0_Samples/5_Simulations/nbody/findgllib.mk.
replace: UBUNTU = $(shell echo $(DISTRO) | grep -i ubuntu >/dev/null 2>&1; echo $?)
with: UBUNTU = $(shell echo $(DISTRO) | grep -i linuxmint >/dev/null 2>&1; echo $?)
Otherwise the Open GL libraries can’t be found (libGL.so and libGLU.so).
Thank you!
Thanks so much
Oh wow this worked like a miracle, thanks so much!
Just a tip to others, use the Bazel version the author says to use. And before you run the bazel command, do: export TMP=”/tmp”. Make sure to also install protobuf first following the steps here:
sudo apt-get install autoconf automake libtool curl make g++ unzip -y
git clone https://github.com/google/protobuf.git
cd protobuf
git submodule update –init –recursive
./autogen.sh
./configure
make
make check
sudo make install
sudo ldconfig
[Source: https://gist.github.com/diegopacheco/cd795d36e6ebcd2537cd18174865887b%5D
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scc_Cops_Slogging_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]′
bazel-out/host/bin/_solib_local/_U_S_Stensorflow_Scc_Cops_Slogging_Uops_Ugen_Ucc___Utensorflow/libtensorflow_framework.so: undefined reference to `[email protected]′
collect2: error: ld returned 1 exit status
Target //tensorflow/tools/pip_package:build_pip_package failed to build
Use –verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 1.186s, Critical Path: 0.48s
INFO: 0 processes.
FAILED: Build did NOT complete successfully
Someont know why this happen?
Thanks for this tutorial. I am unable to get it to work though, with 7.41 CUDA and 2.3.7 NCCL. What does the following error message mean?
bazel build –config=opt –config=cuda //tensorflow/tools/pip_package:build_pip_package
WARNING: The following configs were expanded more than once: [cuda]. For repeatable flags, repeats are counted twice and may lead to unexpected behavior.
ERROR: /home/hegerber/.cache/bazel/_bazel_kontron/8a1cf3c7d840757bff354f793f430a39/external/local_config_cc/BUILD:57:1: in cc_toolchain rule @local_config_cc//:cc-compiler-k8: Error while selecting cc_toolchain: Toolchain identifier ‘local’ was not found, valid identifiers are [local_linux, local_darwin, local_windows]
ERROR: Analysis of target ‘//tensorflow/tools/pip_package:build_pip_package’ failed; build aborted: Analysis of target ‘@local_config_cc//:cc-compiler-k8’ failed; build aborted
INFO: Elapsed time: 0.344s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (0 packages loaded, 0 targets conf\
igured)
currently loading: @protobuf_archive// … (2 packages)
Any advice will be appreciated. Thanks in advance.
Thanks a lot !! Great tutorial, previous descriptions I found missed the initial cleanup steps