What tensorflow_root should be defined to when using Google's bin? #10

SamuelLarkin · 2018-10-11T15:24:08Z

SamuelLarkin
Oct 11, 2018

Hi,
I'm trying to compile DeePMD-kit using the Google's bin route and I was wandering what tensorflow_root should point to in that case?

Here is my command:

cmake \
  -DXDRFILE_ROOT=$XDRFILE_HOME \
  -DTENSORFLOW_ROOT=$TENSORFLOW_HOME \
  -DCMAKE_INSTALL_PREFIX=$HOME/pkgs/deepmd-kit-gpu \
  -DTF_GOOGLE_BIN=true \
  ..

but it can't find tensorflow

-- The C compiler identification is GNU 5.4.0
-- The CXX compiler identification is GNU 5.4.0
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Not assuming google tensorflow binary, use abi = 1
-- Try OpenMP C flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Try OpenMP CXX flag = [-fopenmp]
-- Performing Test OpenMP_FLAG_DETECTED
-- Performing Test OpenMP_FLAG_DETECTED - Success
-- Found OpenMP: -fopenmp  
CMake Error at cmake/Findtensorflow.cmake:30 (message):
  Not found 'include/tensorflow/core/public/session.h' directory in path
  '/usr/;/usr/local/' You can manually set the tensorflow install path by
  -DTENSORFLOW_ROOT
Call Stack (most recent call first):
  CMakeLists.txt:42 (find_package)


-- Configuring incomplete, errors occurred!
See also "$HOME/git/deepmd-kit/source/build.gpu.2/CMakeFiles/CMakeOutput.log".

I'm running Ubuntu 16.04.5 LTS

g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Answered by amcadmus

Dec 6, 2018

This bug may happen when the version of tensorflow's python interface (1.11) is inconsistent with the c++ interface (1.8). We have update the instruction of installation, and hope it will help.

Thanks @SamuelLarkin a lot for reporting.

View full answer

amcadmus · 2018-10-11T17:47:08Z

amcadmus
Oct 11, 2018
Maintainer

Google provide binary for the python interface of tensorflow, which is needed by the training part of deepmd-kit.

The deepmd-kit also needs the c++ interface of tensorflow for MD simulations with deep potentials. The variable TENSORFLOW_ROOT should be set to where the c++ interface of tensorflow is installed. You can follow the instruction https://github.com/deepmodeling/deepmd-kit#install-tensorflows-c-interface to compile the c++ interface.

0 replies

SamuelLarkin · 2018-10-11T18:54:31Z

SamuelLarkin
Oct 11, 2018
Author

I actually tried to compile tensorflow-gpu and use its home as my $TENSORFLOW_HOME.

Here's what I've done:

sudo apt-get install python3-pip
pip3 install tensorflow-gpu
pip3 freeze
tensorflow-gpu==1.11.0

git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.8 -b v1.8.0
cd tensorflow-1.8
source ~/pkgs/cuda-9.0/activate
./configure

bazel \
  build \
  --config=opt \
  --config=cuda \
  --verbose_failures \
  //tensorflow:libtensorflow_cc.so \
  //tensorflow:libtensorflow_framework.so

export tensorflow_root=$HOME/pkgs/tensorflow-1.8

mkdir -p $tensorflow_root/lib
cp bazel-bin/tensorflow/libtensorflow_cc.so $tensorflow_root/lib/
cp bazel-bin/tensorflow/libtensorflow_framework.so $tensorflow_root/lib/

mkdir -p $tensorflow_root/include/tensorflow
cp -r bazel-genfiles/* $tensorflow_root/include/
cp -r tensorflow/cc $tensorflow_root/include/tensorflow
cp -r tensorflow/core $tensorflow_root/include/tensorflow
cp -r third_party $tensorflow_root/include


tensorflow/contrib/makefile/download_dependencies.sh

cd tensorflow/contrib/makefile/downloads/protobuf/
./autogen.sh
./configure --prefix=$tensorflow_root
\make -j 22
\make -j 22 install

cd ../eigen
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j 22 install

cd ../../nsync
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j 22
\make -j 22 install

cd ../../../../../..

source ~/pkgs/xdrfile-1.1.4/activate
source ~/pkgs/cuda-9.0/activate
source ~/pkgs/tensorflow-1.8/activate

cd source
mkdir build
cd build
rm -fr *; \
cmake \
  -DXDRFILE_ROOT=$XDRFILE_HOME \
  -DTENSORFLOW_ROOT=$TENSORFLOW_HOME \
  -DCMAKE_INSTALL_PREFIX=$HOME/pkgs/deepmd-kit-gpu \
  -DTF_GOOGLE_BIN=true \
  ..

\make -j 22
\make install

Then I get an error running DeePMD's dp_train

$HOME/pkgs/deepmd-kit-gpu/bin/dp_train 
Traceback (most recent call last):
  File "$HOME/pkgs/deepmd-kit-gpu/bin/dp_train", line 16, in <module>
    from deepmd.Model import NNPModel
  File "$HOME/pkgs/deepmd-kit-gpu/bin/../lib/deepmd/Model.py", line 15, in <module>
    op_module = tf.load_op_library(module_path + "libop_abi.so")
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: $HOME/pkgs/deepmd-kit-gpu/lib/deepmd/libop_abi.so: undefined symbol: _ZN10tensorflow6StatusC1ENS_5error4CodeENS_11StringPieceE

0 replies

amcadmus · 2018-10-11T20:21:10Z

amcadmus
Oct 11, 2018
Maintainer

What is the version of your gcc?

0 replies

SamuelLarkin · 2018-10-12T19:27:51Z

SamuelLarkin
Oct 12, 2018
Author

@amcadmus

g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

0 replies

SamuelLarkin · 2018-10-12T19:45:36Z

SamuelLarkin
Oct 12, 2018
Author

Humm!
I've finally got it to work but I'm not exactly sure of the final procedure.

I moved out of my conda environment.
I downgrade from cuda-9.2 to cuda-9.0
I also used tensorflow-1.11.0 which I had to fix because when it compiles it uses an internal version of protobuf-3.6.0 which doesn't match the version under tensorflow/contrib/makefile/downloads/protobuf which is protobuf-3.5.0, thus what gets installed in $tensorflow_root is incompatible.
I had to install absl into $tensorflow_root

cd tensorflow/contrib/makefile/downloads/absl
bazel build
rsync -n -a --include '*/' --include '*.h' --exclude '*' absl $tensorflow_root/include/

I will revise my notes and I'm tempted to start from scratch to make sure I'm not missing anything.

Thanks for your help

0 replies

amcadmus · 2018-12-06T02:48:29Z

amcadmus
Dec 6, 2018
Maintainer

This bug may happen when the version of tensorflow's python interface (1.11) is inconsistent with the c++ interface (1.8). We have update the instruction of installation, and hope it will help.

Thanks @SamuelLarkin a lot for reporting.

0 replies

Dongyuan-Ni · 2018-12-16T03:53:51Z

Dongyuan-Ni
Dec 16, 2018

@SamuelLarkin
Hello, when I am trying to train some data using dp_train, I got the same error as yours. Can you please show more details about how to solve this problem? Because both of my tensorflow's python interface and c++ interface are the same version(1.5.0). So I think the problem may come from elsewhere but I don't know. Thank you.

0 replies

SamuelLarkin · 2018-12-17T13:25:15Z

SamuelLarkin
Dec 17, 2018
Author

@Johndoni , let me check this for you. It's been a while since I've done this. I'll get back to you.

0 replies

SamuelLarkin · 2018-12-17T14:47:07Z

SamuelLarkin
Dec 17, 2018
Author

@Johndoni here are my notes that I took but I haven't tried to rerun them to see if they still work. I hate having to install tensorflow, you have to have a very specific version of bazel, bazel that you can't easily compile yourself, that matches the version of tensorflow you want to build because why not tie together your dependency handler with a random project and make them inter-dependent. Might also through incoherent dependencies of 3rd party libraries to add some fun to it. Enough rant, here are my instructions, I hope it helps you a bit.

Tensorflow

Compiling Tensorflow

source ~/pkgs/cuda-9.0/activate

git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.11.0 -b v1.11.0
cd tensorflow-1.11.0
bazel test -c opt -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/lite/...
./configure

Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3

Please input the desired Python library path to use.  Default is [/usr/local/lib/python3.5/dist-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Amazon AWS Platform support? [Y/n]: n
No Amazon AWS Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with nGraph support? [y/N]: n
No nGraph support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]: 

Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-9.0

Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1

Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.0]: 

Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.

Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]: 

Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.0]:

Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,3.5]: 

Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]: 

Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]: 

Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.

Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
	--config=mkl         	# Build with MKL support.
	--config=monolithic  	# Config for mostly static monolithic build.
Configuration finished

bazel \
  build \
  --config=opt \
  --config=cuda \
  --verbose_failures \
  //tensorflow:libtensorflow_cc.so \
  //tensorflow:libtensorflow_framework.so \
  //tensorflow/tools/pip_package:build_pip_package

Installing Tensorflow

export tensorflow_root=$HOME/pkgs/tensorflow-1.11.0

mkdir -p $tensorflow_root/lib
cp bazel-bin/tensorflow/libtensorflow_cc.so $tensorflow_root/lib/
cp bazel-bin/tensorflow/libtensorflow_framework.so $tensorflow_root/lib/

mkdir -p $tensorflow_root/include/tensorflow
cp -r bazel-genfiles/* $tensorflow_root/include/
cp -r tensorflow/cc $tensorflow_root/include/tensorflow
cp -r tensorflow/core $tensorflow_root/include/tensorflow
cp -r third_party $tensorflow_root/include

Installing Tensorflow's dependencies

tensorflow/contrib/makefile/download_dependencies.sh

cd tensorflow/contrib/makefile/downloads/protobuf/
./autogen.sh
./configure --prefix=$tensorflow_root
\make -j $(nproc)
\make -j $(nproc) install

cd ../eigen
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j $(nproc) install

cd ../../nsync
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j $(nproc)
\make -j $(nproc) install

cd ../absl
bazel build
rsync -n -a --include '*/' --include '*.h' --exclude '*' absl $tensorflow_root/include/
# OR
find . -name '*.h' -exec cp --parents \{\} $tensorflow_root/include/ \;
# OR YARK!
cp -r absl $tensorflow_root/include/
find $tensorflow_root/include/absl/  -not -name \*.h  -type f -delete

cd ../../../../../..

Matching Tensorflow's protobuf version.

It would have been nice that the protobuf's version would be the same through out the build but no, that would have been TOO convenient.

wget -O protobuf-3.6.0.tar.gz 'https://mirror.bazel.build/github.com/google/protobuf/archive/v3.6.0.tar.gz'
tar xf protobuf-3.6.0.tar.gz
cd protobuf-3.6.0/
./autogen.sh 
./configure --prefix=$tensorflow_root
\make -j $(nproc)
\make -j $(nproc) install

DeepMD

Compiling DeepMD-kit

Note that tensorflow-1.11.0 uses protobuf-3.6.0 when building itself but the dependencies are downloading protobuf-3.5.0 and thus creates an error.

source ~/pkgs/xdrfile-1.1.4/activate
source ~/pkgs/cuda-9.0/activate
source ~/pkgs/tensorflow-1.11.0/activate
#source ~/pkgs/protobuf-3.6.0/activate   # not needed since we are installing protobuf-3.6.0 in $tensorflow_root

cd source
mkdir build
cd build
rm -fr *; \
cmake \
  -DXDRFILE_ROOT=$XDRFILE_HOME \
  -DTENSORFLOW_ROOT=$TENSORFLOW_HOME \
  -DCMAKE_INSTALL_PREFIX=$HOME/pkgs/deepmd-kit-gpu \
  -DTF_GOOGLE_BIN=true \
  ..

\make -j $(nproc)
\make install

Training with deepMD-kit

Successful Setup

source ~/pkgs/deepmd-kit-gpu/activate 
source ~/pkgs/cuda-9.0/activate 
cd examples/train/
dp_train water.json

Failure

$HOME/pkgs/deepmd-kit-gpu/bin/dp_train 
Traceback (most recent call last):
  File "$HOME/pkgs/deepmd-kit-gpu/bin/dp_train", line 16, in <module>
    from deepmd.Model import NNPModel
  File "$HOME/pkgs/deepmd-kit-gpu/bin/../lib/deepmd/Model.py", line 15, in <module>
    op_module = tf.load_op_library(module_path + "libop_abi.so")
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: $HOME/pkgs/deepmd-kit-gpu/lib/deepmd/libop_abi.so: undefined symbol: _ZN10tensorflow6StatusC1ENS_5error4CodeENS_11StringPieceE

LAMMPS with DeepMD

Compiling

source ~/pkgs/cuda-9.0/activate 
source ~/pkgs/tensorflow-1.11.0/activate 

cd $HOME/git/deepmd-kit
git clone https://github.com/lammps/lammps.git lammps.git
cd lammps.git/src
rsync -Parz $HOME/git/deepmd-kit/source/build.gpu.2/USER-DEEPMD .
\make yes-user-deepmd
\make serial -j $(nproc)
# There is no make install?!?
cp lmp_serial ~/pkgs/deepmd-kit-gpu/bin/

Testing

#source ~/pkgs/tensorflow-1.11.0/activate   # Dependency to tensorflow is hardcoded in lmp_serial's rpath
source ~/pkgs/cuda-9.0/activate
source ~/pkgs/deepmd-kit-gpu/activate

cd $HOME/git/deepmd-kit/examples/lmp
lmp_serial < lammps.in

Tensorflow's ./configuration

cat .tf_configure.bazelrc 
build --action_env PYTHON_BIN_PATH="/usr/bin/python3"
build --action_env PYTHON_LIB_PATH="/usr/local/lib/python3.5/dist-packages"
build --python_path="/usr/bin/python3"
build:gcp --define with_gcp_support=true
build:hdfs --define with_hdfs_support=true
build:aws --define with_aws_support=true
build:kafka --define with_kafka_support=true
build:xla --define with_xla_support=true
build:gdr --define with_gdr_support=true
build:verbs --define with_verbs_support=true
build:ngraph --define with_ngraph_support=true
build --action_env TF_NEED_OPENCL_SYCL="0"
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda-9.0"
build --action_env TF_CUDA_VERSION="9.0"
build --action_env CUDNN_INSTALL_PATH="/usr/local/cuda-9.0"
build --action_env TF_CUDNN_VERSION="7"
build --action_env NCCL_INSTALL_PATH="/usr/local/cuda-9.0"
build --action_env TF_NCCL_VERSION="2"
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="3.5,3.5"
build --action_env LD_LIBRARY_PATH="/usr/local/cuda-9.0/lib:/usr/local/cuda-9.0/lib64"
build --action_env TF_CUDA_CLANG="0"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
build --config=cuda
test --config=cuda
build --define grpc_no_ares=true
build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true

Versions

OS

cat /etc/issue
Ubuntu 16.04.5 LTS \n \l

Compiler

g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Bazel

Important that you match bazel's version with a compatible version of bazel that is known to compile the version of tensorflow you want.

bazel version
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:$HOME/.cache/bazel/_bazel_larkins/install/792a28b07894763eaa2bd870f8776b23/_embedded_binaries/A-server.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
Build label: 0.17.2
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Sep 21 10:31:42 2018 (1537525902)
Build timestamp: 1537525902
Build timestamp as int: 1537525902

0 replies

hanyecn · 2020-10-09T08:55:04Z

hanyecn
Oct 9, 2020

I have encountered the same problem. I installed the c++ interface of tensorflow 2.1.2, when I tried to "cmake -DTENSORFLOW=$tensorflow_root -DXDRFILE=$xdrfile -DPREFIX=$deepmd_kit_root ../', the program can not find session.h of tensorflow which is really there. Have you solved the problem? Help me please if it is possible. Thanks a lot.

0 replies

What tensorflow_root should be defined to when using Google's bin? #10

Uh oh!

Uh oh!

SamuelLarkin Oct 11, 2018

Replies: 10 comments

Uh oh!

amcadmus Oct 11, 2018 Maintainer

Uh oh!

Uh oh!

SamuelLarkin Oct 11, 2018 Author

Uh oh!

amcadmus Oct 11, 2018 Maintainer

Uh oh!

SamuelLarkin Oct 12, 2018 Author

Uh oh!

SamuelLarkin Oct 12, 2018 Author

Uh oh!

amcadmus Dec 6, 2018 Maintainer

Uh oh!

Dongyuan-Ni Dec 16, 2018

Uh oh!

SamuelLarkin Dec 17, 2018 Author

Uh oh!

Uh oh!

SamuelLarkin Dec 17, 2018 Author

Tensorflow

Compiling Tensorflow

Installing Tensorflow

Installing Tensorflow's dependencies

Matching Tensorflow's protobuf version.

DeepMD

Compiling DeepMD-kit

Training with deepMD-kit

Successful Setup

Failure

LAMMPS with DeepMD

Compiling

Testing

Tensorflow's ./configuration

Versions

OS

Compiler

Bazel

Uh oh!

hanyecn Oct 9, 2020

SamuelLarkin
Oct 11, 2018

amcadmus
Oct 11, 2018
Maintainer

SamuelLarkin
Oct 11, 2018
Author

amcadmus
Oct 11, 2018
Maintainer

SamuelLarkin
Oct 12, 2018
Author

SamuelLarkin
Oct 12, 2018
Author

amcadmus
Dec 6, 2018
Maintainer

Dongyuan-Ni
Dec 16, 2018

SamuelLarkin
Dec 17, 2018
Author

SamuelLarkin
Dec 17, 2018
Author

hanyecn
Oct 9, 2020