What tensorflow_root should be defined to when using Google's bin? #10
-
Hi, Here is my command: cmake \
-DXDRFILE_ROOT=$XDRFILE_HOME \
-DTENSORFLOW_ROOT=$TENSORFLOW_HOME \
-DCMAKE_INSTALL_PREFIX=$HOME/pkgs/deepmd-kit-gpu \
-DTF_GOOGLE_BIN=true \
.. but it can't find tensorflow
I'm running Ubuntu 16.04.5 LTS g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
Beta Was this translation helpful? Give feedback.
Replies: 10 comments
-
Google provide binary for the python interface of tensorflow, which is needed by the training part of deepmd-kit. The deepmd-kit also needs the c++ interface of tensorflow for MD simulations with deep potentials. The variable TENSORFLOW_ROOT should be set to where the c++ interface of tensorflow is installed. You can follow the instruction https://github.com/deepmodeling/deepmd-kit#install-tensorflows-c-interface to compile the c++ interface. |
Beta Was this translation helpful? Give feedback.
-
I actually tried to compile tensorflow-gpu and use its home as my $TENSORFLOW_HOME. Here's what I've done: sudo apt-get install python3-pip
pip3 install tensorflow-gpu
pip3 freeze
tensorflow-gpu==1.11.0 git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.8 -b v1.8.0
cd tensorflow-1.8
source ~/pkgs/cuda-9.0/activate
./configure
bazel \
build \
--config=opt \
--config=cuda \
--verbose_failures \
//tensorflow:libtensorflow_cc.so \
//tensorflow:libtensorflow_framework.so export tensorflow_root=$HOME/pkgs/tensorflow-1.8
mkdir -p $tensorflow_root/lib
cp bazel-bin/tensorflow/libtensorflow_cc.so $tensorflow_root/lib/
cp bazel-bin/tensorflow/libtensorflow_framework.so $tensorflow_root/lib/
mkdir -p $tensorflow_root/include/tensorflow
cp -r bazel-genfiles/* $tensorflow_root/include/
cp -r tensorflow/cc $tensorflow_root/include/tensorflow
cp -r tensorflow/core $tensorflow_root/include/tensorflow
cp -r third_party $tensorflow_root/include
tensorflow/contrib/makefile/download_dependencies.sh
cd tensorflow/contrib/makefile/downloads/protobuf/
./autogen.sh
./configure --prefix=$tensorflow_root
\make -j 22
\make -j 22 install
cd ../eigen
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j 22 install
cd ../../nsync
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j 22
\make -j 22 install
cd ../../../../../.. source ~/pkgs/xdrfile-1.1.4/activate
source ~/pkgs/cuda-9.0/activate
source ~/pkgs/tensorflow-1.8/activate
cd source
mkdir build
cd build
rm -fr *; \
cmake \
-DXDRFILE_ROOT=$XDRFILE_HOME \
-DTENSORFLOW_ROOT=$TENSORFLOW_HOME \
-DCMAKE_INSTALL_PREFIX=$HOME/pkgs/deepmd-kit-gpu \
-DTF_GOOGLE_BIN=true \
..
\make -j 22
\make install Then I get an error running DeePMD's dp_train $HOME/pkgs/deepmd-kit-gpu/bin/dp_train
Traceback (most recent call last):
File "$HOME/pkgs/deepmd-kit-gpu/bin/dp_train", line 16, in <module>
from deepmd.Model import NNPModel
File "$HOME/pkgs/deepmd-kit-gpu/bin/../lib/deepmd/Model.py", line 15, in <module>
op_module = tf.load_op_library(module_path + "libop_abi.so")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: $HOME/pkgs/deepmd-kit-gpu/lib/deepmd/libop_abi.so: undefined symbol: _ZN10tensorflow6StatusC1ENS_5error4CodeENS_11StringPieceE |
Beta Was this translation helpful? Give feedback.
-
What is the version of your gcc? |
Beta Was this translation helpful? Give feedback.
-
g++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. |
Beta Was this translation helpful? Give feedback.
-
Humm!
cd tensorflow/contrib/makefile/downloads/absl
bazel build
rsync -n -a --include '*/' --include '*.h' --exclude '*' absl $tensorflow_root/include/ I will revise my notes and I'm tempted to start from scratch to make sure I'm not missing anything. Thanks for your help |
Beta Was this translation helpful? Give feedback.
-
This bug may happen when the version of tensorflow's python interface (1.11) is inconsistent with the c++ interface (1.8). We have update the instruction of installation, and hope it will help. Thanks @SamuelLarkin a lot for reporting. |
Beta Was this translation helpful? Give feedback.
-
@SamuelLarkin |
Beta Was this translation helpful? Give feedback.
-
@Johndoni , let me check this for you. It's been a while since I've done this. I'll get back to you. |
Beta Was this translation helpful? Give feedback.
-
@Johndoni here are my notes that I took but I haven't tried to rerun them to see if they still work. I hate having to install tensorflow, you have to have a very specific version of bazel, bazel that you can't easily compile yourself, that matches the version of tensorflow you want to build because why not tie together your dependency handler with a random project and make them inter-dependent. Might also through incoherent dependencies of 3rd party libraries to add some fun to it. Enough rant, here are my instructions, I hope it helps you a bit. TensorflowCompiling Tensorflowsource ~/pkgs/cuda-9.0/activate
git clone https://github.com/tensorflow/tensorflow.git tensorflow-1.11.0 -b v1.11.0
cd tensorflow-1.11.0
bazel test -c opt -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/lite/...
./configure
Please specify the location of python. [Default is /usr/bin/python]: /usr/bin/python3
Please input the desired Python library path to use. Default is [/usr/local/lib/python3.5/dist-packages]
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: n
No jemalloc as malloc support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]: n
No Google Cloud Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Hadoop File System support? [Y/n]: n
No Hadoop File System support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Amazon AWS Platform support? [Y/n]: n
No Amazon AWS Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with Apache Kafka Platform support? [Y/n]: n
No Apache Kafka Platform support will be enabled for TensorFlow.
Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.
Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.
Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.
Do you wish to build TensorFlow with nGraph support? [y/N]: n
No nGraph support will be enabled for TensorFlow.
Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.
Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.
Please specify the CUDA SDK version you want to use. [Leave empty to default to CUDA 9.0]:
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]: /usr/local/cuda-9.0
Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 7.0]: 7.1
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.0]:
Do you wish to build TensorFlow with TensorRT support? [y/N]: n
No TensorRT support will be enabled for TensorFlow.
Please specify the NCCL version you want to use. If NCCL 2.2 is not installed, then you can use version 1.3 that can be fetched automatically but it may have worse performance with multiple GPUs. [Default is 2.2]:
Please specify the location where NCCL 2 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda-9.0]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,3.5]:
Do you want to use clang as CUDA compiler? [y/N]: n
nvcc will be used as CUDA compiler.
Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]: n
No MPI support will be enabled for TensorFlow.
Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Would you like to interactively configure ./WORKSPACE for Android builds? [y/N]: n
Not configuring the WORKSPACE for Android builds.
Preconfigured Bazel build configs. You can use any of the below by adding "--config=<>" to your build command. See tools/bazel.rc for more details.
--config=mkl # Build with MKL support.
--config=monolithic # Config for mostly static monolithic build.
Configuration finished
bazel \
build \
--config=opt \
--config=cuda \
--verbose_failures \
//tensorflow:libtensorflow_cc.so \
//tensorflow:libtensorflow_framework.so \
//tensorflow/tools/pip_package:build_pip_package Installing Tensorflowexport tensorflow_root=$HOME/pkgs/tensorflow-1.11.0
mkdir -p $tensorflow_root/lib
cp bazel-bin/tensorflow/libtensorflow_cc.so $tensorflow_root/lib/
cp bazel-bin/tensorflow/libtensorflow_framework.so $tensorflow_root/lib/
mkdir -p $tensorflow_root/include/tensorflow
cp -r bazel-genfiles/* $tensorflow_root/include/
cp -r tensorflow/cc $tensorflow_root/include/tensorflow
cp -r tensorflow/core $tensorflow_root/include/tensorflow
cp -r third_party $tensorflow_root/include Installing Tensorflow's dependenciestensorflow/contrib/makefile/download_dependencies.sh
cd tensorflow/contrib/makefile/downloads/protobuf/
./autogen.sh
./configure --prefix=$tensorflow_root
\make -j $(nproc)
\make -j $(nproc) install
cd ../eigen
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j $(nproc) install
cd ../../nsync
mkdir build_dir
cd build_dir
cmake -DCMAKE_INSTALL_PREFIX=$tensorflow_root ..
\make -j $(nproc)
\make -j $(nproc) install
cd ../absl
bazel build
rsync -n -a --include '*/' --include '*.h' --exclude '*' absl $tensorflow_root/include/
# OR
find . -name '*.h' -exec cp --parents \{\} $tensorflow_root/include/ \;
# OR YARK!
cp -r absl $tensorflow_root/include/
find $tensorflow_root/include/absl/ -not -name \*.h -type f -delete
cd ../../../../../.. Matching Tensorflow's protobuf version.It would have been nice that the protobuf's version would be the same through out the build but no, that would have been TOO convenient. wget -O protobuf-3.6.0.tar.gz 'https://mirror.bazel.build/github.com/google/protobuf/archive/v3.6.0.tar.gz'
tar xf protobuf-3.6.0.tar.gz
cd protobuf-3.6.0/
./autogen.sh
./configure --prefix=$tensorflow_root
\make -j $(nproc)
\make -j $(nproc) install DeepMDCompiling DeepMD-kitNote that tensorflow-1.11.0 uses protobuf-3.6.0 when building itself but the dependencies are downloading protobuf-3.5.0 and thus creates an error. source ~/pkgs/xdrfile-1.1.4/activate
source ~/pkgs/cuda-9.0/activate
source ~/pkgs/tensorflow-1.11.0/activate
#source ~/pkgs/protobuf-3.6.0/activate # not needed since we are installing protobuf-3.6.0 in $tensorflow_root
cd source
mkdir build
cd build
rm -fr *; \
cmake \
-DXDRFILE_ROOT=$XDRFILE_HOME \
-DTENSORFLOW_ROOT=$TENSORFLOW_HOME \
-DCMAKE_INSTALL_PREFIX=$HOME/pkgs/deepmd-kit-gpu \
-DTF_GOOGLE_BIN=true \
..
\make -j $(nproc)
\make install Training with deepMD-kitSuccessful Setupsource ~/pkgs/deepmd-kit-gpu/activate
source ~/pkgs/cuda-9.0/activate
cd examples/train/
dp_train water.json Failure$HOME/pkgs/deepmd-kit-gpu/bin/dp_train
Traceback (most recent call last):
File "$HOME/pkgs/deepmd-kit-gpu/bin/dp_train", line 16, in <module>
from deepmd.Model import NNPModel
File "$HOME/pkgs/deepmd-kit-gpu/bin/../lib/deepmd/Model.py", line 15, in <module>
op_module = tf.load_op_library(module_path + "libop_abi.so")
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: $HOME/pkgs/deepmd-kit-gpu/lib/deepmd/libop_abi.so: undefined symbol: _ZN10tensorflow6StatusC1ENS_5error4CodeENS_11StringPieceE LAMMPS with DeepMDCompilingsource ~/pkgs/cuda-9.0/activate
source ~/pkgs/tensorflow-1.11.0/activate
cd $HOME/git/deepmd-kit
git clone https://github.com/lammps/lammps.git lammps.git
cd lammps.git/src
rsync -Parz $HOME/git/deepmd-kit/source/build.gpu.2/USER-DEEPMD .
\make yes-user-deepmd
\make serial -j $(nproc)
# There is no make install?!?
cp lmp_serial ~/pkgs/deepmd-kit-gpu/bin/ Testing#source ~/pkgs/tensorflow-1.11.0/activate # Dependency to tensorflow is hardcoded in lmp_serial's rpath
source ~/pkgs/cuda-9.0/activate
source ~/pkgs/deepmd-kit-gpu/activate
cd $HOME/git/deepmd-kit/examples/lmp
lmp_serial < lammps.in Tensorflow's ./configurationcat .tf_configure.bazelrc
build --action_env PYTHON_BIN_PATH="/usr/bin/python3"
build --action_env PYTHON_LIB_PATH="/usr/local/lib/python3.5/dist-packages"
build --python_path="/usr/bin/python3"
build:gcp --define with_gcp_support=true
build:hdfs --define with_hdfs_support=true
build:aws --define with_aws_support=true
build:kafka --define with_kafka_support=true
build:xla --define with_xla_support=true
build:gdr --define with_gdr_support=true
build:verbs --define with_verbs_support=true
build:ngraph --define with_ngraph_support=true
build --action_env TF_NEED_OPENCL_SYCL="0"
build --action_env TF_NEED_CUDA="1"
build --action_env CUDA_TOOLKIT_PATH="/usr/local/cuda-9.0"
build --action_env TF_CUDA_VERSION="9.0"
build --action_env CUDNN_INSTALL_PATH="/usr/local/cuda-9.0"
build --action_env TF_CUDNN_VERSION="7"
build --action_env NCCL_INSTALL_PATH="/usr/local/cuda-9.0"
build --action_env TF_NCCL_VERSION="2"
build --action_env TF_CUDA_COMPUTE_CAPABILITIES="3.5,3.5"
build --action_env LD_LIBRARY_PATH="/usr/local/cuda-9.0/lib:/usr/local/cuda-9.0/lib64"
build --action_env TF_CUDA_CLANG="0"
build --action_env GCC_HOST_COMPILER_PATH="/usr/bin/gcc"
build --config=cuda
test --config=cuda
build --define grpc_no_ares=true
build:opt --copt=-march=native
build:opt --host_copt=-march=native
build:opt --define with_default_optimizations=true VersionsOScat /etc/issue
Ubuntu 16.04.5 LTS \n \l Compilerg++ --version
g++ (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. BazelImportant that you match bazel's version with a compatible version of bazel that is known to compile the version of tensorflow you want. bazel version
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:$HOME/.cache/bazel/_bazel_larkins/install/792a28b07894763eaa2bd870f8776b23/_embedded_binaries/A-server.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
WARNING: --batch mode is deprecated. Please instead explicitly shut down your Bazel server using the command "bazel shutdown".
Build label: 0.17.2
Build target: bazel-out/k8-opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Fri Sep 21 10:31:42 2018 (1537525902)
Build timestamp: 1537525902
Build timestamp as int: 1537525902 |
Beta Was this translation helpful? Give feedback.
-
I have encountered the same problem. I installed the c++ interface of tensorflow 2.1.2, when I tried to "cmake -DTENSORFLOW=$tensorflow_root -DXDRFILE=$xdrfile -DPREFIX=$deepmd_kit_root ../', the program can not find session.h of tensorflow which is really there. Have you solved the problem? Help me please if it is possible. Thanks a lot. |
Beta Was this translation helpful? Give feedback.
This bug may happen when the version of tensorflow's python interface (1.11) is inconsistent with the c++ interface (1.8). We have update the instruction of installation, and hope it will help.
Thanks @SamuelLarkin a lot for reporting.