Skip to content

Commit 92818ba

Browse files
committed
Merge branch 'develop' of https://github.com/PaddlePaddle/Paddle into threadpool_for_io
2 parents b851c07 + faa752a commit 92818ba

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+4939
-61
lines changed
Lines changed: 142 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,143 @@
1-
############################
2-
Install, Build and Unit test
3-
############################
1+
.. _install_faq:
42

5-
TBD
3+
###############################
4+
Compile, Install, and Unit Test
5+
###############################
6+
7+
.. contents::
8+
9+
1. Insufficient CUDA driver version
10+
----------------------------------------------------------------
11+
12+
Many users usually face issues like `Cuda Error: CUDA driver version is insufficient for CUDA runtime version` when running the PaddlePaddle GPU Docker image. The cause is that you may not map the local CUDA driver to a container directory.
13+
You can solve the issue by running the following commands:
14+
15+
.. code-block:: bash
16+
17+
$ export CUDA_SO="$(\ls usr/lib64/libcuda* | xargs -I{} echo '-v {}:{}') $(\ls /usr/lib64/libnvidia* | xargs -I{} echo '-v {}:{}')"
18+
$ export DEVICES=$(\ls /dev/nvidia* | xargs -I{} echo '--device {}:{}')
19+
$ docker run ${CUDA_SO} ${DEVICES} -it paddlepaddle/paddle:latest-gpu
20+
21+
For more infomation about Docker's installation and usage, please refer to `PaddlePaddle Docker documentation <http://www.paddlepaddle.org/docs/0.11.0/documentation/zh/getstarted/build_and_install/docker_install_en.html>`_ .
22+
23+
24+
2. Version mismatch between PythonLibs and PythonInterpreter
25+
----------------------------------------------------------------
26+
27+
It is a common bug when CMake looks up Python. If you install multiple versions of Python, Cmake may find the version mismatch between PythonLibs and PythonInterpreter . You are forced to specify a Python version, as follows.
28+
29+
.. code-block:: bash
30+
31+
cmake .. -DPYTHON_EXECUTABLE=<exc_path> -DPYTHON_LIBRARY=<lib_path> -DPYTHON_INCLUDE_DIR=<inc_path>
32+
33+
You should specify ``<exc_path>``, ``<lib_path>``, ``<inc_path>`` to your local paths.
34+
35+
3. PaddlePaddle version is 0.0.0
36+
------------------------------------------------
37+
This issue would happen when you run the code `paddle version` or `cmake ..`
38+
39+
.. code-block:: bash
40+
41+
CMake Warning at cmake/version.cmake:20 (message):
42+
Cannot add paddle version from git tag
43+
44+
You should pull all remote branches to your local machine with the command :code:`git fetch upstream` and then run :code:`cmake`
45+
46+
4. paddlepaddle\*.whl is not a supported wheel on this platform.
47+
------------------------------------------------------------------------
48+
49+
The primary cause for this issue is that it can not find the correct PaddlePaddle installation package that matches your current system.The latest PaddlePaddle Python installation package supports Linux x86_64 and MacOS 10.12 os including Python2.7 and Pip 9.0.1.
50+
51+
You can upgrade Pip with the following command\:
52+
53+
.. code-block:: bash
54+
55+
pip install --upgrade pip
56+
57+
If it does not work for you, you can run the command :code:`python -c "import pip; print(pip.pep425tags.get_supported())"` to get the suffix of Python package which your system may support and then compare it with the suffix of your installation.
58+
59+
If the system supports :code:`linux_x86_64` and the installation package is :code:`manylinux1_x86_64`, you should upgrade pip to the latest
60+
61+
if the system supports :code:`manylinux_x86_64` and the local installation package is :code:`linux1_x86_64`, you can rename the whl package to :code:`manylinux1_x86_64` and then try again.
62+
63+
64+
5. ImportError: No module named v2
65+
----------------------------------
66+
Please uninstall Paddle V1 if you have installed it before.
67+
68+
.. code-block:: bash
69+
70+
pip uninstall py_paddle paddle
71+
72+
Then install Python for PaddlePaddle , enter the build directory and run the following commands
73+
74+
pip install python/dist/paddle*.whl && pip install ../paddle/dist/py_paddle*.whl
75+
76+
6. Illegal instruction
77+
-----------------------
78+
This issue may be caused by the wrong usage of PaddlePaddle binary version which uses avx SIMD instructions to increase the performance of cpu. Please choose the correct version.
79+
80+
7. Python unittest fails
81+
--------------------------------
82+
83+
If the following python unittest testcases fail:
84+
85+
.. code-block:: bash
86+
87+
24 - test_PyDataProvider (Failed)
88+
26 - test_RecurrentGradientMachine (Failed)
89+
27 - test_NetworkCompare (Failed)
90+
28 - test_PyDataProvider2 (Failed)
91+
32 - test_Prediction (Failed)
92+
33 - test_Compare (Failed)
93+
34 - test_Trainer (Failed)
94+
35 - test_TrainerOnePass (Failed)
95+
36 - test_CompareTwoNets (Failed)
96+
37 - test_CompareTwoOpts (Failed)
97+
38 - test_CompareSparse (Failed)
98+
39 - test_recurrent_machine_generation (Failed)
99+
40 - test_PyDataProviderWrapper (Failed)
100+
41 - test_config_parser (Failed)
101+
42 - test_swig_api (Failed)
102+
43 - layers_test (Failed)
103+
104+
Please check the PaddlePaddle unittest logs which may suggest the following:
105+
106+
.. code-block:: bash
107+
108+
paddle package is already in your PYTHONPATH. But unittest need a clean environment.
109+
Please uninstall paddle package before start unittest. Try to 'pip uninstall paddle'.
110+
111+
The solution is:
112+
113+
* Remove old PaddlePaddle to make a clean environment for the unit tests. If PaddlePaddle package is already in Python's site-packages, unit tests would refer Python package in site-packages instead of Python package in the :code:`/python` directory of the source directory. Setting :code:`PYTHONPATH` to :code:`/python` is also useless because Python's search path would give the priority to the installed Python package.
114+
115+
116+
8. Failed to download the MKLML library
117+
----------------------------------------------
118+
119+
.. code-block:: bash
120+
121+
make[2]: *** [third_party/mklml/src/extern_mklml-stamp/extern_mklml-download] error 4
122+
make[1]: *** [CMakeFiles/extern_mklml.dir/all] error 2
123+
make[1]: *** waiting for the unfinished jobs....
124+
125+
Cause: The network speed or SSL link causes the MKLML library to download unsuccessfully.
126+
127+
The solution is: manually download and install, the specific steps are as follows.
128+
129+
.. code-block:: bash
130+
131+
// 1. enter the directory
132+
cd build/third_party/mklml/src/extern_mklml
133+
134+
// 2. check the size of the package, normally 75M, if less than 75M, the download fails
135+
du -sh mklml_lnx_2018.0.1.20171007.tgz
136+
137+
// 3. manually download and unzip and make the download success tag:
138+
wget --no-check-certificate https://github.com/01org/mkl-dnn/releases/download/v0.11/mklml_lnx_2018.0.1.20171007.tgz -c -O mklml_lnx_2018.0.1.20171007.tgz
139+
tar zxf mklml_lnx_2018.0.1.20171007.tgz
140+
touch ../extern_mklml-stamp/extern_mklml-download
141+
142+
// 4. then compile
143+

paddle/fluid/operators/nccl_op_test.cu.cc

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -137,6 +137,8 @@ class NCCLTester : public ::testing::Test {
137137
TEST_F(NCCLTester, ncclInitOp) {}
138138

139139
// ncclAllReduceOp with desc
140+
// TODO(helin): https://github.com/PaddlePaddle/Paddle/issues/9367
141+
/*
140142
TEST_F(NCCLTester, ncclAllReduceOp) {
141143
std::unique_ptr<f::OpDesc> op2(new f::OpDesc);
142144
op2->SetType("ncclAllReduce");
@@ -184,6 +186,7 @@ TEST_F(NCCLTester, ncclAllReduceOp) {
184186
}
185187
}
186188
}
189+
*/
187190

188191
// ncclReduceOp with desc
189192
TEST_F(NCCLTester, ncclReduceOp) {
@@ -236,6 +239,8 @@ TEST_F(NCCLTester, ncclReduceOp) {
236239
}
237240

238241
// ncclBcastOp with desc
242+
// TODO(helin): https://github.com/PaddlePaddle/Paddle/issues/9540
243+
/*
239244
TEST_F(NCCLTester, ncclBcastOp) {
240245
std::unique_ptr<f::OpDesc> op2(new f::OpDesc);
241246
const int kRoot = 0;
@@ -281,3 +286,4 @@ TEST_F(NCCLTester, ncclBcastOp) {
281286
ASSERT_NEAR(ct[j], result, 1e-5);
282287
}
283288
}
289+
*/

paddle/fluid/operators/reader/create_double_buffer_reader_op.cc

Lines changed: 72 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -20,12 +20,29 @@ namespace paddle {
2020
namespace operators {
2121
namespace reader {
2222

23-
static constexpr size_t kDoubleBufferSize = 2;
23+
// 'Double buffer' means we shall maintain two batches of input data at the same
24+
// time. So the kCacheSize shoul be at least 2.
25+
static constexpr size_t kCacheSize = 2;
26+
// There will be two bacthes out of the channel during training:
27+
// 1. the one waiting to be sent to the channel
28+
// 2. the one just be received from the channel, which is also being used by
29+
// subsequent operators.
30+
// So the channel size should be kChacheSize - 2
31+
static constexpr size_t kChannelSize = 0; // kCacheSize - 2
2432

2533
class DoubleBufferReader : public framework::DecoratedReader {
2634
public:
2735
struct Item {
2836
Item() : ctx_(nullptr) {}
37+
Item(Item&& b) {
38+
payloads_ = std::move(b.payloads_);
39+
ctx_ = std::move(b.ctx_);
40+
}
41+
Item& operator=(Item&& b) {
42+
payloads_ = std::move(b.payloads_);
43+
ctx_ = std::move(b.ctx_);
44+
return *this;
45+
}
2946

3047
std::vector<framework::LoDTensor> payloads_;
3148
platform::DeviceContext* ctx_;
@@ -34,42 +51,44 @@ class DoubleBufferReader : public framework::DecoratedReader {
3451
explicit DoubleBufferReader(
3552
ReaderBase* reader, platform::Place target_place = platform::CPUPlace())
3653
: DecoratedReader(reader), place_(target_place) {
37-
for (size_t i = 0; i < kDoubleBufferSize; ++i) {
38-
if (platform::is_gpu_place(place_)) {
3954
#ifdef PADDLE_WITH_CUDA
55+
for (size_t i = 0; i < kCacheSize; ++i) {
56+
if (platform::is_gpu_place(place_)) {
4057
ctxs_.emplace_back(new platform::CUDADeviceContext(
4158
boost::get<platform::CUDAPlace>(place_)));
42-
#endif
4359
}
4460
}
45-
46-
start_thread();
47-
}
48-
49-
void start_thread() {
50-
buffer_ = framework::MakeChannel<Item>(kDoubleBufferSize);
51-
prefetcher_ = std::thread([this] { PrefetchThreadFunc(); });
61+
#endif
62+
StartPrefetcher();
5263
}
5364

65+
bool HasNext() const override;
5466
void ReadNext(std::vector<framework::LoDTensor>* out) override;
5567
void ReInit() override;
5668

57-
~DoubleBufferReader() {
58-
buffer_->Close();
59-
prefetcher_.join();
60-
delete buffer_;
69+
~DoubleBufferReader() { EndPrefetcher(); }
70+
71+
private:
72+
void StartPrefetcher() {
73+
channel_ = framework::MakeChannel<Item>(kChannelSize);
74+
prefetcher_ = std::thread([this] { PrefetchThreadFunc(); });
6175
}
6276

63-
bool HasNext() const override;
77+
void EndPrefetcher() {
78+
channel_->Close();
79+
if (prefetcher_.joinable()) {
80+
prefetcher_.join();
81+
}
82+
delete channel_;
83+
channel_ = nullptr;
84+
}
6485

65-
private:
6686
void PrefetchThreadFunc();
6787

6888
std::thread prefetcher_;
69-
framework::Channel<Item>* buffer_;
89+
framework::Channel<Item>* channel_;
7090
platform::Place place_;
7191
std::vector<std::unique_ptr<platform::DeviceContext>> ctxs_;
72-
mutable Item local_buffer_;
7392
};
7493

7594
class CreateDoubleBufferReaderOp : public framework::OperatorBase {
@@ -123,70 +142,70 @@ class CreateDoubleBufferReaderOpMaker : public DecoratedReaderMakerBase {
123142
}
124143
};
125144

145+
bool DoubleBufferReader::HasNext() const {
146+
while (!channel_->IsClosed() && !channel_->CanReceive()) {
147+
}
148+
return channel_->CanReceive();
149+
}
150+
126151
void DoubleBufferReader::ReadNext(std::vector<framework::LoDTensor>* out) {
127152
if (!HasNext()) {
128153
PADDLE_THROW("There is no next data!");
129154
}
130155

131-
if (local_buffer_.payloads_.empty()) {
132-
buffer_->Receive(&local_buffer_);
133-
}
134-
*out = local_buffer_.payloads_;
135-
local_buffer_.payloads_.clear();
136-
if (local_buffer_.ctx_) {
137-
local_buffer_.ctx_->Wait();
156+
Item batch;
157+
channel_->Receive(&batch);
158+
*out = batch.payloads_;
159+
if (batch.ctx_) {
160+
batch.ctx_->Wait();
138161
}
139162
}
140163

141164
void DoubleBufferReader::ReInit() {
142165
reader_->ReInit();
143-
buffer_->Close();
144-
prefetcher_.join();
145-
delete buffer_;
146-
start_thread();
166+
EndPrefetcher();
167+
StartPrefetcher();
147168
}
148169

149170
void DoubleBufferReader::PrefetchThreadFunc() {
150171
VLOG(5) << "A new prefetch thread starts.";
151-
size_t gpu_ctx_offset = 0;
172+
std::vector<std::vector<framework::LoDTensor>> cpu_tensor_cache(kCacheSize);
173+
std::vector<std::vector<framework::LoDTensor>> gpu_tensor_cache(kCacheSize);
174+
size_t cached_tensor_id = 0;
175+
152176
while (reader_->HasNext()) {
153177
Item batch;
154-
reader_->ReadNext(&batch.payloads_);
178+
auto& cpu_batch = cpu_tensor_cache[cached_tensor_id];
179+
reader_->ReadNext(&cpu_batch);
155180
if (platform::is_gpu_place(place_)) {
156-
std::vector<framework::LoDTensor> gpu_batch;
157-
auto& gpu_ctx = this->ctxs_[gpu_ctx_offset++];
158-
gpu_ctx_offset %= this->ctxs_.size();
159-
gpu_batch.resize(batch.payloads_.size());
160-
for (size_t i = 0; i < batch.payloads_.size(); ++i) {
161-
framework::TensorCopy(batch.payloads_[i], place_, *gpu_ctx,
162-
&gpu_batch[i]);
163-
gpu_batch[i].set_lod(batch.payloads_[i].lod());
181+
auto& gpu_batch = gpu_tensor_cache[cached_tensor_id];
182+
auto* gpu_ctx = ctxs_[cached_tensor_id].get();
183+
gpu_batch.resize(cpu_batch.size());
184+
for (size_t i = 0; i < cpu_batch.size(); ++i) {
185+
framework::TensorCopy(cpu_batch[i], place_, *gpu_ctx, &gpu_batch[i]);
186+
gpu_batch[i].set_lod(cpu_batch[i].lod());
164187
}
165-
batch.ctx_ = gpu_ctx.get();
166-
std::swap(gpu_batch, batch.payloads_);
188+
batch.payloads_ = gpu_batch;
189+
batch.ctx_ = gpu_ctx;
190+
} else {
191+
// CPUPlace
192+
batch.payloads_ = cpu_batch;
167193
}
194+
++cached_tensor_id;
195+
cached_tensor_id %= kCacheSize;
168196

169197
try {
170-
buffer_->Send(&batch);
198+
channel_->Send(&batch);
171199
} catch (paddle::platform::EnforceNotMet e) {
172200
VLOG(5) << "WARNING: The double buffer channel has been closed. The "
173201
"prefetch thread will terminate.";
174202
break;
175203
}
176204
}
177-
buffer_->Close();
205+
channel_->Close();
178206
VLOG(5) << "Prefetch thread terminates.";
179207
}
180208

181-
bool DoubleBufferReader::HasNext() const {
182-
if (local_buffer_.payloads_.empty()) {
183-
bool ok = buffer_->Receive(&local_buffer_);
184-
return ok;
185-
} else {
186-
return true;
187-
}
188-
}
189-
190209
} // namespace reader
191210
} // namespace operators
192211
} // namespace paddle

python/CMakeLists.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ if (WITH_TESTING)
8181
# enable v2 API unittest only when paddle swig api is compiled
8282
add_subdirectory(paddle/v2/tests)
8383
add_subdirectory(paddle/v2/plot/tests)
84+
add_subdirectory(paddle/v2/reader/tests)
8485
endif()
8586
endif()
8687
add_subdirectory(paddle/fluid/tests)

python/paddle/dataset/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@
3737
'cifar',
3838
'movielens',
3939
'conll05',
40-
'sentiment'
40+
'sentiment',
4141
'uci_housing',
4242
'wmt14',
4343
'wmt16',

0 commit comments

Comments
 (0)