Skip to content

Commit fd2eb55

Browse files
dzhwinterJiayiFeng
authored andcommitted
"Serialize LoDTensor, Save/Restore model" (#4602)
* "add model format design doc" * "add restore function" * "add parse protobuf" * "move necessary information to saver.proto" * "format code" * "add gpu option" * "add lod info" * "add saveop python test wrapper" * "checkpoint reuse save operator" * "rewrite model format design doc" * "async support needed" * "fix run once" * "fix doc based on comments" * "refine based on comments" * "fix based comments" * "remove persistable flag from framework.proto" * "add IndicateDataType to restore op" * "add save test" * "modify save restore code" * "modified the restore logic" * rm checkpoint_op.cc * rm test_checkpoint_op.py * "get inputs outputs name from execution context" * Saving each variable to a independent file * Fix bugs * Rewrite save_restore_op_test with new Python framework * Move `SaveOp` and `RestoreOp` from OpWithKernel to OpBase * Refine unit test of SaveOp and RestoreOp * fix compile errorwq
1 parent d78d119 commit fd2eb55

File tree

15 files changed

+569
-6
lines changed

15 files changed

+569
-6
lines changed

doc/design/model_format.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# Design Doc: Model Format
2+
3+
## Motivation
4+
5+
The model is the output of training process. One complete model consists of two parts, namely, the **topology** and the **parameters**. To support industrial deployment, we need to make the model format must be self-completed and do not expose any training source code.
6+
7+
As a result, In PaddlePaddle, the **topology** represents as a [ProgramDesc](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/doc/design/program.md), which describes the model structure. The **parameters** contain all the trainable weights in the model, we must support large size parameter, and efficient serialization/deserialization.
8+
9+
## Implementation
10+
11+
The topology is saved as a plain text, in detail, a self-contain protobuf file.
12+
13+
The parameters are saved as a binary file. As we all know, the protobuf message has the limits of [64M size](https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.io.coded_stream#CodedInputStream.SetTotalBytesLimit.details). We do a (benchmark experiment)[https://github.com/PaddlePaddle/Paddle/pull/4610], its result shows protobuf is not fit in this scene.
14+
15+
As a result, we design a particular format for tensor serialization. By default, arbitrary tensor in Paddle is a [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/lod_tensor.md), and has a description information proto of (LoDTensorDesc)[https://github.com/PaddlePaddle/Paddle/blob/develop/paddle/framework/framework.proto#L99]. We save the DescProto as the byte string header, it contains the necessary information, such as the `dims`, the `name` of the tensor, and the `LoD` information in [LoDTensor](https://github.com/PaddlePaddle/Paddle/blob/1c0a4c901c9fc881d120249c703b15d1c50dae7d/paddle/framework/lod_tensor.md). Tensor stores value in a continuous memory buffer, for speed we dump the raw memory to disk and save it as the byte string content. So, the binary format of one tensor is,
16+
17+
|HeaderLength|ContentLength|**LoDTensorDesc**|**TensorValue**|
18+
19+
In detail, tensor's byte view as the table shows. Note that all the signed value written in little-endian.
20+
21+
```text
22+
[offset] [type] [description]
23+
0004 4 bytes integer HeaderLength, the length of LoDTensorDesc
24+
0008 4 bytes integer ContentLength, the length of LodTensor Buffer
25+
0009 1 bytes char TensorDesc
26+
00010 1 bytes char TensorDesc
27+
...
28+
00100 1 bytes char TensorValue
29+
00101 1 bytes char TensorValue
30+
00102 1 bytes char TensorValue ..
31+
...
32+
```
33+
34+
## Summary
35+
36+
We introduce the model format, the `ProgramDesc` describe the **topology**, and a bunch of particular format binary tensors describes the **parameters**.

paddle/framework/CMakeLists.txt

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,7 @@
11
# ddim lib
2+
proto_library(framework_proto SRCS framework.proto)
3+
proto_library(saver_proto SRCS framework.proto saver.proto)
4+
25
cc_library(ddim SRCS ddim.cc DEPS eigen3)
36
cc_test(ddim_test SRCS ddim_test.cc DEPS ddim)
47
nv_test(dim_test SRCS dim_test.cu DEPS ddim)
@@ -7,16 +10,15 @@ cc_library(tensor SRCS tensor.cc DEPS ddim place paddle_memory device_context)
710
cc_test(tensor_test SRCS tensor_test.cc DEPS tensor)
811
cc_test(eigen_test SRCS eigen_test.cc DEPS tensor)
912

10-
cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor)
11-
cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor)
13+
cc_library(lod_tensor SRCS lod_tensor.cc DEPS ddim place tensor saver_proto framework_proto)
14+
cc_test(lod_tensor_test SRCS lod_tensor_test.cc DEPS lod_tensor paddle_memory)
1215
nv_test(lod_tensor_gpu_test SRCS lod_tensor_test.cu DEPS lod_tensor)
1316

1417
cc_test(variable_test SRCS variable_test.cc)
1518

1619
cc_library(scope SRCS scope.cc)
1720
cc_test(scope_test SRCS scope_test.cc DEPS scope)
1821

19-
proto_library(framework_proto SRCS framework.proto)
2022

2123
cc_library(attribute SRCS attribute.cc DEPS framework_proto)
2224
cc_test(program_desc_test SRCS program_desc_test.cc DEPS proto_desc)

paddle/framework/lod_tensor.cc

Lines changed: 144 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,15 @@
1313
limitations under the License. */
1414

1515
#include "paddle/framework/lod_tensor.h"
16+
#include "paddle/framework/saver.pb.h"
17+
18+
#include "paddle/memory/memcpy.h"
19+
#include "paddle/memory/memory.h"
20+
21+
#include <stdint.h>
22+
#include <string.h>
23+
#include <algorithm>
24+
#include <iterator>
1625

1726
#include <glog/logging.h>
1827

@@ -112,5 +121,140 @@ void LoDTensor::ShrinkInLevel(size_t level, size_t elem_begin,
112121
lod_ = new_lod;
113122
}
114123

124+
std::string LoDTensor::SerializeToString() const {
125+
LoDTensorProto desc;
126+
127+
// set data_type
128+
if (this->type() == typeid(int8_t)) desc.set_data_type(DataType::BOOL);
129+
if (this->type() == typeid(int16_t)) desc.set_data_type(DataType::INT16);
130+
if (this->type() == typeid(int32_t)) desc.set_data_type(DataType::INT32);
131+
if (this->type() == typeid(int64_t)) desc.set_data_type(DataType::INT64);
132+
// FIXME(dzh): there is no fp16 in standard c++
133+
134+
if (this->type() == typeid(float)) // NOLINT
135+
desc.set_data_type(DataType::FP32);
136+
if (this->type() == typeid(double)) // NOLINT
137+
desc.set_data_type(DataType::FP64);
138+
139+
for (int i = 0; i < dims().size(); ++i) {
140+
desc.add_dims(dims()[i]);
141+
}
142+
143+
// set lod information
144+
desc.set_lod_level(this->NumLevels());
145+
for (size_t i = 0; i < this->NumLevels(); ++i) {
146+
LoDInfo* lod = desc.add_levels();
147+
for (size_t j = 0; j < lod_[i].size(); ++j) {
148+
lod->add_level(lod_[i][j]);
149+
}
150+
}
151+
152+
desc.set_version(0);
153+
154+
std::string desc_bytes = desc.SerializeAsString();
155+
156+
// FIXME(dzh) : implement fix chunk size buffer.
157+
size_t DESC_SIZE = desc_bytes.size();
158+
size_t DATA_SIZE = holder_->size() - offset_;
159+
160+
const size_t BUFFER_SIZE = DESC_SIZE + DATA_SIZE + 2 * sizeof(size_t);
161+
char* buffer =
162+
static_cast<char*>(memory::Alloc(platform::CPUPlace(), BUFFER_SIZE));
163+
164+
// format: desc_size data_size, desc_bytes, data_bytes.
165+
platform::CPUPlace src_place;
166+
platform::CPUPlace dst_place;
167+
168+
memory::Copy(dst_place, buffer, src_place, &BUFFER_SIZE, sizeof(size_t));
169+
memory::Copy(dst_place, buffer + sizeof(size_t), src_place, &DESC_SIZE,
170+
sizeof(size_t));
171+
memory::Copy(dst_place, buffer + sizeof(size_t) * 2, src_place,
172+
desc_bytes.c_str(), desc_bytes.size());
173+
174+
PADDLE_ENFORCE(this->numel() != 0, "Serialize a empty Tensor!");
175+
176+
platform::Place place = holder_->place();
177+
int element_width = holder_->size() / this->numel();
178+
179+
if (platform::is_cpu_place(place)) {
180+
memory::Copy(dst_place, buffer + sizeof(size_t) * 2 + desc_bytes.size(),
181+
boost::get<platform::CPUPlace>(place),
182+
static_cast<char*>(holder_->ptr()) + offset_ / element_width,
183+
DATA_SIZE);
184+
}
185+
#ifdef PADDLE_WITH_GPU
186+
if (platform::is_gpu_place(place)) {
187+
memory::Copy(dst_place, buffer + sizeof(size_t) * 2 + desc_bytes.size(),
188+
boost::get<platform::GPUPlace>(place),
189+
static_cast<char*>(holder_->ptr()) + offset_ / element_width,
190+
DATA_SIZE);
191+
}
192+
#endif
193+
194+
std::string ret(buffer, BUFFER_SIZE);
195+
memory::Free(platform::CPUPlace(), buffer);
196+
return ret;
197+
}
198+
199+
void LoDTensor::DeserializeFromString(const std::string& s,
200+
const platform::Place& dst_place) {
201+
size_t DESC_SIZE, BUFFER_SIZE;
202+
platform::CPUPlace src_place;
203+
204+
memory::Copy(src_place, &BUFFER_SIZE, src_place, s.c_str(), sizeof(size_t));
205+
memory::Copy(src_place, &DESC_SIZE, src_place, s.c_str() + sizeof(size_t),
206+
sizeof(size_t));
207+
208+
const size_t DATA_SIZE = BUFFER_SIZE - DESC_SIZE - sizeof(size_t) * 2;
209+
210+
// parse LoDTensorDesc
211+
LoDTensorProto desc;
212+
desc.ParseFromArray(s.c_str() + sizeof(size_t) * 2, DESC_SIZE);
213+
214+
std::vector<int64_t> dims;
215+
std::copy(desc.dims().begin(), desc.dims().end(), std::back_inserter(dims));
216+
this->Resize(make_ddim(dims));
217+
218+
// parse data type
219+
void* ptr = nullptr;
220+
if (desc.data_type() == DataType::BOOL)
221+
ptr = this->mutable_data<bool>(dst_place);
222+
if (desc.data_type() == DataType::INT16)
223+
ptr = this->mutable_data<int16_t>(dst_place);
224+
if (desc.data_type() == DataType::INT32)
225+
ptr = this->mutable_data<int32_t>(dst_place);
226+
if (desc.data_type() == DataType::INT64)
227+
ptr = this->mutable_data<int64_t>(dst_place);
228+
// FIXME(dzh): there is no fp16 in standard c++
229+
230+
if (desc.data_type() == DataType::FP32)
231+
ptr = this->mutable_data<float>(dst_place);
232+
if (desc.data_type() == DataType::FP64)
233+
ptr = this->mutable_data<double>(dst_place);
234+
235+
LoD lod;
236+
std::vector<size_t> levels;
237+
for (int i = 0; i < desc.levels().size(); ++i) {
238+
auto current_level = desc.levels()[i].level();
239+
std::copy(current_level.begin(), current_level.end(),
240+
std::back_inserter(levels));
241+
lod.emplace_back(levels);
242+
levels.clear();
243+
}
244+
245+
this->set_lod(lod);
246+
247+
if (platform::is_cpu_place(dst_place)) {
248+
memory::Copy(boost::get<platform::CPUPlace>(dst_place), ptr, src_place,
249+
s.c_str() + sizeof(size_t) * 2 + DESC_SIZE, DATA_SIZE);
250+
}
251+
#ifdef PADDLE_WITH_GPU
252+
if (platform::is_gpu_place(dst_place)) {
253+
memory::Copy(boost::get<platform::GPUPlace>(dst_place), ptr, src_place,
254+
s.c_str() + sizeof(size_t) * 2 + DESC_SIZE, DATA_SIZE);
255+
}
256+
#endif
257+
}
258+
115259
} // namespace framework
116260
} // namespace paddle

paddle/framework/lod_tensor.h

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@
2525
#include "paddle/framework/ddim.h"
2626
#include "paddle/framework/tensor.h"
2727
#include "paddle/platform/enforce.h"
28+
#include "paddle/platform/place.h"
2829

2930
namespace paddle {
3031
namespace framework {
@@ -132,6 +133,27 @@ class LoDTensor : public Tensor {
132133
*/
133134
void ShrinkInLevel(size_t level, size_t elem_begin, size_t elem_end);
134135

136+
/**
137+
* @brief Serialize tensor to char bytes.
138+
* Please check model_format.md for the format detail.
139+
* NOTE: GPUTensor will copy data to cpu implicitly.
140+
* @return return string
141+
*/
142+
143+
// FIXME(dzh) : Currently, this interface should only be used in
144+
// save/restore model and checkpoint. ParameterServer do not use shape
145+
// information to do the optimization, as a result, when we serialize
146+
// parameter/gradient to string, we should serialize the tensor
147+
// to string in the ps trainer instead of LoDTensor.
148+
std::string SerializeToString() const;
149+
150+
/**
151+
* @brief Deserialize char bytes to tensor.
152+
* @return return string
153+
*/
154+
void DeserializeFromString(const std::string& s,
155+
const platform::Place& dst_place);
156+
135157
private:
136158
LoD lod_;
137159
};

paddle/framework/lod_tensor_test.cc

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,13 @@
1717
#include <gtest/gtest.h>
1818
#include <algorithm>
1919
#include <memory>
20+
#include <vector>
2021

2122
namespace paddle {
2223
namespace framework {
2324

25+
const int kLodTensorSize = 20 * 128;
26+
2427
class LoDTensorTester : public ::testing::Test {
2528
public:
2629
virtual void SetUp() override {
@@ -38,7 +41,10 @@ class LoDTensorTester : public ::testing::Test {
3841

3942
lod_tensor_.Resize({20 /*batch size*/, 128 /*dim*/});
4043
// malloc memory
41-
lod_tensor_.mutable_data<float>(place);
44+
float* dst_ptr = lod_tensor_.mutable_data<float>(place);
45+
for (int i = 0; i < kLodTensorSize; ++i) {
46+
dst_ptr[i] = i;
47+
}
4248

4349
lod_tensor_.set_lod(lod);
4450
}
@@ -101,5 +107,21 @@ TEST_F(LoDTensorTester, ShrinkInLevel) {
101107
ASSERT_EQ(new_lod_tensor.data<float>(), lod_tensor_.data<float>());
102108
}
103109

110+
TEST_F(LoDTensorTester, SerializeDeserialize) {
111+
LoDTensor new_lod_tensor = lod_tensor_;
112+
float* src_ptr = lod_tensor_.data<float>();
113+
std::string s = lod_tensor_.SerializeToString();
114+
LoDTensor dst;
115+
dst.DeserializeFromString(s, platform::CPUPlace());
116+
float* dst_ptr = dst.data<float>();
117+
for (int i = 0; i < kLodTensorSize; ++i) {
118+
EXPECT_EQ(dst_ptr[i], src_ptr[i]);
119+
}
120+
121+
ASSERT_EQ(dst.NumElements(0), 2UL);
122+
ASSERT_EQ(dst.NumElements(1), 3UL);
123+
ASSERT_EQ(dst.NumElements(2), 8UL);
124+
}
125+
104126
} // namespace framework
105127
} // namespace paddle

paddle/framework/lod_tensor_test.cu

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,3 +48,30 @@ TEST(LoDTensor, LoDInGPU) {
4848
CHECK_EQ(lod[0].data()[i], src_lod[0].data()[i] * 2);
4949
}
5050
}
51+
52+
TEST(LoDTensor, SerializeDeserialize) {
53+
paddle::framework::LoDTensor lod_tensor;
54+
paddle::platform::GPUPlace place(0);
55+
56+
paddle::framework::LoD src_lod;
57+
src_lod.push_back(std::vector<size_t>{0, 2, 4, 6, 8, 10, 12, 14});
58+
59+
lod_tensor.Resize({14, 16});
60+
lod_tensor.mutable_data<float>(place);
61+
62+
lod_tensor.set_lod(src_lod);
63+
CHECK_EQ(lod_tensor.lod_element(0, 2).first, 4UL);
64+
CHECK_EQ(lod_tensor.lod_element(0, 4).first, 8UL);
65+
66+
test<<<1, 8>>>(src_lod[0].data(), src_lod[0].size());
67+
cudaDeviceSynchronize();
68+
69+
std::string s = lod_tensor.SerializeToString();
70+
paddle::framework::LoDTensor dst;
71+
dst.DeserializeFromString(s, place);
72+
paddle::framework::LoD dst_lod = dst.lod();
73+
74+
for (size_t i = 0; i < dst_lod[0].size(); ++i) {
75+
CHECK_EQ(src_lod[0].data()[i], dst_lod[0].data()[i] * 2);
76+
}
77+
}

paddle/framework/saver.proto

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
/* Copyright (c) 2016 PaddlePaddle Authors. All Rights Reserve.
2+
3+
Licensed under the Apache License, Version 2.0 (the "License");
4+
you may not use this file except in compliance with the License.
5+
You may obtain a copy of the License at
6+
7+
http://www.apache.org/licenses/LICENSE-2.0
8+
9+
Unless required by applicable law or agreed to in writing, software
10+
distributed under the License is distributed on an "AS IS" BASIS,
11+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
See the License for the specific language governing permissions and
13+
limitations under the License. */
14+
15+
syntax = "proto2";
16+
option optimize_for = LITE_RUNTIME;
17+
package paddle.framework;
18+
19+
import "framework.proto";
20+
21+
/**
22+
* This file contains necessary information for model, checkpoint.
23+
* etc.
24+
*/
25+
26+
message LoDInfo { repeated int64 level = 1; }
27+
28+
/**
29+
* Save the LoDTensorDesc information through LoDTensorProto, its data memory
30+
* is copyed to c buffer immediately. See model_format.md for details.
31+
*/
32+
33+
message LoDTensorProto {
34+
optional DataType data_type = 1;
35+
repeated int64 dims = 2; // [UNK, 640, 480] is saved as [-1, 640, 480]
36+
repeated LoDInfo levels = 3;
37+
optional int32 lod_level = 4 [ default = 0 ];
38+
optional int32 version = 5;
39+
}

paddle/framework/scope.cc

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,23 @@ void Scope::DropKids() {
6565
kids_.clear();
6666
}
6767

68+
std::vector<std::string> Scope::GetAllNames(bool recursive) const {
69+
std::vector<std::string> known_vars(vars_.size());
70+
71+
if (recursive) {
72+
for (auto& kid : kids_) {
73+
auto kid_vars = kid->GetAllNames();
74+
for (auto& p : kid_vars) {
75+
known_vars.emplace_back(p);
76+
}
77+
}
78+
}
79+
for (auto& p : vars_) {
80+
known_vars.emplace_back(p.first);
81+
}
82+
return known_vars;
83+
}
84+
6885
void Scope::DeleteScope(Scope* scope) {
6986
auto it = std::find(this->kids_.begin(), this->kids_.end(), scope);
7087
PADDLE_ENFORCE(it != this->kids_.end(), "Cannot find %p as kid scope", scope);

0 commit comments

Comments
 (0)