Skip to content

Commit 6519f6c

Browse files
committed
merge
1 parent 788c600 commit 6519f6c

File tree

2 files changed

+63
-20
lines changed

2 files changed

+63
-20
lines changed

doc/design/cpp_data_feeding.md

Lines changed: 49 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# C++ Data Feeding
22

3-
While using Paddle V2 API for Training, data feeding completely depends on the Python code. To get rid of the Python environment and achieve the goal of "wrapping the whole training by a while loop op" in Paddle Fluid, a C++ data feeding mechanism is required.
3+
While using Paddle V2 API for training, data feeding completely depends on the Python code. To get rid of the Python environment and achieve the goal of "wrapping the whole training by a while loop op" in Paddle Fluid, a C++ data feeding mechanism is required.
44

55
In this document we show the fundamental design of a C++ data feeding process, which includes data reading, shuffling and batching.
66

@@ -16,35 +16,67 @@ In order to handle the above mentioned problem, a new concept called 'Reader' is
1616
```cpp
1717
class ReaderBase {
1818
public:
19-
explicit ReaderBase(const std::vector<DDim>& shapes) : shapes_(shapes) {
20-
PADDLE_ENFORCE(!shapes_.empty());
21-
}
22-
// Read the next batch of data. (A 'batch' can be only one instance)
23-
// If the next batch doesn't exist, '*out' will be an empty std::vector.
19+
// Reads the next batch of data. (A 'batch' can be only one instance)
20+
// If the next batch doesn't exist, it throws an exception
2421
virtual void ReadNext(std::vector<LoDTensor>* out) = 0;
2522

26-
// Reinitialize the reader and read the file from the beginning.
27-
virtual void ReInit() = 0;
23+
// Checks whether the next instance exists.
24+
virtual bool HasNext() = 0;
2825

29-
// Get a certain read in data's shape.
30-
DDim shape(size_t idx) const;
31-
// Get shapes of all read in data.
32-
std::vector<DDim> shapes() const { return shapes_; }
33-
// Set shapes of read in data.
34-
void set_shapes(const std::vector<DDim>& shapes) { shapes_ = shapes; }
26+
// Reinitializes the reader and read the file from the beginning.
27+
virtual void ReInit() = 0;
3528

3629
virtual ~ReaderBase() {}
30+
};
31+
```
32+
33+
### FileReader
34+
35+
`FileReader` is derived from the `ReaderBase`. It is still an abstract class and will further be derived by Readers of respective specific format.
36+
37+
```cpp
38+
class FileReader : public ReaderBase {
39+
public:
40+
explicit FileReader(const std::vector<DDim>& shapes) : shapes_(shapes) {}
41+
42+
void ReadNext(std::vector<LoDTensor>* out) override final {
43+
ReadNextImpl(out);
44+
CheckShapes(out);
45+
}
46+
47+
virtual void ReadNextImpl(std::vector<LoDTensor>* out) = 0;
3748
3849
protected:
50+
// Checks whether the out shapes is consistent with shapes_
51+
CheckShape(const std::vector<LoDTensor>* out);
52+
3953
std::vector<DDim> shapes_;
4054
};
4155
```
4256

43-
### `FileReader` and `DecoratedReader`
57+
A file reader binds with a single file, and reads one instance of data from the file at a time. Each type of file reader shall implement its own `ReadNextImpl()`, `HasNext()` and `ReInit()`.
58+
59+
### DecoratedReader
60+
61+
A decorated reader takes another reader(both file reader and decorated reader are OK) as its 'underlying reader'. It gets data from its underlying reader, does some process on them(shuffling, batching or something else), then yields processed data. The output data of a decorated reader can be a single instance or a batch. `ShuffleReader` and `BatchReader` are both decorated readers.
62+
63+
```cpp
64+
class DecoratedReader : public ReaderBase {
65+
public:
66+
explicit DecoratedReader(ReaderBase* reader) : reader_(reader) {
67+
PADDLE_ENFORCE_NOT_NULL(reader_);
68+
}
69+
70+
void ReInit() override { reader_->ReInit(); }
71+
72+
protected:
73+
ReaderBase* reader_;
74+
};
75+
```
4476
45-
These two classes are derived from the `ReaderBase` and will further be derived by more specific readers. Thus, in our design, there are two kinds of readers: file readers and decorated readers. A file reader reads from a file of some specific format, and yield only one instance of data at a time. For example, RecordIO reader, jpg reader, .... A decorated reader takes another reader(both file reader and decorated reader are OK) as its 'underlying reader'. It gets data from its underlying reader, does some processing on them(shuffling, or batching), then yields processed data. The output data of a decorated reader can be a single instance or a batch. `ShuffleReader` and `BatchReader` are both decorated readers.
77+
All the `FileReader` and `DecoratedReader` share exactly the same interfaces as defined in `ReaderBase`. So they can be decorated for more than one time: We can **shuffle** a reader's outputs and then **batch** the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly.
4678
47-
All the readers share exactly the same interface as defined in `ReaderBase`. So they can be decorated for more than one time: We can **shuffle** a reader's outputs and then **batch** the shuffle outputs. The interface consistency also allows related ops use readers without knowing what they are exactly.
79+
### ThreadedReader
4880
4981
5082
### `ReaderHolder`

paddle/fluid/framework/reader.h

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -43,13 +43,24 @@ class ReaderBase {
4343

4444
class FileReader : public ReaderBase {
4545
public:
46-
explicit FileReader(const std::vector<DDim>& shapes) : ReaderBase(shapes) {}
46+
explicit FileReader(const std::vector<DDim>& shapes) : shapes_(shapes) {}
47+
48+
void ReadNext(std::vector<LoDTensor>* out) override final {
49+
ReadNextImpl(out);
50+
CheckShapes(out);
51+
}
52+
53+
virtual void ReadNextImpl(std::vector<LoDTensor>* out) = 0;
54+
55+
protected:
56+
CheckShape(const std::vector<LoDTensor>* out);
57+
58+
std::vector<DDim> shapes_;
4759
};
4860

4961
class DecoratedReader : public ReaderBase {
5062
public:
51-
explicit DecoratedReader(ReaderBase* reader)
52-
: ReaderBase(reader->shapes()), reader_(reader) {
63+
explicit DecoratedReader(ReaderBase* reader) : reader_(reader) {
5364
PADDLE_ENFORCE_NOT_NULL(reader_);
5465
}
5566

0 commit comments

Comments
 (0)