Skip to content

Commit cf0a057

Browse files
committed
add document for ctr reader
test=develop
1 parent 45578c1 commit cf0a057

File tree

2 files changed

+29
-5
lines changed

2 files changed

+29
-5
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
## CTR READER
2+
3+
An multi-thread cpp reader that has the same interface with py_reader. It
4+
uses cpp multi-thread to read file and is much more faster then the Python read
5+
thread in py_reader.
6+
7+
Currently, it support two types of file:
8+
- gzip
9+
- plain text file
10+
11+
and two types of data format:
12+
- cvs data format is :
13+
* label dense_fea,dense_fea sparse_fea,sparse_fea
14+
- the svm data format is :
15+
* label slot1:fea_sign slot2:fea_sign slot1:fea_sign

python/paddle/fluid/contrib/reader/ctr_reader.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,8 @@ def ctr_reader(
5454
feed_dict,
5555
file_type, # gzip or plain
5656
file_format, # csv or svm
57-
dense_slot_indexs,
58-
sparse_slot_indexs,
57+
dense_slot_index,
58+
sparse_slot_index,
5959
capacity,
6060
thread_num,
6161
batch_size,
@@ -78,11 +78,20 @@ def ctr_reader(
7878
Note that :code:`Program.clone()` method cannot clone :code:`py_reader`.
7979
8080
Args:
81+
feed_dict(list(variable)): a list of data variable.
82+
file_type('gzip'|'plain'): the type of the data file
83+
file_format('csv'|'svm'): csv data or svm data format.
84+
cvs data format is :
85+
label dense_fea,dense_fea sparse_fea,sparse_fea
86+
the svm data format is :
87+
label slot1:fea_sign slot2:fea_sign slot1:fea_sign
88+
dense_slot_index(list(int)): the index of dense slots
89+
sparse_slot_index(list(int)): the index of sparse slots
8190
capacity(int): The buffer capacity maintained by :code:`py_reader`.
8291
thread_num(list|tuple): List of tuples which declaring data shapes.
8392
batch_size(list|tuple): List of strs which declaring data type.
8493
file_list(list|tuple): List of ints which declaring data lod_level.
85-
slots(bool): Whether use double buffer or not.
94+
slots(bool): slot id of all sparse feature
8695
name(basestring): The prefix Python queue name and Reader name. None will
8796
be generated automatically.
8897
@@ -116,8 +125,8 @@ def ctr_reader(
116125
'file_list': file_list,
117126
'file_type': file_type,
118127
'file_format': file_format,
119-
'dense_slot_index': dense_slot_indexs,
120-
'sparse_slot_index': sparse_slot_indexs,
128+
'dense_slot_index': dense_slot_index,
129+
'sparse_slot_index': sparse_slot_index,
121130
'sparse_slots': slots,
122131
'ranks': [],
123132
'lod_levels': [],

0 commit comments

Comments
 (0)