Skip to content

Commit 1e29b12

Browse files
committed
follow comments
1 parent 5dd7586 commit 1e29b12

File tree

8 files changed

+55
-45
lines changed

8 files changed

+55
-45
lines changed

python/paddle/v2/dataset/cifar.py

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,14 +14,17 @@
1414
"""
1515
CIFAR dataset.
1616
17-
This module will download dataset from https://www.cs.toronto.edu/~kriz/cifar.html and
18-
parse train/test set into paddle reader creators.
17+
This module will download dataset from
18+
https://www.cs.toronto.edu/~kriz/cifar.html and parse train/test set into
19+
paddle reader creators.
1920
20-
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000
21-
images per class. There are 50000 training images and 10000 test images.
21+
The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes,
22+
with 6000 images per class. There are 50000 training images and 10000 test
23+
images.
2224
23-
The CIFAR-100 dataset is just like the CIFAR-10, except it has 100 classes containing
24-
600 images each. There are 500 training images and 100 testing images per class.
25+
The CIFAR-100 dataset is just like the CIFAR-10, except it has 100 classes
26+
containing 600 images each. There are 500 training images and 100 testing
27+
images per class.
2528
2629
"""
2730

python/paddle/v2/dataset/conll05.py

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,11 @@
1313
# limitations under the License.
1414
"""
1515
Conll05 dataset.
16-
Paddle semantic role labeling Book and demo use this dataset as an example. Because
17-
Conll05 is not free in public, the default downloaded URL is test set of
18-
Conll05 (which is public). Users can change URL and MD5 to their Conll dataset.
19-
And a pre-trained word vector model based on Wikipedia corpus is used to initialize SRL model.
16+
Paddle semantic role labeling Book and demo use this dataset as an example.
17+
Because Conll05 is not free in public, the default downloaded URL is test set
18+
of Conll05 (which is public). Users can change URL and MD5 to their Conll
19+
dataset. And a pre-trained word vector model based on Wikipedia corpus is used
20+
to initialize SRL model.
2021
"""
2122

2223
import tarfile
@@ -198,9 +199,10 @@ def test():
198199
"""
199200
Conll05 test set creator.
200201
201-
Because the train dataset is not free, the test dataset is used for training.
202-
It returns a reader creator, each sample in the reader is nine features, including sentence
203-
sequence, predicate, predicate context, predicate context flag and tagged sequence.
202+
Because the train dataset is not free, the test dataset is used for
203+
training. It returns a reader creator, each sample in the reader is nine
204+
features, including sentence sequence, predicate, predicate context,
205+
predicate context flag and tagged sequence.
204206
205207
:return: Train reader creator
206208
:rtype: callable

python/paddle/v2/dataset/imdb.py

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,10 @@
1414
"""
1515
IMDB dataset.
1616
17-
This module download IMDB dataset from
18-
http://ai.stanford.edu/%7Eamaas/data/sentiment/, which contains a set of 25,000
19-
highly polar movie reviews for training, and 25,000 for testing. Besides, this
20-
module also provides API for build dictionary and parse train set and test set
21-
into paddle reader creators.
17+
This module downloads IMDB dataset from
18+
http://ai.stanford.edu/%7Eamaas/data/sentiment/. This dataset contains a set
19+
of 25,000 highly polar movie reviews for training, and 25,000 for testing.
20+
Besides, this module also provides API for building dictionary.
2221
"""
2322

2423
import paddle.v2.dataset.common
@@ -37,7 +36,7 @@
3736

3837
def tokenize(pattern):
3938
"""
40-
Read files that match pattern. Tokenize and yield each file.
39+
Read files that match the given pattern. Tokenize and yield each file.
4140
"""
4241

4342
with tarfile.open(paddle.v2.dataset.common.download(URL, 'imdb',
@@ -57,7 +56,8 @@ def tokenize(pattern):
5756

5857
def build_dict(pattern, cutoff):
5958
"""
60-
Build a word dictionary, the key is word, and the value is index.
59+
Build a word dictionary from the corpus. Keys of the dictionary are words,
60+
and values are zero-based IDs of these words.
6161
"""
6262
word_freq = collections.defaultdict(int)
6363
for doc in tokenize(pattern):
@@ -123,7 +123,7 @@ def train(word_idx):
123123
"""
124124
IMDB train set creator.
125125
126-
It returns a reader creator, each sample in the reader is an index
126+
It returns a reader creator, each sample in the reader is an zero-based ID
127127
sequence and label in [0, 1].
128128
129129
:param word_idx: word dictionary
@@ -140,7 +140,7 @@ def test(word_idx):
140140
"""
141141
IMDB test set creator.
142142
143-
It returns a reader creator, each sample in the reader is an index
143+
It returns a reader creator, each sample in the reader is an zero-based ID
144144
sequence and label in [0, 1].
145145
146146
:param word_idx: word dictionary
@@ -155,7 +155,7 @@ def test(word_idx):
155155

156156
def word_dict():
157157
"""
158-
Build word dictionary.
158+
Build a word dictionary from the corpus.
159159
160160
:return: Word dictionary
161161
:rtype: dict

python/paddle/v2/dataset/imikolov.py

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,9 @@
1414
"""
1515
imikolov's simple dataset.
1616
17-
This module will download dataset from http://www.fit.vutbr.cz/~imikolov/rnnlm/ and
18-
parse train/test set into paddle reader creators.
17+
This module will download dataset from
18+
http://www.fit.vutbr.cz/~imikolov/rnnlm/ and parse train/test set into paddle
19+
reader creators.
1920
"""
2021
import paddle.v2.dataset.common
2122
import collections
@@ -42,7 +43,8 @@ def word_count(f, word_freq=None):
4243

4344
def build_dict():
4445
"""
45-
Build a word dictionary, the key is word, and the value is index.
46+
Build a word dictionary from the corpus, Keys of the dictionary are words,
47+
and values are zero-based IDs of these words.
4648
"""
4749
train_filename = './simple-examples/data/ptb.train.txt'
4850
test_filename = './simple-examples/data/ptb.valid.txt'
@@ -91,7 +93,7 @@ def train(word_idx, n):
9193
"""
9294
imikolov train set creator.
9395
94-
It returns a reader creator, each sample in the reader is an index
96+
It returns a reader creator, each sample in the reader is a word ID
9597
tuple.
9698
9799
:param word_idx: word dictionary
@@ -108,7 +110,7 @@ def test(word_idx, n):
108110
"""
109111
imikolov test set creator.
110112
111-
It returns a reader creator, each sample in the reader is an index
113+
It returns a reader creator, each sample in the reader is a word ID
112114
tuple.
113115
114116
:param word_idx: word dictionary

python/paddle/v2/dataset/movielens.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,10 +14,11 @@
1414
"""
1515
Movielens 1-M dataset.
1616
17-
Movielens 1-M dataset contains 1 million ratings from 6000 users on 4000 movies, which was
18-
collected by GroupLens Research. This module will download Movielens 1-M dataset from
19-
http://files.grouplens.org/datasets/movielens/ml-1m.zip and parse train/test set
20-
into paddle reader creators.
17+
Movielens 1-M dataset contains 1 million ratings from 6000 users on 4000
18+
movies, which was collected by GroupLens Research. This module will download
19+
Movielens 1-M dataset from
20+
http://files.grouplens.org/datasets/movielens/ml-1m.zip and parse train/test
21+
set into paddle reader creators.
2122
2223
"""
2324

@@ -50,7 +51,7 @@ def __init__(self, index, categories, title):
5051

5152
def value(self):
5253
"""
53-
Get information of a movie.
54+
Get information from a movie.
5455
"""
5556
return [
5657
self.index, [CATEGORIES_DICT[c] for c in self.categories],
@@ -78,7 +79,7 @@ def __init__(self, index, gender, age, job_id):
7879

7980
def value(self):
8081
"""
81-
Get information of a user.
82+
Get information from a user.
8283
"""
8384
return [self.index, 0 if self.is_male else 1, self.age, self.job_id]
8485

python/paddle/v2/dataset/uci_housing.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,8 +75,8 @@ def train():
7575
"""
7676
UCI_HOUSING train set creator.
7777
78-
It returns a reader creator, each sample in the reader is features after normalization
79-
and price number.
78+
It returns a reader creator, each sample in the reader is features after
79+
normalization and price number.
8080
8181
:return: Train reader creator
8282
:rtype: callable
@@ -95,8 +95,8 @@ def test():
9595
"""
9696
UCI_HOUSING test set creator.
9797
98-
It returns a reader creator, each sample in the reader is features after normalization
99-
and price number.
98+
It returns a reader creator, each sample in the reader is features after
99+
normalization and price number.
100100
101101
:return: Test reader creator
102102
:rtype: callable

python/paddle/v2/dataset/wmt14.py

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@
1313
# limitations under the License.
1414
"""
1515
WMT14 dataset.
16-
The original WMT14 dataset is too large and a small set of data for set is provided.
17-
This module will download dataset from
16+
The original WMT14 dataset is too large and a small set of data for set is
17+
provided. This module will download dataset from
1818
http://paddlepaddle.cdn.bcebos.com/demo/wmt_shrinked_data/wmt14.tgz and
1919
parse train/test set into paddle reader creators.
2020
@@ -107,8 +107,9 @@ def train(dict_size):
107107
"""
108108
WMT14 train set creator.
109109
110-
It returns a reader creator, each sample in the reader is source language word index
111-
sequence, target language word index sequence and next word index sequence.
110+
It returns a reader creator, each sample in the reader is source language
111+
word ID sequence, target language word ID sequence and next word ID
112+
sequence.
112113
113114
:return: Train reader creator
114115
:rtype: callable
@@ -121,8 +122,9 @@ def test(dict_size):
121122
"""
122123
WMT14 test set creator.
123124
124-
It returns a reader creator, each sample in the reader is source language word index
125-
sequence, target language word index sequence and next word index sequence.
125+
It returns a reader creator, each sample in the reader is source language
126+
word ID sequence, target language word ID sequence and next word ID
127+
sequence.
126128
127129
:return: Train reader creator
128130
:rtype: callable

python/paddle/v2/trainer.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
"""
2-
Trainer package
2+
Module Trainer
33
"""
44
import collections
55

0 commit comments

Comments
 (0)