Skip to content

Commit cd0973e

Browse files
authored
[CodeStyle][Typos][V-[1-7]] Fix typo(startswith V) (#7561)
* [CodeStyle][Typos][V-[1-7]] Fix typo(startswith V) * fix Variable-length
1 parent d366198 commit cd0973e

File tree

9 files changed

+12
-21
lines changed

9 files changed

+12
-21
lines changed

_typos.toml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,6 @@ Transfomed = "Transfomed"
6868
Tthe = "Tthe"
6969
Ture = "Ture"
7070
Useage = "Useage"
71-
Varialble = "Varialble"
72-
Varible = "Varible"
73-
Varient = "Varient"
7471
Wether = "Wether"
7572
accordding = "accordding"
7673
accoustic = "accoustic"
@@ -242,12 +239,6 @@ unqiue = "unqiue"
242239
unsupport = "unsupport"
243240
updte = "updte"
244241
utill = "utill"
245-
varialbes = "varialbes"
246-
varibale = "varibale"
247-
varibales = "varibales"
248-
varience = "varience"
249-
varient = "varient"
250-
visting = "visting"
251242
warpped = "warpped"
252243
wether = "wether"
253244
wiht = "wiht"

docs/design/concepts/tensor.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ Please reference the section of `Learn from Majel` for more details.
161161

162162
`ArrayView` is an encapsulation of `Array`, which introduces extra iterator methods, such as `begin()` and `end()`. The `begin()` method returns an iterator pointing to the first element in the ArrayView. And the `end()` method returns an iterator pointing to the pass-the-end element in the ArrayView.
163163

164-
`ArrayView` make the visting and manipulating an array more efficiently, flexibly and safely.
164+
`ArrayView` make the visiting and manipulating an array more efficiently, flexibly and safely.
165165

166166

167167
A global function `make_view` is provided to transform an array to corresponding arrayview.

docs/design/concepts/tensor_array.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ class TensorArray:
212212
```
213213

214214
## DenseTensor-related Supports
215-
The `RecurrentGradientMachine` in Paddle serves as a flexible RNN layer; it takes varience-length sequences as input, and output sequences too.
215+
The `RecurrentGradientMachine` in Paddle serves as a flexible RNN layer; it takes variable-length sequences as input, and output sequences too.
216216

217217
Since each step of RNN can only take a tensor-represented batch of data as input,
218218
some preprocess should be taken on the inputs such as sorting the sentences by their length in descending order and cut each word and pack to new batches.
@@ -244,10 +244,10 @@ def pack(level, indices_map):
244244
pass
245245
```
246246

247-
With these two methods, a varience-length sentence supported RNN can be implemented like
247+
With these two methods, a variable-length sentence supported RNN can be implemented like
248248

249249
```c++
250-
// input is the varient-length data
250+
// input is the variable-length data
251251
LodTensor sentence_input(xxx);
252252
TensorArray ta;
253253
Tensor indice_map;
@@ -268,4 +268,4 @@ for (int step = 0; step = ta.size(); step++) {
268268
DenseTensor rnn_output = ta.pack(ta, indice_map);
269269
```
270270
the code above shows that by embedding the DenseTensor-related preprocess operations into `TensorArray`,
271-
the implementation of a RNN that supports varient-length sentences is far more concise than `RecurrentGradientMachine` because the latter mixes all the codes together, hard to read and extend.
271+
the implementation of a RNN that supports variable-length sentences is far more concise than `RecurrentGradientMachine` because the latter mixes all the codes together, hard to read and extend.

docs/design/dynamic_rnn/rnn_design_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Varient Length supported RNN Design
1+
# Variable Length supported RNN Design
22
For the learning of variable length sequences, the existing mainstream frameworks such as tensorflow, pytorch, caffe2, mxnet and so on all use padding.
33

44
Different-length sequences in a mini-batch will be padded with zeros and transformed to same length.

docs/design/modules/backward.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ def _append_backward_ops_(target,
6161
target_block(Block): the block which is going to hold new generated grad ops
6262
no_grad_dict(dict):
6363
key(int) block index
64-
val(set) a set of varibale names. These varibales have no gradient
64+
val(set) a set of variable names. These variables have no gradient
6565
grad_to_var(dict)(output argument):
6666
key(str): grad variable name
6767
val(str): corresponding forward variable name

docs/design/modules/net_op_design.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ class PlainNet : public Net {
9090
// Create a network describe by `def`. NetDesc is the definition of a network.
9191
PlainNet(const NetDesc &def);
9292

93-
// Infer all the operators' input and output varialbes' shapes, will be called before every mini-batch
93+
// Infer all the operators' input and output variables' shapes, will be called before every mini-batch
9494
training.
9595
virtual Error InferShape(Scope *scope) override;
9696

docs/design/others/gan_api.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ class DCGAN:
5858
self.D_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data
5959
self.D_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
6060
self.D_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data
61-
self.D_W2 = pd.Varialble(np.random.rand(128, 1))
61+
self.D_W2 = pd.Variable(np.random.rand(128, 1))
6262
self.D_b2 = pd.Variable(np.zeros(128))
6363
self.theta_D = [self.D_W0, self.D_b0, self.D_W1, self.D_b1, self.D_W2, self.D_b2]
6464

@@ -67,7 +67,7 @@ class DCGAN:
6767
self.G_b0 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data
6868
self.G_W1 = pd.Variable(shape=[784, 128], data=pd.gaussian_normal_randomizer())
6969
self.G_b1 = pd.Variable(np.zeros(128)) # variable also support initialization using a numpy data
70-
self.G_W2 = pd.Varialble(np.random.rand(128, 1))
70+
self.G_W2 = pd.Variable(np.random.rand(128, 1))
7171
self.G_b2 = pd.Variable(np.zeros(128))
7272
self.theta_G = [self.G_W0, self.G_b0, self.G_W1, self.G_b1, self.G_W2, self.G_b2]
7373
```

docs/design/others/graph_survey.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ def get_symbol(num_classes=10, **kwargs):
2828

2929

3030

31-
Varible here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own AnyAttr. There is a op field in AnyAttr class, when a Symbol represents Variable(often input data), the op field is null.
31+
Variable here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own AnyAttr. There is a op field in AnyAttr class, when a Symbol represents Variable(often input data), the op field is null.
3232

3333
Symbol contains a data member, std::vector<NodeEntry> outputs, and NodeEntry cantains a poniter to Node. We can follow the Node pointer to get all the Graph.
3434

docs/guides/06_distributed_training/group_sharded_parallel_cn.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@
1616
1.1 GroupSharded
1717
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1818

19-
GroupSharded 实现了类似 ZeRO-DP 的训练策略,将模型状态包括:模型参数(parameter)、参数梯度(gradient)、参数对应的优化器状态(以 Adam 为例 moment 和 varience)切分到每一张 GPU 上。让模型参数部分所占的显存随并行卡数的增加而减少。
19+
GroupSharded 实现了类似 ZeRO-DP 的训练策略,将模型状态包括:模型参数(parameter)、参数梯度(gradient)、参数对应的优化器状态(以 Adam 为例 moment 和 variance)切分到每一张 GPU 上。让模型参数部分所占的显存随并行卡数的增加而减少。
2020
通过 paddle.distributed.sharding.group_sharded_parallel 提供的简单易用接口, 用户只需要添加几行代码就可将策略加入到原有的训练中。
2121

2222
模型训练过程中的显存消耗主要由两大部分组成:模型参数及优化器状态、训练产生的中间变量(activations)。

0 commit comments

Comments
 (0)