Skip to content

Commit ebf0776

Browse files
[CodeStyle][Typos][A-[2-5],C-[8-9],C-[16-18]] Fix typo('Adventages','Archetecture','Asynchoronous','Attrbute','Attribtue','cliping','colunms','contruction','containg','cantains') (#7595)
* fix-a2-a5 * fix-c8-c9-c16-c18
1 parent 57c3ee6 commit ebf0776

File tree

11 files changed

+13
-23
lines changed

11 files changed

+13
-23
lines changed

_typos.toml

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -23,11 +23,6 @@ Nervana = "Nervana"
2323

2424
# These words need to be fixed
2525
Accuray = "Accuray"
26-
Adventages = "Adventages"
27-
Archetecture = "Archetecture"
28-
Asynchoronous = "Asynchoronous"
29-
Attrbute = "Attrbute"
30-
Attribtue = "Attribtue"
3126
Creenshot = "Creenshot"
3227
Embeddding = "Embeddding"
3328
Embeding = "Embeding"
@@ -45,12 +40,7 @@ Successed = "Successed"
4540
accordding = "accordding"
4641
accoustic = "accoustic"
4742
accpetance = "accpetance"
48-
cantains = "cantains"
4943
classfy = "classfy"
50-
cliping = "cliping"
51-
colunms = "colunms"
52-
containg = "containg"
53-
contruction = "contruction"
5444
contxt = "contxt"
5545
convertion = "convertion"
5646
convinience = "convinience"

docs/api/paddle/nn/functional/sparse_attention_cn.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ sparse_attention
88
99
对 Transformer 模块中的 Attention 矩阵进行了稀疏化,从而减少内存消耗和计算量。
1010

11-
其稀疏数据排布通过 CSR 格式表示,CSR 格式包含两个参数,``offset`` 和 ``colunms``。计算公式为:
11+
其稀疏数据排布通过 CSR 格式表示,CSR 格式包含两个参数,``offset`` 和 ``columns``。计算公式为:
1212

1313
.. math::
1414
result=softmax(\frac{ Q * K^T }{\sqrt{d}}) * V
@@ -24,7 +24,7 @@ sparse_attention
2424
- **key** (Tensor) - 输入的 Tensor,代表注意力模块中的 ``key``,这是一个 4 维 Tensor,形状为:[batch_size, num_heads, seq_len, head_dim],数据类型为 float32 或 float64。
2525
- **value** (Tensor) - 输入的 Tensor,代表注意力模块中的 ``value``,这是一个 4 维 Tensor,形状为:[batch_size, num_heads, seq_len, head_dim],数据类型为 float32 或 float64。
2626
- **sparse_csr_offset** (Tensor) - 输入的 Tensor,注意力模块中的稀疏特性,稀疏特性使用 CSR 格式表示,``offset`` 代表矩阵中每一行非零元的数量。这是一个 3 维 Tensor,形状为:[batch_size, num_heads, seq_len + 1],数据类型为 int32。
27-
- **sparse_csr_columns** (Tensor) - 输入的 Tensor,注意力模块中的稀疏特性,稀疏特性使用 CSR 格式表示,``colunms`` 代表矩阵中每一行非零元的列索引值。这是一个 3 维 Tensor,形状为:[batch_size, num_heads, sparse_nnz],数据类型为 int32。
27+
- **sparse_csr_columns** (Tensor) - 输入的 Tensor,注意力模块中的稀疏特性,稀疏特性使用 CSR 格式表示,``columns`` 代表矩阵中每一行非零元的列索引值。这是一个 3 维 Tensor,形状为:[batch_size, num_heads, sparse_nnz],数据类型为 int32。
2828

2929
返回
3030
:::::::::

docs/design/concurrent/parallel_do.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ We can avoid this step by making each device have a copy of the parameter. This
113113
1. In the backward, allreduce param@grad at different devices, this requires
114114
1. `backward.py` add `allreduce` operators at parallel_do_grad
115115
1. `allreduce` operators need to be called in async mode to achieve maximum throughput
116-
1. apply gradients related op(i.e. cliping, normalization, decay, sgd) on different devices in parallel
116+
1. apply gradients related op(i.e. clipping, normalization, decay, sgd) on different devices in parallel
117117

118118
By doing so, we also avoided "backward: accumulate param@grad from different devices to the first device".
119119
And the ProgramDesc looks like the following

docs/design/mkldnn/gru/gru.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Proof:
4141
PaddlePaddle allows user to choose activation functions for update/reset gate and output gate. However, oneDNN supports only default `sigmoid` activation for gates and `tanh` for output. Currently oneDNN operator throws an error when user tries to execute it with other activations.
4242

4343
## oneDNN GRU operator
44-
oneDNN `GRU` operator is based on Paddle Paddle `fusion_gru` operator. It uses primitive/memory caching mechanism called `AcquireAPI`. Handler containg 2 caching key, one dependent on sentence length used in caching input/output and primitive. The other key (`memory_key`) depends only on other, not changing during inference, parameters and is used to cache weights and bias memory.
44+
oneDNN `GRU` operator is based on Paddle Paddle `fusion_gru` operator. It uses primitive/memory caching mechanism called `AcquireAPI`. Handler containing 2 caching key, one dependent on sentence length used in caching input/output and primitive. The other key (`memory_key`) depends only on other, not changing during inference, parameters and is used to cache weights and bias memory.
4545

4646
### Dimensions in oneDNN RNN primitives
4747

docs/design/mkldnn/inplace/inplace.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Currently assumption is that if operator can have in-place processing then all i
1515
- gelu*
1616
- sum**
1717

18-
Adventages of in-place computation are:
18+
Advantages of in-place computation are:
1919
* lower memory usage
2020
* improved performance of operators
2121

docs/design/network/deep_speech_2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ The classical DS2 network contains 15 layers (from bottom to top):
117117

118118
<div align="center">
119119
<img src="https://raw.githubusercontent.com/PaddlePaddle/Paddle/develop/doc/fluid/images/ds2_network.png" width=350><br/>
120-
Figure 1. Archetecture of Deep Speech 2 Network.
120+
Figure 1. Architecture of Deep Speech 2 Network.
121121
</div>
122122

123123
We don't have to persist on this 2-3-7-1-1-1 depth \[[2](#references)\]. Similar networks with different depths might also work well. As in \[[1](#references)\], authors use a different depth (e.g. 2-2-3-1-1-1) for final experiments.

docs/design/others/graph_survey.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def get_symbol(num_classes=10, **kwargs):
3030

3131
Variable here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own AnyAttr. There is a op field in AnyAttr class, when a Symbol represents Variable(often input data), the op field is null.
3232

33-
Symbol contains a data member, std::vector<NodeEntry> outputs, and NodeEntry cantains a pointer to Node. We can follow the Node pointer to get all the Graph.
33+
Symbol contains a data member, std::vector<NodeEntry> outputs, and NodeEntry contains a pointer to Node. We can follow the Node pointer to get all the Graph.
3434

3535
And Symbol can be saved to a JSON file.
3636

docs/dev_guides/api_contributing_guides/api_design_guidelines_standard_cn.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -538,7 +538,7 @@
538538
| 级联 | coalesced | |
539539
| 数据并行 | data parallelism | |
540540
| 模型并行 | model parallelism | |
541-
| 异步随机梯度下降 | Asynchoronous Stochastic Gradient Descent | |
541+
| 异步随机梯度下降 | Asynchronous Stochastic Gradient Descent | |
542542
| 参数服务器 | parameter server | |
543543
| 模型压缩 | model compression | |
544544
| 动态结构 | dynamic structure | |

docs/guides/advanced/layer_and_model_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ In this guide, you will learn how to define and make use of models in Paddle, an
1111

1212
In Paddle, most models consist of a series of layers. Layer serves as the foundamental logical unit of a model, composed of two parts: the variable that participates in the computation and the operator(s) that actually perform the execution.
1313

14-
Constructing a model from scratch could be painful, with tons of nested codes to write and maintain. To make life easier, Paddle provides foundamental data structure ``paddle.nn.Layer`` to simplify the contruction of layer or model. One may easily inherit from ``paddle.nn.Layer`` to define their custom layers or models. In addition, since both model and layer are essentially inherited from ``paddle.nn.Layer``, model is nothing but a special layer in Paddle.
14+
Constructing a model from scratch could be painful, with tons of nested codes to write and maintain. To make life easier, Paddle provides foundamental data structure ``paddle.nn.Layer`` to simplify the construction of layer or model. One may easily inherit from ``paddle.nn.Layer`` to define their custom layers or models. In addition, since both model and layer are essentially inherited from ``paddle.nn.Layer``, model is nothing but a special layer in Paddle.
1515

1616
Now let us construct a model using ``paddle.nn.Layer``:
1717

docs/guides/paddle_v3_features/paddle_ir_cn.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
</figure>
1010

1111
在深度学习框架 IR 概念中,「顺序性」和「图语义」是两个非常高频常用的概念。旧的中间表示体系由「顺序性」ProgramDesc 和「图语义」Graph 两个核心类共同承载。用户在静态图 API 或者动转静模块下,产生的中间表示是 Op-by-Op 的 Program,如果要应用更高层面的优化策略(比如算子融合、inplace 策略、剪枝等),框架会将由 Program 构造出 Graph,其由数据节点、算子节点和彼此关联的边构成。
12-
在新的 Paddle IR 中,飞桨在底层抽象了一套高度可扩展的基础组件,包括 Type、Attrbute、Op、Trait 和 Interface,并引入了 Dialect 的概念,支持开发者灵活扩展、自由定制,提供了完备鲁邦的语义表达能力;在模型表示层,通过多 Dialect 模块化管理,统一多端表示,实现了训推一体的全架构统一表示,无缝衔接组合算子和编译器,支持自动优化和多硬件适配;在图变换层,通过统一底层模块,简化基础概念,向用户提供了低成本开发、易用高性能、丰富可插拔的 Pass 优化机制。
12+
在新的 Paddle IR 中,飞桨在底层抽象了一套高度可扩展的基础组件,包括 Type、Attribute、Op、Trait 和 Interface,并引入了 Dialect 的概念,支持开发者灵活扩展、自由定制,提供了完备鲁邦的语义表达能力;在模型表示层,通过多 Dialect 模块化管理,统一多端表示,实现了训推一体的全架构统一表示,无缝衔接组合算子和编译器,支持自动优化和多硬件适配;在图变换层,通过统一底层模块,简化基础概念,向用户提供了低成本开发、易用高性能、丰富可插拔的 Pass 优化机制。
1313
飞桨的新一代的 IR 表示坚持 SSA(静态单赋值)原则,模型等价于一个有向无环图。并以 Value、Operation 对计算图进行抽象, Operation 为节点,Value 为边。
1414

1515
* Operation 表示计算图中的节点:一个 Operation 表示一个算子,它里面包含了零个或多个 Region;Region 表示一个闭包,它里面包含了零个或多个 Block;Block 表示一个符合 SSA 的基本块,里面包含了零个或多个 Operation;三者循环嵌套,可以实现任意复杂的语法结构
@@ -96,7 +96,7 @@ print(out)
9696

9797
如上左图所示,新一代 IR 的整体设计自底向上分为三层:
9898
### 1.灵活的基础组件
99-
飞桨提供了 Trait 和 Interface 两种重要机制实现了对算子 Op 的特征和接口的抽象标记。 比如 InplaceTrait 表示一个 Op 具有 Inplace 特征, InferShapeInterface 表示一个算子定义了 InferShape 函数接口等,这二者都是可以任意扩展的,只要派生自相应的基类、遵循相应的实现规则即可;并对算子体系下核心概念抽出 Type、Attrbute、Op,这三者是基于 Trait 和 Interface 进行定义的。它们会对关联自己所拥有的相应 Trait 和 Interface ;Dialect 用来对 Type、Attribtue、Op 做模块化管理, 比如 BuiltinDialect、PaddleDialect、CinnDialect 等等。一个 Dialect 里面包含了一系列的 Type、Attribtue、Op 的定义。相应的,每个 Type、Attribtue、Op 都是定义在某个唯一的 Dialect 里面。对整个 IR 框架而言, Dialect 是可以随意插拔的,也是可以任意扩展的。
99+
飞桨提供了 Trait 和 Interface 两种重要机制实现了对算子 Op 的特征和接口的抽象标记。 比如 InplaceTrait 表示一个 Op 具有 Inplace 特征, InferShapeInterface 表示一个算子定义了 InferShape 函数接口等,这二者都是可以任意扩展的,只要派生自相应的基类、遵循相应的实现规则即可;并对算子体系下核心概念抽出 Type、Attribute、Op,这三者是基于 Trait 和 Interface 进行定义的。它们会对关联自己所拥有的相应 Trait 和 Interface ;Dialect 用来对 Type、Attribute、Op 做模块化管理, 比如 BuiltinDialect、PaddleDialect、CinnDialect 等等。一个 Dialect 里面包含了一系列的 Type、Attribute、Op 的定义。相应的,每个 Type、Attribute、Op 都是定义在某个唯一的 Dialect 里面。对整个 IR 框架而言, Dialect 是可以随意插拔的,也是可以任意扩展的。
100100

101101
这一层是 IR 适应多种场景的基础。这一层的每一个要素都是可定制化扩展的,一般情况下,针对一个具体的场景,比如分布式、编译器。都需要定义自己需要用到的 Trait、Interface,然后定义自己的 Dialect,在自己的 Dialect 里面,定义自己需要用到的 Type、Attribute、Op。
102102

0 commit comments

Comments
 (0)