Skip to content

Commit bba7376

Browse files
fix-c8-c9-c16-c18
1 parent cc0397e commit bba7376

File tree

7 files changed

+8
-13
lines changed

7 files changed

+8
-13
lines changed

_typos.toml

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,7 @@ Successed = "Successed"
4040
accordding = "accordding"
4141
accoustic = "accoustic"
4242
accpetance = "accpetance"
43-
cantains = "cantains"
4443
classfy = "classfy"
45-
cliping = "cliping"
46-
colunms = "colunms"
47-
containg = "containg"
48-
contruction = "contruction"
4944
contxt = "contxt"
5045
convertion = "convertion"
5146
convinience = "convinience"

docs/api/paddle/nn/functional/sparse_attention_cn.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ sparse_attention
88
99
对 Transformer 模块中的 Attention 矩阵进行了稀疏化,从而减少内存消耗和计算量。
1010

11-
其稀疏数据排布通过 CSR 格式表示,CSR 格式包含两个参数,``offset`` 和 ``colunms``。计算公式为:
11+
其稀疏数据排布通过 CSR 格式表示,CSR 格式包含两个参数,``offset`` 和 ``columns``。计算公式为:
1212

1313
.. math::
1414
result=softmax(\frac{ Q * K^T }{\sqrt{d}}) * V
@@ -24,7 +24,7 @@ sparse_attention
2424
- **key** (Tensor) - 输入的 Tensor,代表注意力模块中的 ``key``,这是一个 4 维 Tensor,形状为:[batch_size, num_heads, seq_len, head_dim],数据类型为 float32 或 float64。
2525
- **value** (Tensor) - 输入的 Tensor,代表注意力模块中的 ``value``,这是一个 4 维 Tensor,形状为:[batch_size, num_heads, seq_len, head_dim],数据类型为 float32 或 float64。
2626
- **sparse_csr_offset** (Tensor) - 输入的 Tensor,注意力模块中的稀疏特性,稀疏特性使用 CSR 格式表示,``offset`` 代表矩阵中每一行非零元的数量。这是一个 3 维 Tensor,形状为:[batch_size, num_heads, seq_len + 1],数据类型为 int32。
27-
- **sparse_csr_columns** (Tensor) - 输入的 Tensor,注意力模块中的稀疏特性,稀疏特性使用 CSR 格式表示,``colunms`` 代表矩阵中每一行非零元的列索引值。这是一个 3 维 Tensor,形状为:[batch_size, num_heads, sparse_nnz],数据类型为 int32。
27+
- **sparse_csr_columns** (Tensor) - 输入的 Tensor,注意力模块中的稀疏特性,稀疏特性使用 CSR 格式表示,``columns`` 代表矩阵中每一行非零元的列索引值。这是一个 3 维 Tensor,形状为:[batch_size, num_heads, sparse_nnz],数据类型为 int32。
2828

2929
返回
3030
:::::::::

docs/design/concurrent/parallel_do.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@ We can avoid this step by making each device have a copy of the parameter. This
113113
1. In the backward, allreduce param@grad at different devices, this requires
114114
1. `backward.py` add `allreduce` operators at parallel_do_grad
115115
1. `allreduce` operators need to be called in async mode to achieve maximum throughput
116-
1. apply gradients related op(i.e. cliping, normalization, decay, sgd) on different devices in parallel
116+
1. apply gradients related op(i.e. clipping, normalization, decay, sgd) on different devices in parallel
117117

118118
By doing so, we also avoided "backward: accumulate param@grad from different devices to the first device".
119119
And the ProgramDesc looks like the following

docs/design/mkldnn/gru/gru.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Proof:
4141
PaddlePaddle allows user to choose activation functions for update/reset gate and output gate. However, oneDNN supports only default `sigmoid` activation for gates and `tanh` for output. Currently oneDNN operator throws an error when user tries to execute it with other activations.
4242

4343
## oneDNN GRU operator
44-
oneDNN `GRU` operator is based on Paddle Paddle `fusion_gru` operator. It uses primitive/memory caching mechanism called `AcquireAPI`. Handler containg 2 caching key, one dependent on sentence length used in caching input/output and primitive. The other key (`memory_key`) depends only on other, not changing during inference, parameters and is used to cache weights and bias memory.
44+
oneDNN `GRU` operator is based on Paddle Paddle `fusion_gru` operator. It uses primitive/memory caching mechanism called `AcquireAPI`. Handler containing 2 caching key, one dependent on sentence length used in caching input/output and primitive. The other key (`memory_key`) depends only on other, not changing during inference, parameters and is used to cache weights and bias memory.
4545

4646
### Dimensions in oneDNN RNN primitives
4747

docs/design/others/graph_survey.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def get_symbol(num_classes=10, **kwargs):
3030

3131
Variable here is actually a Symbol. Every basic Symbol will correspond to one Node, and every Node has its own AnyAttr. There is a op field in AnyAttr class, when a Symbol represents Variable(often input data), the op field is null.
3232

33-
Symbol contains a data member, std::vector<NodeEntry> outputs, and NodeEntry cantains a pointer to Node. We can follow the Node pointer to get all the Graph.
33+
Symbol contains a data member, std::vector<NodeEntry> outputs, and NodeEntry contains a pointer to Node. We can follow the Node pointer to get all the Graph.
3434

3535
And Symbol can be saved to a JSON file.
3636

docs/guides/advanced/layer_and_model_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ In this guide, you will learn how to define and make use of models in Paddle, an
1111

1212
In Paddle, most models consist of a series of layers. Layer serves as the foundamental logical unit of a model, composed of two parts: the variable that participates in the computation and the operator(s) that actually perform the execution.
1313

14-
Constructing a model from scratch could be painful, with tons of nested codes to write and maintain. To make life easier, Paddle provides foundamental data structure ``paddle.nn.Layer`` to simplify the contruction of layer or model. One may easily inherit from ``paddle.nn.Layer`` to define their custom layers or models. In addition, since both model and layer are essentially inherited from ``paddle.nn.Layer``, model is nothing but a special layer in Paddle.
14+
Constructing a model from scratch could be painful, with tons of nested codes to write and maintain. To make life easier, Paddle provides foundamental data structure ``paddle.nn.Layer`` to simplify the construction of layer or model. One may easily inherit from ``paddle.nn.Layer`` to define their custom layers or models. In addition, since both model and layer are essentially inherited from ``paddle.nn.Layer``, model is nothing but a special layer in Paddle.
1515

1616
Now let us construct a model using ``paddle.nn.Layer``:
1717

docs/templates/common_docs.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,12 @@
2323
stride (tuple|int): The stride size. It can be a single integer or a tuple containing two integers, representing the strides of the convolution along the height and width. If it is a single integer, the height and width are equal to the integer. Default is 1.
2424
groups (int, optional): The group number of convolution layer. When group=n, the input and convolution kernels are divided into n groups equally, the first group of convolution kernels and the first group of inputs are subjected to convolution calculation, the second group of convolution kernels and the second group of inputs are subjected to convolution calculation, ……, the nth group of convolution kernels and the nth group of inputs perform convolution calculations. Default is 1.
2525
regularization (WeightDecayRegularizer, optional): The strategy of regularization. There are two method: :ref:`api_fluid_regularizer_L1Decay` 、 :ref:`api_fluid_regularizer_L2Decay` . If a parameter has set regularizer using :ref:`api_fluid_ParamAttr` already, the regularization setting here in optimizer will be ignored for this parameter. Otherwise, the regularization setting here in optimizer will take effect. Default None, meaning there is no regularization.
26-
grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of some derived class of ``GradientClipBase`` . There are three cliping strategies ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` ). Default None, meaning there is no gradient clipping.
26+
grad_clip (GradientClipBase, optional): Gradient clipping strategy, it's an instance of some derived class of ``GradientClipBase`` . There are three clipping strategies ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` ). Default None, meaning there is no gradient clipping.
2727
dilation (tuple|int): The dilation size. It can be a single integer or a tuple containing two integers, representing the height and width of dilation of the convolution kernel elements. If it is a single integer,the height and width of dilation are equal to the integer. Default is 1.
2828
stop_gradient (bool, optional): A boolean that mentions whether gradient should flow. Default is True, means stop calculate gradients.
2929
force_cpu (bool, optional): Whether force to store the output tensor in CPU memory. If force_cpu is False, the output tensor will be stored in running device memory, otherwise it will be stored to the CPU memory. Default is False.
3030
data_format (str, optional): Specify the input data format, the output data format will be consistent with the input, which can be ``NCHW`` or ``NHWC`` . N is batch size, C is channels, H is height, and W is width. Default is ``NCHW`` .
31-
grad_clip (GradientClipBase, optional): Gradient cliping strategy, it's an instance of some derived class of ``GradientClipBase`` . There are three cliping strategies ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` ). Default is None, meaning there is no gradient clipping.
31+
grad_clip (GradientClipBase, optional): Gradient clipping strategy, it's an instance of some derived class of ``GradientClipBase`` . There are three clipping strategies ( :ref:`api_fluid_clip_GradientClipByGlobalNorm` , :ref:`api_fluid_clip_GradientClipByNorm` , :ref:`api_fluid_clip_GradientClipByValue` ). Default is None, meaning there is no gradient clipping.
3232
num_filters (int): The number of filter. It is as same as the output channals numbers.
3333
dim (int, optional): A dimension along which to operate. Default is 0.
3434
is_sparse (bool, optional): Whether use sparse updating. For more information, please refer to :ref:`api_guide_sparse_update_en` . If it's True, it will use sparse updating.

0 commit comments

Comments
 (0)