complete merge: resolve conflicts from origin/develop

liangqi520 · liangqi520 · commit d9887211283c · 2025-11-12T11:26:08.000+08:00
diff --git a/_typos.toml b/_typos.toml
@@ -29,25 +29,15 @@ feeded = "feeded"
 
 # These words need to be fixed
 Learing = "Learing"
-Moible = "Moible"
 Operaton = "Operaton"
 Optimizaing = "Optimizaing"
 Optimzier = "Optimzier"
 Setment = "Setment"
 Simle = "Simle"
 Sovler = "Sovler"
 libary = "libary"
-mantained = "mantained"
 matrics = "matrics"
-mdule = "mdule"
-mechnism = "mechnism"
-memeory = "memeory"
-memroy = "memroy"
-messege = "messege"
-metaphore = "metaphore"
 metrices = "metrices"
-muliply = "muliply"
-mulitplying = "mulitplying"
 mutbale = "mutbale"
 occurence = "occurence"
 opeartor = "opeartor"
diff --git a/docs/api/paddle/static/accuracy_cn.rst b/docs/api/paddle/static/accuracy_cn.rst
@@ -10,7 +10,7 @@ accuracy
 
 accuracy layer。参考 https://en.wikipedia.org/wiki/Precision_and_recall
 
-使用输入和标签计算准确率。如果正确的标签在 topk 个预测值里，则计算结果加 1。注意：输出正确率的类型由 input 类型决定，input 和 lable 的类型可以不一样。
+使用输入和标签计算准确率。如果正确的标签在 topk 个预测值里，则计算结果加 1。注意：输出正确率的类型由 input 类型决定，input 和 label 的类型可以不一样。
 
 参数
 ::::::::::::
diff --git a/docs/design/concurrent/go_op.md b/docs/design/concurrent/go_op.md
@@ -218,7 +218,7 @@ for more details.
 
 #### Green Threads
 
-Golang utilizes `green threads`, which is a mechnism for the runtime library to
+Golang utilizes `green threads`, which is a mechanism for the runtime library to
 manage multiple threads (instead of natively by the OS).  Green threads usually
 allows for faster thread creation and switching, as there is less overhead
 when spawning these threads.  For the first version of CSP, we only support
diff --git a/docs/design/memory/README.md b/docs/design/memory/README.md
@@ -116,7 +116,7 @@ I got inspiration from Majel and Caffe2, though above design look different from
 
 ### Caffe2
 
-In Caffe2, `Tensor<Context>::mutable_data()` allocates the memroy.  In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479).
+In Caffe2, `Tensor<Context>::mutable_data()` allocates the memory.  In particular, [`Tensor<Context>::mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L523) calls [`Tensor<Context>::raw_mutable_data`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L459), which in turn calls [`Context::New`](https://github.com/caffe2/caffe2/blob/v0.7.0/caffe2/core/tensor.h#L479).
 
 There are two implementations of `Context`:
 
diff --git a/docs/design/memory/memory_optimization.md b/docs/design/memory/memory_optimization.md
@@ -53,7 +53,7 @@ In compilers, the front end of the compiler translates programs into an intermed
 
 Therefore, the compiler needs to analyze the intermediate-representation program to determine which temporary variables are in use at the same time. We say a variable is "live" if it holds a value that may be needed in the future, so this analysis is called liveness analysis.
 
-We can leran these techniques from compilers. There are mainly two stages to make live variable analysis:
+We can learn these techniques from compilers. There are mainly two stages to make live variable analysis:
 
 - construct a control flow graph
 - solve the dataflow equations
@@ -197,7 +197,7 @@ After op1, we can process variable b and variable c; After op2, we can process v
 
 #### memory sharing policy
 
-A memory pool will be mantained in the stage of memory optimization. Each operator node will be scanned to determine memory optimization is done or not. If an operator satisfies the requirement, following policy will be taken to handle input/output variables.
+A memory pool will be maintained in the stage of memory optimization. Each operator node will be scanned to determine memory optimization is done or not. If an operator satisfies the requirement, following policy will be taken to handle input/output variables.
 
 ```
 if op.support_inplace():
diff --git a/docs/design/mkldnn/gru/gru.md b/docs/design/mkldnn/gru/gru.md
@@ -99,12 +99,12 @@ Because oneDNN assumes that all sentences are of equal length, before reorder, w
 ![](images/input_is_reverse.svg)
 
 * PaddlePaddle WeightX -> oneDNN WeightX\
-WeightX does not need custom reorders because memory arrangement is the same for both PP and oneDNN. However, it has to be modified if `origin_mode==false` by mulitplying update gate part by `-1`. At the end, oneDNN reorder is called to convert weights to correct type and strides selected by primitive.
+WeightX does not need custom reorders because memory arrangement is the same for both PP and oneDNN. However, it has to be modified if `origin_mode==false` by multiplying update gate part by `-1`. At the end, oneDNN reorder is called to convert weights to correct type and strides selected by primitive.
 * PaddlePaddle WeightH -> oneDNN WeightH\
 WeightH tensor has different representation in PP and oneDNN. PaddlePaddle stores it as 2 connected blocks of memory, where first contains reset and update gate recurrent weights, and second stores output gate recurrent weights. In oneDNN, these weights are stored in a single memory block of size `[OC, 3, OC]`. Therefore, custom reorder is needed here. After that, if `origin_mode==false`, update gate part is multiplied by `-1`. At the end, oneDNN reorder is called to convert weights to correct type and strides selected by primitive.
 ![](images/different_tensor_memory_arrangement.svg)
 * PaddlePaddle Bias -> oneDNN Bias\
-Bias does not require reorder from PP to oneDNN. However, if it is not provided by user, it has to be created and filled with `0.0f` because oneDNN requires it. If it was provided, it has to be modified when `origin_mode==false` by mulitplying update gate part by `-1`. Note: bias is always of `float` data type, even in `int8` and `bfloat16` kernels.
+Bias does not require reorder from PP to oneDNN. However, if it is not provided by user, it has to be created and filled with `0.0f` because oneDNN requires it. If it was provided, it has to be modified when `origin_mode==false` by multiplying update gate part by `-1`. Note: bias is always of `float` data type, even in `int8` and `bfloat16` kernels.
 * oneDNN TNC/NTC -> PaddlePaddle Output LoD\
 After execution of oneDNN GRU primitive, output tensor has to be converted back to PP representation. It is done in the same way as input reorder but in a reverse manner.
 
diff --git a/docs/dev_guides/custom_device_docs/memory_api_en.md b/docs/dev_guides/custom_device_docs/memory_api_en.md
@@ -180,7 +180,7 @@ It copies synchronous memory in the device.
 
 device - the device to be used
 
-dst - the address of the destination device memroy
+dst - the address of the destination device memory
 
 src - the address of the source device memory
 
diff --git a/docs/eval/evaluation_of_docs_system.md b/docs/eval/evaluation_of_docs_system.md
@@ -271,7 +271,7 @@ TensorFlow 的文档规划，比较直接地匹配了本文所介绍的分类标
     - Training Transformer models using Pipeline Parallelism
     - Training Transformer models using Distributed Data Parallel and Pipeline Parallelism
     - Distributed Training with Uneven Inputs Using the Join Context Manager
-- Moible
+- Mobile
     - Image Segmentation DeepLabV3 on iOS
     - Image Segmentation DeepLabV3 on Android
 - Recommendation Systems
diff --git a/docs/eval/【Hackathon No.69】PR.md b/docs/eval/【Hackathon No.69】PR.md
@@ -97,7 +97,7 @@ torch.tensor(data,
 
 在 paddle.to_tensor 中，stop_gradient 表示是否阻断梯度传导，PyTorch 的 requires_grad 表示是否不阻断梯度传导。
 
-在 torch.tensor 中，pin_memeory 表示是否使用锁页内存，而 PaddlePaddle 却无此参数。
+在 torch.tensor 中，pin_memory 表示是否使用锁页内存，而 PaddlePaddle 却无此参数。
 
 ------
 
diff --git a/docs/faq/train_cn.md b/docs/faq/train_cn.md
@@ -8,7 +8,7 @@
 
 ##### 问题：请问`paddle.matmul`和`paddle.multiply`有什么区别？
 
-+ 答复：`matmul`支持的两个 tensor 的矩阵乘操作。`muliply`是支持两个 tensor 进行逐元素相乘。
++ 答复：`matmul`支持的两个 tensor 的矩阵乘操作。`multiply`是支持两个 tensor 进行逐元素相乘。
 
 ----------
 
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch._assert.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch._assert.md
@@ -9,8 +9,8 @@ Paddle 无此 API，需要组合实现。
 ### 转写示例
 ```python
 # PyTorch 写法
-torch._assert(condition=1==2, message='error messege')
+torch._assert(condition=1==2, message='error message')
 
 # Paddle 写法
-assert 1==2, 'error messege'
+assert 1==2, 'error message'
 ```
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.testing.assert_allclose.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.testing.assert_allclose.md
@@ -9,8 +9,8 @@ Paddle 无此 API，需要组合实现。
 ### 转写示例
 ```python
 # PyTorch 写法
-torch.testing.assert_allclose(actual, expected, rtol=rtol, atol=atol, equal_nan=True, msg='error messege')
+torch.testing.assert_allclose(actual, expected, rtol=rtol, atol=atol, equal_nan=True, msg='error message')
 
 # Paddle 写法
-assert paddle.allclose(actual, expected, rtol=rtol, atol=atol, equal_nan=True).item(), 'error messege'
+assert paddle.allclose(actual, expected, rtol=rtol, atol=atol, equal_nan=True).item(), 'error message'
 ```
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.testing.assert_close.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.testing.assert_close.md
@@ -9,8 +9,8 @@ Paddle 无此 API，需要组合实现。
 ### 转写示例
 ```python
 # PyTorch 写法
-torch.testing.assert_close(actual, expected, rtol=rtol, atol=atol, equal_nan=True, msg='error messege')
+torch.testing.assert_close(actual, expected, rtol=rtol, atol=atol, equal_nan=True, msg='error message')
 
 # Paddle 写法
-assert paddle.allclose(actual, expected, rtol=rtol, atol=atol, equal_nan=True).item(), 'error messege'
+assert paddle.allclose(actual, expected, rtol=rtol, atol=atol, equal_nan=True).item(), 'error message'
 ```
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.load.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.load.md
@@ -26,6 +26,6 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：
 | map_location     | -            | 表示如何重新映射存储位置，Paddle 无此参数，暂无转写方式。    |
 | pickle_module    | -            | 表示用于 unpickling 元数据和对象的模块，Paddle 无此参数，暂无转写方式。   |
 | weights_only     | -            | 指示 unpickler 是否应限制为仅加载张量、原始类型和字典，Paddle 无此参数，暂无转写方式。 |
-| pickle_load_args | -            | 传递给 pickle_module.load（）和 pickle_mdule.Unpickler（）的可选关键字参数，Paddle 无此参数，暂无转写方式。 |
+| pickle_load_args | -            | 传递给 pickle_module.load（）和 pickle_module.Unpickler（）的可选关键字参数，Paddle 无此参数，暂无转写方式。 |
 | mmap             | -            | 指示是否使用 mmap 文件，Paddle 无此参数，暂无转写方式。 |
 | -                | configs      | 表示其他用于兼容的载入配置选项。PyTorch 无此参数，Paddle 保持默认即可。 |
diff --git a/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.scalar_tensor.md b/docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.scalar_tensor.md
@@ -27,7 +27,7 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：
 |  layout    |  -        | 数据布局格式，Paddle 无此参数。一般对训练结果影响不大，可直接删除。 |
 |  device    |  place        | 表示 Tensor 存放设备位置，两者参数类型不相同，需要转写。 |
 |  requires_grad  |  stop_gradient   | 表示是否计算梯度， 两者参数意义不相同，Paddle 输入与 PyTorch 逻辑相反。需要转写。 |
-|  pin_memory    | - | 表示是否使用锁页内存， Paddle 无此参数，需要转写。 Paddle 需要对结果使用 padlde.Tensor.pin_memory()。  |
+|  pin_memory    | - | 表示是否使用锁页内存， Paddle 无此参数，需要转写。 Paddle 需要对结果使用 paddle.Tensor.pin_memory()。  |
 
 ### 转写示例
 #### device: Tensor 的设备
diff --git a/docs/guides/model_convert/convert_from_pytorch/deprecated/pytorch_api_mapping_format_cn_deprecated.md b/docs/guides/model_convert/convert_from_pytorch/deprecated/pytorch_api_mapping_format_cn_deprecated.md
@@ -273,7 +273,7 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：（注：这里
 | device        | -      | 表示 Tensor 存放设备位置，Paddle 无此参数，需要转写。    |
 | requires_grad | -      | 表示是否计算梯度， Paddle 无此参数，需要转写。           |
 | memory_format | -      | 表示内存格式， Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
-| pin_memeory   | -      | 表示是否使用锁页内存， Paddle 无此参数，需要转写。       |
+| pin_memory   | -      | 表示是否使用锁页内存， Paddle 无此参数，需要转写。       |
 | generator     | -      | 用于采样的伪随机数生成器， Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | size_average  | -      | PyTorch 已弃用， Paddle 无此参数，需要转写。                  |
 | reduce        | -      | PyTorch 已弃用， Paddle 无此参数，需要转写。                  |
diff --git a/docs/guides/model_convert/convert_from_pytorch/pytorch_api_mapping_format_cn.md b/docs/guides/model_convert/convert_from_pytorch/pytorch_api_mapping_format_cn.md
@@ -259,7 +259,7 @@ PyTorch 相比 Paddle 支持更多其他参数，具体如下：（注：这里
 | device        | -      | 表示 Tensor 存放设备位置，Paddle 无此参数，需要转写。    |
 | requires_grad | -      | 表示是否计算梯度， Paddle 无此参数，需要转写。           |
 | memory_format | -      | 表示内存格式， Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
-| pin_memeory   | -      | 表示是否使用锁页内存， Paddle 无此参数，需要转写。       |
+| pin_memory   | -      | 表示是否使用锁页内存， Paddle 无此参数，需要转写。       |
 | generator     | -      | 用于采样的伪随机数生成器， Paddle 无此参数，一般对网络训练结果影响不大，可直接删除。 |
 | size_average  | -      | PyTorch 已弃用， Paddle 无此参数，需要转写。                  |
 | reduce        | -      | PyTorch 已弃用， Paddle 无此参数，需要转写。                  |
diff --git a/docs/practices/gan/cyclegan/cyclegan.ipynb b/docs/practices/gan/cyclegan/cyclegan.ipynb
diff --git a/docs/practices/nlp/transformer_in_English-to-Spanish.ipynb b/docs/practices/nlp/transformer_in_English-to-Spanish.ipynb