labmlai
diff --git a/‎translate_cache/transformers/configs.zh.json
Lines changed: 1 addition & 1 deletion b/‎translate_cache/transformers/configs.zh.json
Lines changed: 1 addition & 1 deletion
diff --git a/‎translate_cache/transformers/feed_forward.zh.json
Lines changed: 1 addition & 1 deletion b/‎translate_cache/transformers/feed_forward.zh.json
Lines changed: 1 addition & 1 deletion
diff --git a/‎translate_cache/transformers/label_smoothing_loss.zh.json
Lines changed: 1 addition & 1 deletion b/‎translate_cache/transformers/label_smoothing_loss.zh.json
Lines changed: 1 addition & 1 deletion
@@ -1,6 +1,6 @@
 {
  "<h1>Configurable Transformer Components</h1>\n": "<h1>\u53ef\u914d\u7f6e\u7684 Transformer \u7ec4\u4ef6</h1>\n",
- "<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u53d8\u4f53</h2>\n<p>\u8fd9\u4e9b\u662f\u5728\u8bba\u6587 <a href=\"https://arxiv.org/abs/2002.05202\">\u300a GLU Variants Improve Transformer \u300b</a>\u4e2d\u5305\u542b\u7684\u5404\u79cd\u5e26\u95e8\u63a7\u9690\u85cf\u5c42\u7684 ffn \u53d8\u4f53\u3002\u6211\u4eec\u5df2\u6309\u7167\u8bba\u6587\u89c4\u5b9a\u7701\u7565\u4e86\u504f\u7f6e\u9879\u3002</p>\n",
+ "<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u53d8\u4f53</h2>\n<p>\u8fd9\u4e9b\u662f\u5728\u8bba\u6587 <a href=\"https://arxiv.org/abs/2002.05202\">\u300a GLU Variants Improve Transformer \u300b</a>\u4e2d\u5305\u542b\u7684\u5404\u79cd\u5e26\u95e8\u63a7\u9690\u85cf\u5c42\u7684 FFN \u53d8\u4f53\u3002\u6211\u4eec\u5df2\u6309\u7167\u8bba\u6587\u89c4\u5b9a\u7701\u7565\u4e86\u504f\u7f6e\u9879\u3002</p>\n",
  "<h3>FFN with Bilinear hidden layer</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u53cc\u7ebf\u6027\u9690\u85cf\u5c42\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
  "<h3>FFN with GELU gate</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26 GELU \u95e8\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
  "<h3>FFN with Gated Linear Units</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u95e8\u63a7\u7ebf\u6027\u5355\u5143\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",
 
@@ -1,5 +1,5 @@
 {
- "<h1>Position-wise Feed-Forward Network (FFN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of position-wise feedforward network used in transformer.</p>\n<p>FFN consists of two fully connected layers. Number of dimensions in the hidden layer <span translate=no>_^_0_^_</span>, is generally set to around four times that of the token embedding <span translate=no>_^_1_^_</span>. So it is sometime also called the expand-and-contract network.</p>\n<p>There is an activation at the hidden layer, which is usually set to ReLU (Rectified Linear Unit) activation, <span translate=no>_^_2_^_</span></p>\n<p>That is, the FFN function is, <span translate=no>_^_3_^_</span> where <span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>, <span translate=no>_^_6_^_</span> and <span translate=no>_^_7_^_</span> are learnable parameters.</p>\n<p>Sometimes the GELU (Gaussian Error Linear Unit) activation is also used instead of ReLU. <span translate=no>_^_8_^_</span> where <span translate=no>_^_9_^_</span></p>\n<h3>Gated Linear Units</h3>\n<p>This is a generic implementation that supports different variants including <a href=\"https://arxiv.org/abs/2002.05202\">Gated Linear Units</a> (GLU). We have also implemented experiments on these:</p>\n<ul><li><a href=\"glu_variants/experiment.html\">experiment that uses <span translate=no>_^_10_^_</span></a> </li>\n<li><a href=\"glu_variants/simple.html\">simpler version from scratch</a></li></ul>\n": "<h1>\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc （FFN）</h1>\n<p>\u8fd9\u662f Transformer \u4e2d\u4f7f\u7528\u7684\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc\u7684 <a href=\"https://pytorch.org\"> PyTorch </a> \u5b9e\u73b0\u3002</p>\n<p> FFN \u7531\u4e24\u4e2a\u5168\u8fde\u63a5\u5c42\u7ec4\u6210\u3002\u9690\u85cf\u5c42\u4e2d\u7684\u7ef4\u5ea6\u6570<span translate=no>_%5e_0_%5e_</span>\u901a\u5e38\u8bbe\u7f6e\u4e3a\u6807\u8bb0\u5d4c\u5165\u7ef4\u5ea6<span translate=no>_%5e_1_%5e_</span>\u7684\u56db\u500d\u5de6\u53f3\u3002\u56e0\u6b64\uff0c\u5b83\u6709\u65f6\u4e5f\u88ab\u79f0\u4e3a\u6269\u5f20-\u538b\u7f29\u7f51\u7edc\u3002</p>\n<p>\u9690\u85cf\u5c42\u6709\u4e00\u4e2a\u6fc0\u6d3b\u51fd\u6570\uff0c\u901a\u5e38\u8bbe\u7f6e\u4e3a ReLU (Rectified Linear Unit) \u6fc0\u6d3b\u51fd\u6570\uff0c<span translate=no>_%5e_2_%5e_</span></p>\n<p>\u5728\u6b64\u57fa\u7840\u4e0a\uff0c FFN \u51fd\u6570\u53ef\u4ee5\u5199\u4f5c\uff1a<span translate=no>_%5e_3_%5e_</span>\u5176\u4e2d<span translate=no>_%5e_4_%5e_</span><span translate=no>_%5e_5_%5e_</span>\u3001<span translate=no>_%5e_6_%5e_</span>\u548c<span translate=no>_%5e_7_%5e_</span>\u662f\u53ef\u5b66\u4e60\u7684\u53c2\u6570\u3002</p>\n<p>\u6709\u65f6\u8fd8\u4f1a\u4f7f\u7528 GELU (Gaussian Error Linear Unit) \u6fc0\u6d3b\u51fd\u6570\u6765\u4ee3\u66ff ReLU \u3002<span translate=no>_%5e_8_%5e_</span>\u5176\u4e2d<span translate=no>_%5e_9_%5e_</span></p>\n<h3>\u95e8\u63a7\u7ebf\u6027\u5355\u5143</h3>\n<p>\u8fd9\u662f\u4e00\u4e2a\u901a\u7528\u5b9e\u73b0\uff0c\u652f\u6301\u5305\u62ec<a href=\"https://arxiv.org/abs/2002.05202\">\u95e8\u63a7\u7ebf\u6027\u5355\u5143(GLU)</a> \u5728\u5185\u7684\u4e0d\u540c\u53d8\u4f53\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd9\u4e9b\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff1a</p>\n<ul><li><a href=\"glu_variants/experiment.html\">\u4f7f\u7528<span translate=no>_%5e_10_%5e_</span></a>\u7684\u5b9e\u9a8c</li>\n<li><a href=\"glu_variants/simple.html\">\u4ece\u5934\u5f00\u59cb\u7684\u7b80\u5316\u7248\u672c</a></li></ul>\n",
+ "<h1>Position-wise Feed-Forward Network (FFN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of position-wise feedforward network used in transformer.</p>\n<p>FFN consists of two fully connected layers. Number of dimensions in the hidden layer <span translate=no>_^_0_^_</span>, is generally set to around four times that of the token embedding <span translate=no>_^_1_^_</span>. So it is sometime also called the expand-and-contract network.</p>\n<p>There is an activation at the hidden layer, which is usually set to ReLU (Rectified Linear Unit) activation, <span translate=no>_^_2_^_</span></p>\n<p>That is, the FFN function is, <span translate=no>_^_3_^_</span> where <span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>, <span translate=no>_^_6_^_</span> and <span translate=no>_^_7_^_</span> are learnable parameters.</p>\n<p>Sometimes the GELU (Gaussian Error Linear Unit) activation is also used instead of ReLU. <span translate=no>_^_8_^_</span> where <span translate=no>_^_9_^_</span></p>\n<h3>Gated Linear Units</h3>\n<p>This is a generic implementation that supports different variants including <a href=\"https://arxiv.org/abs/2002.05202\">Gated Linear Units</a> (GLU). We have also implemented experiments on these:</p>\n<ul><li><a href=\"glu_variants/experiment.html\">experiment that uses <span translate=no>_^_10_^_</span></a> </li>\n<li><a href=\"glu_variants/simple.html\">simpler version from scratch</a></li></ul>\n": "<h1>\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc （FFN）</h1>\n<p>\u8fd9\u662f Transformer \u4e2d\u4f7f\u7528\u7684\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc\u7684 <a href=\"https://pytorch.org\"> PyTorch </a> \u5b9e\u73b0\u3002</p>\n<p> FFN \u7531\u4e24\u4e2a\u5168\u8fde\u63a5\u5c42\u7ec4\u6210\u3002\u9690\u85cf\u5c42\u4e2d\u7684\u7ef4\u5ea6\u6570<span translate=no>_^_0_^_</span>\u901a\u5e38\u8bbe\u7f6e\u4e3a\u6807\u8bb0\u5d4c\u5165\u7ef4\u5ea6<span translate=no>_^_1_^_</span>\u7684\u56db\u500d\u5de6\u53f3\u3002\u56e0\u6b64\uff0c\u5b83\u6709\u65f6\u4e5f\u88ab\u79f0\u4e3a\u6269\u5f20-\u538b\u7f29\u7f51\u7edc\u3002</p>\n<p>\u9690\u85cf\u5c42\u6709\u4e00\u4e2a\u6fc0\u6d3b\u51fd\u6570\uff0c\u901a\u5e38\u8bbe\u7f6e\u4e3a ReLU (Rectified Linear Unit) \u6fc0\u6d3b\u51fd\u6570\uff0c<span translate=no>_^_2_^_</span></p>\n<p>\u5728\u6b64\u57fa\u7840\u4e0a\uff0c FFN \u51fd\u6570\u53ef\u4ee5\u5199\u4f5c\uff1a<span translate=no>_^_3_^_</span>\u5176\u4e2d<span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u3001<span translate=no>_^_6_^_</span>\u548c<span translate=no>_^_7_^_</span>\u662f\u53ef\u5b66\u4e60\u7684\u53c2\u6570\u3002</p>\n<p>\u6709\u65f6\u8fd8\u4f1a\u4f7f\u7528 GELU (Gaussian Error Linear Unit) \u6fc0\u6d3b\u51fd\u6570\u6765\u4ee3\u66ff ReLU \u3002<span translate=no>_^_8_^_</span>\u5176\u4e2d<span translate=no>_^_9_^_</span></p>\n<h3>\u95e8\u63a7\u7ebf\u6027\u5355\u5143</h3>\n<p>\u8fd9\u662f\u4e00\u4e2a\u901a\u7528\u5b9e\u73b0\uff0c\u652f\u6301\u5305\u62ec<a href=\"https://arxiv.org/abs/2002.05202\">\u95e8\u63a7\u7ebf\u6027\u5355\u5143(GLU)</a> \u5728\u5185\u7684\u4e0d\u540c\u53d8\u4f53\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd9\u4e9b\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff1a</p>\n<ul><li><a href=\"glu_variants/experiment.html\">\u4f7f\u7528<span translate=no>_^_10_^_</span></a>\u7684\u5b9e\u9a8c</li>\n<li><a href=\"glu_variants/simple.html\">\u4ece\u5934\u5f00\u59cb\u7684\u7b80\u5316\u7248\u672c</a></li></ul>\n",
  "<h2>FFN module</h2>\n": "<h2>FFN \u6a21\u5757</h2>\n",
  "<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",
  "<p><span translate=no>_^_0_^_</span> or <span translate=no>_^_1_^_</span> depending on whether it is gated </p>\n": "<p>\u6839\u636e\u662f\u5426\u8fdb\u884c\u95e8\u63a7\uff0c\u8fd4\u56de<span translate=no>_^_0_^_</span>\u6216\u8005<span translate=no>_^_1_^_</span></p>\n",
 
@@ -1,7 +1,7 @@
 {
  "<h1>Label Smoothing Loss</h1>\n": "<h1>\u6807\u7b7e\u5e73\u6ed1\u635f\u5931</h1>\n",
  "<p>Show the target distributions expected by the system. </p>\n": "<p>\u5c55\u793a\u7cfb\u7edf\u671f\u671b\u7684\u76ee\u6807\u5206\u5e03\u3002</p>\n",
- "<p>print(predict) </p>\n": "<p>\u8f93\u51fa\uff08\u9884\u6d4b\uff09</p>\n",
+ "<p>print(predict) </p>\n": "<p>\u8f93\u51fa\u9884\u6d4b\u503c</p>\n",
  "Label Smoothing Loss": "\u6807\u7b7e\u5e73\u6ed1\u635f\u5931",
  "This is an implementation of label smoothing loss, that can be used as an alternative to cross entropy loss for improved accuracy.": "\u8fd9\u662f\u6807\u7b7e\u5e73\u6ed1\u635f\u5931\u7684\u5b9e\u73b0\uff0c\u53ef\u4f5c\u4e3a\u4ea4\u53c9\u71b5\u635f\u5931\u7684\u66ff\u4ee3\u54c1\u4ee5\u63d0\u9ad8\u51c6\u786e\u6027\u3002"
 }
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`{`
`2`	`2`	`"<h1>Configurable Transformer Components</h1>\n": "<h1>\u53ef\u914d\u7f6e\u7684 Transformer \u7ec4\u4ef6</h1>\n",`
`3`		- "<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u53d8\u4f53</h2>\n<p>\u8fd9\u4e9b\u662f\u5728\u8bba\u6587 <a href=\"https://arxiv.org/abs/2002.05202\">\u300a GLU Variants Improve Transformer \u300b</a>\u4e2d\u5305\u542b\u7684\u5404\u79cd\u5e26\u95e8\u63a7\u9690\u85cf\u5c42\u7684 ffn \u53d8\u4f53\u3002\u6211\u4eec\u5df2\u6309\u7167\u8bba\u6587\u89c4\u5b9a\u7701\u7565\u4e86\u504f\u7f6e\u9879\u3002</p>\n",
	`3`	+ "<h2>GLU Variants</h2>\n<p>These are variants with gated hidden layers for the FFN as introduced in paper <a href=\"https://arxiv.org/abs/2002.05202\">GLU Variants Improve Transformer</a>. We have omitted the bias terms as specified in the paper. </p>\n": "<h2>GLU \u53d8\u4f53</h2>\n<p>\u8fd9\u4e9b\u662f\u5728\u8bba\u6587 <a href=\"https://arxiv.org/abs/2002.05202\">\u300a GLU Variants Improve Transformer \u300b</a>\u4e2d\u5305\u542b\u7684\u5404\u79cd\u5e26\u95e8\u63a7\u9690\u85cf\u5c42\u7684 FFN \u53d8\u4f53\u3002\u6211\u4eec\u5df2\u6309\u7167\u8bba\u6587\u89c4\u5b9a\u7701\u7565\u4e86\u504f\u7f6e\u9879\u3002</p>\n",
`4`	`4`	`"<h3>FFN with Bilinear hidden layer</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u53cc\u7ebf\u6027\u9690\u85cf\u5c42\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",`
`5`	`5`	`"<h3>FFN with GELU gate</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26 GELU \u95e8\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",`
`6`	`6`	`"<h3>FFN with Gated Linear Units</h3>\n<p><span translate=no>_^_0_^_</span> </p>\n": "<h3>\u5e26\u95e8\u63a7\u7ebf\u6027\u5355\u5143\u7684 FFN</h3>\n<p><span translate=no>_^_0_^_</span></p>\n",`
Original file line number	Diff line number	Diff line change
`@@ -1,5 +1,5 @@`
`1`	`1`	`{`
`2`		- "<h1>Position-wise Feed-Forward Network (FFN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of position-wise feedforward network used in transformer.</p>\n<p>FFN consists of two fully connected layers. Number of dimensions in the hidden layer <span translate=no>_^_0_^_</span>, is generally set to around four times that of the token embedding <span translate=no>_^_1_^_</span>. So it is sometime also called the expand-and-contract network.</p>\n<p>There is an activation at the hidden layer, which is usually set to ReLU (Rectified Linear Unit) activation, <span translate=no>_^_2_^_</span></p>\n<p>That is, the FFN function is, <span translate=no>_^_3_^_</span> where <span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>, <span translate=no>_^_6_^_</span> and <span translate=no>_^_7_^_</span> are learnable parameters.</p>\n<p>Sometimes the GELU (Gaussian Error Linear Unit) activation is also used instead of ReLU. <span translate=no>_^_8_^_</span> where <span translate=no>_^_9_^_</span></p>\n<h3>Gated Linear Units</h3>\n<p>This is a generic implementation that supports different variants including <a href=\"https://arxiv.org/abs/2002.05202\">Gated Linear Units</a> (GLU). We have also implemented experiments on these:</p>\n<ul><li><a href=\"glu_variants/experiment.html\">experiment that uses <span translate=no>_^_10_^_</span></a> </li>\n<li><a href=\"glu_variants/simple.html\">simpler version from scratch</a></li></ul>\n": "<h1>\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc （FFN）</h1>\n<p>\u8fd9\u662f Transformer \u4e2d\u4f7f\u7528\u7684\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc\u7684 <a href=\"https://pytorch.org\"> PyTorch </a> \u5b9e\u73b0\u3002</p>\n<p> FFN \u7531\u4e24\u4e2a\u5168\u8fde\u63a5\u5c42\u7ec4\u6210\u3002\u9690\u85cf\u5c42\u4e2d\u7684\u7ef4\u5ea6\u6570<span translate=no>_%5e_0_%5e_</span>\u901a\u5e38\u8bbe\u7f6e\u4e3a\u6807\u8bb0\u5d4c\u5165\u7ef4\u5ea6<span translate=no>_%5e_1_%5e_</span>\u7684\u56db\u500d\u5de6\u53f3\u3002\u56e0\u6b64\uff0c\u5b83\u6709\u65f6\u4e5f\u88ab\u79f0\u4e3a\u6269\u5f20-\u538b\u7f29\u7f51\u7edc\u3002</p>\n<p>\u9690\u85cf\u5c42\u6709\u4e00\u4e2a\u6fc0\u6d3b\u51fd\u6570\uff0c\u901a\u5e38\u8bbe\u7f6e\u4e3a ReLU (Rectified Linear Unit) \u6fc0\u6d3b\u51fd\u6570\uff0c<span translate=no>_%5e_2_%5e_</span></p>\n<p>\u5728\u6b64\u57fa\u7840\u4e0a\uff0c FFN \u51fd\u6570\u53ef\u4ee5\u5199\u4f5c\uff1a<span translate=no>_%5e_3_%5e_</span>\u5176\u4e2d<span translate=no>_%5e_4_%5e_</span><span translate=no>_%5e_5_%5e_</span>\u3001<span translate=no>_%5e_6_%5e_</span>\u548c<span translate=no>_%5e_7_%5e_</span>\u662f\u53ef\u5b66\u4e60\u7684\u53c2\u6570\u3002</p>\n<p>\u6709\u65f6\u8fd8\u4f1a\u4f7f\u7528 GELU (Gaussian Error Linear Unit) \u6fc0\u6d3b\u51fd\u6570\u6765\u4ee3\u66ff ReLU \u3002<span translate=no>_%5e_8_%5e_</span>\u5176\u4e2d<span translate=no>_%5e_9_%5e_</span></p>\n<h3>\u95e8\u63a7\u7ebf\u6027\u5355\u5143</h3>\n<p>\u8fd9\u662f\u4e00\u4e2a\u901a\u7528\u5b9e\u73b0\uff0c\u652f\u6301\u5305\u62ec<a href=\"https://arxiv.org/abs/2002.05202\">\u95e8\u63a7\u7ebf\u6027\u5355\u5143(GLU)</a> \u5728\u5185\u7684\u4e0d\u540c\u53d8\u4f53\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd9\u4e9b\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff1a</p>\n<ul><li><a href=\"glu_variants/experiment.html\">\u4f7f\u7528<span translate=no>_%5e_10_%5e_</span></a>\u7684\u5b9e\u9a8c</li>\n<li><a href=\"glu_variants/simple.html\">\u4ece\u5934\u5f00\u59cb\u7684\u7b80\u5316\u7248\u672c</a></li></ul>\n",
	`2`	+ "<h1>Position-wise Feed-Forward Network (FFN)</h1>\n<p>This is a <a href=\"https://pytorch.org\">PyTorch</a> implementation of position-wise feedforward network used in transformer.</p>\n<p>FFN consists of two fully connected layers. Number of dimensions in the hidden layer <span translate=no>_^_0_^_</span>, is generally set to around four times that of the token embedding <span translate=no>_^_1_^_</span>. So it is sometime also called the expand-and-contract network.</p>\n<p>There is an activation at the hidden layer, which is usually set to ReLU (Rectified Linear Unit) activation, <span translate=no>_^_2_^_</span></p>\n<p>That is, the FFN function is, <span translate=no>_^_3_^_</span> where <span translate=no>_^_4_^_</span>, <span translate=no>_^_5_^_</span>, <span translate=no>_^_6_^_</span> and <span translate=no>_^_7_^_</span> are learnable parameters.</p>\n<p>Sometimes the GELU (Gaussian Error Linear Unit) activation is also used instead of ReLU. <span translate=no>_^_8_^_</span> where <span translate=no>_^_9_^_</span></p>\n<h3>Gated Linear Units</h3>\n<p>This is a generic implementation that supports different variants including <a href=\"https://arxiv.org/abs/2002.05202\">Gated Linear Units</a> (GLU). We have also implemented experiments on these:</p>\n<ul><li><a href=\"glu_variants/experiment.html\">experiment that uses <span translate=no>_^_10_^_</span></a> </li>\n<li><a href=\"glu_variants/simple.html\">simpler version from scratch</a></li></ul>\n": "<h1>\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc （FFN）</h1>\n<p>\u8fd9\u662f Transformer \u4e2d\u4f7f\u7528\u7684\u4f4d\u7f6e\u524d\u9988\u7f51\u7edc\u7684 <a href=\"https://pytorch.org\"> PyTorch </a> \u5b9e\u73b0\u3002</p>\n<p> FFN \u7531\u4e24\u4e2a\u5168\u8fde\u63a5\u5c42\u7ec4\u6210\u3002\u9690\u85cf\u5c42\u4e2d\u7684\u7ef4\u5ea6\u6570<span translate=no>_^_0_^_</span>\u901a\u5e38\u8bbe\u7f6e\u4e3a\u6807\u8bb0\u5d4c\u5165\u7ef4\u5ea6<span translate=no>_^_1_^_</span>\u7684\u56db\u500d\u5de6\u53f3\u3002\u56e0\u6b64\uff0c\u5b83\u6709\u65f6\u4e5f\u88ab\u79f0\u4e3a\u6269\u5f20-\u538b\u7f29\u7f51\u7edc\u3002</p>\n<p>\u9690\u85cf\u5c42\u6709\u4e00\u4e2a\u6fc0\u6d3b\u51fd\u6570\uff0c\u901a\u5e38\u8bbe\u7f6e\u4e3a ReLU (Rectified Linear Unit) \u6fc0\u6d3b\u51fd\u6570\uff0c<span translate=no>_^_2_^_</span></p>\n<p>\u5728\u6b64\u57fa\u7840\u4e0a\uff0c FFN \u51fd\u6570\u53ef\u4ee5\u5199\u4f5c\uff1a<span translate=no>_^_3_^_</span>\u5176\u4e2d<span translate=no>_^_4_^_</span><span translate=no>_^_5_^_</span>\u3001<span translate=no>_^_6_^_</span>\u548c<span translate=no>_^_7_^_</span>\u662f\u53ef\u5b66\u4e60\u7684\u53c2\u6570\u3002</p>\n<p>\u6709\u65f6\u8fd8\u4f1a\u4f7f\u7528 GELU (Gaussian Error Linear Unit) \u6fc0\u6d3b\u51fd\u6570\u6765\u4ee3\u66ff ReLU \u3002<span translate=no>_^_8_^_</span>\u5176\u4e2d<span translate=no>_^_9_^_</span></p>\n<h3>\u95e8\u63a7\u7ebf\u6027\u5355\u5143</h3>\n<p>\u8fd9\u662f\u4e00\u4e2a\u901a\u7528\u5b9e\u73b0\uff0c\u652f\u6301\u5305\u62ec<a href=\"https://arxiv.org/abs/2002.05202\">\u95e8\u63a7\u7ebf\u6027\u5355\u5143(GLU)</a> \u5728\u5185\u7684\u4e0d\u540c\u53d8\u4f53\u3002\u6211\u4eec\u8fd8\u5bf9\u8fd9\u4e9b\u8fdb\u884c\u4e86\u5b9e\u9a8c\uff1a</p>\n<ul><li><a href=\"glu_variants/experiment.html\">\u4f7f\u7528<span translate=no>_^_10_^_</span></a>\u7684\u5b9e\u9a8c</li>\n<li><a href=\"glu_variants/simple.html\">\u4ece\u5934\u5f00\u59cb\u7684\u7b80\u5316\u7248\u672c</a></li></ul>\n",
`3`	`3`	`"<h2>FFN module</h2>\n": "<h2>FFN \u6a21\u5757</h2>\n",`
`4`	`4`	`"<p><span translate=no>_^_0_^_</span> </p>\n": "<p><span translate=no>_^_0_^_</span></p>\n",`
`5`	`5`	`"<p><span translate=no>_^_0_^_</span> or <span translate=no>_^_1_^_</span> depending on whether it is gated </p>\n": "<p>\u6839\u636e\u662f\u5426\u8fdb\u884c\u95e8\u63a7\uff0c\u8fd4\u56de<span translate=no>_^_0_^_</span>\u6216\u8005<span translate=no>_^_1_^_</span></p>\n",`
Original file line number	Diff line number	Diff line change
`@@ -1,7 +1,7 @@`
`1`	`1`	`{`
`2`	`2`	`"<h1>Label Smoothing Loss</h1>\n": "<h1>\u6807\u7b7e\u5e73\u6ed1\u635f\u5931</h1>\n",`
`3`	`3`	`"<p>Show the target distributions expected by the system. </p>\n": "<p>\u5c55\u793a\u7cfb\u7edf\u671f\u671b\u7684\u76ee\u6807\u5206\u5e03\u3002</p>\n",`
`4`		`- "<p>print(predict) </p>\n": "<p>\u8f93\u51fa\uff08\u9884\u6d4b\uff09</p>\n",`
	`4`	`+ "<p>print(predict) </p>\n": "<p>\u8f93\u51fa\u9884\u6d4b\u503c</p>\n",`
`5`	`5`	`"Label Smoothing Loss": "\u6807\u7b7e\u5e73\u6ed1\u635f\u5931",`
`6`	`6`	`"This is an implementation of label smoothing loss, that can be used as an alternative to cross entropy loss for improved accuracy.": "\u8fd9\u662f\u6807\u7b7e\u5e73\u6ed1\u635f\u5931\u7684\u5b9e\u73b0\uff0c\u53ef\u4f5c\u4e3a\u4ea4\u53c9\u71b5\u635f\u5931\u7684\u66ff\u4ee3\u54c1\u4ee5\u63d0\u9ad8\u51c6\u786e\u6027\u3002"`
`7`	`7`	`}`