labmlai
diff --git a/‎docs/sitemap.xml
Lines changed: 1 addition & 1 deletion b/‎docs/sitemap.xml
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/zh/index.html
Lines changed: 2 additions & 2 deletions b/‎docs/zh/index.html
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/zh/sitemap.xml
Lines changed: 1 addition & 1 deletion b/‎docs/zh/sitemap.xml
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/zh/transformers/configs.html
Lines changed: 42 additions & 42 deletions b/‎docs/zh/transformers/configs.html
Lines changed: 42 additions & 42 deletions
diff --git a/‎docs/zh/transformers/feed_forward.html
Lines changed: 28 additions & 29 deletions b/‎docs/zh/transformers/feed_forward.html
Lines changed: 28 additions & 29 deletions
diff --git a/‎docs/zh/transformers/index.html
Lines changed: 47 additions & 47 deletions b/‎docs/zh/transformers/index.html
Lines changed: 47 additions & 47 deletions
diff --git a/‎docs/zh/transformers/label_smoothing_loss.html
Lines changed: 5 additions & 5 deletions b/‎docs/zh/transformers/label_smoothing_loss.html
Lines changed: 5 additions & 5 deletions
@@ -1450,7 +1450,7 @@
 
     <url>
       <loc>https://nn.labml.ai/rl/ppo/gae.html</loc>
-      <lastmod>2023-10-24T16:30:00+00:00</lastmod>
+      <lastmod>2024-06-24T16:30:00+00:00</lastmod>
       <priority>1.00</priority>
     </url>
 
 
@@ -72,7 +72,7 @@
             <h1><a href="index.html">labml.ai 带注释的 PyTorch 版论文实现</a></h1>
 <p>这是一个用 PyTorch 实现各种神经网络和相关算法的集合。每个算法的<a href="https://github.com/labmlai/annotated_deep_learning_paper_implementations">代码实现</a>都有详细的解释说明，且在<a href="index.html">网站</a>上与代码逐行对应。我们相信，这些内容将帮助您更好地理解这些算法。</p>
 <p><img alt="Screenshot" src="dqn-light.png"></p>
-<p>我们正在积极维护这个仓库并添加新的代码实现<a href="https://twitter.com/labmlai"><img alt="Twitter" src="https://img.shields.io/twitter/follow/labmlai?style=social"></a>以获取更新。</p>
+<p>我们正在积极维护这个仓库并添加新的代码实现。<a href="https://twitter.com/labmlai"><img alt="Twitter" src="https://img.shields.io/twitter/follow/labmlai?style=social"></a>以获取更新。</p>
 <h2>翻译</h2>
 <h3><strong><a href="https://nn.labml.ai">英语（原版）</a></strong></h3>
 </a><h3><strong><a href="https://nn.labml.ai/zh/">中文（翻译）</strong></h3>
@@ -102,7 +102,7 @@ <h4>✨ <a href="transformers/index.html">Transformers</a></h4>
 <li><a href="transformers/primer_ez/index.html">Primer</a></li>
 <li><a href="transformers/hour_glass/index.html">沙漏网络</a></li></ul>
 <h4>✨ <a href="neox/index.html">Eleuther GPT-neox</a></h4>
-<li><a href="neox/samples/generate.html">在一块 48GB GPU 上生成</a></li> <ul>
+<ul><li><a href="neox/samples/generate.html">在一块 48GB GPU 上生成</a></li> 
 <li><a href="neox/samples/finetune.html">在两块 48GB GPU 上微调</a></li>
 <li><a href="neox/utils/llm_int8.html">llm.int8 ()</a></li></ul>
 <h4>✨ <a href="diffusion/index.html">扩散模型</a></h4>
 
@@ -1450,7 +1450,7 @@
 
     <url>
       <loc>https://nn.labml.ai/rl/ppo/gae.html</loc>
-      <lastmod>2023-10-24T16:30:00+00:00</lastmod>
+      <lastmod>2024-06-24T16:30:00+00:00</lastmod>
       <priority>1.00</priority>
     </url>
 
 
@@ -3,24 +3,24 @@
 <head>
     <meta http-equiv="content-type" content="text/html;charset=utf-8"/>
     <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
-    <meta name="description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合。"/>
+    <meta name="description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集。"/>
 
     <meta name="twitter:card" content="summary"/>
     <meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
-    <meta name="twitter:title" content="变压器"/>
-    <meta name="twitter:description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合。"/>
+    <meta name="twitter:title" content="Transformers"/>
+    <meta name="twitter:description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集。"/>
     <meta name="twitter:site" content="@labmlai"/>
     <meta name="twitter:creator" content="@labmlai"/>
 
     <meta property="og:url" content="https://nn.labml.ai/transformers/index.html"/>
-    <meta property="og:title" content="变压器"/>
+    <meta property="og:title" content="Transformers"/>
     <meta property="og:image" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
-    <meta property="og:site_name" content="变压器"/>
+    <meta property="og:site_name" content="Transformers"/>
     <meta property="og:type" content="object"/>
-    <meta property="og:title" content="变压器"/>
-    <meta property="og:description" content="这是变压器和相关技术的 PyTorch 实现/教程的集合。"/>
+    <meta property="og:title" content="Transformers"/>
+    <meta property="og:description" content="这是一个包含 Transformers 及相关技术的 PyTorch 实现和教程的合集。"/>
 
-    <title>变压器</title>
+    <title>Transformers</title>
     <link rel="shortcut icon" href="/icon.png"/>
     <link rel="stylesheet" href="../pylit.css?v=1">
     <link rel="canonical" href="https://nn.labml.ai/transformers/index.html"/>
@@ -70,50 +70,50 @@
             <div class='section-link'>
                 <a href='#section-0'>#</a>
             </div>
-            <h1>变压器</h1>
-</a><p>本模块包含 <a href="https://pytorch.org/">PyTorch 实现和论文 Attention Is <a href="https://arxiv.org/abs/1706.03762">All You Need</a> 中对原创变压器的解释，以及它的衍生品和增强功能。</p>
-<ul><li><a href="mha.html">多头关注</a></li>
-<li><a href="models.html">变压器编码器和解码器型号</a></li>
+            <h1>Transformers</h1>
+</a><p>本节内容包含对论文<a href="https://arxiv.org/abs/1706.03762">《 Attention is All You Need 》</a>中原始 Transformer 的解释与<a href="https://pytorch.org/">PyTorch</a> 实现，以及对其衍生和增强版本的解释与实现。</p>
+<ul><li><a href="mha.html">多头注意力</a></li>
+<li><a href="models.html">Transformer 编码器和解码器模型</a></li>
 <li><a href="feed_forward.html">位置前馈网络 (FFN)</a></li>
 <li><a href="positional_encoding.html">固定位置编码</a></li></ul>
-<h2><a href="xl/index.html">变压器 XL</a></h2>
-<p>这使用<a href="xl/relative_mha.html">相对的多头注意力</a>实现了变形金刚 XL 模型</p>
-<h2><a href="rope/index.html">旋转位置嵌入</a></h2>
-<p>这实现了旋转位置嵌入 (roPE)</p>
-<h2><a href="alibi/index.html">注意线性偏差</a></h2>
-<p>这实现了线性偏差注意力（AliBI）。</p>
-<h2><a href="retro/index.html">复古</a></h2>
-<p>这实现了检索增强型转换器（RETRO）。</p>
-<h2><a href="compressive/index.html">压缩变压器</a></h2>
-<p>这是一种压缩变压器的实现，它通过压缩最古老的存储<a href="xl/index.html">器来延长注意力跨度，从而在Transformer XL</a> 上扩展。</p>
+<h2><a href="xl/index.html">Transformer XL</a></h2>
+<p>这是使用<a href="xl/relative_mha.html">相对多头注意力</a>的 Transformer XL 模型的实现。</p>
+<h2><a href="rope/index.html">旋转式位置编码</a></h2>
+<p>这是旋转式位置编码（ ROPE ）的实现。</p>
+<h2><a href="alibi/index.html">线性偏差注意力</a></h2>
+<p>这是线性偏差注意力（ ALIBI ）的实现。</p>
+<h2><a href="retro/index.html">RETRO</a></h2>
+<p>这是对检索增强 Transformer （ RETRO ）的实现。</p>
+<h2><a href="compressive/index.html">压缩 Transformer</a></h2>
+<p>这是一个压缩transformer的实现，它在<a href="xl/index.html">Transformer XL</a> 的基础上，通过压缩最早期的记忆来延长注意力跨度。</p>
 <h2><a href="gpt/index.html">GPT 架构</a></h2>
-<p>这是 GPT-2 体系结构的实现。</p>
+<p>这是 GPT-2 结构的实现。</p>
 <h2><a href="glu_variants/simple.html">GLU 变体</a></h2>
-<p>这是论文 <a href="https://arxiv.org/abs/2002.05202">GLU 变体改进变压器的</a>实现。</p>
-<h2><a href="knn/index.html">knn-lm</a></h2>
-<p>这是论文《<a href="https://arxiv.org/abs/1911.00172">通过记忆推广：最近邻语言模型</a>》的实现。</p>
-<h2><a href="feedback/index.html">反馈变压器</a></h2>
-<p>这是一篇论文《使用<a href="https://arxiv.org/abs/2002.09402">反馈存储器访问顺序变压器中的更高层次表示》的</a>实现。</p>
-<h2><a href="switch/index.html">开关变压器</a></h2>
-<p>这是论文《<a href="https://arxiv.org/abs/2101.03961">开关变压器：以简单高效的稀疏度缩放到万亿参数模型</a>》的微型实现。我们的实现只有几百万个参数，不对并行分布式训练进行建模。它进行单个 GPU 训练，但我们实现了白皮书中描述的切换概念。</p>
-<h2><a href="fast_weights/index.html">快速重量变压器</a></h2>
-<p>这是 <a href="https://arxiv.org/abs/2102.11174">PyTorch 中线性变压器是秘密的快速重量存储系统论文的</a>实现。</p>
-<h2><a href="fnet/index.html">FNet：将令牌与傅里叶变换混合</a></h2>
-<p>这是论文《<a href="https://arxiv.org/abs/2105.03824">FNet：将令牌与傅里叶变换混合</a>》的实现。</p>
-<h2><a href="aft/index.html">免注意变压器</a></h2>
-<p>这是论文《<a href="https://arxiv.org/abs/2105.14103">无注意力变压器》的</a>实现。</p>
-<h2><a href="mlm/index.html">屏蔽语言模型</a></h2>
-<p>这是在论文《B <a href="https://arxiv.org/abs/1810.04805">ERT：用于语言理解的深度双向变换器的预训练》中用于预训练的蒙面语言模型的</a>实现。</p>
-<h2><a href="mlp_mixer/index.html">MLP 混音器：面向视觉的全 MLP 架构</a></h2>
-<p>这是论文 <a href="https://arxiv.org/abs/2105.01601">MLP-Mixer：视觉的全 MLP 架构的</a>实现。</p>
-<h2><a href="gmlp/index.html">注意 MLP (gMLP)</a></h2>
-<p>这是 “<a href="https://arxiv.org/abs/2105.08050">注意 MLP” 一文的</a>实现。</p>
-<h2><a href="vit/index.html">视觉变压器 (ViT)</a></h2>
-<p>这是论文《<a href="https://arxiv.org/abs/2010.11929">图像值得 16x16 Words：大规模图像识别的变形金刚》的</a>实现。</p>
+<p>这是论文 <a href="https://arxiv.org/abs/2002.05202">《 GLU Variants Improve Transformer 》</a>的实现。</p>
+<h2><a href="knn/index.html">kNN-LM</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/1911.00172">《 Generalization through Memorization: Nearest Neighbor Language Models 》</a>的实现。</p>
+<h2><a href="feedback/index.html">自反馈 Transformer</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/2002.09402">《 Accessing Higher-level Representations in Sequential Transformers with Feedback Memory 》</a>的实现。</p>
+<h2><a href="switch/index.html">Switch Transformer</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/2101.03961">《 Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity 》</a>的一个简化实现。我们的实现仅包含几百万个参数，并且只在单 GPU 上进行训练，不涉及并行分布式训练，但我们仍然实现了论文中描述的 Switch 概念。</p>
+<h2><a href="fast_weights/index.html">快速权重 Transformer</a></h2>
+<p>这是论文 <a href="https://arxiv.org/abs/2102.11174">《 Linear Transformers Are Secretly Fast Weight Memory Systems in PyTorch 》</a>的实现。</p>
+<h2><a href="fnet/index.html">Fnet：使用傅里叶变换混合 token </a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/2105.03824">《 FNet: Mixing Tokens with Fourier Transforms 》</a>的实现。</p>
+<h2><a href="aft/index.html">无注意力 Transformer</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/2105.14103">《 An Attention Free Transformer 》</a>的实现。</p>
+<h2><a href="mlm/index.html">掩码语言模型</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/1810.04805">《 BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 》</a>中用于预训练的掩码语言模型的实现</p>
+<h2><a href="mlp_mixer/index.html">MLP-Mixer：一种用于视觉的全 MLP 架构</a></h2>
+<p>这是论文 <a href="https://arxiv.org/abs/2105.01601">《 MLP-Mixer: An all-MLP Architecture for Vision 》</a>的实现。</p>
+<h2><a href="gmlp/index.html">门控多层感知器 (gMLP)</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/2105.08050">《 Pay Attention to MLPs 》</a>的实现。</p>
+<h2><a href="vit/index.html">视觉 Transformer (ViT)</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/2010.11929">《 An Image Is Worth 16x16 Words: Transformers For Image Recognition At Scale 》</a>的实现。</p>
 <h2><a href="primer_ez/index.html">Primer</a></h2>
-<p>这是论文《入<a href="https://arxiv.org/abs/2109.08668">门：为语言建模寻找高效的变换器》的</a>实现。</p>
-<h2><a href="hour_glass/index.html">沙漏</a></h2>
-<p>这是论文《<a href="https://arxiv.org/abs/2110.13711">分层变换器是更有效的语言模型</a>》的实现</p>
+<p>这是论文<a href="https://arxiv.org/abs/2109.08668">《 Primer: Searching for Efficient Transformers for Language Modeling 》</a>的实现。</p>
+<h2><a href="hour_glass/index.html">沙漏网络</a></h2>
+<p>这是论文<a href="https://arxiv.org/abs/2110.13711">《 Hierarchical Transformers Are More Efficient Language Models 》</a>的实现</p>
 
         </div>
         <div class='code'>
 
@@ -3,12 +3,12 @@
 <head>
     <meta http-equiv="content-type" content="text/html;charset=utf-8"/>
     <meta name="viewport" content="width=device-width, initial-scale=1.0"/>
-    <meta name="description" content="这是标签平滑损失的实现，可以用作交叉熵损失的替代方案，以提高准确性。"/>
+    <meta name="description" content="这是标签平滑损失的实现，可作为交叉熵损失的替代品以提高准确性。"/>
 
     <meta name="twitter:card" content="summary"/>
     <meta name="twitter:image:src" content="https://avatars1.githubusercontent.com/u/64068543?s=400&amp;v=4"/>
     <meta name="twitter:title" content="标签平滑损失"/>
-    <meta name="twitter:description" content="这是标签平滑损失的实现，可以用作交叉熵损失的替代方案，以提高准确性。"/>
+    <meta name="twitter:description" content="这是标签平滑损失的实现，可作为交叉熵损失的替代品以提高准确性。"/>
     <meta name="twitter:site" content="@labmlai"/>
     <meta name="twitter:creator" content="@labmlai"/>
 
@@ -18,7 +18,7 @@
     <meta property="og:site_name" content="标签平滑损失"/>
     <meta property="og:type" content="object"/>
     <meta property="og:title" content="标签平滑损失"/>
-    <meta property="og:description" content="这是标签平滑损失的实现，可以用作交叉熵损失的替代方案，以提高准确性。"/>
+    <meta property="og:description" content="这是标签平滑损失的实现，可作为交叉熵损失的替代品以提高准确性。"/>
 
     <title>标签平滑损失</title>
     <link rel="shortcut icon" href="/icon.png"/>
@@ -154,7 +154,7 @@ <h1>标签平滑损失</h1>
             <div class='section-link'>
                 <a href='#section-5'>#</a>
             </div>
-            <p>显示系统预期的目标分布。</p>
+            <p>展示系统期望的目标分布。</p>
 
         </div>
         <div class='code'>
@@ -183,7 +183,7 @@ <h1>标签平滑损失</h1>
             <div class='section-link'>
                 <a href='#section-7'>#</a>
             </div>
-            <p>打印（预测）</p>
+            <p>输出（预测）</p>
 
         </div>
         <div class='code'>