Skip to content

Commit abde8f5

Browse files
committed
finish bpe series
1 parent f074c48 commit abde8f5

24 files changed

+1854
-12
lines changed

_posts/2025-09-05-bpe-trainer-0.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -298,4 +298,17 @@ gunzip owt_valid.txt.gz
298298
cd ..
299299
```
300300

301-
301+
## 本系列全部文章
302+
303+
* [第0部分:简介](/2025/09/05/bpe-trainer-0/) 介绍bpe训练的基本算法和相关任务,并且介绍开发环境。
304+
* [第1部分:最简单实现](/2025/09/07/bpe-trainer-1/) bpe训练最简单的实现。
305+
* [第2部分:优化算法](/2025/09/08/bpe-trainer-2/) 实现pair_counts的增量更新。
306+
* [第3部分:并行分词和统计词频](/2025/09/09/bpe-trainer-3/) 使用multiprocessing实现多进程并行算法。
307+
* [第4部分:一次失败的并行优化](/2025/09/10/bpe-trainer-4/) 尝试用多进程并行计算max pair。
308+
* [第5部分:用C++实现Merge算法](/2025/09/12/bpe-trainer-5/) 用C++实现和Python等价的merge算法,并且比较std::unordered_map的两种遍历方式。
309+
* [第6部分:用OpenMP实现并行求最大](/2025/09/15/bpe-trainer-6/) 用OpenMP并行求pair_counts里最大pair。
310+
* [第7部分:使用flat hashmap替代std::unordered_map](/2025/09/18/bpe-trainer-7/) 使用flat hashmap来替代std::unordered_map。
311+
* [第8部分:实现细粒度更新](/2025/09/19/bpe-trainer-8/) 使用倒排索引实现pair_counts的细粒度更新算法。
312+
* [第9部分:使用堆来寻找最大pair](/2025/09/21/bpe-trainer-9/) 使用堆来求最大pair,提升性能。
313+
* [第10部分:使用cython和pypy来加速](/2025/09/24/bpe-trainer-10/) 使用cython和pypy来加速python代码。
314+
* [第11部分:使用cython封装c++代码](/2025/09/25/bpe-trainer-11/) 使用cython封装c++代码。

_posts/2025-09-05-bpe-trainer-0_en.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,4 +297,17 @@ gunzip owt_valid.txt.gz
297297
cd ..
298298
```
299299

300-
300+
## Full Series
301+
302+
* [Part 0: Introduction](/2025/09/05/bpe-trainer-0_en/) Introduces the basic BPE training algorithm and related tasks, as well as the development environment.
303+
* [Part 1: The Simplest Implementation](/2025/09/07/bpe-trainer-1_en/) The simplest implementation of BPE training.
304+
* [Part 2: Optimized Algorithm](/2025/09/08/bpe-trainer-2_en/) Implements incremental updates for pair\_counts.
305+
* [Part 3: Parallel Tokenization and Frequency Counting](/2025/09/09/bpe-trainer-3_en/) Uses multiprocessing to implement a multi-process parallel algorithm.
306+
* [Part 4: A Failed Parallel Optimization](/2025/09/10/bpe-trainer-4_en/) An attempt to parallelize the max pair calculation using multiple processes.
307+
* [Part 5: Implementing the Merge Algorithm in C++](/2025/09/12/bpe-trainer-5_en/) Implements a C++ merge algorithm equivalent to the Python version, and compares two ways of iterating through std::unordered\_map.
308+
* [Part 6: Parallelizing the Max Pair Search with OpenMP](/2025/09/15/bpe-trainer-6_en/) Uses OpenMP to find the max pair in pair\_counts in parallel.
309+
* [Part 7: Using Flat Hashmap to Replace std::unordered\_map](/2025/09/18/bpe-trainer-7_en/) Uses flat hashmap to replace std::unordered\_map.
310+
* [Part 8: Implementing Fine-Grained Updates](/2025/09/19/bpe-trainer-8_en/) Implements a fine-grained update algorithm for pair\_counts using an inverted index.
311+
* [Part 9: Using a Heap to Find the Max Pair](/2025/09/21/bpe-trainer-9_en/) Uses a heap to find the max pair and improve performance.
312+
* [Part 10: Using Cython and PyPy for Acceleration](/2025/09/24/bpe-trainer-10_en/) Uses Cython and PyPy to accelerate Python code.
313+
* [Part 11: Wrapping C++ Code with Cython](/2025/09/25/bpe-trainer-11_en/) Wraps C++ code using Cython.

_posts/2025-09-07-bpe-trainer-1.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -403,4 +403,17 @@ total train time: 2187.20 seconds
403403
版本 | 数据 | 总时间(s) | 统计词频时间(s) | 合并时间(s) | 其它
404404
bpe_v1_time | tinystory |2187/2264/2247|622/642/628| count_pair:944/995/984 max:167/173/174/174 update:453/453/460 |
405405

406-
406+
## 本系列全部文章
407+
408+
* [第0部分:简介](/2025/09/05/bpe-trainer-0/) 介绍bpe训练的基本算法和相关任务,并且介绍开发环境。
409+
* [第1部分:最简单实现](/2025/09/07/bpe-trainer-1/) bpe训练最简单的实现。
410+
* [第2部分:优化算法](/2025/09/08/bpe-trainer-2/) 实现pair_counts的增量更新。
411+
* [第3部分:并行分词和统计词频](/2025/09/09/bpe-trainer-3/) 使用multiprocessing实现多进程并行算法。
412+
* [第4部分:一次失败的并行优化](/2025/09/10/bpe-trainer-4/) 尝试用多进程并行计算max pair。
413+
* [第5部分:用C++实现Merge算法](/2025/09/12/bpe-trainer-5/) 用C++实现和Python等价的merge算法,并且比较std::unordered_map的两种遍历方式。
414+
* [第6部分:用OpenMP实现并行求最大](/2025/09/15/bpe-trainer-6/) 用OpenMP并行求pair_counts里最大pair。
415+
* [第7部分:使用flat hashmap替代std::unordered_map](/2025/09/18/bpe-trainer-7/) 使用flat hashmap来替代std::unordered_map。
416+
* [第8部分:实现细粒度更新](/2025/09/19/bpe-trainer-8/) 使用倒排索引实现pair_counts的细粒度更新算法。
417+
* [第9部分:使用堆来寻找最大pair](/2025/09/21/bpe-trainer-9/) 使用堆来求最大pair,提升性能。
418+
* [第10部分:使用cython和pypy来加速](/2025/09/24/bpe-trainer-10/) 使用cython和pypy来加速python代码。
419+
* [第11部分:使用cython封装c++代码](/2025/09/25/bpe-trainer-11/) 使用cython封装c++代码。

_posts/2025-09-07-bpe-trainer-1_en.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -395,3 +395,18 @@ For easy comparison, each article will summarize all test results up to the curr
395395
| Version | Data | Total Time (s) | Count Time (s) | Merge Time (s) | Other |
396396
|---|---|---|---|---|---|
397397
| bpe\_v1\_time | tinystory | 2187/2264/2247 | 622/642/628 | count\_pair: 944/995/984 <br> max: 167/173/174/174 <br> update: 453/453/460 | |
398+
399+
## Full Series
400+
401+
* [Part 0: Introduction](/2025/09/05/bpe-trainer-0_en/) Introduces the basic BPE training algorithm and related tasks, as well as the development environment.
402+
* [Part 1: The Simplest Implementation](/2025/09/07/bpe-trainer-1_en/) The simplest implementation of BPE training.
403+
* [Part 2: Optimized Algorithm](/2025/09/08/bpe-trainer-2_en/) Implements incremental updates for pair\_counts.
404+
* [Part 3: Parallel Tokenization and Frequency Counting](/2025/09/09/bpe-trainer-3_en/) Uses multiprocessing to implement a multi-process parallel algorithm.
405+
* [Part 4: A Failed Parallel Optimization](/2025/09/10/bpe-trainer-4_en/) An attempt to parallelize the max pair calculation using multiple processes.
406+
* [Part 5: Implementing the Merge Algorithm in C++](/2025/09/12/bpe-trainer-5_en/) Implements a C++ merge algorithm equivalent to the Python version, and compares two ways of iterating through std::unordered\_map.
407+
* [Part 6: Parallelizing the Max Pair Search with OpenMP](/2025/09/15/bpe-trainer-6_en/) Uses OpenMP to find the max pair in pair\_counts in parallel.
408+
* [Part 7: Using Flat Hashmap to Replace std::unordered\_map](/2025/09/18/bpe-trainer-7_en/) Uses flat hashmap to replace std::unordered\_map.
409+
* [Part 8: Implementing Fine-Grained Updates](/2025/09/19/bpe-trainer-8_en/) Implements a fine-grained update algorithm for pair\_counts using an inverted index.
410+
* [Part 9: Using a Heap to Find the Max Pair](/2025/09/21/bpe-trainer-9_en/) Uses a heap to find the max pair and improve performance.
411+
* [Part 10: Using Cython and PyPy for Acceleration](/2025/09/24/bpe-trainer-10_en/) Uses Cython and PyPy to accelerate Python code.
412+
* [Part 11: Wrapping C++ Code with Cython](/2025/09/25/bpe-trainer-11_en/) Wraps C++ code using Cython.

_posts/2025-09-08-bpe-trainer-2.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -504,3 +504,19 @@ total train time: 35358.17 seconds
504504
bpe_v1_time | tinystory |2187/2264/2247|622/642/628| count_pair:944/995/984 max:167/173/174/174 update:453/453/460 |
505505
bpe_v2_time | tinystory |757/738/746|639/621/627| 118/117/118 |
506506
bpe_v2_time | openweb |35358/34265/35687|2870/2949/2930| 32437/31264/32708 |
507+
508+
509+
## 本系列全部文章
510+
511+
* [第0部分:简介](/2025/09/05/bpe-trainer-0/) 介绍bpe训练的基本算法和相关任务,并且介绍开发环境。
512+
* [第1部分:最简单实现](/2025/09/07/bpe-trainer-1/) bpe训练最简单的实现。
513+
* [第2部分:优化算法](/2025/09/08/bpe-trainer-2/) 实现pair_counts的增量更新。
514+
* [第3部分:并行分词和统计词频](/2025/09/09/bpe-trainer-3/) 使用multiprocessing实现多进程并行算法。
515+
* [第4部分:一次失败的并行优化](/2025/09/10/bpe-trainer-4/) 尝试用多进程并行计算max pair。
516+
* [第5部分:用C++实现Merge算法](/2025/09/12/bpe-trainer-5/) 用C++实现和Python等价的merge算法,并且比较std::unordered_map的两种遍历方式。
517+
* [第6部分:用OpenMP实现并行求最大](/2025/09/15/bpe-trainer-6/) 用OpenMP并行求pair_counts里最大pair。
518+
* [第7部分:使用flat hashmap替代std::unordered_map](/2025/09/18/bpe-trainer-7/) 使用flat hashmap来替代std::unordered_map。
519+
* [第8部分:实现细粒度更新](/2025/09/19/bpe-trainer-8/) 使用倒排索引实现pair_counts的细粒度更新算法。
520+
* [第9部分:使用堆来寻找最大pair](/2025/09/21/bpe-trainer-9/) 使用堆来求最大pair,提升性能。
521+
* [第10部分:使用cython和pypy来加速](/2025/09/24/bpe-trainer-10/) 使用cython和pypy来加速python代码。
522+
* [第11部分:使用cython封装c++代码](/2025/09/25/bpe-trainer-11/) 使用cython封装c++代码。

_posts/2025-09-08-bpe-trainer-2_en.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -497,3 +497,18 @@ The total training time is 35,358 seconds. The `_pretokenize_and_count` step too
497497
| bpe\_v2\_time | tinystory | 757/738/746 | 639/621/627 | 118/117/118 | |
498498
| bpe\_v2\_time | openweb | 35358/34265/35687 | 2870/2949/2930 | 32437/31264/32708 | |
499499

500+
501+
## Full Series
502+
503+
* [Part 0: Introduction](/2025/09/05/bpe-trainer-0_en/) Introduces the basic BPE training algorithm and related tasks, as well as the development environment.
504+
* [Part 1: The Simplest Implementation](/2025/09/07/bpe-trainer-1_en/) The simplest implementation of BPE training.
505+
* [Part 2: Optimized Algorithm](/2025/09/08/bpe-trainer-2_en/) Implements incremental updates for pair\_counts.
506+
* [Part 3: Parallel Tokenization and Frequency Counting](/2025/09/09/bpe-trainer-3_en/) Uses multiprocessing to implement a multi-process parallel algorithm.
507+
* [Part 4: A Failed Parallel Optimization](/2025/09/10/bpe-trainer-4_en/) An attempt to parallelize the max pair calculation using multiple processes.
508+
* [Part 5: Implementing the Merge Algorithm in C++](/2025/09/12/bpe-trainer-5_en/) Implements a C++ merge algorithm equivalent to the Python version, and compares two ways of iterating through std::unordered\_map.
509+
* [Part 6: Parallelizing the Max Pair Search with OpenMP](/2025/09/15/bpe-trainer-6_en/) Uses OpenMP to find the max pair in pair\_counts in parallel.
510+
* [Part 7: Using Flat Hashmap to Replace std::unordered\_map](/2025/09/18/bpe-trainer-7_en/) Uses flat hashmap to replace std::unordered\_map.
511+
* [Part 8: Implementing Fine-Grained Updates](/2025/09/19/bpe-trainer-8_en/) Implements a fine-grained update algorithm for pair\_counts using an inverted index.
512+
* [Part 9: Using a Heap to Find the Max Pair](/2025/09/21/bpe-trainer-9_en/) Uses a heap to find the max pair and improve performance.
513+
* [Part 10: Using Cython and PyPy for Acceleration](/2025/09/24/bpe-trainer-10_en/) Uses Cython and PyPy to accelerate Python code.
514+
* [Part 11: Wrapping C++ Code with Cython](/2025/09/25/bpe-trainer-11_en/) Wraps C++ code using Cython.

_posts/2025-09-09-bpe-trainer-3.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -711,3 +711,19 @@ bpe_v3_time | openweb | |120/130/130| | num_counter=64, num_merger=4
711711
bpe_v3_bytes_time | openweb | |80/90/80| | num_counter=64, num_merger=8, chunk_size 1mb
712712
bpe_v3_bytes_time | openweb | |70/70/80| | num_counter=64, num_merger=8, chunk_size 4mb
713713
bpe_v3_bytes_time | openweb | |70/70/70| | num_counter=64, num_merger=8, chunk_size 8mb
714+
715+
716+
## 本系列全部文章
717+
718+
* [第0部分:简介](/2025/09/05/bpe-trainer-0/) 介绍bpe训练的基本算法和相关任务,并且介绍开发环境。
719+
* [第1部分:最简单实现](/2025/09/07/bpe-trainer-1/) bpe训练最简单的实现。
720+
* [第2部分:优化算法](/2025/09/08/bpe-trainer-2/) 实现pair_counts的增量更新。
721+
* [第3部分:并行分词和统计词频](/2025/09/09/bpe-trainer-3/) 使用multiprocessing实现多进程并行算法。
722+
* [第4部分:一次失败的并行优化](/2025/09/10/bpe-trainer-4/) 尝试用多进程并行计算max pair。
723+
* [第5部分:用C++实现Merge算法](/2025/09/12/bpe-trainer-5/) 用C++实现和Python等价的merge算法,并且比较std::unordered_map的两种遍历方式。
724+
* [第6部分:用OpenMP实现并行求最大](/2025/09/15/bpe-trainer-6/) 用OpenMP并行求pair_counts里最大pair。
725+
* [第7部分:使用flat hashmap替代std::unordered_map](/2025/09/18/bpe-trainer-7/) 使用flat hashmap来替代std::unordered_map。
726+
* [第8部分:实现细粒度更新](/2025/09/19/bpe-trainer-8/) 使用倒排索引实现pair_counts的细粒度更新算法。
727+
* [第9部分:使用堆来寻找最大pair](/2025/09/21/bpe-trainer-9/) 使用堆来求最大pair,提升性能。
728+
* [第10部分:使用cython和pypy来加速](/2025/09/24/bpe-trainer-10/) 使用cython和pypy来加速python代码。
729+
* [第11部分:使用cython封装c++代码](/2025/09/25/bpe-trainer-11/) 使用cython封装c++代码。

_posts/2025-09-09-bpe-trainer-3_en.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -698,3 +698,17 @@ bpe\_v3\_bytes\_time| openweb | | 80/90/80 |
698698
bpe\_v3\_bytes\_time| openweb | | 70/70/80 | | num\_counter=64, num\_merger=8, chunk\_size 4mb
699699
bpe\_v3\_bytes\_time| openweb | | 70/70/70 | | num\_counter=64, num\_merger=8, chunk\_size 8mb
700700

701+
## Full Series
702+
703+
* [Part 0: Introduction](/2025/09/05/bpe-trainer-0_en/) Introduces the basic BPE training algorithm and related tasks, as well as the development environment.
704+
* [Part 1: The Simplest Implementation](/2025/09/07/bpe-trainer-1_en/) The simplest implementation of BPE training.
705+
* [Part 2: Optimized Algorithm](/2025/09/08/bpe-trainer-2_en/) Implements incremental updates for pair\_counts.
706+
* [Part 3: Parallel Tokenization and Frequency Counting](/2025/09/09/bpe-trainer-3_en/) Uses multiprocessing to implement a multi-process parallel algorithm.
707+
* [Part 4: A Failed Parallel Optimization](/2025/09/10/bpe-trainer-4_en/) An attempt to parallelize the max pair calculation using multiple processes.
708+
* [Part 5: Implementing the Merge Algorithm in C++](/2025/09/12/bpe-trainer-5_en/) Implements a C++ merge algorithm equivalent to the Python version, and compares two ways of iterating through std::unordered\_map.
709+
* [Part 6: Parallelizing the Max Pair Search with OpenMP](/2025/09/15/bpe-trainer-6_en/) Uses OpenMP to find the max pair in pair\_counts in parallel.
710+
* [Part 7: Using Flat Hashmap to Replace std::unordered\_map](/2025/09/18/bpe-trainer-7_en/) Uses flat hashmap to replace std::unordered\_map.
711+
* [Part 8: Implementing Fine-Grained Updates](/2025/09/19/bpe-trainer-8_en/) Implements a fine-grained update algorithm for pair\_counts using an inverted index.
712+
* [Part 9: Using a Heap to Find the Max Pair](/2025/09/21/bpe-trainer-9_en/) Uses a heap to find the max pair and improve performance.
713+
* [Part 10: Using Cython and PyPy for Acceleration](/2025/09/24/bpe-trainer-10_en/) Uses Cython and PyPy to accelerate Python code.
714+
* [Part 11: Wrapping C++ Code with Cython](/2025/09/25/bpe-trainer-11_en/) Wraps C++ code using Cython.

_posts/2025-09-10-bpe-trainer-4.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -558,4 +558,17 @@ if __name__ == '__main__':
558558

559559
通过这一次探索,我们发现Python的multiprocessing对于这种I/O和CPU交叠的运算任务是不合适的。虽然使用多个CPU确实能够减少计算max的时间(compute_time),但是由于进程间通信的开销过大,反而得不偿失。这种场景应该是使用多线程来解决,因为同一个进程的多个线程可以共享内存,从而避免了进程间通信的开销。但是由于CPython本身GIL的限制,我们无法使用多线程来解决这个问题。所以下面的内容我们暂时转向C++,尝试使用多线程来并行化max函数。
560560

561-
561+
## 本系列全部文章
562+
563+
* [第0部分:简介](/2025/09/05/bpe-trainer-0/) 介绍bpe训练的基本算法和相关任务,并且介绍开发环境。
564+
* [第1部分:最简单实现](/2025/09/07/bpe-trainer-1/) bpe训练最简单的实现。
565+
* [第2部分:优化算法](/2025/09/08/bpe-trainer-2/) 实现pair_counts的增量更新。
566+
* [第3部分:并行分词和统计词频](/2025/09/09/bpe-trainer-3/) 使用multiprocessing实现多进程并行算法。
567+
* [第4部分:一次失败的并行优化](/2025/09/10/bpe-trainer-4/) 尝试用多进程并行计算max pair。
568+
* [第5部分:用C++实现Merge算法](/2025/09/12/bpe-trainer-5/) 用C++实现和Python等价的merge算法,并且比较std::unordered_map的两种遍历方式。
569+
* [第6部分:用OpenMP实现并行求最大](/2025/09/15/bpe-trainer-6/) 用OpenMP并行求pair_counts里最大pair。
570+
* [第7部分:使用flat hashmap替代std::unordered_map](/2025/09/18/bpe-trainer-7/) 使用flat hashmap来替代std::unordered_map。
571+
* [第8部分:实现细粒度更新](/2025/09/19/bpe-trainer-8/) 使用倒排索引实现pair_counts的细粒度更新算法。
572+
* [第9部分:使用堆来寻找最大pair](/2025/09/21/bpe-trainer-9/) 使用堆来求最大pair,提升性能。
573+
* [第10部分:使用cython和pypy来加速](/2025/09/24/bpe-trainer-10/) 使用cython和pypy来加速python代码。
574+
* [第11部分:使用cython封装c++代码](/2025/09/25/bpe-trainer-11/) 使用cython封装c++代码。

_posts/2025-09-10-bpe-trainer-4_en.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -547,3 +547,19 @@ With the `fork` method, the total time was 1.07s and the subprocess traversal ti
547547
This exploration shows that Python's `multiprocessing` is not suitable for tasks with frequent I/O and CPU overlap. While using multiple CPUs does reduce the `compute_time` for the `max` operation, the large overhead of inter-process communication makes it counterproductive. This type of problem is better suited for multithreading, where threads within the same process can share memory, thus avoiding IPC overhead. However, due to Python's Global Interpreter Lock (GIL), we cannot use multithreading to solve this.
548548

549549
Therefore, we will now turn to C++ to attempt to parallelize the `max` function using multithreading.
550+
551+
552+
## Full Series
553+
554+
* [Part 0: Introduction](/2025/09/05/bpe-trainer-0_en/) Introduces the basic BPE training algorithm and related tasks, as well as the development environment.
555+
* [Part 1: The Simplest Implementation](/2025/09/07/bpe-trainer-1_en/) The simplest implementation of BPE training.
556+
* [Part 2: Optimized Algorithm](/2025/09/08/bpe-trainer-2_en/) Implements incremental updates for pair\_counts.
557+
* [Part 3: Parallel Tokenization and Frequency Counting](/2025/09/09/bpe-trainer-3_en/) Uses multiprocessing to implement a multi-process parallel algorithm.
558+
* [Part 4: A Failed Parallel Optimization](/2025/09/10/bpe-trainer-4_en/) An attempt to parallelize the max pair calculation using multiple processes.
559+
* [Part 5: Implementing the Merge Algorithm in C++](/2025/09/12/bpe-trainer-5_en/) Implements a C++ merge algorithm equivalent to the Python version, and compares two ways of iterating through std::unordered\_map.
560+
* [Part 6: Parallelizing the Max Pair Search with OpenMP](/2025/09/15/bpe-trainer-6_en/) Uses OpenMP to find the max pair in pair\_counts in parallel.
561+
* [Part 7: Using Flat Hashmap to Replace std::unordered\_map](/2025/09/18/bpe-trainer-7_en/) Uses flat hashmap to replace std::unordered\_map.
562+
* [Part 8: Implementing Fine-Grained Updates](/2025/09/19/bpe-trainer-8_en/) Implements a fine-grained update algorithm for pair\_counts using an inverted index.
563+
* [Part 9: Using a Heap to Find the Max Pair](/2025/09/21/bpe-trainer-9_en/) Uses a heap to find the max pair and improve performance.
564+
* [Part 10: Using Cython and PyPy for Acceleration](/2025/09/24/bpe-trainer-10_en/) Uses Cython and PyPy to accelerate Python code.
565+
* [Part 11: Wrapping C++ Code with Cython](/2025/09/25/bpe-trainer-11_en/) Wraps C++ code using Cython.

0 commit comments

Comments
 (0)