Skip to content

Commit 5e47475

Browse files
committed
update
1 parent ae67a00 commit 5e47475

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

_posts/2025-09-05-bpe-trainer-0.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ tags:
1010
- bpe tokenizer
1111
---
1212

13-
本系列文章完成Stanford CS336作业1的一个子任务——实现BPE Tokenizer的高效训练算法。通过一系列优化,我们的算法在OpenWebText上的训练时间从最初的10多个小时优化到小于15分钟。本系列文章解释这一系列优化过程,包括:算法的优化,数据结构的优化,并行(openmp)优化,cython优化,用c++实现关键代码和c++库的cython集成等内容。本文是第一篇,内容包括这个任务的介绍,获取源代码和设置开发环境。
13+
本系列文章完成Stanford CS336作业1的一个子任务——实现BPE Tokenizer的高效训练算法。通过一系列优化,我们的算法在OpenWebText上的训练时间从最初的10多个小时优化到小于10分钟。本系列文章解释这一系列优化过程,包括:算法的优化,数据结构的优化,并行(openmp)优化,cython优化,用c++实现关键代码和c++库的cython集成等内容。本文是第一篇,内容包括这个任务的介绍,获取源代码和设置开发环境。
1414

1515
<!--more-->
1616

_posts/2025-09-05-bpe-trainer-0_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ tags:
1010
- bpe tokenizer
1111
---
1212

13-
This series of articles implements a subtask of Stanford's CS336 Assignment 1: building an efficient training algorithm for a BPE Tokenizer. Through a series of optimizations, our algorithm's training time on OpenWebText was reduced from over 10 hours to less than 15 minutes. This series explains these optimizations, including algorithmic improvements, data structure enhancements, parallelization with OpenMP, Cython optimization, and implementing key code in C++ along with its integration via Cython. This first article covers the task's introduction, how to get the source code, and how to set up the development environment.
13+
This series of articles implements a subtask of Stanford's CS336 Assignment 1: building an efficient training algorithm for a BPE Tokenizer. Through a series of optimizations, our algorithm's training time on OpenWebText was reduced from over 10 hours to less than 10 minutes. This series explains these optimizations, including algorithmic improvements, data structure enhancements, parallelization with OpenMP, Cython optimization, and implementing key code in C++ along with its integration via Cython. This first article covers the task's introduction, how to get the source code, and how to set up the development environment.
1414

1515
<!--more-->
1616

0 commit comments

Comments
 (0)