Skip to content

Commit 40f80ae

Browse files
committed
update
1 parent c57a022 commit 40f80ae

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_posts/2025-09-07-bpe-trainer-1_en.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ tags:
1010
- bpe tokenizer
1111
---
1212

13-
This article series is a sub-project of Stanford's CS336 Assignment 1, focusing on implementing an efficient training algorithm for a BPE Tokenizer. Through a series of optimizations, we managed to reduce the training time on OpenWebText from over 10 hours to less than 10 minutes. This series explains that entire optimization process, covering: algorithm optimization, data structure optimization, parallelization with OpenMP, Cython optimization, and the implementation and Cython integration of key components in C++. This is the second article, covering the implementation of the simplest algorithm.
13+
This series of articles implements a subtask of Stanfords CS336 Assignment 1: building an efficient training algorithm for a BPE Tokenizer. Through a series of optimizations, our algorithm’s training time on OpenWebText was reduced from over 10 hours to less than 10 minutes. This series explains these optimizations, including algorithmic improvements, data structure enhancements, parallelization with OpenMP, Cython optimization, and implementing key code in C++ along with its integration via Cython. This is the second article, covering the implementation of the simplest algorithm.
1414

1515
<!--more-->
1616

0 commit comments

Comments
 (0)