Commit 0f01dc9
2 files changed
+5
-7
lines changedSubmodule tokenizers updated 20 files
- .github/workflows/pull.yml+7
- CMakeLists.txt-24
- MANIFEST.in-39
- README.md-7
- include/pytorch/tokenizers/bpe_tokenizer_base.h+1-2
- include/pytorch/tokenizers/error.h+75-6
- setup.py+5-42
- src/bpe_tokenizer_base.cpp+2-5
- src/hf_tokenizer.cpp+49-119
- src/llama2c_tokenizer.cpp+1-3
- src/normalizer.cpp+2-7
- src/pre_tokenizer.cpp+2-13
- src/re2_regex.cpp+2-3
- src/tekken.cpp+7-27
- src/tiktoken.cpp+7-31
- test/resources/hf_tokenizer_dir/special_tokens_map.json-16
- test/resources/hf_tokenizer_dir/tokenizer.json-152
- test/resources/hf_tokenizer_dir/tokenizer_config.json-42
- test/test_hf_tokenizer.cpp-15
- test/test_hf_tokenizer.py-20
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
465 | 465 | | |
466 | 466 | | |
467 | 467 | | |
468 | | - | |
469 | | - | |
470 | | - | |
471 | | - | |
| 468 | + | |
472 | 469 | | |
473 | 470 | | |
474 | 471 | | |
| 472 | + | |
475 | 473 | | |
476 | 474 | | |
477 | 475 | | |
| |||
537 | 535 | | |
538 | 536 | | |
539 | 537 | | |
540 | | - | |
541 | | - | |
| 538 | + | |
| 539 | + | |
542 | 540 | | |
543 | 541 | | |
544 | 542 | | |
| |||
0 commit comments