Commit d291819
committed
Fix tokenizer loading for GPT2
Fixes #752
Fix the issue with loading the tokenizer for 'gpt2'.
* **scrapegraphai/utils/tokenizer.py**
- Add a check for `GPT2TokenizerFast` in the `num_tokens_calculus` function.
- Import `GPT2TokenizerFast` from `transformers`.
* **scrapegraphai/utils/tokenizers/tokenizer_ollama.py**
- Modify the `num_tokens_ollama` function to handle `GPT2TokenizerFast`.
* **tests/graphs/smart_scraper_ollama_test.py**
- Add a test case to verify the tokenizer loading for `GPT2TokenizerFast`.
---
For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/ScrapeGraphAI/Scrapegraph-ai/issues/752?shareId=XXXX-XXXX-XXXX-XXXX).1 parent 488821a commit d291819
File tree
3 files changed
+23
-1
lines changed- scrapegraphai/utils
- tokenizers
- tests/graphs
3 files changed
+23
-1
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
| 9 | + | |
9 | 10 | | |
10 | 11 | | |
11 | 12 | | |
| |||
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
26 | 34 | | |
27 | 35 | | |
28 | 36 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
21 | 22 | | |
22 | 23 | | |
23 | 24 | | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
24 | 30 | | |
25 | 31 | | |
26 | 32 | | |
27 | 33 | | |
28 | | - | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
6 | 7 | | |
7 | 8 | | |
8 | 9 | | |
| |||
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
0 commit comments