Skip to content

Conversation

@yusefes
Copy link
Contributor

@yusefes yusefes commented Oct 17, 2024

Fixes #752

Fix the issue with loading the tokenizer for 'gpt2'.

  • scrapegraphai/utils/tokenizer.py

    • Add a check for GPT2TokenizerFast in the num_tokens_calculus function.
    • Import GPT2TokenizerFast from transformers.
  • scrapegraphai/utils/tokenizers/tokenizer_ollama.py

    • Modify the num_tokens_ollama function to handle GPT2TokenizerFast.
  • tests/graphs/smart_scraper_ollama_test.py

    • Add a test case to verify the tokenizer loading for GPT2TokenizerFast.

For more details, open the Copilot Workspace session.

Fixes #752

Fix the issue with loading the tokenizer for 'gpt2'.

* **scrapegraphai/utils/tokenizer.py**
  - Add a check for `GPT2TokenizerFast` in the `num_tokens_calculus` function.
  - Import `GPT2TokenizerFast` from `transformers`.

* **scrapegraphai/utils/tokenizers/tokenizer_ollama.py**
  - Modify the `num_tokens_ollama` function to handle `GPT2TokenizerFast`.

* **tests/graphs/smart_scraper_ollama_test.py**
  - Add a test case to verify the tokenizer loading for `GPT2TokenizerFast`.

---

For more details, open the [Copilot Workspace session](https://copilot-workspace.githubnext.com/ScrapeGraphAI/Scrapegraph-ai/issues/752?shareId=XXXX-XXXX-XXXX-XXXX).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for the test

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never seen before thank you

@VinciGit00 VinciGit00 merged commit bde1e0f into ScrapeGraphAI:main Oct 18, 2024
3 checks passed
@github-actions
Copy link

🎉 This PR is included in version 1.26.6 🎉

The release is available on:

Your semantic-release bot 📦🚀

@VinciGit00
Copy link
Collaborator

Hi
I think there Is a problem here, I do not wanna install tensowrlow or PyTorch btw
Screenshot 2024-10-18 alle 17 35 58

@github-actions
Copy link

🎉 This PR is included in version 1.27.0-beta.2 🎉

The release is available on:

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Can't load tokenizer for 'gpt2'

2 participants