-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Describe the bug
The model_tokens parameter in the graph_config dictionary is not being applied to the Ollama model within the SmartScraperGraph setup. Despite setting model_tokens to 128000, the output still shows an error indicating that the token sequence length exceeds the model's limit (2231 > 1024), causing indexing errors.
To Reproduce
Steps to reproduce the behavior:
- Set up a
SmartScraperGraphusing the code below. - Configure the
graph_configdictionary, specifyingmodel_tokens: 128000under the"llm"section. - Run the scraper with
smart_scraper_graph.run(). - Observe the error regarding token sequence length.
Expected behavior
The model_tokens parameter should be applied to Ollama's model, ensuring that the model respects the 128000-token length specified without raising indexing errors.
Code
from scrapegraphai.graphs import SmartScraperGraph
ollama_base_url = 'http://localhost:11434'
graph_config = {
"llm": {
"model": "ollama/mistral",
"temperature": 1,
"format": "json",
'model_tokens': 128000,
"base_url": ollama_base_url
},
"embeddings": {
"model": "ollama/nomic-embed-text",
"base_url": ollama_base_url
},
}
smart_scraper_graph = SmartScraperGraph(
prompt='What is this website about?',
source="my.example.com",
config=graph_config
)
result = smart_scraper_graph.run()
print(result)Error Message
Token indices sequence length is longer than the specified maximum sequence length for this model (2231 > 1024). Running this sequence through the model will result in indexing errors.Desktop:
- OS: Ubuntu 22.04.5 LTS
- Browser: Chromium
- Version:
- Python 3.12.7
- scrapegraphai 1.26.7
- Torch 2.5 (Torch should not be necessary as Ollama is being used)
Additional context: Ollama typically uses the num_ctx parameter to set context length. It seems that model_tokens does not directly influence the model's context length, suggesting a possible oversight or misconfiguration in how the SmartScraperGraph handles token length parameters with Ollama models.
Thank you for taking the time to look into this issue! I appreciate any guidance or suggestions you can provide to help resolve this problem. Your assistance means a lot, and I'm looking forward to any insights you might have on how to apply the model_tokens parameter correctly with Ollama. Thanks again for your help!