Skip to content

Commit b313916

Browse files
authored
Merge pull request #74 from VinciGit00/llama3
add integration for llama3
2 parents b683854 + 8aa2cad commit b313916

File tree

10 files changed

+224
-11
lines changed

10 files changed

+224
-11
lines changed

examples/benchmarks/GenerateScraper/Readme.md

Lines changed: 7 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,17 +9,15 @@ The time is measured in seconds
99

1010
The model runned for this benchmark is Mistral on Ollama with nomic-embed-text
1111

12-
| Hardware | Example 1 | Example 2 |
13-
| ------------------ | --------- | --------- |
14-
| Macbook 14' m1 pro | 30.54 | 35.76 |
15-
| Macbook m2 max | | |
12+
| Hardware | Model | Example 1 | Example 2 |
13+
| ---------------------- | --------------------------------------- | --------- | --------- |
14+
| Macbook 14' m1 pro | Mistral on Ollama with nomic-embed-text | 30.54s | 35.76s |
15+
| Macbook m2 max | Mistral on Ollama with nomic-embed-text | | |
16+
| Macbook 14' m1 pro<br> | Llama3 on Ollama with nomic-embed-text | 27.82s | 29.986s |
17+
| Macbook m2 max<br> | Llama3 on Ollama with nomic-embed-text | | |
1618

1719

18-
**Note**: the examples on Docker are not runned on other devices than the Macbook because the performance are to slow (10 times slower than Ollama). Indeed the results are the following:
19-
20-
| Hardware | Example 1 | Example 2 |
21-
| ------------------ | --------- | --------- |
22-
| Macbook 14' m1 pro | | |
20+
**Note**: the examples on Docker are not runned on other devices than the Macbook because the performance are to slow (10 times slower than Ollama).
2321
# Performance on APIs services
2422
### Example 1: personal portfolio
2523
**URL**: https://perinim.github.io/projects
Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper from text
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
from scrapegraphai.graphs import ScriptCreatorGraph
8+
from scrapegraphai.utils import prettify_exec_info
9+
load_dotenv()
10+
11+
# ************************************************
12+
# Read the text file
13+
# ************************************************
14+
files = ["inputs/example_1.txt", "inputs/example_2.txt"]
15+
tasks = ["List me all the projects with their description.",
16+
"List me all the articles with their description."]
17+
18+
# ************************************************
19+
# Define the configuration for the graph
20+
# ************************************************
21+
22+
openai_key = os.getenv("GPT4_KEY")
23+
24+
25+
graph_config = {
26+
"llm": {
27+
"model": "ollama/llama3",
28+
"temperature": 0,
29+
# "model_tokens": 2000, # set context length arbitrarily,
30+
"base_url": "http://localhost:11434", # set ollama URL arbitrarily
31+
},
32+
"embeddings": {
33+
"model": "ollama/nomic-embed-text",
34+
"temperature": 0,
35+
"base_url": "http://localhost:11434", # set ollama URL arbitrarily
36+
},
37+
"library": "beautifoulsoup"
38+
}
39+
40+
41+
# ************************************************
42+
# Create the SmartScraperGraph instance and run it
43+
# ************************************************
44+
45+
for i in range(0, 2):
46+
with open(files[i], 'r', encoding="utf-8") as file:
47+
text = file.read()
48+
49+
smart_scraper_graph = ScriptCreatorGraph(
50+
prompt=tasks[i],
51+
source=text,
52+
config=graph_config
53+
)
54+
55+
result = smart_scraper_graph.run()
56+
print(result)
57+
# ************************************************
58+
# Get graph execution info
59+
# ************************************************
60+
61+
graph_exec_info = smart_scraper_graph.get_execution_info()
62+
print(prettify_exec_info(graph_exec_info))
Lines changed: 39 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,39 @@
1-
This folder contains all the scripts used for benchmarks
2-
Remember if you use openai to set the keys or if you use Ollama/Docker to set the setup
1+
# Local models
2+
The two websites benchmark are:
3+
- Example 1: https://perinim.github.io/projects
4+
- Example 2: https://www.wired.com (at 17/4/2024)
5+
6+
Both are strored locally as txt file in .txt format because in this way we do not have to think about the internet connection
7+
8+
| Hardware | Moodel | Example 1 | Example 2 |
9+
| ------------------ | --------------------------------------- | --------- | --------- |
10+
| Macbook 14' m1 pro | Mistral on Ollama with nomic-embed-text | 11.60s | 26.61s |
11+
| Macbook m2 max | Mistral on Ollama with nomic-embed-text | 8.05s | 12.17s |
12+
| Macbook 14' m1 pro | Llama3 on Ollama with nomic-embed-text | 29.871 | 35.32 |
13+
| Macbook m2 max | Llama3 on Ollama with nomic-embed-text | | |
14+
15+
16+
**Note**: the examples on Docker are not runned on other devices than the Macbook because the performance are to slow (10 times slower than Ollama). Indeed the results are the following:
17+
18+
| Hardware | Example 1 | Example 2 |
19+
| ------------------ | --------- | --------- |
20+
| Macbook 14' m1 pro | 139.89 | Too long |
21+
# Performance on APIs services
22+
### Example 1: personal portfolio
23+
**URL**: https://perinim.github.io/projects
24+
**Task**: List me all the projects with their description.
25+
26+
| Name | Execution time (seconds) | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
27+
| ------------------- | ------------------------ | ------------ | ------------- | ----------------- | ------------------- | -------------- |
28+
| gpt-3.5-turbo | 25.22 | 445 | 272 | 173 | 1 | 0.000754 |
29+
| gpt-4-turbo-preview | 9.53 | 449 | 272 | 177 | 1 | 0.00803 |
30+
31+
### Example 2: Wired
32+
**URL**: https://www.wired.com
33+
**Task**: List me all the articles with their description.
34+
35+
| Name | Execution time (seconds) | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
36+
| ------------------- | ------------------------ | ------------ | ------------- | ----------------- | ------------------- | -------------- |
37+
| gpt-3.5-turbo | 25.89 | 445 | 272 | 173 | 1 | 0.000754 |
38+
| gpt-4-turbo-preview | 64.70 | 3573 | 2199 | 1374 | 1 | 0.06321 |
39+
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper from text
3+
"""
4+
5+
import os
6+
from scrapegraphai.graphs import SmartScraperGraph
7+
from scrapegraphai.utils import prettify_exec_info
8+
9+
files = ["inputs/example_1.txt", "inputs/example_2.txt"]
10+
tasks = ["List me all the projects with their description.",
11+
"List me all the articles with their description."]
12+
13+
14+
# ************************************************
15+
# Define the configuration for the graph
16+
# ************************************************
17+
18+
graph_config = {
19+
"llm": {
20+
"model": "ollama/llama3",
21+
"temperature": 0,
22+
"format": "json", # Ollama needs the format to be specified explicitly
23+
# "model_tokens": 2000, # set context length arbitrarily
24+
"base_url": "http://localhost:11434",
25+
},
26+
"embeddings": {
27+
"model": "ollama/nomic-embed-text",
28+
"temperature": 0,
29+
"base_url": "http://localhost:11434",
30+
}
31+
}
32+
33+
# ************************************************
34+
# Create the SmartScraperGraph instance and run it
35+
# ************************************************
36+
37+
for i in range(0, 2):
38+
with open(files[i], 'r', encoding="utf-8") as file:
39+
text = file.read()
40+
41+
smart_scraper_graph = SmartScraperGraph(
42+
prompt=tasks[i],
43+
source=text,
44+
config=graph_config
45+
)
46+
47+
result = smart_scraper_graph.run()
48+
print(result)
49+
# ************************************************
50+
# Get graph execution info
51+
# ************************************************
52+
53+
graph_exec_info = smart_scraper_graph.get_execution_info()
54+
print(prettify_exec_info(graph_exec_info))

scrapegraphai/helpers/models_tokens.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323

2424
"ollama": {
2525
"llama2": 4096,
26+
"llama3": 8192,
2627
"mistral": 8192,
2728
"codellama": 16000,
2829
"dolphin-mixtral": 32000,

tests/Readme.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,8 @@
33
Regarding the tests for the folder graphs and nodes it was created a specific repo as a example
44
([link of the repo](https://github.com/VinciGit00/Scrapegrah-ai-website-for-tests)). The test website is hosted [here](https://scrapegrah-ai-website-for-tests.onrender.com).
55
Remember to activating Ollama and having installed the LLM on your pc
6+
7+
For running the tests run the command:
8+
```python
9+
pytests
10+
```
Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
"""
2+
Module for the tests
3+
"""
4+
import os
5+
import pytest
6+
from scrapegraphai.graphs import SmartScraperGraph
7+
8+
9+
@pytest.fixture
10+
def sample_text():
11+
"""
12+
Example of text
13+
"""
14+
file_name = "inputs/plain_html_example.txt"
15+
curr_dir = os.path.dirname(os.path.realpath(__file__))
16+
file_path = os.path.join(curr_dir, file_name)
17+
18+
with open(file_path, 'r', encoding="utf-8") as file:
19+
text = file.read()
20+
21+
return text
22+
23+
24+
@pytest.fixture
25+
def graph_config():
26+
"""
27+
Configuration of the graph
28+
"""
29+
return {
30+
"llm": {
31+
"model": "ollama/llama3",
32+
"temperature": 0,
33+
"format": "json",
34+
"base_url": "http://localhost:11434",
35+
},
36+
"embeddings": {
37+
"model": "ollama/nomic-embed-text",
38+
"temperature": 0,
39+
"base_url": "http://localhost:11434",
40+
}
41+
}
42+
43+
44+
def test_scraping_pipeline(sample_text: str, graph_config: dict):
45+
"""
46+
Start of the scraping pipeline
47+
"""
48+
smart_scraper_graph = SmartScraperGraph(
49+
prompt="List me all the news with their description.",
50+
source=sample_text,
51+
config=graph_config
52+
)
53+
54+
result = smart_scraper_graph.run()
55+
56+
assert result is not None
File renamed without changes.

0 commit comments

Comments
 (0)