You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* added verbose flag to suppress print statements ([2dd7817](https://github.com/VinciGit00/Scrapegraph-ai/commit/2dd7817cfb37cfbeb7e65b3a24655ab238f48026))
Copy file name to clipboardExpand all lines: README.md
+37-1Lines changed: 37 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,6 +23,10 @@ The reference page for Scrapegraph-ai is available on the official page of pypy:
23
23
```bash
24
24
pip install scrapegraphai
25
25
```
26
+
you will also need to install Playwright for javascript-based scraping:
27
+
```bash
28
+
playwright install
29
+
```
26
30
## 🔍 Demo
27
31
Official streamlit demo:
28
32
@@ -46,6 +50,7 @@ You can use the `SmartScraper` class to extract information from a website using
46
50
The `SmartScraper` class is a direct graph implementation that uses the most common nodes present in a web scraping pipeline. For more information, please see the [documentation](https://scrapegraph-ai.readthedocs.io/en/latest/).
47
51
### Case 1: Extracting information using Ollama
48
52
Remember to download the model on Ollama separately!
53
+
49
54
```python
50
55
from scrapegraphai.graphs import SmartScraperGraph
51
56
@@ -129,7 +134,38 @@ result = smart_scraper_graph.run()
129
134
print(result)
130
135
```
131
136
132
-
### Case 4: Extracting information using Gemini
137
+
### Case 4: Extracting information using Groq
138
+
```python
139
+
from scrapegraphai.graphs import SmartScraperGraph
140
+
from scrapegraphai.utils import prettify_exec_info
141
+
142
+
groq_key = os.getenv("GROQ_APIKEY")
143
+
144
+
graph_config = {
145
+
"llm": {
146
+
"model": "groq/gemma-7b-it",
147
+
"api_key": groq_key,
148
+
"temperature": 0
149
+
},
150
+
"embeddings": {
151
+
"model": "ollama/nomic-embed-text",
152
+
"temperature": 0,
153
+
"base_url": "http://localhost:11434",
154
+
},
155
+
"headless": False
156
+
}
157
+
158
+
smart_scraper_graph = SmartScraperGraph(
159
+
prompt="List me all the projects with their description and the author.",
160
+
source="https://perinim.github.io/projects",
161
+
config=graph_config
162
+
)
163
+
164
+
result = smart_scraper_graph.run()
165
+
print(result)
166
+
```
167
+
168
+
### Case 5: Extracting information using Gemini
133
169
```python
134
170
from scrapegraphai.graphs import SmartScraperGraph
| Macbook 14' m1 pro | Mistral on Ollama with nomic-embed-text | 11.60s | 26.61s |
13
11
| Macbook m2 max | Mistral on Ollama with nomic-embed-text | 8.05s | 12.17s |
14
-
| Macbook 14' m1 pro | Llama3 on Ollama with nomic-embed-text | 29.871s| 35.32s |
12
+
| Macbook 14' m1 pro | Llama3 on Ollama with nomic-embed-text | 29.87s | 35.32s |
15
13
| Macbook m2 max | Llama3 on Ollama with nomic-embed-text | 18.36s | 78.32s |
16
14
17
-
18
15
**Note**: the examples on Docker are not runned on other devices than the Macbook because the performance are to slow (10 times slower than Ollama). Indeed the results are the following:
19
16
20
17
| Hardware | Example 1 | Example 2 |
21
18
| ------------------ | --------- | --------- |
22
-
| Macbook 14' m1 pro | 139.89s| Too long |
19
+
| Macbook 14' m1 pro | 139.89 | Too long |
23
20
# Performance on APIs services
24
21
### Example 1: personal portfolio
25
22
**URL**: https://perinim.github.io/projects
26
23
**Task**: List me all the projects with their description.
27
24
28
-
| Name | Execution time | total_tokens | prompt_tokens | completion_tokens | successful_requests | total_cost_USD |
0 commit comments