You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-2Lines changed: 5 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,11 +43,14 @@ The documentation for ScrapeGraphAI can be found [here](https://scrapegraph-ai.r
43
43
Check out also the Docusaurus [here](https://scrapegraph-doc.onrender.com/).
44
44
45
45
## 💻 Usage
46
-
There are three main scraping pipelines that can be used to extract information from a website (or local file):
46
+
There are multiple standard scraping pipelines that can be used to extract information from a website (or local file):
47
47
-`SmartScraperGraph`: single-page scraper that only needs a user prompt and an input source;
48
48
-`SearchGraph`: multi-page scraper that extracts information from the top n search results of a search engine;
49
49
-`SpeechGraph`: single-page scraper that extracts information from a website and generates an audio file.
50
-
-`SmartScraperMultiGraph`: multiple page scraper given a single prompt
50
+
-`ScriptCreatorGraph`: single-page scraper that extracts information from a website and generates a Python script.
51
+
52
+
-`SmartScraperMultiGraph`: multi-page scraper that extracts information from multiple pages given a single prompt and a list of sources;
53
+
-`ScriptCreatorMultiGraph`: multi-page scraper that generates a Python script for extracting information from multiple pages given a single prompt and a list of sources.
51
54
52
55
It is possible to use different LLM through APIs, such as **OpenAI**, **Groq**, **Azure** and **Gemini**, or local models using **Ollama**.
Copy file name to clipboardExpand all lines: docs/source/scrapers/graphs.rst
+39-2Lines changed: 39 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,15 @@ Graphs are scraping pipelines aimed at solving specific tasks. They are composed
6
6
There are several types of graphs available in the library, each with its own purpose and functionality. The most common ones are:
7
7
8
8
- **SmartScraperGraph**: one-page scraper that requires a user-defined prompt and a URL (or local file) to extract information using LLM.
9
-
- **SmartScraperMultiGraph**: multi-page scraper that requires a user-defined prompt and a list of URLs (or local files) to extract information using LLM. It is built on top of SmartScraperGraph.
10
9
- **SearchGraph**: multi-page scraper that only requires a user-defined prompt to extract information from a search engine using LLM. It is built on top of SmartScraperGraph.
11
10
- **SpeechGraph**: text-to-speech pipeline that generates an answer as well as a requested audio file. It is built on top of SmartScraperGraph and requires a user-defined prompt and a URL (or local file).
12
11
- **ScriptCreatorGraph**: script generator that creates a Python script to scrape a website using the specified library (e.g. BeautifulSoup). It requires a user-defined prompt and a URL (or local file).
13
12
13
+
There are also two additional graphs that can handle multiple sources:
14
+
15
+
- **SmartScraperMultiGraph**: similar to `SmartScraperGraph`, but with the ability to handle multiple sources.
16
+
- **ScriptCreatorMultiGraph**: similar to `ScriptCreatorGraph`, but with the ability to handle multiple sources.
17
+
14
18
With the introduction of `GPT-4o`, two new powerful graphs have been created:
15
19
16
20
- **OmniScraperGraph**: similar to `SmartScraperGraph`, but with the ability to scrape images and describe them.
@@ -186,4 +190,37 @@ It will fetch the data from the source, extract the information based on the pro
186
190
)
187
191
188
192
result = speech_graph.run()
189
-
print(result)
193
+
print(result)
194
+
195
+
196
+
ScriptCreatorGraph & ScriptCreatorMultiGraph
197
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
198
+
199
+
.. image:: ../../assets/scriptcreatorgraph.png
200
+
:align:center
201
+
:width:90%
202
+
:alt:ScriptCreatorGraph
203
+
204
+
First we define the graph configuration, which includes the LLM model and other parameters.
205
+
Then we create an instance of the ScriptCreatorGraph class, passing the prompt, source, and configuration as arguments. Finally, we run the graph and print the result.
206
+
207
+
.. code-block:: python
208
+
209
+
from scrapegraphai.graphs import ScriptCreatorGraph
210
+
211
+
graph_config = {
212
+
"llm": {...},
213
+
"library": "beautifulsoup4"
214
+
}
215
+
216
+
script_creator_graph = ScriptCreatorGraph(
217
+
prompt="Create a Python script to scrape the projects.",
218
+
source="https://perinim.github.io/projects/",
219
+
config=graph_config,
220
+
schema=schema
221
+
)
222
+
223
+
result = script_creator_graph.run()
224
+
print(result)
225
+
226
+
**ScriptCreatorMultiGraph** is similar to ScriptCreatorGraph, but it can handle multiple sources. We define the graph configuration, create an instance of the ScriptCreatorMultiGraph class, and run the graph.
0 commit comments