Skip to content

Commit b8079f8

Browse files
authored
Merge pull request #211 from DiTo97/fix/fetch-node-proxybroker
feat: flexible web driver and proxy broker in fetch node
2 parents 30758b4 + 7e8acd8 commit b8079f8

29 files changed

+1235
-392
lines changed

docs/assets/searchgraph.png

50.2 KB
Loading

docs/assets/smartscrapergraph.png

58.2 KB
Loading

docs/assets/speechgraph.png

45.8 KB
Loading

docs/source/conf.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,4 +30,3 @@
3030
# https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
3131

3232
html_theme = 'sphinx_rtd_theme'
33-
html_static_path = ['_static']

docs/source/getting_started/examples.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
Examples
22
========
33

4-
Here some example of the different ways to scrape with ScrapegraphAI
4+
Let's suppose you want to scrape a website to get a list of projects with their descriptions.
5+
You can use the `SmartScraperGraph` class to do that.
6+
The following examples show how to use the `SmartScraperGraph` class with OpenAI models and local models.
57

68
OpenAI models
79
^^^^^^^^^^^^^
@@ -78,7 +80,7 @@ After that, you can run the following code, using only your machine resources br
7880
# ************************************************
7981
8082
smart_scraper_graph = SmartScraperGraph(
81-
prompt="List me all the news with their description.",
83+
prompt="List me all the projects with their description.",
8284
# also accepts a string with the already downloaded HTML code
8385
source="https://perinim.github.io/projects",
8486
config=graph_config
@@ -87,3 +89,4 @@ After that, you can run the following code, using only your machine resources br
8789
result = smart_scraper_graph.run()
8890
print(result)
8991
92+
To find out how you can customize the `graph_config` dictionary, by using different LLM and adding new parameters, check the `Scrapers` section!

docs/source/getting_started/installation.rst

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,26 +7,35 @@ for this project.
77
Prerequisites
88
^^^^^^^^^^^^^
99

10-
- `Python 3.8+ <https://www.python.org/downloads/>`_
11-
- `pip <https://pip.pypa.io/en/stable/getting-started/>`
12-
- `ollama <https://ollama.com/>` *optional for local models
10+
- `Python >=3.9,<3.12 <https://www.python.org/downloads/>`_
11+
- `pip <https://pip.pypa.io/en/stable/getting-started/>`_
12+
- `Ollama <https://ollama.com/>`_ (optional for local models)
1313

1414

1515
Install the library
1616
^^^^^^^^^^^^^^^^^^^^
1717

18+
The library is available on PyPI, so it can be installed using the following command:
19+
1820
.. code-block:: bash
1921
2022
pip install scrapegraphai
2123
24+
**Note:** It is higly recommended to install the library in a virtual environment (conda, venv, etc.)
25+
26+
If your clone the repository, you can install the library using `poetry <https://python-poetry.org/docs/>`_:
27+
28+
.. code-block:: bash
29+
30+
poetry install
31+
2232
Additionally on Windows when using WSL
2333
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2434

35+
If you are using Windows Subsystem for Linux (WSL) and you are facing issues with the installation of the library, you might need to install the following packages:
36+
2537
.. code-block:: bash
2638
2739
sudo apt-get -y install libnss3 libnspr4 libgbm1 libasound2
2840
29-
As simple as that! You are now ready to scrape gnamgnamgnam 👿👿👿
30-
31-
3241

docs/source/index.rst

Lines changed: 13 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,6 @@
33
You can adapt this file completely to your liking, but it should at least
44
contain the root `toctree` directive.
55
6-
Welcome to scrapegraphai-ai's documentation!
7-
=======================================
8-
9-
Here you will find all the information you need to get started.
10-
The following sections will guide you through the installation process and the usage of the library.
11-
126
.. toctree::
137
:maxdepth: 2
148
:caption: Introduction
@@ -22,6 +16,19 @@ The following sections will guide you through the installation process and the u
2216

2317
getting_started/installation
2418
getting_started/examples
19+
20+
.. toctree::
21+
:maxdepth: 2
22+
:caption: Scrapers
23+
24+
scrapers/graphs
25+
scrapers/llm
26+
scrapers/graph_config
27+
28+
.. toctree::
29+
:maxdepth: 2
30+
:caption: Modules
31+
2532
modules/modules
2633

2734
Indices and tables

docs/source/introduction/contributing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Contributing
22
============
33

44
Hey, you want to contribute? Awesome!
5-
Just fork the repo, make your changes, and send me a pull request.
5+
Just fork the repo, make your changes, and send a pull request.
66
If you're not sure if it's a good idea, open an issue and we'll discuss it.
77

88
Go and check out the `contributing guidelines <https://github.com/VinciGit00/Scrapegraph-ai/blob/main/CONTRIBUTING.md>`__ for more information.
Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,35 @@
1+
.. image:: ../../assets/scrapegraphai_logo.png
2+
:align: center
3+
:width: 50%
4+
:alt: ScrapegraphAI
5+
16
Overview
27
========
38

4-
In a world where web pages are constantly changing and in a data-hungry world there is a need for a new generation of scrapers, and this is where ScrapegraphAI was born.
5-
An opensource library with the aim of starting a new era of scraping tools that are more flexible and require less maintenance by developers, with the use of LLMs.
9+
ScrapeGraphAI is a open-source web scraping python library designed to usher in a new era of scraping tools.
10+
In today's rapidly evolving and data-intensive digital landscape, this library stands out by integrating LLM and
11+
direct graph logic to automate the creation of scraping pipelines for websites and various local documents, including XML,
12+
HTML, JSON, and more.
613

7-
.. image:: ../../assets/scrapegraphai_logo.png
8-
:align: center
9-
:width: 100px
10-
:alt: ScrapegraphAI
14+
Simply specify the information you need to extract, and ScrapeGraphAI handles the rest,
15+
providing a more flexible and low-maintenance solution compared to traditional scraping tools.
1116

1217
Why ScrapegraphAI?
1318
==================
1419

15-
ScrapegraphAI in our vision represents a significant step forward in the field of web scraping, offering an open-source solution designed to meet the needs of a constantly evolving web landscape. Here's why ScrapegraphAI stands out:
16-
17-
Flexibility and Adaptability
18-
^^^^^^^^^^^^^^^^^^^^^^^^^^^
19-
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages. ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
20+
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages.
21+
ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
2022
This flexibility ensures that scrapers remain functional even when website layouts change.
2123

24+
We support many Large Language Models (LLMs) including GPT, Gemini, Groq, Azure, Hugging Face etc.
25+
as well as local models which can run on your machine using Ollama.
2226

23-
Overview
24-
========
27+
Diagram
28+
=======
2529
With ScrapegraphAI you first construct a pipeline of steps you want to execute by combining nodes into a graph.
2630
Executing the graph takes care of all the steps that are often part of scraping: fetching, parsing etc...
2731
Finally the scraped and processed data gets fed to an LLM which generates a response.
2832

2933
.. image:: ../../assets/project_overview_diagram.png
3034
:align: center
31-
:alt: ScrapegraphAI Overview
35+
:alt: ScrapegraphAI Overview

docs/source/modules/modules.rst

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,3 @@
1-
scrapegraphai
2-
=============
3-
41
.. toctree::
52
:maxdepth: 4
63

0 commit comments

Comments
 (0)