Skip to content

Commit 1bb2a11

Browse files
committed
UPD readthedocs
1 parent b683854 commit 1bb2a11

File tree

4 files changed

+142
-82
lines changed

4 files changed

+142
-82
lines changed
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
Overview
2+
========
3+
4+
Here some example of the different ways to scrape with ScrapegraphAI
5+
6+
OpenAI models
7+
^^^^^^^^^^^^^
8+
9+
.. code-block:: python
10+
11+
import os
12+
from dotenv import load_dotenv
13+
from scrapegraphai.graphs import SmartScraperGraph
14+
from scrapegraphai.utils import prettify_exec_info
15+
16+
load_dotenv()
17+
18+
openai_key = os.getenv("OPENAI_APIKEY")
19+
20+
graph_config = {
21+
"llm": {
22+
"api_key": openai_key,
23+
"model": "gpt-3.5-turbo",
24+
},
25+
}
26+
27+
# ************************************************
28+
# Create the SmartScraperGraph instance and run it
29+
# ************************************************
30+
31+
smart_scraper_graph = SmartScraperGraph(
32+
prompt="List me all the projects with their description.",
33+
# also accepts a string with the already downloaded HTML code
34+
source="https://perinim.github.io/projects/",
35+
config=graph_config
36+
)
37+
38+
result = smart_scraper_graph.run()
39+
print(result)
40+
41+
42+
OpenAI models
43+
^^^^^^^^^^^^^
44+
45+
.. code-block:: python
46+
47+
import os
48+
from dotenv import load_dotenv
49+
from scrapegraphai.graphs import SmartScraperGraph
50+
from scrapegraphai.utils import prettify_exec_info
51+
52+
load_dotenv()
53+
54+
openai_key = os.getenv("OPENAI_APIKEY")
55+
56+
graph_config = {
57+
"llm": {
58+
"api_key": openai_key,
59+
"model": "gpt-3.5-turbo",
60+
},
61+
}
62+
63+
# ************************************************
64+
# Create the SmartScraperGraph instance and run it
65+
# ************************************************
66+
67+
smart_scraper_graph = SmartScraperGraph(
68+
prompt="List me all the projects with their description.",
69+
# also accepts a string with the already downloaded HTML code
70+
source="https://perinim.github.io/projects/",
71+
config=graph_config
72+
)
73+
74+
result = smart_scraper_graph.run()
75+
print(result)
76+
77+
Local models
78+
^^^^^^^^^^^^^
79+
80+
Remember to have installed in your pc ollama `ollama <https://ollama.com/>`
81+
Remember to pull the right model for LLM and for the embeddings, like:
82+
.. code-block:: bash
83+
84+
ollama pull llama3
85+
86+
After that, you can run the following code, using only your machine resources brum brum brum:
87+
88+
.. code-block:: python
89+
90+
from scrapegraphai.graphs import SmartScraperGraph
91+
from scrapegraphai.utils import prettify_exec_info
92+
93+
graph_config = {
94+
"llm": {
95+
"model": "ollama/mistral",
96+
"temperature": 1,
97+
"format": "json", # Ollama needs the format to be specified explicitly
98+
"model_tokens": 2000, # depending on the model set context length
99+
"base_url": "http://localhost:11434", # set ollama URL of the local host (YOU CAN CHANGE IT, if you have a different endpoint
100+
},
101+
"embeddings": {
102+
"model": "ollama/nomic-embed-text",
103+
"temperature": 0,
104+
"base_url": "http://localhost:11434", # set ollama URL
105+
}
106+
}
107+
108+
# ************************************************
109+
# Create the SmartScraperGraph instance and run it
110+
# ************************************************
111+
112+
smart_scraper_graph = SmartScraperGraph(
113+
prompt="List me all the news with their description.",
114+
# also accepts a string with the already downloaded HTML code
115+
source="https://perinim.github.io/projects",
116+
config=graph_config
117+
)
118+
119+
result = smart_scraper_graph.run()
120+
print(result)
121+

docs/source/getting_started/installation.rst

Lines changed: 5 additions & 63 deletions
Original file line numberDiff line numberDiff line change
@@ -8,77 +8,19 @@ Prerequisites
88
^^^^^^^^^^^^^
99

1010
- `Python 3.8+ <https://www.python.org/downloads/>`_
11-
- `Visual Studio Code <https://code.visualstudio.com/download>`_ or IDE of your choice
11+
- `pip <https://pip.pypa.io/en/stable/getting-started/>`
12+
- `ollama <https://ollama.com/>` *optional for local models
1213
13-
External dependencies
14-
^^^^^^^^^^^^^^^^^^^^^
1514

16-
Windows
17-
+++++++
18-
19-
Insert external dependencies for Windows if there are any.
20-
21-
Linux
22-
++++++
23-
24-
You don't need to install any external dependencies.
25-
26-
Clone the repository
15+
Install the library
2716
^^^^^^^^^^^^^^^^^^^^
2817

2918
.. code-block:: bash
3019
31-
git clone https://github.com/VinciGit00/yoso-ai.git
32-
cd AmazScraper
33-
34-
Create a virtual environment
35-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
36-
37-
It is recommended to create a virtual environment to install the dependencies in order to avoid conflicts with other projects.
38-
39-
.. code-block:: bash
40-
41-
python -m venv venv
42-
# python3 -m venv venv
43-
44-
Activate the virtual environment
45-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
46-
47-
All the commands must be executed in the virtual environment. If you are not familiar with virtual environments, please read the `Python Virtual Environments: A Primer <https://realpython.com/python-virtual-environments-a-primer/>`_ article.
48-
49-
To activate the virtual environment, run the following command:
50-
51-
.. code-block:: bash
52-
53-
# Windows
54-
.\venv\Scripts\activate
55-
# Linux
56-
source venv/bin/activate
57-
58-
Install the dependencies
59-
^^^^^^^^^^^^^^^^^^^^^^^^
60-
61-
In the **requirements.txt** file you will find all the dependencies needed to run the code. To install them, run the following command:
62-
63-
.. code-block:: bash
64-
65-
pip install -r requirements.txt
66-
# pip3 install -r requirements.txt
67-
68-
Test the installation
69-
^^^^^^^^^^^^^^^^^^^^^
70-
71-
- Let's test if the installation was successful. Run the following command:
20+
pip install scrapegraphai
7221
73-
.. code-block:: bash
7422
75-
python some_example.py
76-
# python3 .some_example.py
23+
As simple as that! You are now ready to scrape gnamgnamgnam 👿👿👿
7724

78-
- Let's test if the modules works. Run the following command:
7925

80-
.. code-block:: bash
8126

82-
python -m examples.values_scraping
83-
# or
84-
python -m examples.html_scraping

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ The following sections will guide you through the installation process and the u
2121
:caption: Getting Started
2222

2323
getting_started/installation
24+
getting_started/examples
2425
modules/modules
2526

2627
Indices and tables
Lines changed: 15 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,20 @@
1-
Overview
1+
Overview
22
========
33

4-
This is an open source project aimed at developing a scraping library using LLM through Langchain.
5-
The goal is to be able to scrape data using natural language queries and store them in a structured format.
4+
In a world where web pages are constantly changing and in a data-hungry world there is a need for a new generation of scrapers, and this is where ScrapegraphAI was born.
5+
An opensource library with the aim of starting a new era of scraping tools that are more flexible and require less maintenance by developers, with the use of LLMs.
66

7-
.. image:: ../../assets/apikey_1.png
7+
.. image:: ../../assets/scrapegraphai_logo.png
88
:align: center
9-
:width: 400px
10-
:alt: OpenAI Key
11-
12-
.. image:: ../../assets/apikey_2.png
13-
:align: center
14-
:width: 400px
15-
:alt: OpenAI Key
9+
:width: 100px
10+
:alt: ScrapegraphAI
1611

17-
.. image:: ../../assets/apikey_3.png
18-
:align: center
19-
:width: 400px
20-
:alt: OpenAI Key
21-
.. image:: ../../assets/apikey_4.png
22-
:align: center
23-
:width: 400px
24-
:alt: OpenAI Key
12+
Why ScrapegraphAI?
13+
==================
14+
15+
ScrapegraphAI in our vision represents a significant step forward in the field of web scraping, offering an open-source solution designed to meet the needs of a constantly evolving web landscape. Here's why ScrapegraphAI stands out:
16+
17+
Flexibility and Adaptability
18+
^^^^^^^^^^^^^^^^^^^^^^^^^^^
19+
Traditional web scraping tools often rely on fixed patterns or manual configuration to extract data from web pages. ScrapegraphAI, leveraging the power of LLMs, adapts to changes in website structures, reducing the need for constant developer intervention.
20+
This flexibility ensures that scrapers remain functional even when website layouts change.

0 commit comments

Comments
 (0)