Skip to content

Commit 2ccb608

Browse files
authored
Merge pull request #129 from VinciGit00/main
reallignement
2 parents c11331a + 35ae76f commit 2ccb608

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+977
-611
lines changed

CHANGELOG.md

Lines changed: 62 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,75 @@
1-
## [0.5.0-beta.8](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.5.0-beta.7...v0.5.0-beta.8) (2024-05-02)
1+
## [0.6.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.5.2...v0.6.0) (2024-05-02)
22

33

44
### Features
55

6+
* added node and graph for CSV scraping ([4d542a8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4d542a88f7d949a5ba360dcd880716c8110a5d14))
67
* Allow end users to pass model instances for llm and embedding model ([b86aac2](https://github.com/VinciGit00/Scrapegraph-ai/commit/b86aac2188887642564a34d13d55d0fcff220ec1))
8+
* modified node name ([02d1af0](https://github.com/VinciGit00/Scrapegraph-ai/commit/02d1af006cb89bf860ee4f1186f582e2049a8e3d))
9+
10+
11+
### CI
12+
13+
* **release:** 0.5.0-beta.7 [skip ci] ([40b2a34](https://github.com/VinciGit00/Scrapegraph-ai/commit/40b2a346d57865ca21915ecaa658096c52a2cc6b))
14+
* **release:** 0.5.0-beta.8 [skip ci] ([c11331a](https://github.com/VinciGit00/Scrapegraph-ai/commit/c11331a26ac325dfcf489272442ceeed13225a39))
15+
16+
## [0.5.2](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.5.1...v0.5.2) (2024-05-02)
17+
18+
19+
### Bug Fixes
720

8-
## [0.5.0-beta.7](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.5.0-beta.6...v0.5.0-beta.7) (2024-05-01)
21+
* bug on script_creator_graph.py ([4a3bc37](https://github.com/VinciGit00/Scrapegraph-ai/commit/4a3bc37f2fbb24953edd68f28234ff14302ac120))
22+
23+
## [0.5.1](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.5.0...v0.5.1) (2024-05-02)
24+
25+
26+
### Bug Fixes
27+
28+
* examples and graphs ([5cf4e4f](https://github.com/VinciGit00/Scrapegraph-ai/commit/5cf4e4f92f024041c44211aebd2e3bdf73351a00))
29+
30+
31+
### Docs
32+
33+
* added venv suggestion ([ba2b24b](https://github.com/VinciGit00/Scrapegraph-ai/commit/ba2b24b4cd82d63f9235051eb0e95519c51fd639))
34+
* base and fetch node ([e981796](https://github.com/VinciGit00/Scrapegraph-ai/commit/e9817963c8e98e35662cc5a140b0348792d25307))
35+
* change contributing.md with new ci/cd workflow ([3e91a46](https://github.com/VinciGit00/Scrapegraph-ai/commit/3e91a46522ab1f6b2f733efd234b06df4687c695))
36+
* fixed basegraph docstring ([29427c2](https://github.com/VinciGit00/Scrapegraph-ai/commit/29427c233485816967c4ecd6c1951351be9b27ce))
37+
* graphs and helpers docstrings ([0631985](https://github.com/VinciGit00/Scrapegraph-ai/commit/0631985e6156bd21ec5317faff9e345c8aa7f88b))
38+
* refactor examples ([c11fc28](https://github.com/VinciGit00/Scrapegraph-ai/commit/c11fc288963e1a2818e451279a3bf53eb33e22be))
39+
* refactor models docstrings ([18c20eb](https://github.com/VinciGit00/Scrapegraph-ai/commit/18c20eb03de183a0311be5ffe21f53ec4edf1b87))
40+
* refactor nodes docstrings ([1409797](https://github.com/VinciGit00/Scrapegraph-ai/commit/140979747598210674131befadd786800c9fb5ec))
41+
* update utils docstrings ([cf038b3](https://github.com/VinciGit00/Scrapegraph-ai/commit/cf038b33eaae42f65d7d9c782b5729092b272dd0))
42+
43+
## [0.5.0](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.4.1...v0.5.0) (2024-04-30)
944

1045

1146
### Features
1247

13-
* added node and graph for CSV scraping ([4d542a8](https://github.com/VinciGit00/Scrapegraph-ai/commit/4d542a88f7d949a5ba360dcd880716c8110a5d14))
14-
* modified node name ([02d1af0](https://github.com/VinciGit00/Scrapegraph-ai/commit/02d1af006cb89bf860ee4f1186f582e2049a8e3d))
48+
* add cluade integration ([e0ffc83](https://github.com/VinciGit00/Scrapegraph-ai/commit/e0ffc838b06c0f024026a275fc7f7b4243ad5cf9))
49+
* add co-author ([719a353](https://github.com/VinciGit00/Scrapegraph-ai/commit/719a353410992cc96f46ec984a5d3ec372e71ad2))
50+
* **fetch:** added playwright support ([42ab0aa](https://github.com/VinciGit00/Scrapegraph-ai/commit/42ab0aa1d275b5798ab6fc9feea575fe59b6e767))
51+
* added verbose flag to suppress print statements ([2dd7817](https://github.com/VinciGit00/Scrapegraph-ai/commit/2dd7817cfb37cfbeb7e65b3a24655ab238f48026))
52+
* base groq + requirements + toml update with groq ([7dd5b1a](https://github.com/VinciGit00/Scrapegraph-ai/commit/7dd5b1a03327750ffa5b2fb647eda6359edd1fc2))
53+
* **refactor:** changed variable names ([8fba7e5](https://github.com/VinciGit00/Scrapegraph-ai/commit/8fba7e5490f916b325588443bba3fff5c0733c17))
54+
* **llm:** implemented groq model ([dbbf10f](https://github.com/VinciGit00/Scrapegraph-ai/commit/dbbf10fc77b34d99d64c6cd7f74524b6d8e57fa5))
55+
* updated requirements.txt ([d368725](https://github.com/VinciGit00/Scrapegraph-ai/commit/d36872518a6d234eba5f8b7ddca7da93797874b2))
56+
57+
58+
### Bug Fixes
59+
60+
* script generator and add new benchmarks ([e3d0194](https://github.com/VinciGit00/Scrapegraph-ai/commit/e3d0194dc93b20dc254fc48bba11559bf8a3a185))
61+
62+
63+
### CI
64+
65+
* **release:** 0.4.0-beta.3 [skip ci] ([d13321b](https://github.com/VinciGit00/Scrapegraph-ai/commit/d13321b2f86d98e2a3a0c563172ca0dd29cdf5fb))
66+
* **release:** 0.5.0-beta.1 [skip ci] ([450291f](https://github.com/VinciGit00/Scrapegraph-ai/commit/450291f52e48cd35b2b8cc50ff66f5336326fa25))
67+
* **release:** 0.5.0-beta.2 [skip ci] ([ff7d12f](https://github.com/VinciGit00/Scrapegraph-ai/commit/ff7d12f1389d8eed87e9f6b2fc8b099767a904a9))
68+
* **release:** 0.5.0-beta.3 [skip ci] ([7e81f7c](https://github.com/VinciGit00/Scrapegraph-ai/commit/7e81f7c03f79c43219743be52affabbaf0d66387))
69+
* **release:** 0.5.0-beta.4 [skip ci] ([14e56f6](https://github.com/VinciGit00/Scrapegraph-ai/commit/14e56f6ab1711a08e749edbda860d349db491dae))
70+
* **release:** 0.5.0-beta.5 [skip ci] ([5ac97e2](https://github.com/VinciGit00/Scrapegraph-ai/commit/5ac97e2fb321be40c9787fbf8cb53fa62cf0ce06))
71+
* **release:** 0.5.0-beta.6 [skip ci] ([9356124](https://github.com/VinciGit00/Scrapegraph-ai/commit/9356124ce39568e88f7d2965181579c4ff0a5752))
72+
1573

1674
## [0.5.0-beta.6](https://github.com/VinciGit00/Scrapegraph-ai/compare/v0.5.0-beta.5...v0.5.0-beta.6) (2024-04-30)
1775

CONTRIBUTING.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,22 +15,31 @@ Thank you for your interest in contributing to **ScrapeGraphAI**! We welcome con
1515

1616
To get started with contributing, follow these steps:
1717

18-
1. Fork the repository on GitHub.
18+
1. Fork the repository on GitHub **(FROM pre/beta branch)**.
1919
2. Clone your forked repository to your local machine.
20-
3. Install the necessary dependencies.
20+
3. Install the necessary dependencies from requirements.txt or via pyproject.toml as you prefere :).
2121
4. Make your changes or additions.
2222
5. Test your changes thoroughly.
2323
6. Commit your changes with descriptive commit messages.
2424
7. Push your changes to your forked repository.
25-
8. Submit a pull request to the main repository.
25+
8. Submit a pull request to the pre/beta branch.
26+
27+
N.B All the pull request to the main branch will be rejected!
2628

2729
## Contributing Guidelines
2830

2931
Please adhere to the following guidelines when contributing to ScrapeGraphAI:
3032

3133
- Follow the code style and formatting guidelines specified in the [Code Style](#code-style) section.
32-
- Make sure your changes are well-documented and include any necessary updates to the project's documentation.
33-
- Write clear and concise commit messages that describe the purpose of your changes.
34+
- Make sure your changes are well-documented and include any necessary updates to the project's documentation and requirements if needed.
35+
- Write clear and concise commit messages that describe the purpose of your changes and the last commit before the pull request has to follow the following format:
36+
- `feat: Add new feature`
37+
- `fix: Correct issue with existing feature`
38+
- `docs: Update documentation`
39+
- `style: Improve formatting and style`
40+
- `refactor: Restructure code`
41+
- `test: Add or update tests`
42+
- `perf: Improve performance`
3443
- Be respectful and considerate towards other contributors and maintainers.
3544

3645
## Code Style
@@ -42,6 +51,7 @@ Please make sure to format your code accordingly before submitting a pull reques
4251
- [Style Guide for Python Code](https://www.python.org/dev/peps/pep-0008/)
4352
- [Google Python Style Guide](https://google.github.io/styleguide/pyguide.html)
4453
- [The Hitchhiker's Guide to Python](https://docs.python-guide.org/writing/style/)
54+
- [Pylint style of code for the documentation](https://pylint.pycqa.org/en/1.6.0/tutorial.html)
4555

4656
## Submitting a Pull Request
4757

@@ -53,7 +63,7 @@ To submit your changes for review, please follow these steps:
5363
4. Select your forked repository and the branch containing your changes.
5464
5. Provide a descriptive title and detailed description for your pull request.
5565
6. Reviewers will provide feedback and discuss any necessary changes.
56-
7. Once your pull request is approved, it will be merged into the main repository.
66+
7. Once your pull request is approved, it will be merged into the pre/beta branch.
5767

5868
## Reporting Issues
5969

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,9 @@ you will also need to install Playwright for javascript-based scraping:
2727
```bash
2828
playwright install
2929
```
30+
31+
**Note**: it is recommended to install the library in a virtual environment to avoid conflicts with other libraries 🐱
32+
3033
## 🔍 Demo
3134
Official streamlit demo:
3235

examples/groq/.env.example

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
1-
GROQ_APIKEY= "your groq key"
1+
GROQ_APIKEY= "your groq key"
2+
OPENAI_APIKEY="your openai api key"
File renamed without changes.
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
"""
2+
Basic example of scraping pipeline using SmartScraper
3+
"""
4+
5+
import os
6+
from dotenv import load_dotenv
7+
from scrapegraphai.graphs import SmartScraperGraph
8+
from scrapegraphai.utils import prettify_exec_info
9+
10+
load_dotenv()
11+
12+
13+
# ************************************************
14+
# Define the configuration for the graph
15+
# ************************************************
16+
17+
groq_key = os.getenv("GROQ_APIKEY")
18+
openai_key = os.getenv("OPENAI_APIKEY")
19+
20+
graph_config = {
21+
"llm": {
22+
"model": "groq/gemma-7b-it",
23+
"api_key": groq_key,
24+
"temperature": 0
25+
},
26+
"embeddings": {
27+
"api_key": openai_key,
28+
"model": "gpt-3.5-turbo",
29+
},
30+
"headless": False
31+
}
32+
33+
# ************************************************
34+
# Create the SmartScraperGraph instance and run it
35+
# ************************************************
36+
37+
smart_scraper_graph = SmartScraperGraph(
38+
prompt="List me all the projects with their description.",
39+
# also accepts a string with the already downloaded HTML code
40+
source="https://perinim.github.io/projects/",
41+
config=graph_config
42+
)
43+
44+
result = smart_scraper_graph.run()
45+
print(result)
46+
47+
# ************************************************
48+
# Get graph execution info
49+
# ************************************************
50+
51+
graph_exec_info = smart_scraper_graph.get_execution_info()
52+
print(prettify_exec_info(graph_exec_info))

examples/openai/custom_graph_openai.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@
4040
fetch_node = FetchNode(
4141
input="url | local_dir",
4242
output=["doc"],
43+
node_config={"headless": True, "verbose": True}
4344
)
4445
parse_node = ParseNode(
4546
input="doc",

examples/openai/scrape_plain_text_openai.py

Lines changed: 0 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,3 @@
5353

5454
graph_exec_info = smart_scraper_graph.get_execution_info()
5555
print(prettify_exec_info(graph_exec_info))
56-
57-
58-
# ************************************************
59-
# Get graph execution info
60-
# ************************************************
61-
62-
graph_exec_info = smart_scraper_graph.get_execution_info()
63-
print(prettify_exec_info(graph_exec_info))

examples/openai/script_generator_openai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
"api_key": openai_key,
2121
"model": "gpt-3.5-turbo",
2222
},
23-
"library": "beautifoulsoup"
23+
"library": "beautifulsoup"
2424
}
2525

2626
# ************************************************

0 commit comments

Comments
 (0)