-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Problem statement:
When I supply a web page where the application sends a minimal HTML page to the client, and JavaScript on the client's browser is responsible for rendering the full content of the page. The HTML is dynamically generated on the client side.
The code doesn't work in this case and I have no ways to supply proper parameters to control JS behaviour.
A clear and concise description of what you want to happen.
Playwright lets me do it manually.
Here is the Class submodule which refers to the options:
https://scrapegraph-ai.readthedocs.io/en/latest/modules/scrapegraphai.docloaders.html
I just don't have an option to do this
My code snippet:
graph_config = {
"llm": {
"api_key": OPENAI_API_KEY,
"model": "openai/gpt-4o",
},
"verbose": True,
"headless": False,
#"max_results": 5,
#"format":"json"
}
format = {
"title": "job title",
"company": "Company Name",
"location": "Location of the job",
"experience": "Years of experience required",
"salary": "Compensation amount",
"sourceUrl": "Url of the job"
}
smart_scraper_graph = SmartScraperGraph(
prompt=f"Give me details about jobs postings here in the following format: {format}",
source='https://jobfound.org/?jobType=full+time%2Cinternship',
config=graph_config
)
result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))```
Metadata
Metadata
Assignees
Labels
No labels