Skip to content

CSR (client side rendering) web pages don't work ! #786

@obscuredotsh

Description

@obscuredotsh

Problem statement:

When I supply a web page where the application sends a minimal HTML page to the client, and JavaScript on the client's browser is responsible for rendering the full content of the page. The HTML is dynamically generated on the client side.
The code doesn't work in this case and I have no ways to supply proper parameters to control JS behaviour.

A clear and concise description of what you want to happen.

Playwright lets me do it manually.
Here is the Class submodule which refers to the options:
https://scrapegraph-ai.readthedocs.io/en/latest/modules/scrapegraphai.docloaders.html

I just don't have an option to do this

My code snippet:

graph_config = {
    "llm": {
        "api_key": OPENAI_API_KEY,
        "model": "openai/gpt-4o",
    },
    "verbose": True,
    "headless": False,
    #"max_results": 5,
    #"format":"json"  
}

format =   {
                "title": "job title",
                "company": "Company Name",
                "location": "Location of the job",
                "experience": "Years of experience required",
                "salary": "Compensation amount",
                "sourceUrl": "Url of the job"
            }

smart_scraper_graph = SmartScraperGraph(
    prompt=f"Give me details about jobs postings here in the following format: {format}",
    source='https://jobfound.org/?jobType=full+time%2Cinternship',
    config=graph_config
)


result = smart_scraper_graph.run()
print(json.dumps(result, indent=4))```

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions