Scraping methods? #1299

davida72 · 2025-04-07T21:06:33Z

davida72
Apr 7, 2025

In the Contributing section four methods of scraping are provided.

Using Beautiful Soup 4
Using the requests module
Reading data from external files
Using Selenium to automate browser behaviour

I'd like to check that I understand how each works.

Beautiful Soup lets me parse the contents of an HTML page
Request module works with sites that have an API
External files lets me parse CSVs, PDFs and other files.
Selenium allows me to operate a virtual webpage, whereupon Beautiful Soup can parse the HTML.

Is that right?

Answered by dp247

Apr 7, 2025

Basically yes. 1 and 2 are used in tandem (you request the page then scrape with BS4).

2 is also useful if you want to reverse-engineer a council's process for displaying the data (you can use curlconverter to do the heavy lifting).

Also, for external files, it's mainly just CSV or text-based files - PDFs aren't really workable due to their unreliable structures.

View full answer

dp247 · 2025-04-07T21:18:34Z

dp247
Apr 7, 2025
Collaborator

Basically yes. 1 and 2 are used in tandem (you request the page then scrape with BS4).

2 is also useful if you want to reverse-engineer a council's process for displaying the data (you can use curlconverter to do the heavy lifting).

Also, for external files, it's mainly just CSV or text-based files - PDFs aren't really workable due to their unreliable structures.

0 replies

robbrad · 2025-04-08T12:47:00Z

robbrad
Apr 8, 2025
Maintainer

Selenium is just an automation engine. It takes commands via an api
Eg

go to page x

it’s the closest thing to a human doing it. The councils go to great lengths to protect their infrastructure from web scraping. Are we web scraping, yes, but it’s being done by the user of that council (a customer) to get at their schedule data.

Much like if you did it manually.

Are we using it to mass harvest data. No

nice work on diving deep @davida72

0 replies

davida72 · 2025-04-08T19:20:54Z

davida72
Apr 8, 2025
Author

Thanks both. Getting there...

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Scraping methods? #1299

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Scraping methods? #1299

Uh oh!

davida72 Apr 7, 2025

Replies: 3 comments

Uh oh!

dp247 Apr 7, 2025 Collaborator

Uh oh!

robbrad Apr 8, 2025 Maintainer

Uh oh!

davida72 Apr 8, 2025 Author

davida72
Apr 7, 2025

dp247
Apr 7, 2025
Collaborator

robbrad
Apr 8, 2025
Maintainer

davida72
Apr 8, 2025
Author