|
| 1 | +# 'Have your Say' scraper |
| 2 | + |
| 3 | + [](https://badge.fury.io/py/hys_scraper) [](https://github.com/felixrech/hys_scraper/blob/master/LICENSE) |
| 4 | + |
| 5 | +A small utility to scrape the European Commission's 'Have your Say' plattform ([https://ec.europa.eu/info/law/better-regulation/have-your-say](https://ec.europa.eu/info/law/better-regulation/have-your-say)). Can scrape an initiative's feedback submissions, attachments of these submissions, and the by country and by category statistics. |
| 6 | + |
| 7 | +## Installation |
| 8 | + |
| 9 | +```bash |
| 10 | +pip3 install hys_scraper |
| 11 | +``` |
| 12 | + |
| 13 | +Tested to work with Python 3.9 on a Linux machine and Google Colab notebooks. |
| 14 | + |
| 15 | +## Getting started |
| 16 | + |
| 17 | +To get started, you will need the publication id of the initiative you want to scrape. To get this, simply navigate to the initiative on 'Have your Say' and look at the URL - the number at the end is the publication id you will use in the next step. For example, for the [AIAct commission adoption initiative](https://ec.europa.eu/info/law/better-regulation/have-your-say/initiatives/12527-Artificial-intelligence-ethical-and-legal-requirements/feedback_en?p_id=24212003), the publication id would be `24212003`. |
| 18 | + |
| 19 | +To scrape an initiative the following is sufficient (replace `24212003` with the publication id of the initiative you want to scrape): |
| 20 | + |
| 21 | +```bash |
| 22 | +python3 -m hys_scraper 24212003 |
| 23 | +``` |
| 24 | + |
| 25 | +This will create a new folder in your current working directory with the following layout: |
| 26 | + |
| 27 | +``` |
| 28 | +24212003_requirements_for_artificial_intelligence/ |
| 29 | +├── attachments |
| 30 | +│ ├── 2488672.pdf |
| 31 | +│ ├── 2596917.pdf |
| 32 | +│ └── ... |
| 33 | +├── attachments.csv |
| 34 | +├── categories.csv |
| 35 | +├── countries.csv |
| 36 | +└── feedbacks.csv |
| 37 | +
|
| 38 | +1 directory, 263 files |
| 39 | +``` |
| 40 | + |
| 41 | +## Advanced usage |
| 42 | + |
| 43 | +The command line interface has a few more arguments. For example instead of having `hys_scraper` create a folder in the local working directory to save results into, you can also manually specify a target directory. |
| 44 | + |
| 45 | +``` |
| 46 | +$ python3 -m hys_scraper -h |
| 47 | +Scrape feedback and statistics from the European Commission's 'Have your Say' plattform. |
| 48 | +
|
| 49 | +positional arguments: |
| 50 | + PID The publication id - what comes after 'p_id=' in the initiative's URL. |
| 51 | +
|
| 52 | +optional arguments: |
| 53 | + -h, --help show this help message and exit |
| 54 | + --dir target_dir, --target_dir target_dir |
| 55 | + Directory to save the feedback and statistics dataframes to. Defaults to creating a new |
| 56 | + folder in the current working directory. |
| 57 | + --no_attachments Whether to skip the download of attachments. |
| 58 | + --sleep_time t Minimum time between consecutive HTTP requests (in seconds). |
| 59 | +``` |
| 60 | + |
| 61 | +Alternatively, you can also access `hys_scraper` from Python: |
| 62 | + |
| 63 | +```python |
| 64 | +from hys_scraper import HYS_Scraper |
| 65 | +feedbacks, countries, categories = HYS_Scraper("24212003").scrape() |
| 66 | +``` |
| 67 | + |
| 68 | +Similar options are available as for the command line interface, check out `help(HYS_Scraper)` for details. |
0 commit comments