The final pdfs are posted on Google Cloud Storage: https://storage.googleapis.com/in-electoral-rolls/dadra_pdfs.tar.gz
Requester pays for the charges associated with downloading the data. For more information about about that, see: https://cloud.google.com/storage/docs/requester-pays
URL = http://ceodnh.nic.in/Electoral2017.aspx
Year = Final Electoral Roll for 2017
The Script does three things:
-
Produces dadra.csv that contains metadata about the pdfs. The CSV has the following fields:
language, main_or_supplementary, part_no, file_name -
Downloads all the pdfs to a directory called
dadra_pdfs/ -
Renames files as follows:
- English language rolls have the prefix
engand Gujarati language rolls have the prefixguj. - The
mainrolls have the wordmainin them and supplementarysupp - And the last segment is the 3 digit part_no.
So a sample name = eng_main_001.pdf
- English language rolls have the prefix
pip install -r requirements.txt
python dadra.py
| lang | type | file_name |
|---|---|---|
| eng | main | 266 |
| eng | supp | 255 |
| guj | main | 266 |
| guj | supp | 252 |
There are missing supplementary files getting error 404 (File or directory not found).
Draft roll for 2018 is also available.