Large rewrite of the ice_detention_scraper, many thanks to @johnseekins and a couple volunteers who tested the many new refinements and additions. This brings us up to more than 190 facilities, many of them not managed by ICE directly but still part of their network.
Generally this release should be okay for use. It has been tested in various modes.
The scraper can now obtain and present data including how many detainees are in the facilities. To address the poor and incorrect postal addresses provided by ICE there are a ton of manually matched addresses. The project now uses uv and mise to orchestrate Python containers. An additional ICE detention facility spreadsheet that is updated about 2x a month is now downloaded and parsed to get this more detailed facility data.
Export formats are now expanded! json is an option and more are in the works.
The --load-existing option is mainly intended for developers and has been trimmed to 20 facilities.
In the enrichment process there is much improved searching around the names and 'stemming' them to look for alternatives in OpenStreetMap and Wikipedia, which should make it easier to find.
For developers and overall maintainability, there is better type safety tools, git commit hooks for checking code quality, and dependabot integration. Please see the readme.md for additional details on all this.
If you want to help check out the issue queue and the pull requests.
Known issue
- the flags for
--debug-wikipedia,--debug-osm,--debug-wikidataare not currently working. They will just kick a notice at you. ( #19 )
What's Changed
- Improve project structure some by @johnseekins in #14
- Bump ruff from 0.12.12 to 0.13.0 by @dependabot[bot] in #21
- Bump polars from 1.33.0 to 1.33.1 by @dependabot[bot] in #20
- fix default data so --load-existing works without enrich by @johnseekins in #22
- update sheet from ice.gov (as it gets updated weekly) by @johnseekins in #25
- shrink default data set to a random subset of all facilities by @johnseekins in #26
- find pages to scrape rather than hard-coding by @johnseekins in #28
- Bump mypy from 1.17.1 to 1.18.1 by @dependabot[bot] in #30
- Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913 by @dependabot[bot] in #29
- more matching fixes by @johnseekins in #33
New Contributors
- @johnseekins made their first contribution in #14
- @dependabot[bot] made their first contribution in #21
Full Changelog: 1.0.0...1.1.0-alpha1