Releases: Open-Security-Mapping-Project/ice_detention_scraper
1.1.0-alpha1
Large rewrite of the ice_detention_scraper, many thanks to @johnseekins and a couple volunteers who tested the many new refinements and additions. This brings us up to more than 190 facilities, many of them not managed by ICE directly but still part of their network.
Generally this release should be okay for use. It has been tested in various modes.
The scraper can now obtain and present data including how many detainees are in the facilities. To address the poor and incorrect postal addresses provided by ICE there are a ton of manually matched addresses. The project now uses uv and mise to orchestrate Python containers. An additional ICE detention facility spreadsheet that is updated about 2x a month is now downloaded and parsed to get this more detailed facility data.
Export formats are now expanded! json is an option and more are in the works.
The --load-existing option is mainly intended for developers and has been trimmed to 20 facilities.
In the enrichment process there is much improved searching around the names and 'stemming' them to look for alternatives in OpenStreetMap and Wikipedia, which should make it easier to find.
For developers and overall maintainability, there is better type safety tools, git commit hooks for checking code quality, and dependabot integration. Please see the readme.md for additional details on all this.
If you want to help check out the issue queue and the pull requests.
Known issue
- the flags for
--debug-wikipedia,--debug-osm,--debug-wikidataare not currently working. They will just kick a notice at you. ( #19 )
What's Changed
- Improve project structure some by @johnseekins in #14
- Bump ruff from 0.12.12 to 0.13.0 by @dependabot[bot] in #21
- Bump polars from 1.33.0 to 1.33.1 by @dependabot[bot] in #20
- fix default data so --load-existing works without enrich by @johnseekins in #22
- update sheet from ice.gov (as it gets updated weekly) by @johnseekins in #25
- shrink default data set to a random subset of all facilities by @johnseekins in #26
- find pages to scrape rather than hard-coding by @johnseekins in #28
- Bump mypy from 1.17.1 to 1.18.1 by @dependabot[bot] in #30
- Bump types-requests from 2.32.4.20250809 to 2.32.4.20250913 by @dependabot[bot] in #29
- more matching fixes by @johnseekins in #33
New Contributors
- @johnseekins made their first contribution in #14
- @dependabot[bot] made their first contribution in #21
Full Changelog: 1.0.0...1.1.0-alpha1
v1.0.0
ICE Detention Facilities Scraper - Initial Release 1.0.0.
ICE Detention Facilities Data Scraper and Enricher, a Python script managed by the Open Security Mapping Project.
In short this will help identify the online profile of each ICE detention facility. Please see the project home page for more about mapping these facilities and other detailed info sources.
This script scrapes ICE detention facility data from ICE.gov and enriches it with information from Wikipedia, Wikidata, and
OpenStreetMap.
The main purpose right now is to identify if the detention facilities have data on Wikipedia, Wikidata and OpenStreetMap, which will help with documenting the facilities appropriately. As these entries get fixed up, you should be able to see your CSV results change almost immediately.
You can also use --load-existing to leverage an existing scrape of the data from ICE.gov. This is stored in data_loader.py and includes the official current addresses of facilities. (Note ICE has been renaming known "detention center" sites to "processing center", and so on.)
Please see the README.md for additional information.