-
Notifications
You must be signed in to change notification settings - Fork 1
Vera.org Facility data and additional facility types/groupings #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vera.org Facility data and additional facility types/groupings #49
Conversation
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
|
This isn't done, but I thought it'd be worth exploring more. Currently matching 107 facilities, so getting that up to 191/192 would be nice. |
|
I added the Susupe, Saipan place to openstreetmap, it was not marked at all. let's see if the alt_name attribute can help carry this forward. Nominatim does return the way now on query "Vicente T Seman Bldg Civic Center" I made alt_name using semicolon separators here - that seems to be the right approach https://wiki.openstreetmap.org/wiki/Key:alt_name |
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
|
This may help in addressing #47 , although it won't completely solve it. |
Signed-off-by: John Seekins <[email protected]>
|
can we add a skip option for the vera data? probably a CLI toggle like --skip-vera to avoid, i think running this command. it is taking quite a long time to go thru 1419 facilities in the broader american prison industrial complex. also maybe add a switch like --vera-ice-facilities-only which would skip over all the non ICE managed ones (that is, limit the processing only to the ones that we already have grabbed via the other methods. I am running the test now. it will take some time to see how the results come out. also using ctrl-C i was not able to get a clean break from the processing in the midst of it (i think around 600 processed facilities). which is the first time i have had this issue so far. |
|
You're talking about skipping Vera data during enrichment, aren't you? I had only tested it during the initial scrape, which was nice and fast. Maybe we don't want to add this just yet? Not hard to leave a PR as a thought for later... The ctrl-c not working actually makes sense. I used multiprocessing to speed up enrichment, which actually forks subprocesses, so you have to actually cancel all the subprocesses to fully exit. It's a pain, and we could probably look at using threads instead (although historically that's been messy in python), but it seemed a pretty reasonable trade-off for the speed up.
|
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
|
@HongPong With the changes I've made, enrichment performance is decent again: Not amazing (that's about 10 minutes?), but significantly better than before when enriching all Vera data. |
Signed-off-by: John Seekins <[email protected]>
|
Using the (just added) |
|
Interestingly, there's diminishing returns on the worker count. 7 workers (up from the default of 3), only nets an extra minute or so: |
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
Signed-off-by: John Seekins <[email protected]>
|
alright i got it in thank you!!! sorry about the delay on that |
vera.org has a list of >1300 facilities we may be able to leverage to expand our dataset:
https://github.com/vera-institute/ice-detention-trends/blob/main/metadata/facilities.csv
The qualifier being that their license is somewhat restrictive: https://github.com/vera-institute/ice-detention-trends/blob/main/License.md