-
Notifications
You must be signed in to change notification settings - Fork 1
Description
We have added a geojson file to the repo that shows all of the hospital locations to date. I have tested it and can confirm that the data is complete as of this writing, but we still haven't fully integrated it into the workflow, so that's what this ticket is about.
Background: Each row in an input CSV has the name of a hospital in it, and that name has to be matched to a lat/long/street address, etc which is then written to the output CSV. The input CSVs do come with some of this information for some of the hospitals, but it is not reliable so we disregard it.
Currently, the matching process uses two different files, geocode_cache.csv and pa_hospitals and combines them in geo_utils.py HospitalLocations(). Here's an example of where that is ultimately implemented in process_csv: https://github.com/RTCovid/PADataIngestion/blob/master/operators/process_csv.py#L54 (also scroll down to lines 120 and 133) in order to match coordinates to the hospitals based on their name.
Instead of that process, we can consolidate greatly by loading the geojson file, matching a name to each feature, and then taking all of the necessary information from the feature. We recently started using that matching process in the Validator class here:
Line 39 in bf1197f
| def validate_locations(self, input_csv): |
Completing this ticket will be revamping process_csv to use the new matching method, so that HospitalLocations (and therefore the csv files mentioned above) are no longer needed.