Skip to content

Consolidate geo matching of hospitals #14

@mradamcox

Description

@mradamcox

We have added a geojson file to the repo that shows all of the hospital locations to date. I have tested it and can confirm that the data is complete as of this writing, but we still haven't fully integrated it into the workflow, so that's what this ticket is about.

Background: Each row in an input CSV has the name of a hospital in it, and that name has to be matched to a lat/long/street address, etc which is then written to the output CSV. The input CSVs do come with some of this information for some of the hospitals, but it is not reliable so we disregard it.

Currently, the matching process uses two different files, geocode_cache.csv and pa_hospitals and combines them in geo_utils.py HospitalLocations(). Here's an example of where that is ultimately implemented in process_csv: https://github.com/RTCovid/PADataIngestion/blob/master/operators/process_csv.py#L54 (also scroll down to lines 120 and 133) in order to match coordinates to the hospitals based on their name.

Instead of that process, we can consolidate greatly by loading the geojson file, matching a name to each feature, and then taking all of the necessary information from the feature. We recently started using that matching process in the Validator class here:

def validate_locations(self, input_csv):
. That example also shows the simple pattern in place to handle misspellings or new names for hospitals: A "HospitalNameAliases" field is stored in the GeoJSON that can hold pipe-delimited alternate spellings, and it is parsed in the load_geojson function. The if a name doesn't immediate match one of the features, the list is iterated again.

Completing this ticket will be revamping process_csv to use the new matching method, so that HospitalLocations (and therefore the csv files mentioned above) are no longer needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions