There is work going in for get_locations(locations, resolution) on issue #36 (PR #37) that checks incoming values against the filesystem to see if they might be a CSV file to load from.
This might be better off split into a separate get_locations_csv(file, resolution) function for a few reasons:
- Speculatively calling
Path(location_id).exists() on every incoming location ID might cause performance issues when a large number of locations are being requested
- There's the possibility that files could exist that have the same name as a location ID string, leading to unexpected errors or erroneous results
- Security-wise, it's good to guard against people probing a system via unexpected avenues. If for example I was calling this library function with input passed through from a web form, errors from something like
getLocations(["CHC", "AKL, ".dockerenv"]) could tell an attacker whether the code was running in a Docker container.
A separate get_locations_csv(file_location, resolution) function would make interacting with the host file system more intentional. It also provides us with an opportunity to take in either a Path or a file-like object (with a type under typing.TextIO) to read from, which would allow users to read CSV data in from non-traditional sources (e.g. S3 buckets) as well as from their local filesystems.
There is work going in for
get_locations(locations, resolution)on issue #36 (PR #37) that checks incoming values against the filesystem to see if they might be a CSV file to load from.This might be better off split into a separate
get_locations_csv(file, resolution)function for a few reasons:Path(location_id).exists()on every incoming location ID might cause performance issues when a large number of locations are being requestedgetLocations(["CHC", "AKL, ".dockerenv"])could tell an attacker whether the code was running in a Docker container.A separate
get_locations_csv(file_location, resolution)function would make interacting with the host file system more intentional. It also provides us with an opportunity to take in either a Path or a file-like object (with a type undertyping.TextIO) to read from, which would allow users to read CSV data in from non-traditional sources (e.g. S3 buckets) as well as from their local filesystems.