-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Currently this package includes functions to 1) download, 2) open, and 3) query two different datasets. If we want to expand the package to include a wider variety of data, step 3 is going to be difficult because it is hard to anticipate how users will want to use the data ahead of time.
So I would propose that it make sense to remove step 3 from the scope of the project, at least for the near term until common workflows for querying datasets become apparent.
To make steps 1 and 2 more general, maybe the following setup would be useful:
export open, cache
abstract type GeoDataset end
abstract type NCDataset <: GeoDataset end
abstract type ZarrDataset <: GeoDataset end
abstract type ShapefileDatset <: GeoDataset end
struct GSHHG <: ShapefileDatset
resolution::Int
level::Int
end
function Base.open(d::GeoDataset)
cache(d)
_open(d)
end
function cache(d::GeoDataset)
.....
end
# This function would go in an extension for when GeoDataFrames.jl or whatever is loaded.
function _open(d::ShapefileDataset)
.....
end
... etcIn the example above there are two main functions, open and cache, where cache would check if the file was locally available and download it if not. And open would open the file.
Each specific dataset would be a type including any fields necessary for configuration of specification, e.g. resolution and level as above information about mirrors, authentication, etc.
I think the cache function could be relatively straightforward, with maybe some different branches for http vs s3, whether authentication is required, whether there's a possibility for hash checking, etc. I think the cache function should be exported to facility data being dowloaded on HPC systems where only the login node has an internet connection.
The _open function is trickier because there will potentially be many different file formats, but I think the way forward would be to have extensions for each file format to provide a method for open that type of file.
The user would ultimately run open(GSHHG(2,1)) to get a geodataframe with the GSHHG data.
Let me know what you think.