-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Zenodo now offers Data Package as an export format for the metadata (e.g. https://zenodo.org/records/10054230/export/datapackage). It includes the deposit metadata (contributors, license, etc.) and all files as resources. These resources are generic (with name, path, format, mimetype, bytes, hash): they are not specified as tabular (even if they are) and do not contain a schema.
For deposits that have a datapackage.json file, one of the resources listed will be that datapackage.json:
library(frictionless)
(p <- read_package("https://zenodo.org/records/10054230/export/datapackage"))
#> A Data Package with 22 resources:
#> • HG_OOSTENDE-acceleration-2017.csv.gz
#> • HG_OOSTENDE-gps-2013.csv.gz
#> • HG_OOSTENDE-gps-2019.csv.gz
#> • HG_OOSTENDE-gps-2021.csv.gz
#> • HG_OOSTENDE-acceleration-2016.csv.gz
#> • HG_OOSTENDE-gps-2017.csv.gz
#> • HG_OOSTENDE-gps-2016.csv.gz
#> • HG_OOSTENDE-acceleration-2022.csv.gz
#> • HG_OOSTENDE-acceleration-2020.csv.gz
#> • HG_OOSTENDE-acceleration-2021.csv.gz
#> • HG_OOSTENDE-acceleration-2014.csv.gz
#> • HG_OOSTENDE-acceleration-2018.csv.gz
#> • HG_OOSTENDE-acceleration-2019.csv.gz
#> • HG_OOSTENDE-acceleration-2013.csv.gz
#> • HG_OOSTENDE-gps-2014.csv.gz
#> • HG_OOSTENDE-acceleration-2015.csv.gz
#> • HG_OOSTENDE-gps-2015.csv.gz
#> • HG_OOSTENDE-gps-2018.csv.gz
#> • datapackage.json
#> • HG_OOSTENDE-gps-2022.csv.gz
#> • HG_OOSTENDE-gps-2020.csv.gz
#> • HG_OOSTENDE-reference-data.csv
#> For more information, see <https://doi.org/10.5281/zenodo.10054230>.
#> Use `unclass()` to print the Data Package as a list.
read_resource(p, "datapackage.json")
#> Error in `get_schema()` at frictionless-r/R/read_from_path.R:13:3:
#> ! Resource "datapackage.json" must have a profile property with value
#> "tabular-data-resource".
datapackage_path <- frictionless:::get_resource(p, "datapackage.json")$path
read_package(datapackage_path)
#> A Data Package with 3 resources:
#> • reference-data
#> • gps
#> • acceleration
#> For more information, see <https://doi.org/10.5281/zenodo.10054230>.
#> Use `unclass()` to print the Data Package as a list.It would be nice if read_package() could notice this and suggest to the user to read that file instead.
p <- read_package("https://zenodo.org/records/10054230/export/datapackage")
#> ...
#> One of the listed resources is a "datapackage.json" which may describe
#> the resources in more detail. Read it with
#> `read_package("https://zenodo.org/records/10054230/files/datapackage.json")`.This is good as a first approach, but it doesn't allow easy programmatic access. Suggestions to do that:
-
An attribute.
NULLif there is nodatapackage.jsonresource:p1 <- read_package("https://zenodo.org/records/10054230/export/datapackage") p1$resource_datapackage_path #> "https://zenodo.org/records/10054230/files/datapackage.json" p2 <- read_package(p1$resource_datapackage_path)
-
Piping
read_package(). If you pass a package toread_package()it attempts to read the deeperdatapackage.jsonor return the original one if not found:read_package("https://zenodo.org/records/10054230/export/datapackage") |> read_package()
-
A
mergeparameter that tries to merge the first (metadata) and second (resources)datapackage.jsonfiles. Note: there is no guarantee that the second one contains betterresourcesinfo and worse metadata, but it is likely for Zenodo deposits.read_package( "https://zenodo.org/records/10054230/export/datapackage", merge = TRUE )
It would be good to investigate how other implementations do this. @roll how is this implemented in dpkit and/or Python?