-
-
Notifications
You must be signed in to change notification settings - Fork 838
Description
What is your suggestion?
I'm working on improving Data Package compliance in vega-datasets (PR #755). One blocker is that the current datapackage.json uses non-standard paths that fail validation.
This is my design oversight in vega-datasets, not an Altair issue — but I need Altair's help to fix it without breaking things.
The Problem
vega-datasets/
├── datapackage.json # resource.path = "airports.csv" ← wrong
└── data/
└── airports.csv
Current datapackage.json:
{
"name": "vega-datasets",
"resources": [
{
"name": "airports",
"path": "airports.csv",
"format": "csv",
"bytes": 21176,
"hash": "sha256:608ba6d..."
}
]
}Per the Data Package spec, paths must be relative to the descriptor. Since datapackage.json is at the repo root and files live in data/, the correct path should be data/airports.csv.
Altair correctly compensates for this oversight by hardcoding data/ in the base URL (npm.py:62):
def dataset_base_url(self, version: BranchOrTag, /) -> LiteralString:
return f"{self._prefix(version)}data/"If I fix vega-datasets now, Altair's URL construction breaks (double data/data/).
Request
Could Altair handle both path formats? This would let me fix vega-datasets without a coordinated release.
| Format | resource.path |
Expected URL |
|---|---|---|
| Current | airports.csv |
.../data/airports.csv |
| Fixed | data/airports.csv |
.../data/airports.csv |
Suggested Approach
# npm.py - remove hardcoded "data/"
def dataset_base_url(self, version: BranchOrTag, /) -> LiteralString:
return f"{self._prefix(version)}"
# datapackage.py - normalize paths for backwards compatibility
@property
def _url(self) -> Column:
path_col = col("path")
normalized = pl.when(path_col.str.starts_with("data/")).then(
path_col
).otherwise(
pl.concat_str(pl.lit("data/"), path_col)
)
expr = pl.concat_str(pl.lit(self._base_url), normalized)
return Column("url", expr, "Remote url used to access dataset.")Also file_name would need to extract just the filename:
Column("file_name", col("path").str.split("/").list.last(), ...)Migration
- Altair adds backwards-compatible path handling
- vega-datasets fixes paths to
data/airports.csv - Standard validators pass, Altair continues working
Happy to help with the implementation if useful.
Related
- feat: improve Data Package metadata compliance with CKAN licenses and field schemas vega-datasets#755: feat: improve Data Package metadata compliance — blocked by this
- Data Package path spec: https://datapackage.org/standard/data-resource/#path
Have you considered any alternative solutions?
No response