-
Notifications
You must be signed in to change notification settings - Fork 11
Description
I wanted to assess the complexity of converting a v1 to a v2 Data Package. Below are the steps that need to be taken. For version detection, see #262. @khusmann could you review these? There are a couple of items I'm unsure about.
Package
Add package.$schema, remove package.profile
Use package.profile, then remove it.
-
NULL=>https://datapackage.org/profiles/2.0/datapackage.json -
data-package(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json -
tabular-data-package(registered id) =>https://datapackage.org/profiles/2.0/datapackage.json. This also removes deprecated tabular-data-package -
fiscal-data-package(registered id) => Unsure, should we use the 1.0 URL for fiscal-data-package? - A URL => Unsure, the referenced schema will likely point to Data Package v1, making it a v1
- Any other value => Unsure, not allowed by https://specs.frictionlessdata.io/profiles/
Add package.contributors.roles
- For each contributor set
roles(array) based onrole(string). Removerole
Other changes
- package.version: documentation change, no action required
- package.contributors: no action required for
title,givenNameandfamilyName. - package.sources: Unsure, but I think no action is required
Each resource
Add resource.$schema, remove resource.profile
Use resource.profile, then remove it
-
NULL=>https://datapackage.org/profiles/2.0/dataresource.json -
data-resource(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json -
tabular-data-resource(registered id) =>https://datapackage.org/profiles/2.0/dataresource.json(but seeresource.type) - A URL => Unsure, the referenced schema will likely point to Data Package v1, making it a v1
- Any other value => Unsure, not allowed by https://specs.frictionlessdata.io/profiles/
- There is also the edge case where
$schemais already present (i.e. a v1 package with a v2 resource). => Unsure, should the presentresource.$schemabe left as is then?
Add resource.type
Use resource.profile:
-
NULL=> don't set -
tabular-data-resource=>table - Any other value or URL => don't set
Other changes
- resource.sources: no change required
- resource.name: rules are relaxed, existing names can remain as is
- resource.path: dot-paths are now forbidden. In the edge case there is such a path provided, we should not convert it, because it is impossible to know what would be the correct path. These types of paths will be flagged when reading a resource.
- resource.encoding: allows more, no action required
For each dialect
Note that upconverting a dialect requires a remote one to be downloaded and verbosely included.
Add dialect.$schema
-
dialect.caseSensitiveHeaderis present =>https://datapackage.org/profiles/1.0/tabledialect.json -
dialect.csvddfVersionis present =>https://datapackage.org/profiles/1.0/tabledialect.json - Otherwise this can safely be set to
https://datapackage.org/profiles/2.0/tabledialect.json
Unsure about this though. For example, if a dialect was absent (very often the case), one will be added with just the $schema property. The alternative is to leave all dialects as v1 (assuming a $schema that defaults to https://datapackage.org/profiles/1.0/tabledialect.json). That would also mean that remote dialects can stay remote.
Other changes
- dialect.table: new property, no action required
For each schema
Note that upconverting a schema requires a remote one to be downloaded and verbosely included.
Add schema.$schema
- Set to
https://datapackage.org/profiles/2.0/tableschema.jsonbecause we will update the schema it to that version.
Update schema.primaryKey
- Convert from string to array.
Update schema.foreignKeys
- Convert
schema.foreignKeys.fieldsfrom string to array - Convert
schema.foreignKeys.reference[x].fieldsfrom string to array - If
schema.foreignKeys.reference[x].resource= resource name => remove property
No action required
- schema.missingValues: old format still valid, no action required
- schema.fieldMatch: this is
exactfor all v1, but that is also the default for this field, so no need to set it - schema.uniqueKeys: new property, no action required
For each field
Other changes
- field.categories: new property, no action => We can't assume that every field with an
enumshould be converted to a field withcategories. - fields.categoriesOrdered: new property, no action required
- fields.missingValues: new property, no action required
- integer field type:
groupCharis a new property, no action required - list field type: new property, no action required
- datetime field type: default format merely extends current one, no action required
- geopoint field type: documentation update, no action required
- any field type: no conversion needed, but frictionless needs to interpret differently when reading Do not guess
type = any, potentially provide opt-in #168 - min/max constraints: can now be used for duration, no action needed
- exclusiveMin/Max constraints: new property, no action required
- jsonSchema constraint: new property, no action required