Skip to content

Linking StatDCAT-AP datasets to SDMX Dataflow #24

@ChLaaboudi

Description

@ChLaaboudi

As raised during the last webinar (10/03) by OECD, a discussion needs to explicitly consider the SDMX concept of a Dataflow, which is currently absent from the StatDCAT-AP model.

Notion of Dataflow

A Dataflow is "an abstract concept of the Data Sets, i.e. a ( structure without any data)

The SDMX Glossary defines a Dataflow as "an abstract concept of the Data Sets, i.e. a structure without any data". This aligns exactly with the DCAT-AP definition of dcat:Dataset as "a conceptual entity that represents the information published". The two definitions are semantically equivalent, and the Dataflow might therefore be the canonical SDMX source from which a dcat:Dataset description is derived. It corresponds to a Product in GSIM 2.0 Exchange Group,

A Dataflow associates a DSD with one or more Categories. This means the Dataflow is the bridge between:

  • the structural definition (DSD)
    • A Dataflow always points to exactly one DSD
  • the thematic organisation - in a Category Scheme - dcat:theme

A Dataflow always points to exactly one DSD (one DSD can have multiple dataflows)

Proposal for linking dcat:Dataset to its Dataflow

dct:conformsTo is proposed as a candidate property for Linking dcat:Dataset to its SDMX Dataflow.

A key question open for community feedback: where should dct:conformsTo be attached: dcat:Dataset or dcat:Distribution?

References: Usage of dct:conformsTo in DCAT and its extensions:

DCAT 3.0 (attached to dcat:Dataset or dcat:DataService)

  • Definition An established standard to which the described resource conforms.
  • Usage note: This property SHOULD be used to indicate the model, schema, ontology, view or profile that the cataloged resource content conforms to.
  • See 14.2.1 Conformance to a standard

DCAT-AP-3.0 (attached to dcat:Dataset or dcat:DataService)

  • Definiton (dcat:Dataset) : An implementing rule or other specification.
  • Definiton ((dcat:DatasService): An established (technical) standard to which the Data Service conforms.

HealthDCAT-AP (attached to dcat:Dataset or dcat:DataService and dcat:Distribution)

  • Definition (dcat:Distribution): Linked Schema: An established schema to which the described Distribution conforms

Good and bad practices

The Eurostat example below on data.europa.eu represents the Dataflow as a dcat:Distribution. This directly contradicts both the SDMX Glossary (a Dataflow has no data) and the DCAT-AP specification (a Distribution is a physical embodiment). StatDCAT-AP should provide explicit guidance to prevent this pattern.

Dataset

<http://data.europa.eu/88u/dataset/dpyj6oz3pcclvrsdkyp1w>
  rdf:type                **dcat:Dataset**;
  dct:title               "Intelligence Artificielle , par activité de la NACE Rév. 2"@fr , "Artificial intelligence by NACE Rev. 2 activity"@en , "Künstliche Intelligenz, nach Aktivitäten der NACE Rev.2"@de;
  dct:type                <http://publications.europa.eu/resource/authority/dataset-type/STATISTICAL>;
  adms:identifier         <https://doi.org/10.2908/ISOC_EB_AIN2>;
  dcat:distribution       <http://data.europa.eu/88u/distribution/b7e3e222-3189-4902-966e-c35aef83013b> , <http://data.europa.eu/88u/distribution/ee684489-260c-40bb-88fa-9d10b58aaacb> , <http://data.europa.eu/88u/distribution/f6714666-1c9d-4cfa-b402-959da5e0fb12> , <http://data.europa.eu/88u/distribution/e1ccb4e1-7789-46bb-8684-07b6f39f92f8> , <http://data.europa.eu/88u/distribution/85559983-36eb-4488-880a-5c57f442bb02>;

Distribution (dataflow)

 <http://data.europa.eu/88u/distribution/e1ccb4e1-7789-46bb-8684-07b6f39f92f8>
   rdf:type            d**cat:Distribution**;
   dct:compressformat  <http://publications.europa.eu/resource/authority/file-type/GZIP>;
   dct:format          <http://publications.europa.eu/resource/authority/file-type/XML>;
   dct:identifier      "https://ec.europa.eu/eurostat/api/dissemination**/sdmx/2.1/dataflow/ESTAT/isoc_eb_ain2**?references=descendants&detail=referencepartial&format=sdmx_2.1_generic&compressed=true";
   dct:license         <http://publications.europa.eu/resource/authority/licence/CC_BY_4_0>;
   dct:rights          <http://publications.europa.eu/resource/authority/access-right/PUBLIC>;
   dct:title           "Download metadata in SDMX 2.1 format"@en;
   dct:type            <http://publications.europa.eu/resource/authority/distribution-type/DOWNLOADABLE_FILE>;
   spdx:checksum       [ rdf:type            spdx:Checksum;
             spdx:algorithm      spdx:checksumAlgorithm_sha256;
             spdx:checksumValue  "caa364f06885eac84ac4f639da5efb694bb111769d079d8fd15ac8db752ba899"
   ];
  dcat:accessURL      <https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/isoc_eb_ain2?references=descendants&detail=referencepartial&format=sdmx_2.1_generic&compressed=true>;
  dcat:byteSize       "18407";
  dcat:downloadURL    <https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/dataflow/ESTAT/isoc_eb_ain2?references=descendants&detail=referencepartial&format=sdmx_2.1_generic&compressed=true>;
  dcat:mediaType      <http://www.iana.org/assignments/media-types/application/xml> .

Metadata

Metadata

Assignees

No one assigned

    Labels

    feedback-requestedCommunity feedback requestedrelease:3.0.0Actively being worked on for StatDCAT-AP 3.0.0type:featureA yet untackled problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions