Skip to content

Machine-readable metadata not available #25

@RayStick

Description

@RayStick

CPRD's metadata (also known as data specifications) contain all the info we want for this pipeline but they are in PDF. Some of the steps in the current repo, manually copy and paste the metadata from PDF to CSV (machine-readable format) - this is not ideal as it is prone to error, takes time, and is not too scalable (for example, if there is an update to the metadata).

It would be great to access machine-readable structural metadata, for example like the files hosted on the HDRUK Gateway however the file only contains 'column name' not 'field name' (and field name is the one that is in the data files), the order of the columns are different in the metadata compared to the real/synthetic data files, and we do not know if it is kept up to date.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededquestionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions