-
Notifications
You must be signed in to change notification settings - Fork 0
Description
CPRD's metadata (also known as data specifications) contain all the info we want for this pipeline but they are in PDF. Some of the steps in the current repo, manually copy and paste the metadata from PDF to CSV (machine-readable format) - this is not ideal as it is prone to error, takes time, and is not too scalable (for example, if there is an update to the metadata).
It would be great to access machine-readable structural metadata, for example like the files hosted on the HDRUK Gateway however the file only contains 'column name' not 'field name' (and field name is the one that is in the data files), the order of the columns are different in the metadata compared to the real/synthetic data files, and we do not know if it is kept up to date.