-
Notifications
You must be signed in to change notification settings - Fork 188
Labels
bugSomething isn't workingSomething isn't working
Description
Package version (if known): latest
Describe the bug
The invenio vocabularies convert command for the names vocabulary randomly outputs full ORCID record data instead of a normalized names vocabulary entry.
Steps to Reproduce
- Download ORCID gigantic dump
- Convert the dump by following the [steps here], which will take a day or so.(https://inveniordm.docs.cern.ch/operate/customize/vocabularies/names/#creating-a-namesyaml-file)
- Inspect the generated
head -n 1000 names_dump_2025.yaml > test.yaml - Notice the large file size "13.58" GB, usually it should be around 4 GB
- Observe entries containing raw ORCID record fields (e.g. @ xmlns:*, activities-summary, history) instead of name fields.
Expected behavior
Each ORCID record is converted into a normalized names vocabulary entry containing only name fields, identifiers, and optional affiliations, or is skipped if required name data is missing.
Screenshots (if applicable)
Additional context
Command used:
invenio vocabularies convert \
--vocabulary names \
--origin /path/to/ORCID_2025_summaries.tar.gz \
--target names.yamlThis commit might be related to the issue: inveniosoftware/invenio-vocabularies@fb68221
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working
Type
Projects
Status
No status