Skip to content

DOI registry assumes md5 hashing algorithm #435

@ionathan

Description

@ionathan

Description of the problem:

While trying to load a registry from a DOI of dataverse.nl, I realized that they use SHA1. In pooch the hash algorithm is "fixed" to md5.

Full code that generated the error

import pooch
example = pooch.create(
    path=pooch.os_cache("example"),
    base_url="doi:10.34894/5SOKTV",
)
example.load_registry_from_doi()

Full error message

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[20], line 1
----> 1 example.load_registry_from_doi()

File /usr/local/lib/python3.11/site-packages/pooch/core.py:704, in Pooch.load_registry_from_doi(self)
    701 repository = doi_to_repository(doi)
    703 # Call registry population for this repository
--> 704 return repository.populate_registry(self)

File /usr/local/lib/python3.11/site-packages/pooch/downloaders.py:1162, in DataverseRepository.populate_registry(self, pooch)
   1151 """
   1152 Populate the registry using the data repository's API
   1153 
   (...)
   1157     The pooch instance that the registry will be added to.
   1158 """
   1160 for filedata in self.api_response.json()["data"]["latestVersion"]["files"]:
   1161     pooch.registry[filedata["dataFile"]["filename"]] = (
-> 1162         f"md5:{filedata['dataFile']['md5']}"
   1163     )

KeyError: 'md5'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugReport a problem that needs to be fixed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions