Skip to content

Comments

Fix content provider for Zenodo#1504

Open
rgaiacs wants to merge 3 commits intojupyterhub:mainfrom
rgaiacs:1503-zenodo
Open

Fix content provider for Zenodo#1504
rgaiacs wants to merge 3 commits intojupyterhub:mainfrom
rgaiacs:1503-zenodo

Conversation

@rgaiacs
Copy link
Contributor

@rgaiacs rgaiacs commented Feb 11, 2026

As reported in #1503, the content provider for Zenodo is not working as expected. I believe there were some changes in the Zenodo API but I could not find a reference to the changes.

The first change is that the URL in the hostname was https://zenodo.org/doi/ instead of https://zenodo.org/records/.

The second change is that the record ID was 18553140 instead of zenodo.18553140.

Before this pull request

$ repo2docker --no-build --debug 10.5281/zenodo.18553140
[Repo2Docker] Looking for repo2docker_config in /home/raniere/github.com/jupyterhub/repo2docker
Retrieving dataverse installations from https://iqss.github.io/dataverse-installations/data/data.jsonTraceback (most recent call last):
  File "/home/raniere/github.com/jupyterhub/repo2docker/.pixi/envs/default/bin/repo2docker", line 7, in <module>
    sys.exit(main())
             ~~~~^^
  File "/home/raniere/github.com/jupyterhub/repo2docker/repo2docker/__main__.py", line 476, in main
    r2d.start()
    ~~~~~~~~~^^
  File "/home/raniere/github.com/jupyterhub/repo2docker/repo2docker/app.py", line 850, in start
    self.build()
    ~~~~~~~~~~^^
  File "/home/raniere/github.com/jupyterhub/repo2docker/repo2docker/app.py", line 738, in build
    self.fetch(self.repo, self.ref, checkout_path)
    ~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raniere/github.com/jupyterhub/repo2docker/repo2docker/app.py", line 503, in fetch
    spec = cp.detect(url, ref=ref)
  File "/home/raniere/github.com/jupyterhub/repo2docker/repo2docker/contentproviders/mercurial.py", line 16, in detect
    subprocess.check_output(
    ~~~~~~~~~~~~~~~~~~~~~~~^
        ["hg", "identify", source, "--config", "extensions.hggit=!"]
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        + args_enabling_topic,
        ^^^^^^^^^^^^^^^^^^^^^^
        stderr=subprocess.DEVNULL,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/raniere/github.com/jupyterhub/repo2docker/.pixi/envs/default/lib/python3.13/subprocess.py", line 472, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
               **kwargs).stdout
               ^^^^^^^^^
  File "/home/raniere/github.com/jupyterhub/repo2docker/.pixi/envs/default/lib/python3.13/subprocess.py", line 554, in run
    with Popen(*popenargs, **kwargs) as process:
         ~~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raniere/github.com/jupyterhub/repo2docker/.pixi/envs/default/lib/python3.13/subprocess.py", line 1039, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                        pass_fds, cwd, env,
                        ^^^^^^^^^^^^^^^^^^^
    ...<5 lines>...
                        gid, gids, uid, umask,
                        ^^^^^^^^^^^^^^^^^^^^^^
                        start_new_session, process_group)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/raniere/github.com/jupyterhub/repo2docker/.pixi/envs/default/lib/python3.13/subprocess.py", line 1972, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'hg'

After this pull request

$ repo2docker --no-build --debug 10.5281/zenodo.18553140
[Repo2Docker] Looking for repo2docker_config in /home/raniere/github.com/jupyterhub/repo2docker
Picked Zenodo content provider.
Fetching Zenodo record 18553140.
Fetching Zenodo record 18553140 files.
Downloading file https://zenodo.org/api/records/18553140/files/InsightSoftwareConsortium/ITKElastix-v0.24.0.zip/content as InsightSoftwareConsortium/ITKElastix-v0.24.0.zip
Requesting https://zenodo.org/api/records/18553140/files/InsightSoftwareConsortium/ITKElastix-v0.24.0.zip/content
Creating /tmp/repo2dockerp35_7xyk/InsightSoftwareConsortium
Fetching InsightSoftwareConsortium/ITKElastix-v0.24.0.zip
...

Next steps

  • @manics could you review this pull request?

def fetch(self, spec, output_dir, yield_output=False):
"""Fetch and unpack a Zenodo record"""
record_id = spec["record"]
record_id = spec["record"].split(".")[1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @manics for the enlightenment.

Do you know why this happens?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we're assigning semantic meaning to something we shouldn't?

https://support.zenodo.org/help/en-gb/18-general/216-what-is-a-doi says

How to resolve a DOI?
...
Or you can directly resolve it by constructing the following link:
     https://doi.org/10.5281/zenodo.[record ID]

You may need to look into the comments or past PRs to work out why we're following a chain of URLs to get the record ID

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a bunch of digging into the doi -> zenodo record id while writing repoproviders and this comment summarizes that https://github.com/yuvipanda/repoproviders/blob/6759b84055584ad137e31579da3656d6c4c67593/src/repoproviders/resolvers/doi.py#L208

@yuvipanda
Copy link
Collaborator

I tested the provided use case on https://github.com/yuvipanda/repoproviders (which has been getting a lot of new development as I build out https://jupyterbook.pub) and it handles the test case fine. Me and @minrk have also been talking about what else is needed before we can try to integrate repoproviders here in yuvipanda/repoproviders#30. I don't think that should block this PR though

@rgaiacs
Copy link
Contributor Author

rgaiacs commented Feb 17, 2026

I expanded the test

test_hosts = [
(
[
"https://zenodo.org/record/3232985",
"10.5281/zenodo.3232985",
"https://doi.org/10.5281/zenodo.3232985",
],
{"host": test_zen.hosts[1], "record": "3232985"},
),
(
[
"https://data.caltech.edu/records/1235",
"10.22002/d1.1235",
"https://doi.org/10.22002/d1.1235",
],
{"host": test_zen.hosts[2], "record": "1235"},
),
]
@pytest.mark.parametrize("test_input,expected", test_hosts)
def test_detect_zenodo(test_input, expected):
# valid Zenodo DOIs trigger this content provider
assert Zenodo().detect(test_input[0]) == expected
assert Zenodo().detect(test_input[1]) == expected
assert Zenodo().detect(test_input[2]) == expected
but it is now failing. In https://github.com/jupyterhub/repo2docker/actions/runs/22105137388/job/63885553276?pr=1504 it failed with

  {'record': 'zenodo.18553140'} != {'record': '18553140'}

and in https://github.com/jupyterhub/repo2docker/actions/runs/22105506970/job/63886941250?pr=1504 it failed with

  {'record': '18553140'} != {'record': 'zenodo.18553140'}

My impression is that

assert Zenodo().detect(test_input[0]) == expected
assert Zenodo().detect(test_input[1]) == expected
assert Zenodo().detect(test_input[2]) == expected
are following different paths when resolving.

I will have a look later when I have time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Zenodo builds broken - can't resolve the DOI as zenodo

3 participants