-
Notifications
You must be signed in to change notification settings - Fork 84
Standardize requests made by DOIDownloaders
#514
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Respect user's decisions when defining the `DOIDownloader` with respect to arguments passed to `requests.get` whenever we call that function. This way, all calls made by `DOIDownloaders` and the repository classes make use of the same arguments, including `timeout`, `headers`, etc.
|
@dokempf, would you like to review this PR? I want to ensure that if a user defines a custom The only case where this is not going to work is on |
|
Ok, I'm merging this. I still think this is not the most elegant way of doing this, and I would like to revisit this design as part of #495. I'm annoyed by how many kwargs get passed all aroundl. I also don't like how some arguments ( Since this PR doesn't change the interface, we could rollback and improve it later. But based on the recent events (#502), I want to allow users to pass their own headers, even if the solution is not very elegant. |
|
Hey @santisoler. Sorry for being a bit too late here - I agree that the Zenodo situation required immediate action. I share your concerns with bloating the interface. One potential remedy that I proposed in the past was: We could expose a |
|
Hi @dokempf. No worries, no need to apologize. I had some time last week and wanted to fix this and make the release. You weren't late, I got rushed 😁 I like that idea of taking a |
|
@santisoler My students and me will join next week's community call if you have nothing else on the agenda. So we could discuss it then. |
|
Sure! You are more than welcome!! |
Bumps [pooch](https://github.com/fatiando/pooch) from 1.8.2 to 1.9.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/fatiando/pooch/releases">pooch's releases</a>.</em></p> <blockquote> <h2>v1.9.0</h2> <p>Released on: 2026/01/30</p> <p>DOI: <a href="https://doi.org/10.5281/zenodo.18379610">https://doi.org/10.5281/zenodo.18379610</a></p> <p>Breaking changes:</p> <ul> <li>Drop support for Python 3.7 and 3.8 (<a href="https://redirect.github.com/fatiando/pooch/pull/450">#450</a>).</li> </ul> <p>Bug fixes:</p> <ul> <li>Explicitly pass <code>filter</code> to <code>TarFile.extractall</code> on Python >=3.12 (<a href="https://redirect.github.com/fatiando/pooch/pull/458">#458</a>). Pass a <code>filter="data"</code> argument to <code>TarFile.extractall</code> to prevent dangerous security issues. The <code>filter</code> argument was added in Python 3.12, so only pass it on versions greater or equal than that. This change matches the default behaviour that will take place since Python 3.14.</li> <li>Fix TQDM usage (<a href="https://redirect.github.com/fatiando/pooch/pull/465">#465</a>). Newer versions of tqdm behave differently at a terminal vs in a jupyter notebook. Import from <code>tqdm.auto</code> instead so that the downloader looks right in either a notebook or the terminal.</li> <li>Fix bug in file hashing on FIPS enabled system (<a href="https://redirect.github.com/fatiando/pooch/pull/511">#511</a>). Set <code>userforsecurity=False</code> on <code>hashlib</code> hashing algorithms to make FIPS enabled systems happy.</li> </ul> <p>New features:</p> <ul> <li>Set User-Agent in requests headers for DOI downloaders (<a href="https://redirect.github.com/fatiando/pooch/pull/507">#507</a>). Pass a custom User-Agent when making requests through DOI downloaders in order to bypass limit rates imposed by services like Zenodo to block abusive requests. The can now filter requests coming from Pooch from the rest. Add a global <code>REQUESTS_HEADERS</code> variable that is used by the <code>doi_to_url</code> function (which requires to make a request to doi.org to figure out the service provider). Add a new <code>headers</code> argument to the <code>DOIDownloader</code> to specifically pass requests headers. By default it'll use the Pooch's default user agent.</li> <li>Extend support for Python 3.13 (<a href="https://redirect.github.com/fatiando/pooch/pull/451">#451</a>) and Python 3.14 (<a href="https://redirect.github.com/fatiando/pooch/pull/505">#505</a>).</li> <li>Provide more descriptive errors when DOI request fails (<a href="https://redirect.github.com/fatiando/pooch/pull/477">#477</a>). Raise the <code>requests</code> response to provide more informative errors when the status code is between 400 and 600.</li> </ul> <p>Maintenance:</p> <ul> <li>Add testing data to the package distributions (<a href="https://redirect.github.com/fatiando/pooch/pull/421">#421</a>). The test code <code>pooch/tests</code> is installed but he data in <code>pooch/tests/data</code> are not. This makes it impossible to run tests on the installed package. Add the appropriate setuptools configuration to make it happen.</li> <li>Move push to codecov to its own job in Actions (<a href="https://redirect.github.com/fatiando/pooch/pull/424">#424</a>). Remove the push to codecov step from the <code>test</code> job into a new job that depends on the test job. Upload the coverage reports as artifacts after testing, and reuse the artifacts in the new job. Upload all coverage reports in a single push to Codecov to minimize the number of hits.</li> <li>Increase the max positional args allowed by pylint (<a href="https://redirect.github.com/fatiando/pooch/pull/438">#438</a>). Configure <code>pylint</code> to increase the maximum number of positional arguments allowed in any function or method.</li> <li>Replace usage of <code>pkg_resources</code> for <code>importlib.resources</code> (<a href="https://redirect.github.com/fatiando/pooch/pull/449">#449</a>).</li> <li>Add mypy to CI job and type hints for one class. (<a href="https://redirect.github.com/fatiando/pooch/pull/404">#404</a>). Add type hints to <code>pooch/core.py</code> and create a new <code>typing</code> submodule for custom type classes, and add it to the API Reference. Run <code>mypy</code> on CI to perform type checks, and create new targets in the <code>Makefile</code>. Extend the list of dependencies required to run the type checks.</li> <li>Add pytest <code>figshare</code> mark to tests (<a href="https://redirect.github.com/fatiando/pooch/pull/481">#481</a>). Add a pytest <code>figshare</code> mark to tests that make requests to Figshare. Such mark allows us to filter tests: use <code>pytest -v -m figshare</code> to only run tests with that mark, or use <code>pytest -v -m "not figshare</code> to run all test but the marked ones.</li> <li>Skip Figshare related tests on Actions under MacOS (<a href="https://redirect.github.com/fatiando/pooch/pull/482">#482</a>). Skip tests marked with <code>figshare</code> on Actions that use MacOS as runner. Those tests in CI were constantly failing, probably due to too many requests coming from GitHub. Add an optional <code>PYTEST_ARGS_EXTRA</code> variable to <code>Makefile</code> that can be used to pass extra arguments to <code>pytest</code>. Skip doctests that download files from Figshare.</li> <li>List requirements to run type checks in new file (<a href="https://redirect.github.com/fatiando/pooch/pull/492">#492</a>). Create a new <code>env/requirements-types.txt</code> file with the list of required packages to run types checks. This file is used by the GitHub Action workflow that automatically runs the type checks. List new requirements for type checks in <code>environment.yml</code>. Stop ignoring missing imports of <code>xxhash</code> in <code>pyproject.toml</code>. Ignore type assignment for <code>xxhash</code> in test file.</li> <li>Fix uploads of coverage reports to codecov (<a href="https://redirect.github.com/fatiando/pooch/pull/496">#496</a>). Checkout the repository in the <code>codecov-upload</code> job before uploading the coverage reports to codecov.</li> <li>Pin black to v25 (<a href="https://redirect.github.com/fatiando/pooch/pull/506">#506</a>). Pin black version used in the <code>environment.yml</code> and to run style checks on CI to <code>25.*.*</code> and <code><26.0.0</code>, respectively. Since we plan to replace black with Ruff for autoformatting, it's better to pin for now than reformat it with latest version.</li> <li>Only run tests with network access on some CI jobs (<a href="https://redirect.github.com/fatiando/pooch/pull/484">#484</a>). Our CI is continuously hitting some external network providers which is causing some of them (mostly figshare for now) to block our traffic. This means that our CI fails randomly and it's annoying. Only run network tests on jobs with the latest Python and optional dependencies installed to try to mitigate this.</li> <li>Use a SPDX expression for license in <code>pyproject.toml</code> (<a href="https://redirect.github.com/fatiando/pooch/pull/476">#476</a>). Use a SPDX expression for the license in <code>pyproject.toml</code> and remove the unneeded license classifier. This removes the warnings we were getting after running <code>make build</code>.</li> <li>Add <code>Typing :: Typed</code> trove classifier (<a href="https://redirect.github.com/fatiando/pooch/pull/472">#472</a>). Allow PyPI users know that Pooch supports type hints.</li> <li>Allow to manually trigger test job in Actions (<a href="https://redirect.github.com/fatiando/pooch/pull/475">#475</a>). Add <code>workflow_dispatch</code> as an event trigger for the <code>test.yml</code> workflow.</li> <li>Standardize requests made by <code>DOIDownloaders</code> (<a href="https://redirect.github.com/fatiando/pooch/pull/514">#514</a>). Respect user's decisions when defining the <code>DOIDownloader</code> with respect to arguments passed to <code>requests.get</code> whenever we call that function. This way, all calls made by <code>DOIDownloaders</code> and the repository classes make use of the same arguments, including <code>timeout</code>, <code>headers</code>, etc.</li> </ul> <p>Documentation:</p> <ul> <li>Add a link to the Fatiando Forum in the README (<a href="https://redirect.github.com/fatiando/pooch/pull/461">#461</a>).</li> <li>Add <code>scXpand</code> (<a href="https://redirect.github.com/fatiando/pooch/pull/488">#488</a>), <code>xclim</code> (<a href="https://redirect.github.com/fatiando/pooch/pull/445">#445</a>), <code>CLISOPS</code> (<a href="https://redirect.github.com/fatiando/pooch/pull/445">#445</a>), and <code>SPLASH</code> (<a href="https://redirect.github.com/fatiando/pooch/pull/432">#432</a>) to list of projects using Pooch.</li> </ul> <p>Contributors:</p> <ul> <li>Adam Boesky</li> <li>Antonio Valentino</li> <li>Daniel McCloy</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/fatiando/pooch/blob/main/doc/changes.rst">pooch's changelog</a>.</em></p> <blockquote> <h2>Version 1.9.0</h2> <p>Released on: 2026/01/30</p> <p>DOI: <a href="https://doi.org/10.5281/zenodo.18379610">https://doi.org/10.5281/zenodo.18379610</a></p> <p>Breaking changes:</p> <ul> <li>Drop support for Python 3.7 and 3.8 (<code>[#450](fatiando/pooch#450) <https://github.com/fatiando/pooch/pull/450></code>__).</li> </ul> <p>Bug fixes:</p> <ul> <li>Explicitly pass <code>filter</code> to <code>TarFile.extractall</code> on Python >=3.12 (<code>[#458](fatiando/pooch#458) <https://github.com/fatiando/pooch/pull/458></code>__). Pass a <code>filter="data"</code> argument to <code>TarFile.extractall</code> to prevent dangerous security issues. The <code>filter</code> argument was added in Python 3.12, so only pass it on versions greater or equal than that. This change matches the default behaviour that will take place since Python 3.14.</li> <li>Fix TQDM usage (<code>[#465](fatiando/pooch#465) <https://github.com/fatiando/pooch/pull/465></code>__). Newer versions of tqdm behave differently at a terminal vs in a jupyter notebook. Import from <code>tqdm.auto</code> instead so that the downloader looks right in either a notebook or the terminal.</li> <li>Fix bug in file hashing on FIPS enabled system (<code>[#511](fatiando/pooch#511) <https://github.com/fatiando/pooch/pull/511></code>__). Set <code>userforsecurity=False</code> on <code>hashlib</code> hashing algorithms to make FIPS enabled systems happy.</li> </ul> <p>New features:</p> <ul> <li>Set User-Agent in requests headers for DOI downloaders (<code>[#507](fatiando/pooch#507) <https://github.com/fatiando/pooch/pull/507></code>__). Pass a custom User-Agent when making requests through DOI downloaders in order to bypass limit rates imposed by services like Zenodo to block abusive requests. The can now filter requests coming from Pooch from the rest. Add a global <code>REQUESTS_HEADERS</code> variable that is used by the <code>doi_to_url</code> function (which requires to make a request to doi.org to figure out the service provider). Add a new <code>headers</code> argument to the <code>DOIDownloader</code> to specifically pass requests headers. By default it’ll use the Pooch’s default user agent.</li> <li>Extend support for Python 3.13 (<code>[#451](fatiando/pooch#451) <https://github.com/fatiando/pooch/pull/451></code><strong>) and Python 3.14 (<code>[#505](fatiando/pooch#505) <https://github.com/fatiando/pooch/pull/505></code></strong>).</li> <li>Provide more descriptive errors when DOI request fails (<code>[#477](fatiando/pooch#477) <https://github.com/fatiando/pooch/pull/477></code>__). Raise the <code>requests</code> response to provide more informative errors when the status code is between 400 and 600.</li> </ul> <p>Maintenance:</p> <ul> <li>Add testing data to the package distributions (<code>[#421](fatiando/pooch#421) <https://github.com/fatiando/pooch/pull/421></code>__). The test code <code>pooch/tests</code> is installed but he data in <code>pooch/tests/data</code> are not. This makes it impossible to run tests on the installed package. Add the appropriate setuptools configuration to make it happen.</li> <li>Move push to codecov to its own job in Actions (<code>[#424](fatiando/pooch#424) <https://github.com/fatiando/pooch/pull/424></code>__). Remove the push to codecov step from the <code>test</code> job into a new job that depends on the test job. Upload the coverage reports as artifacts after testing, and reuse the artifacts in the new job. Upload all coverage reports in a single push to Codecov to minimize the number of hits.</li> <li>Increase the max positional args allowed by pylint (<code>[#438](fatiando/pooch#438) <https://github.com/fatiando/pooch/pull/438></code>__). Configure <code>pylint</code> to increase the maximum number of positional arguments allowed in any function or method.</li> <li>Replace usage of <code>pkg_resources</code> for <code>importlib.resources</code> (<code>[#449](fatiando/pooch#449) <https://github.com/fatiando/pooch/pull/449></code>__).</li> <li>Add mypy to CI job and type hints for one class. (<code>[#404](fatiando/pooch#404) <https://github.com/fatiando/pooch/pull/404></code>__). Add type hints to <code>pooch/core.py</code> and create a new <code>typing</code> submodule for custom type classes, and add it to the API Reference. Run <code>mypy</code> on CI to perform type checks, and create new targets in the <code>Makefile</code>. Extend the list of dependencies required to run the type checks.</li> <li>Add pytest <code>figshare</code> mark to tests (<code>[#481](fatiando/pooch#481) <https://github.com/fatiando/pooch/pull/481></code>__). Add a pytest <code>figshare</code> mark to tests that make requests to Figshare. Such mark allows us to filter tests: use <code>pytest -v -m figshare</code> to only run tests with that mark, or use <code>pytest -v -m "not figshare</code> to run all test but the marked ones.</li> <li>Skip Figshare related tests on Actions under MacOS (<code>[#482](fatiando/pooch#482) <https://github.com/fatiando/pooch/pull/482></code>__). Skip tests marked with <code>figshare</code> on Actions that use MacOS as runner. Those tests in CI were constantly failing, probably due to too many requests coming from GitHub. Add an optional <code>PYTEST_ARGS_EXTRA</code> variable to <code>Makefile</code> that can be used to pass extra arguments to <code>pytest</code>. Skip doctests that download files from Figshare.</li> <li>List requirements to run type checks in new file (<code>[#492](fatiando/pooch#492) <https://github.com/fatiando/pooch/pull/492></code>__). Create a new <code>env/requirements-types.txt</code> file with the list of required packages to run types checks. This file is used by the GitHub Action workflow that automatically runs the type checks. List new requirements for type checks in <code>environment.yml</code>. Stop ignoring missing imports of <code>xxhash</code> in <code>pyproject.toml</code>. Ignore type assignment for <code>xxhash</code> in test file.</li> <li>Fix uploads of coverage reports to codecov (<code>[#496](fatiando/pooch#496) <https://github.com/fatiando/pooch/pull/496></code>__). Checkout the repository in the <code>codecov-upload</code> job before uploading the coverage reports to codecov.</li> <li>Pin black to v25 (<code>[#506](fatiando/pooch#506) <https://github.com/fatiando/pooch/pull/506></code>__). Pin black version used in the <code>environment.yml</code> and to run style checks on CI to <code>25.*.*</code> and <code><26.0.0</code>, respectively. Since we plan to replace black with Ruff for autoformatting, it’s better to pin for now than reformat it with latest version.</li> <li>Only run tests with network access on some CI jobs (<code>[#484](fatiando/pooch#484) <https://github.com/fatiando/pooch/pull/484></code>__). Our CI is continuously hitting some external network providers which is causing some of them (mostly figshare for now) to block our traffic. This means that our CI fails randomly and it’s annoying. Only run network tests on jobs with the latest Python and optional dependencies installed to try to mitigate this.</li> <li>Use a SPDX expression for license in <code>pyproject.toml</code> (<code>[#476](fatiando/pooch#476) <https://github.com/fatiando/pooch/pull/476></code>__). Use a SPDX expression for the license in <code>pyproject.toml</code> and remove the unneeded license classifier. This removes the warnings we were getting after running <code>make build</code>.</li> <li>Add <code>Typing :: Typed</code> trove classifier (<code>[#472](fatiando/pooch#472) <https://github.com/fatiando/pooch/pull/472></code>__). Allow PyPI users know that Pooch supports type hints.</li> <li>Allow to manually trigger test job in Actions (<code>[#475](fatiando/pooch#475) <https://github.com/fatiando/pooch/pull/475></code>__). Add <code>workflow_dispatch</code> as an event trigger for the <code>test.yml</code> workflow.</li> <li>Standardize requests made by <code>DOIDownloaders</code> (<code>[#514](fatiando/pooch#514) <https://github.com/fatiando/pooch/pull/514></code>__). Respect user’s decisions when defining the <code>DOIDownloader</code> with respect to arguments passed to <code>requests.get</code> whenever we call that function. This way, all calls made by <code>DOIDownloaders</code> and the repository classes make use of the same arguments, including <code>timeout</code>, <code>headers</code>, etc.</li> </ul> <p>Documentation:</p> <ul> <li>Add a link to the Fatiando Forum in the README (<code>[#461](fatiando/pooch#461) <https://github.com/fatiando/pooch/pull/461></code>__).</li> <li>Add <code>scXpand</code> (<code>[#488](fatiando/pooch#488) <https://github.com/fatiando/pooch/pull/488></code><strong>), <code>xclim</code> (<code>[#445](fatiando/pooch#445) <https://github.com/fatiando/pooch/pull/445></code></strong>), <code>CLISOPS</code> (<code>[#445](fatiando/pooch#445) <https://github.com/fatiando/pooch/pull/445></code><strong>), and <code>SPLASH</code> (<code>[#432](fatiando/pooch#432) <https://github.com/fatiando/pooch/pull/432></code></strong>) to list of projects using Pooch.</li> </ul> <p>This release contains contributions from:</p> <ul> <li>Adam Boesky</li> <li>Antonio Valentino</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="https://github.com/fatiando/pooch/commit/6aab6f90569774d335edb7197729005c9e99f7c1"><code>6aab6f9</code></a> Add changelog for Pooch v1.9.0 (<a href="https://redirect.github.com/fatiando/pooch/issues/517">#517</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/2932f3407131697171b007c4a97f3586250c411f"><code>2932f34</code></a> Standardize requests made by <code>DOIDownloaders</code> (<a href="https://redirect.github.com/fatiando/pooch/issues/514">#514</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/d2b547edcb3d10d68451e8951d6aceaeb502302f"><code>d2b547e</code></a> Bump actions/checkout from 4 to 6 (<a href="https://redirect.github.com/fatiando/pooch/issues/515">#515</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/e33707dce81f77a23dfecb457b0f796bdbb4dc3a"><code>e33707d</code></a> Update Santi's affiliation in AUTHORS.md (<a href="https://redirect.github.com/fatiando/pooch/issues/513">#513</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/e7e59e91f5009d05d8184bf325bed963f724ca36"><code>e7e59e9</code></a> Fix bug in file hashing on FIPS enabled system (<a href="https://redirect.github.com/fatiando/pooch/issues/511">#511</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/27e3ab2a686040554068a1a08a433588c6524aed"><code>27e3ab2</code></a> Fix TQDM usage (<a href="https://redirect.github.com/fatiando/pooch/issues/465">#465</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/d9a82e6c7b5ca39b2e4c9207fd4da3cd9552c74e"><code>d9a82e6</code></a> Allow to manually trigger test job in Actions (<a href="https://redirect.github.com/fatiando/pooch/issues/475">#475</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/00c6cea6328c9c16c32bba20c01dc02e83339da5"><code>00c6cea</code></a> Add <code>Typing :: Typed</code> trove classifier (<a href="https://redirect.github.com/fatiando/pooch/issues/472">#472</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/f4d32da438a18266a7431ed19d7f6f44c0d28cb0"><code>f4d32da</code></a> Use a SPDX expression for license in <code>pyproject.toml</code> (<a href="https://redirect.github.com/fatiando/pooch/issues/476">#476</a>)</li> <li><a href="https://github.com/fatiando/pooch/commit/cddaac9db356d62703f5c5dfcbccc6d4006df5e6"><code>cddaac9</code></a> Bump actions/download-artifact from 4 to 7 (<a href="https://redirect.github.com/fatiando/pooch/issues/478">#478</a>)</li> <li>Additional commits viewable in <a href="https://github.com/fatiando/pooch/compare/v1.8.2...v1.9.0">compare view</a></li> </ul> </details> <br /> [](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Respect user's decisions when defining the
DOIDownloaderwith respect to arguments passed torequests.getwhenever we call that function. This way, all calls made byDOIDownloadersand the repository classes make use of the same arguments, includingtimeout,headers, etc.Relevant issues/PRs:
Fixes #508
Inspired by #502 (comment)