Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
151 commits
Select commit Hold shift + click to select a range
11d3f47
changelog: cover 0.2.1
Gallaecio Jul 29, 2022
9370310
Merge pull request #26 from Gallaecio/0.2.1-release-notes
kmike Jul 29, 2022
8c8ffd8
Set a release date for 0.2.1
Gallaecio Jul 29, 2022
f16b99c
Bump version: 0.2.0 → 0.2.1
Gallaecio Jul 29, 2022
4a0cd62
clean up AggStats
kmike Jul 29, 2022
aaa49ea
Merge pull request #27 from zytedata/stats-cleanup
kmike Jul 29, 2022
4383067
Changelog for 0.3.0 (#28)
kmike Jul 29, 2022
cb77806
Set a release date for 0.3.0
kmike Jul 29, 2022
095e47e
Bump version: 0.2.1 → 0.3.0
kmike Jul 29, 2022
6350223
Zyte Data API → Zyte API
Gallaecio Sep 16, 2022
dd5e9f8
Merge pull request #30 from Gallaecio/zyte-api
kmike Sep 16, 2022
5b943cb
add brotli as a dependency
BurnzZ Sep 19, 2022
eaf9208
simplify brotli declaration in headers
BurnzZ Sep 20, 2022
3966563
Merge pull request #31 from zytedata/add-brotli
BurnzZ Sep 20, 2022
15a3e40
update CHANGES.rst with 0.4.0 changes
BurnzZ Sep 20, 2022
37a126f
Bump version: 0.3.0 → 0.4.0
BurnzZ Sep 20, 2022
520b9ce
Merge pull request #32 from zytedata/new-version
kmike Sep 20, 2022
d72f7af
Network error retry time: 5 minutes → 15 minutes
Gallaecio Oct 13, 2022
c759db2
Merge pull request #33 from zytedata/Gallaecio-patch-2
kmike Oct 13, 2022
f404b89
0.4.1 release notes (#34)
Gallaecio Oct 16, 2022
1ac47c6
Bump version: 0.4.0 → 0.4.1
Gallaecio Oct 16, 2022
92c5943
Update the minimum aiohttp version (#36)
kmike Oct 28, 2022
119d710
declare Python 3.11 support; bump mypy version (just in case)
kmike Oct 28, 2022
e893a6d
changelog for 0.4.2
kmike Oct 28, 2022
b2edeac
Bump version: 0.4.1 → 0.4.2
kmike Oct 28, 2022
e4f95a8
RFC-3986-encode URLs
Gallaecio Nov 1, 2022
17b7dad
Union[bytes, str] → AnyStr
Gallaecio Nov 1, 2022
a3703b4
URL character escaping: RFC-3986 → RFC-2396
Gallaecio Nov 3, 2022
58086f5
Don't reuse the connections.
kmike Nov 9, 2022
4beeb8b
Merge pull request #39 from zytedata/dont-reuse-connections
kmike Nov 10, 2022
84f0937
set release date
kmike Nov 10, 2022
d15915c
Bump version: 0.4.2 → 0.4.3
kmike Nov 10, 2022
954181d
Use w3lib.url.safe_url_string to make URLs safe
Gallaecio Nov 28, 2022
f058d26
Remove unused imports
Gallaecio Nov 28, 2022
a895bc7
Make sure that URL processing does not remove fragments
Gallaecio Nov 29, 2022
1dc76fa
Merge pull request #37 from Gallaecio/ensure-valid-urls
kmike Nov 29, 2022
0caeedd
allow to set custom retrying for the AsyncClient
kmike Nov 30, 2022
4206785
CLI: allow to disable retrying of network and request errors
kmike Nov 30, 2022
6b93625
fix the logic
kmike Nov 30, 2022
066d1dc
Merge pull request #40 from zytedata/more-flexible-error-handling
kmike Dec 1, 2022
405948b
changelog for 0.4.4
kmike Dec 1, 2022
6168c8e
Merge pull request #41 from zytedata/0.4.4-changelog
kmike Dec 1, 2022
c18582d
set release date
kmike Dec 1, 2022
6f13566
Bump version: 0.4.3 → 0.4.4
kmike Dec 1, 2022
08fd0c2
fix tox4 support
kmike Dec 23, 2022
16e9df5
remove unused "requests" from install_requires
kmike Dec 23, 2022
ccef2ea
require w3lib 2.1.1, which is needed to escape URLs properly
kmike Dec 23, 2022
b26470d
changelog
kmike Dec 23, 2022
806319e
Merge pull request #42 from zytedata/cleanup
kmike Jan 2, 2023
641649e
fixed tox.ini
kmike Jan 3, 2023
845e001
Merge pull request #43 from zytedata/fix-tox-again
BurnzZ Jan 3, 2023
0eacc23
set release date
kmike Jan 3, 2023
5d561c4
Bump version: 0.4.4 → 0.4.5
kmike Jan 3, 2023
158b2eb
Cover the api_key parameter in the asyncio API page
Gallaecio Aug 25, 2023
a09748c
Merge pull request #46 from Gallaecio/aio-api-key
kmike Aug 25, 2023
1407682
API_TIMEOUT: 60s →240s
Gallaecio Sep 21, 2023
09da463
Merge pull request #48 from zytedata/update-server-timeout
kmike Sep 21, 2023
a902a39
Cover 0.4.6 in the changelog
Gallaecio Sep 21, 2023
d7b9ad8
Update API_TIMEOUT
Gallaecio Sep 21, 2023
6c91694
Merge pull request #49 from Gallaecio/release-notes
kmike Sep 25, 2023
c905219
Update the release date of 0.4.6
Gallaecio Sep 26, 2023
67f7365
Bump version: 0.4.5 → 0.4.6
Gallaecio Sep 26, 2023
ad7e3a8
Allow overriding the user agent (#50)
PyExplorer Sep 26, 2023
51a61ea
changelog for 0.4.7
PyExplorer Sep 26, 2023
3af5db6
make the description shorter
PyExplorer Sep 26, 2023
40b9bc1
fix formatting
PyExplorer Sep 26, 2023
e608c91
Merge pull request #51 from zytedata/0.4.7-changelog
PyExplorer Sep 26, 2023
8547d64
Update the release date of 0.4.7
PyExplorer Sep 26, 2023
91b141e
Bump version: 0.4.6 → 0.4.7
PyExplorer Sep 26, 2023
0e684a3
add the ZAPI request id on RequestError message
BurnzZ Oct 24, 2023
c71fc95
RequestError: add request_id as an attribute and str representation
BurnzZ Oct 24, 2023
e288c75
revert previous code changes for a simpler approach
BurnzZ Oct 24, 2023
07cc970
Merge pull request #52 from zytedata/request-id-error
kmike Oct 24, 2023
3d82528
remove undefined request_id variable
BurnzZ Oct 25, 2023
550c451
Merge pull request #54 from zytedata/request-id-fix
kmike Oct 25, 2023
09d1366
Bump version: 0.4.7 → 0.4.8
BurnzZ Nov 2, 2023
094632f
update release date for 0.4.8
BurnzZ Nov 2, 2023
764f745
Merge pull request #55 from zytedata/v0.4.8
BurnzZ Nov 2, 2023
2b6c779
avoid Sphinx format to prevent PyPI RST error
BurnzZ Nov 2, 2023
d13737f
Add .readthedocs.yml (#56)
Gallaecio Dec 19, 2023
c7b05ea
remove Python 3.7 support
BurnzZ Jan 23, 2024
35349d6
Merge pull request #53 from zytedata/remove-3.7
kmike Feb 1, 2024
931a810
Provide an option to store error responses in CLI (#47)
adnan-awan Feb 1, 2024
5b94c90
Replace AsyncClient with AsyncZyteAPI
Gallaecio Mar 13, 2024
794083c
Fix typing
Gallaecio Mar 13, 2024
eafd59f
Clarify that iter does not yield in the original order
Gallaecio Mar 14, 2024
95b85e7
Reuse code
Gallaecio Mar 14, 2024
2968343
Revert iter to an iterator
Gallaecio Mar 14, 2024
4304637
Restore Python 3.8 support
Gallaecio Mar 14, 2024
5163ee6
Clarify the reason for the use of TYPE_CHECKING
Gallaecio Mar 14, 2024
01a5a2b
Merge pull request #60 from Gallaecio/clean-async-api
kmike Mar 15, 2024
d31e2d9
Use pre-commit (#64)
Gallaecio Mar 19, 2024
bf6e901
Include basic usage examples in the README (#61)
Gallaecio Mar 19, 2024
1b5a61b
Cleanups (#65)
Gallaecio Mar 19, 2024
533a417
Add a sync API (#58)
Gallaecio Mar 20, 2024
e14d556
Simplify the iter example, provide a session-specific example later (…
Gallaecio Mar 20, 2024
55766c7
Use a client semaphore (#63)
Gallaecio Mar 20, 2024
b0d1ee7
Implement AsyncZyteAPI.session (#62)
Gallaecio Mar 20, 2024
f36cab9
Complete test coverage (#66)
Gallaecio Mar 22, 2024
914d5be
Refactor docs after recent changes (#67)
Gallaecio Apr 5, 2024
3d5d0fd
Release notes (#68)
Gallaecio Apr 5, 2024
f7ba12e
Set the release date for 0.5.0
Gallaecio Apr 5, 2024
85e9ab5
Bump version: 0.4.8 → 0.5.0
Gallaecio Apr 5, 2024
396f260
Remove the changelog from the PyPI description
Gallaecio Apr 5, 2024
9968ac4
ReadTheDocs: do not fail on warnings
Gallaecio Apr 5, 2024
1baecba
Add session.close(), remove internal _context (#69)
Gallaecio Apr 16, 2024
ecfbb5b
Release notes for 0.5.1 (#70)
Gallaecio Apr 16, 2024
7bba2ea
Set the release date of 0.5.1
Gallaecio Apr 16, 2024
22d0444
Bump version: 0.5.0 → 0.5.1
Gallaecio Apr 16, 2024
e91d3a4
Add conservative_retrying
Gallaecio Apr 26, 2024
413dd4b
Keep mypy happy
Gallaecio Apr 26, 2024
48acc00
Conservative → Aggresive
Gallaecio Apr 30, 2024
985827e
is_maybe_temporary_error → _maybe_temporary_error
Gallaecio Apr 30, 2024
aca5fc6
Fix the description of the default retry policy
Gallaecio Apr 30, 2024
2a7d19e
Add tests for the attempt-based limits of the default retry policy
Gallaecio Apr 30, 2024
9ed623d
Reconfigure Codecov
Gallaecio Apr 30, 2024
b7ff419
Lower aggresive retrying of temporary download errors from 16 to 8 at…
Gallaecio May 7, 2024
41a8d69
Ignore rate-limiting errors when counting max temporary download errors
Gallaecio May 7, 2024
439cac9
Only stop retries for network errors after 15 uninterrupted minutes o…
Gallaecio May 8, 2024
9878e10
Implement a custom download error handling for the aggressive retry p…
Gallaecio May 8, 2024
4d4688d
Retry undocumented 5xx errors up to 4 times, not counting rate-limiti…
Gallaecio May 9, 2024
b71edb0
Ignore mypy complaints about RetryCallState custom attributes added a…
Gallaecio May 9, 2024
bba313d
Clean up APIs for easier subclassing, and update retry docs
Gallaecio May 9, 2024
5f47702
Implement RequestError.query
Gallaecio May 10, 2024
9e5bb74
Merge pull request #73 from Gallaecio/exception-request
kmike May 10, 2024
7cf8423
Improve the wording around custom retry policy usage
Gallaecio May 10, 2024
d5d1740
Raise ValueError if retrying does not get an instance of AsyncRetrying
Gallaecio May 10, 2024
adf4f10
Fix typo: uninterrumpted → uninterrupted
Gallaecio May 10, 2024
f51c561
Avoid using id(self)
Gallaecio May 10, 2024
ba71081
Merge remote-tracking branch 'zytedata/main' into conservative-retrying
Gallaecio May 10, 2024
2f533a7
Add missing parameter
Gallaecio May 10, 2024
99f8523
Release notes for 0.5.2 (#74)
Gallaecio May 10, 2024
452f711
Set the release date for 0.5.2
Gallaecio May 10, 2024
51e59ea
Bump version: 0.5.1 → 0.5.2
Gallaecio May 10, 2024
f9a8c26
Concentrate retry tests and complete coverage
Gallaecio May 10, 2024
b5b55b1
undocumented_error_stop: stop_on_uninterrupted_status → stop_on_count
Gallaecio May 15, 2024
1e791d5
Update the docs
Gallaecio May 15, 2024
5638ce9
Update test expectations
Gallaecio May 15, 2024
d87c0e6
Remove leftovers
Gallaecio May 28, 2024
221eb5c
Merge pull request #71 from zytedata/conservative-retrying
kmike May 29, 2024
9ca2fc5
Release notes for 0.6.0 (#75)
Gallaecio May 29, 2024
b25fb2b
Bump version: 0.5.2 → 0.6.0
Gallaecio May 29, 2024
011215d
Update Python versions.
wRAR Oct 14, 2024
c69fe9e
Bump tool versions.
wRAR Oct 14, 2024
4132ced
Roll back RTD Python
wRAR Oct 15, 2024
b95d703
Update docs referebcesto docs.zyte.com.
wRAR Oct 16, 2024
1dcb8e0
Merge pull request #76 from zytedata/modernize
wRAR Oct 16, 2024
9d60a9a
Make the default retry policy the aggressive one with half the attempts
Gallaecio Dec 30, 2024
d052ed1
Initial stab at circuit break for undocumented erors
Gallaecio Jan 3, 2025
2d2c56f
Remove unnecessary comments
Gallaecio Jan 3, 2025
6394d94
unquote type hints
Gallaecio Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.2.0
current_version = 0.6.0
commit = True
tag = True
tag_name = {new_version}
Expand Down
6 changes: 6 additions & 0 deletions .coveragerc
Original file line number Diff line number Diff line change
@@ -1,2 +1,8 @@
[run]
branch = true

[report]
# https://github.com/nedbat/coveragepy/issues/831#issuecomment-517778185
exclude_lines =
pragma: no cover
if TYPE_CHECKING:
6 changes: 3 additions & 3 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,11 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: '3.x'
python-version: '3.13'
- name: Install dependencies
run: |
python -m pip install --upgrade pip
Expand Down
16 changes: 9 additions & 7 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ jobs:
strategy:
fail-fast: false
matrix:
python-version: ['3.7', '3.8', '3.9', '3.10']
python-version: ['3.9', '3.10', '3.11', '3.12', '3.13']

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
Expand All @@ -33,20 +33,22 @@ jobs:
tox -e py
- name: coverage
if: ${{ success() }}
run: bash <(curl -s https://codecov.io/bash)
uses: codecov/[email protected]
with:
token: ${{ secrets.CODECOV_TOKEN }}

check:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ['3.10']
python-version: ['3.12'] # Keep in sync with .readthedocs.yml
tox-job: ["mypy", "docs"]

steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python-version }}
- name: Install dependencies
Expand Down
19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
repos:
- repo: https://github.com/PyCQA/isort
rev: 5.13.2
hooks:
- id: isort
- repo: https://github.com/psf/black
rev: 24.10.0
hooks:
- id: black
- repo: https://github.com/pycqa/flake8
rev: 7.1.1
hooks:
- id: flake8
- repo: https://github.com/adamchainz/blacken-docs
rev: 1.19.0
hooks:
- id: blacken-docs
additional_dependencies:
- black==24.10.0
14 changes: 14 additions & 0 deletions .readthedocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
version: 2
formats: all
sphinx:
configuration: docs/conf.py
build:
os: ubuntu-22.04
tools:
# For available versions, see:
# https://docs.readthedocs.io/en/stable/config-file/v2.html#build-tools-python
python: "3.12" # Keep in sync with .github/workflows/test.yml
python:
install:
- requirements: docs/requirements.txt
- path: .
136 changes: 136 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,142 @@
Changes
=======

0.6.0 (2024-05-29)
------------------

* Improved how the :ref:`default retry policy <default-retry-policy>` handles
:ref:`temporary download errors <zapi-temporary-download-errors>`.
Before, 3 HTTP 429 responses followed by a single HTTP 520 response would
have prevented a retry. Now, unrelated responses and errors do not count
towards the HTTP 520 retry limit.

* Improved how the :ref:`default retry policy <default-retry-policy>` handles
network errors. Before, after 15 minutes of unsuccessful responses (e.g. HTTP
429), any network error would prevent a retry. Now, network errors must happen
15 minutes in a row, without different errors in between, to stop retries.

* Implemented an optional :ref:`aggressive retry policy
<aggressive-retry-policy>`, which retries more errors more often, and could
be useful for long crawls or websites with a low success rate.

* Improved the exception that is raised when passing an invalid retrying policy
object to a :ref:`Python client <api>`.

0.5.2 (2024-05-10)
------------------

* :class:`~zyte_api.RequestError` now has a :data:`~zyte_api.RequestError.query`
attribute with the Zyte API request parameters that caused the error.

0.5.1 (2024-04-16)
------------------

* :class:`~zyte_api.ZyteAPI` and :class:`~zyte_api.AsyncZyteAPI` sessions no
longer need to be used as context managers, and can instead be closed with a
``close()`` method.

0.5.0 (2024-04-05)
------------------

* Removed Python 3.7 support.

* Added :class:`~zyte_api.ZyteAPI` and :class:`~zyte_api.AsyncZyteAPI` to
provide both sync and async Python interfaces with a cleaner API.

* Deprecated ``zyte_api.aio``:

* Replace ``zyte_api.aio.client.AsyncClient`` with the new
:class:`~zyte_api.AsyncZyteAPI` class.

* Replace ``zyte_api.aio.client.create_session`` with the new
:meth:`AsyncZyteAPI.session <zyte_api.AsyncZyteAPI.session>` method.

* Import ``zyte_api.aio.errors.RequestError``,
``zyte_api.aio.retry.RetryFactory`` and
``zyte_api.aio.retry.zyte_api_retrying`` directly from ``zyte_api`` now.

* When using the command-line interface, you can now use ``--store-errors`` to
have error responses be stored alongside successful responses.

* Improved the documentation.

0.4.8 (2023-11-02)
------------------

* Include the Zyte API request ID value in a new ``.request_id`` attribute
in ``zyte_api.aio.errors.RequestError``.

0.4.7 (2023-09-26)
------------------

* ``AsyncClient`` now lets you set a custom user agent to send to Zyte API.

0.4.6 (2023-09-26)
------------------

* Increased the client timeout to match the server’s.
* Mentioned the ``api_key`` parameter of ``AsyncClient`` in the docs example.

0.4.5 (2023-01-03)
------------------

* w3lib >= 2.1.1 is required in install_requires, to ensure that URLs
are escaped properly.
* unnecessary ``requests`` library is removed from install_requires
* fixed tox 4 support

0.4.4 (2022-12-01)
------------------

* Fixed an issue with submitting URLs which contain unescaped symbols
* New "retrying" argument for AsyncClient.__init__, which allows to set
custom retrying policy for the client
* ``--dont-retry-errors`` argument in the CLI tool

0.4.3 (2022-11-10)
------------------

* Connections are no longer reused between requests.
This reduces the amount of ``ServerDisconnectedError`` exceptions.

0.4.2 (2022-10-28)
------------------
* Bump minimum ``aiohttp`` version to 3.8.0, as earlier versions don't support
brotli decompression of responses
* Declared Python 3.11 support

0.4.1 (2022-10-16)
------------------

* Network errors, like server timeouts or disconnections, are now retried for
up to 15 minutes, instead of 5 minutes.

0.4.0 (2022-09-20)
------------------

* Require to install ``Brotli`` as a dependency. This changes the requests to
have ``Accept-Encoding: br`` and automatically decompress brotli responses.

0.3.0 (2022-07-29)
------------------

Internal AggStats class is cleaned up:

* ``AggStats.n_extracted_queries`` attribute is removed, as it was a duplicate
of ``AggStats.n_results``
* ``AggStats.n_results`` is renamed to ``AggStats.n_success``
* ``AggStats.n_input_queries`` is removed as redundant and misleading;
AggStats got a new ``AggStats.n_processed`` property instead.

This change is backwards incompatible if you used stats directly.

0.2.1 (2022-07-29)
------------------

* ``aiohttp.client_exceptions.ClientConnectorError`` is now treated as a
network error and retried accordingly.
* Removed the unused ``zyte_api.sync`` module.

0.2.0 (2022-07-14)
------------------

Expand Down
95 changes: 82 additions & 13 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,32 +18,101 @@ python-zyte-api
:target: https://codecov.io/gh/zytedata/zyte-api
:alt: Coverage report

Python client libraries for `Zyte Data API`_.
.. description-start

Command-line utility and asyncio-based library are provided by this package.
Command-line client and Python client library for `Zyte API`_.

.. _Zyte API: https://docs.zyte.com/zyte-api/get-started.html

.. description-end

Installation
============

::
.. install-start

.. code-block:: shell

pip install zyte-api

``zyte-api`` requires Python 3.7+.
.. note:: Python 3.9+ is required.

.. install-end

Basic usage
===========

.. basic-start

Set your API key
----------------

.. key-get-start

After you `sign up for a Zyte API account
<https://app.zyte.com/account/signup/zyteapi>`_, copy `your API key
<https://app.zyte.com/o/zyte-api/api-access>`_.

.. key-get-end


Use the command-line client
---------------------------

Then you can use the zyte-api command-line client to send Zyte API requests.
First create a text file with a list of URLs:

.. code-block:: none

https://books.toscrape.com
https://quotes.toscrape.com

And then call ``zyte-api`` from your shell:

.. code-block:: shell

API key
=======
zyte-api url-list.txt --api-key YOUR_API_KEY --output results.jsonl

Make sure you have an API key for the `Zyte Data API`_ service.
You can set ``ZYTE_API_KEY`` environment
variable with the key to avoid passing it around explicitly.

Read the `documentation <https://python-zyte-api.readthedocs.io>`_ for more information.
Use the Python sync API
-----------------------

License is BSD 3-clause.
For very basic Python scripts, use the sync API:

.. code-block:: python

from zyte_api import ZyteAPI

client = ZyteAPI(api_key="YOUR_API_KEY")
response = client.get({"url": "https://toscrape.com", "httpResponseBody": True})


Use the Python async API
------------------------

For asyncio code, use the async API:

.. code-block:: python

import asyncio

from zyte_api import AsyncZyteAPI


async def main():
client = AsyncZyteAPI(api_key="YOUR_API_KEY")
response = await client.get(
{"url": "https://toscrape.com", "httpResponseBody": True}
)


asyncio.run(main())

.. basic-end

Read the `documentation <https://python-zyte-api.readthedocs.io>`_ for more
information.

* Documentation: https://python-zyte-api.readthedocs.io
* Source code: https://github.com/zytedata/python-zyte-api
* Issue tracker: https://github.com/zytedata/python-zyte-api/issues

.. _Zyte Data API: https://docs.zyte.com/zyte-api/get-started.html
Loading