Learn how to get scrapy-zyte-api installed and configured on an existing :doc:`Scrapy <scrapy:index>` project.
Tip
:ref:`Zyte’s web scraping tutorial <zyte:tutorial>` covers scrapy-zyte-api setup as well.
You need at least:
- A :ref:`Zyte API <zyte-api>` subscription (there’s a :ref:`free trial <zapi-trial>`).
- Python 3.10+
- Scrapy 2.0.1+
:doc:`scrapy-poet <scrapy-poet:index>` integration requires Scrapy 2.6+.
For a basic installation:
pip install scrapy-zyte-apiFor :ref:`scrapy-poet integration <scrapy-poet>`, install the provider extra:
pip install scrapy-zyte-api[provider]For :ref:`x402 support <x402>`, install the x402 extra:
pip install scrapy-zyte-api[x402]Note that you can install multiple extras:
pip install scrapy-zyte-api[provider,x402]To configure scrapy-zyte-api, :ref:`set up authentication <auth>` and either :ref:`enable the add-on <config-addon>` (Scrapy ≥ 2.10) or :ref:`configure all components separately <config-components>`.
Warning
Sign up for a Zyte API account, copy your API key and do either of the following:
Define an environment variable named
ZYTE_API_KEYwith your API key:On Windows’ CMD:
> set ZYTE_API_KEY=YOUR_API_KEY
On macOS and Linux:
$ export ZYTE_API_KEY=YOUR_API_KEY
Add your API key to your setting module:
ZYTE_API_KEY = "YOUR_API_KEY"
To use x402 instead, see :ref:`x402`.
If you are using Scrapy 2.10 or higher, you can set up scrapy-zyte-api integration using the following :ref:`add-on <topics-addons>` with any priority:
ADDONS = {
"scrapy_zyte_api.Addon": 500,
}Note
The addon enables :ref:`transparent mode <transparent>` by default.
If :ref:`enabling the add-on <config-addon>` is not an option, you can set up scrapy-zyte-api integration as follows:
DOWNLOAD_HANDLERS = {
"http": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
"https": "scrapy_zyte_api.ScrapyZyteAPIDownloadHandler",
}
DOWNLOADER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPIDownloaderMiddleware": 633,
}
SPIDER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPISpiderMiddleware": 100,
"scrapy_zyte_api.ScrapyZyteAPIRefererSpiderMiddleware": 1000,
}
REQUEST_FINGERPRINTER_CLASS = "scrapy_zyte_api.ScrapyZyteAPIRequestFingerprinter"
TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"By default, scrapy-zyte-api doesn't change the spider behavior. To switch your spider to use Zyte API for all requests, set the following setting as well:
ZYTE_API_TRANSPARENT_MODE = TrueFor :ref:`scrapy-poet integration <scrapy-poet>`, :ref:`configure scrapy-poet
<scrapy-poet:setup>` first, and then add the following provider to the
SCRAPY_POET_PROVIDERS setting:
SCRAPY_POET_PROVIDERS = {
"scrapy_zyte_api.providers.ZyteApiProvider": 1100,
}If you already had a custom value for :setting:`REQUEST_FINGERPRINTER_CLASS <scrapy:REQUEST_FINGERPRINTER_CLASS>`, set that value on :setting:`ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS` instead.
ZYTE_API_FALLBACK_REQUEST_FINGERPRINTER_CLASS = "myproject.CustomRequestFingerprinter"For :ref:`session management support <session>`, add the following downloader middleware to the :setting:`DOWNLOADER_MIDDLEWARES <scrapy:DOWNLOADER_MIDDLEWARES>` setting:
DOWNLOADER_MIDDLEWARES = {
"scrapy_zyte_api.ScrapyZyteAPISessionDownloaderMiddleware": 667,
}If your :setting:`TWISTED_REACTOR <scrapy:TWISTED_REACTOR>` setting was not
set to "twisted.internet.asyncioreactor.AsyncioSelectorReactor" before,
you will be changing the Twisted reactor that your Scrapy project uses, and
your existing code may need changes, such as:
:ref:`asyncio-preinstalled-reactor`.
Some Twisted imports install the default, non-asyncio Twisted reactor as a side effect. Once a reactor is installed, it cannot be changed for the whole run time.
-
Note that you might be using Deferreds without realizing it through some Scrapy functions and methods. For example, when you yield the return value of
self.crawler.engine.download()from a spider callback, you are yielding a Deferred.
It is possible to use :ref:`Zyte API <zyte-api>` without a Zyte API account by using the x402 protocol to handle payments:
- Read the Zyte Terms of Service. By using Zyte API, you are accepting them.
- During :ref:`installation <install>`, make sure to install the
x402extra. - :ref:`Configure <eth-key>` the private key of your Ethereum account to authorize payments.
It is recommended to configure your Ethereum private key through an environment variable, so that it also works when you use :doc:`python-zyte-api <python-zyte-api:index>`:
On Windows’ CMD:
> set ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY
On macOS and Linux:
$ export ZYTE_API_ETH_KEY=YOUR_ETH_PRIVATE_KEY
Alternatively, you can add your Ethereum private key to the settings module:
ZYTE_API_ETH_KEY = "YOUR_ETH_PRIVATE_KEY"