Skip to content

Commit 14a35ad

Browse files
committed
feat: set HTTPCACHE_STORAGE in apply_apify_settings, document usage
1 parent 3909758 commit 14a35ad

File tree

3 files changed

+6
-0
lines changed

3 files changed

+6
-0
lines changed

docs/02_guides/05_scrapy.mdx

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ The Apify SDK provides several custom components to support integration with the
4040
- [`apify.scrapy.ApifyScheduler`](https://docs.apify.com/sdk/python/reference/class/ApifyScheduler) - Replaces Scrapy's default [scheduler](https://docs.scrapy.org/en/latest/topics/scheduler.html) with one that uses Apify's [request queue](https://docs.apify.com/platform/storage/request-queue) for storing requests. It manages enqueuing, dequeuing, and maintaining the state and priority of requests.
4141
- [`apify.scrapy.ActorDatasetPushPipeline`](https://docs.apify.com/sdk/python/reference/class/ActorDatasetPushPipeline) - A Scrapy [item pipeline](https://docs.scrapy.org/en/latest/topics/item-pipeline.html) that pushes scraped items to Apify's [dataset](https://docs.apify.com/platform/storage/dataset). When enabled, every item produced by the spider is sent to the dataset.
4242
- [`apify.scrapy.ApifyHttpProxyMiddleware`](https://docs.apify.com/sdk/python/reference/class/ApifyHttpProxyMiddleware) - A Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html) that manages proxy configurations. This middleware replaces Scrapy's default `HttpProxyMiddleware` to facilitate the use of Apify's proxy service.
43+
- [`apify.scrapy.extensions.httpcache.ApifyCacheStorage`](https://docs.apify.com/sdk/python/reference/class/ApifyCacheStorage) - A storage backend for the built-in Scrapy [middleware](https://docs.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.downloadermiddlewares.httpcache) that manages caching responses to HTTP requests. This backend uses Apify's [key-value store](https://docs.apify.com/platform/storage/key-value-store). Don't forget to set `HTTPCACHE_ENABLED` and `HTTPCACHE_EXPIRATION_SECS` in your settings, otherwise no caching takes place.
4344

4445
Additional helper functions in the [`apify.scrapy`](https://github.com/apify/apify-sdk-python/tree/master/src/apify/scrapy) subpackage include:
4546

docs/02_guides/code/scrapy_project/src/settings.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@
77
TELNETCONSOLE_ENABLED = False
88
# Do not change the Twisted reactor unless you really know what you are doing.
99
TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
10+
HTTPCACHE_ENABLED = True
11+
HTTPCACHE_EXPIRATION_SECS = 7200

src/apify/scrapy/utils.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,9 @@ def apply_apify_settings(*, settings: Settings | None = None, proxy_config: dict
4444
settings['DOWNLOADER_MIDDLEWARES']['scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware'] = None
4545
settings['DOWNLOADER_MIDDLEWARES']['apify.scrapy.middlewares.ApifyHttpProxyMiddleware'] = 750
4646

47+
# Set the default HTTPCache middleware storage backend to ApifyCacheStorage
48+
settings['HTTPCACHE_STORAGE'] = 'apify.scrapy.extensions.httpcache.ApifyCacheStorage'
49+
4750
# Store the proxy configuration
4851
settings['APIFY_PROXY_SETTINGS'] = proxy_config
4952

0 commit comments

Comments
 (0)