@@ -198,7 +198,8 @@ ZYTE_API_MAX_REQUESTS
198198Default: ``None ``
199199
200200When set to an integer value > 0, the spider will close when the number of Zyte
201- API requests reaches it.
201+ API requests reaches it, with ``closespider_max_zapi_requests `` as the close
202+ reason.
202203
203204Note that requests with error responses that cannot be retried or exceed their
204205retry limit also count here.
@@ -246,6 +247,261 @@ subclass.
246247See :ref: `retry `.
247248
248249
250+ .. setting :: ZYTE_API_SESSION_CHECKER
251+
252+ ZYTE_API_SESSION_CHECKER
253+ ========================
254+
255+ Default: ``None ``
256+
257+ A :ref: `Scrapy component <topics-components >` (or its import path as a string)
258+ that defines a ``check `` method.
259+
260+ If ``check `` returns ``True ``, the response session is considered valid; if
261+ ``check `` returns ``False ``, the response session is considered invalid, and
262+ will be discarded. ``check `` can also raise a
263+ :exc: `~scrapy.exceptions.CloseSpider ` exception to close the spider.
264+
265+ If defined, the ``check `` method is called on every response that is using a
266+ :ref: `session managed by scrapy-zyte-api <session >`. If not defined, the
267+ default implementation checks the outcome of the ``setLocation `` action if
268+ session initialization was location-based, as described in
269+ :ref: `session-check `.
270+
271+ Example:
272+
273+ .. code-block :: python
274+ :caption: settings.py
275+
276+ from scrapy import Request
277+ from scrapy.http.response import Response
278+
279+
280+ class MySessionChecker :
281+
282+ def check (self , request : Request, response : Response) -> bool :
283+ return bool (response.css(" .is_valid" ))
284+
285+
286+ ZYTE_API_SESSION_CHECKER = MySessionChecker
287+
288+ Because the session checker is a Scrapy component, you can access the crawler
289+ object, for example to read settings:
290+
291+ .. code-block :: python
292+ :caption: settings.py
293+
294+ from scrapy import Request
295+ from scrapy.http.response import Response
296+
297+
298+ class MySessionChecker :
299+
300+ @ classmethod
301+ def from_crawler (cls , crawler ):
302+ return cls (crawler)
303+
304+ def __init__ (self , crawler ):
305+ location = crawler.settings[" ZYTE_API_SESSION_LOCATION" ]
306+ self .postal_code = location[" postalCode" ]
307+
308+ def check (self , request : Request, response : Response) -> bool :
309+ return response.css(" .postal_code::text" ).get() == self .postal_code
310+
311+
312+ ZYTE_API_SESSION_CHECKER = MySessionChecker
313+
314+
315+ .. setting :: ZYTE_API_SESSION_ENABLED
316+
317+ ZYTE_API_SESSION_ENABLED
318+ ========================
319+
320+ Default: ``False ``
321+
322+ Enables :ref: `scrapy-zyte-api session management <session >`.
323+
324+
325+ .. setting :: ZYTE_API_SESSION_LOCATION
326+
327+ ZYTE_API_SESSION_LOCATION
328+ =========================
329+
330+ Default: ``{} ``
331+
332+ If defined, sessions are initialized using the ``setLocation ``
333+ :http: `action <request:actions> `, and the value of this setting must be the
334+ target address :class: `dict `. For example:
335+
336+ .. code-block :: python
337+ :caption: settings.py
338+
339+ ZYTE_API_SESSION_LOCATION = {" postalCode" : " 10001" }
340+
341+ If the :setting: `ZYTE_API_SESSION_PARAMS ` setting or the
342+ :reqmeta: `zyte_api_session_params ` request metadata key set a ``"url" ``, it
343+ will be used for session initialization as well. Otherwise, the URL of the
344+ request for which the session is being initialized will be used instead.
345+
346+ This setting, if not empty, takes precedence over the
347+ :setting: `ZYTE_API_SESSION_PARAMS ` setting and the
348+ :reqmeta: `zyte_api_session_params ` request metadata key, but it can be
349+ overridden by the :reqmeta: `zyte_api_session_location ` request metadata key.
350+
351+ To disable the :setting: `ZYTE_API_SESSION_LOCATION ` setting on a specific
352+ request, e.g. to use the :setting: `ZYTE_API_SESSION_PARAMS ` setting or the
353+ :reqmeta: `zyte_api_session_params ` request metadata key instead, set
354+ the :reqmeta: `zyte_api_session_location ` request metadata key to ``{} ``.
355+
356+
357+ .. setting :: ZYTE_API_SESSION_MAX_BAD_INITS
358+
359+ ZYTE_API_SESSION_MAX_BAD_INITS
360+ ==============================
361+
362+ Default: ``8 ``
363+
364+ The maximum number of :ref: `scrapy-zyte-api sessions <session >` per pool that
365+ are allowed to fail their session check right after creation in a row. If the
366+ maximum is reached, the spider closes with ``bad_session_inits `` as the close
367+ reason.
368+
369+ To override this value for specific pools, use
370+ :setting: `ZYTE_API_SESSION_MAX_BAD_INITS_PER_POOL `.
371+
372+
373+ .. setting :: ZYTE_API_SESSION_MAX_BAD_INITS_PER_POOL
374+
375+ ZYTE_API_SESSION_MAX_BAD_INITS_PER_POOL
376+ =======================================
377+
378+ Default: ``{} ``
379+
380+ :class: `dict ` where keys are :ref: `pool <session-pools >` IDs and values are
381+ overrides of :setting: `ZYTE_API_SESSION_POOL_SIZE ` for those pools.
382+
383+
384+ .. setting :: ZYTE_API_SESSION_MAX_ERRORS
385+
386+ ZYTE_API_SESSION_MAX_ERRORS
387+ ===========================
388+
389+ Default: ``1 ``
390+
391+ Maximum number of :ref: `unsuccessful responses
392+ <zyte-api-unsuccessful-responses>` allowed for any given session before
393+ discarding the session.
394+
395+ You might want to increase this number if you find that a session may continue
396+ to work even after an unsuccessful response. See :ref: `optimize-sessions `.
397+
398+ .. note :: This setting does not affect session checks
399+ (:setting: `ZYTE_API_SESSION_CHECKER `). A session is always discarded the
400+ first time it fails its session check.
401+
402+
403+ .. setting :: ZYTE_API_SESSION_PARAMS
404+
405+ ZYTE_API_SESSION_PARAMS
406+ =======================
407+
408+ Default: ``{"browserHtml": True} ``
409+
410+ Parameters to use for session initialization.
411+
412+ It works similarly to :http: `request:sessionContextParams ` from
413+ :ref: `server-managed sessions <zyte-api-session-contexts >`, but it supports
414+ arbitrary Zyte API parameters instead of a specific subset.
415+
416+ If it does not define a ``"url" ``, the URL of the request for which the session
417+ is being initialized will be used.
418+
419+ This setting can be overridden by the :setting: `ZYTE_API_SESSION_LOCATION `
420+ setting, the :reqmeta: `zyte_api_session_location ` request metadata key, or the
421+ :reqmeta: `zyte_api_session_params ` request metadata key.
422+
423+ Example:
424+
425+ .. code-block :: python
426+ :caption: settings.py
427+
428+ ZYTE_API_SESSION_PARAMS = {
429+ " browserHtml" : True ,
430+ " actions" : [
431+ {
432+ " action" : " setLocation" ,
433+ " address" : {" postalCode" : " 10001" },
434+ }
435+ ],
436+ }
437+
438+ .. tip :: The example above is equivalent to setting
439+ :setting: `ZYTE_API_SESSION_LOCATION ` to ``{"postalCode": "10001"} ``.
440+
441+
442+ .. setting :: ZYTE_API_SESSION_POOL_SIZE
443+
444+ ZYTE_API_SESSION_POOL_SIZE
445+ ==========================
446+
447+ Default: ``8 ``
448+
449+ The maximum number of active :ref: `scrapy-zyte-api sessions <session >` to keep
450+ per :ref: `pool <session-pools >`.
451+
452+ To override this value for specific pools, use
453+ :setting: `ZYTE_API_SESSION_POOL_SIZES `.
454+
455+ Increase this number to lower the frequency with which requests are sent
456+ through each session, which on some websites may increase the lifetime of each
457+ session. See :ref: `optimize-sessions `.
458+
459+
460+ .. setting :: ZYTE_API_SESSION_POOL_SIZES
461+
462+ ZYTE_API_SESSION_POOL_SIZES
463+ ===========================
464+
465+ Default: ``{} ``
466+
467+ :class: `dict ` where keys are :ref: `pool <session-pools >` IDs and values are
468+ overrides of :setting: `ZYTE_API_SESSION_POOL_SIZE ` for those pools.
469+
470+
471+ .. setting :: ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS
472+
473+ ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS
474+ ===================================
475+
476+ Default: ``60 ``
477+
478+ scrapy-zyte-api maintains a rotation queue of ready-to-use sessions per
479+ :ref: `pool <session-pools >`. At some points, the queue might be empty for a
480+ given pool because all its sessions are in the process of being initialized or
481+ refreshed.
482+
483+ If the queue is empty when trying to assign a session to a request,
484+ scrapy-zyte-api will wait some time
485+ (:setting: `ZYTE_API_SESSION_QUEUE_WAIT_TIME `), and then try to get a session
486+ from the queue again.
487+
488+ Use this setting to configure the maximum number of attempts before giving up
489+ and raising a :exc: `RuntimeError ` exception.
490+
491+
492+ .. setting :: ZYTE_API_SESSION_QUEUE_WAIT_TIME
493+
494+ ZYTE_API_SESSION_QUEUE_WAIT_TIME
495+ ===================================
496+
497+ Default: ``1.0 ``
498+
499+ Number of seconds to wait between attempts to get a session from a rotation
500+ queue.
501+
502+ See :setting: `ZYTE_API_SESSION_QUEUE_MAX_ATTEMPTS ` for details.
503+
504+
249505.. setting :: ZYTE_API_SKIP_HEADERS
250506
251507ZYTE_API_SKIP_HEADERS
0 commit comments