You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add deduplication to add_batch_of_requests (#534)
### Description
- Ensure that already known requests are excluded from
`api_client.batch_add_requests` calls to avoid expensive and pointless
API calls.
- Add all new requests to the cache when calling `batch_add_requests`.
- Add test with real API usage measurement.
### Issues
- Closes: #514
### Testing
- Added new integration tests to verify reduced API usage.
- Comparing benchmark actor based on master vs this PR. Actor is a
simple ParselCrawler that crawls the whole crawlee.dev, which contains
many duplicate links, as the documentation is cross-linked thoroughly.
Results:
- Massive reduction of cost from request queue.
- Significant overall speed up due to reduced API calls.
---------
Co-authored-by: Vlada Dusek <[email protected]>
0 commit comments