Skip to content

Investigate caching options in ApifyRequestQueueClient #550

@Pijukatel

Description

@Pijukatel

A few points were raised during the implementation of ApifyRequestQueueClient that were not addressed immediately, as they were more of optimization issues and did not prevent the client from being released. They are mentioned here:

  • Local cache size 1_000_000:
    • This could potentially consume all the resources. Maybe we could add dynamic resizing based on currently available resources. If we reach a certain threshold, migrate a portion of the cache to a smaller one and drop the rest.
  • Deduplication can be based on different caches:
    • It is convenient to re-use existing cache for deduplication, as we do not need to consume any new resources. On the other hand, a full request cache is overkill for deduplication, as it requires only the set of unique_keys, which is basically only the keys of the request cache. If they are independent, then the size of the request cache does not affect deduplication. On the other hand, in some scenarios, it is just duplicate information in the second cache.
  • Better utilization of the already fully hydrated requests to avoid await self.get_request(request.id) for each fetch_next_request call. This might not be possible, but investigate if there is a room for improvement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    t-toolingIssues with this label are in the ownership of the tooling team.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions