Skip to content

CT log fetcher download bottleneck #58

@cyrill-k

Description

@cyrill-k

The CT log fetcher currently uses a single thread, which means that only certificates from one CT log server are fetched at one point in time and that all certificates are fetched sequentially (typically fetching ~32 certificates per request).
This results in download rates which are much lower than the achievable ingestion rate of the map server and in some cases almost as low as the actual growth rate of the log.
For the Google Xenon log located in the same continent as the map server (Europe), the rates are 360 certs/s (fetching), 5363 certs/s (ingesting), 89 certs/s (log growth). Meaning that we could support only fetching from the Xenon log given its current growth (processing rate ~ 4x growth).
However, for the Google Argon log located in the US, the rates are 93 certs/s (fetching), 8953 certs/s (ingesting), 77 certs/s (log growth). Meaning that we could barely support only fetching from the Xenon log given its current growth (processing rate ~ 1.2x growth).

Solutions:

  • Continuously fetch from all CT log servers to ensure that no large amount of certs accumulate
  • Fetch from a single log server using N parallel threads, e.g., distribute batches to different threads, cache the results, and then serve them in the correct order to the ingestion module. Must be careful to gracefully handle rate limit messages from the log server and back up for a limited amount of time (e.g., 1min back-off per thread)

logfetcher

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions