CT log fetcher download bottleneck

The CT log fetcher currently uses a single thread, which means that only certificates from one CT log server are fetched at one point in time and that all certificates are fetched sequentially (typically fetching ~32 certificates per request).
This results in download rates which are much lower than the achievable ingestion rate of the map server and in some cases almost as low as the actual growth rate of the log.
For the Google Xenon log located in the same continent as the map server (Europe), the rates are 360 certs/s (fetching), 5363 certs/s (ingesting), 89 certs/s (log growth). Meaning that we could support **only** fetching from the Xenon log given its current growth (processing rate ~ 4x growth).
However, for the Google Argon log located in the US, the rates are 93 certs/s (fetching), 8953 certs/s (ingesting), 77 certs/s (log growth). Meaning that we could **barely** support **only** fetching from the Xenon log given its current growth (processing rate ~ 1.2x growth).

Solutions:
- Continuously fetch from **all** CT log servers to ensure that no large amount of certs accumulate
- Fetch from a single log server using N parallel threads, e.g., distribute batches to different threads, cache the results, and then serve them in the correct order to the ingestion module. Must be careful to gracefully handle rate limit messages from the log server and back up for a limited amount of time (e.g., 1min back-off per thread)

![logfetcher](https://github.com/netsec-ethz/fpki/assets/7011170/95dd0f45-8e6d-47fc-8fd4-daef7bbfa77a)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CT log fetcher download bottleneck #58

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CT log fetcher download bottleneck #58

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions