@@ -51,12 +51,26 @@ will be routed through CLP's [API server](./guides-using-the-api-server.md) in a
5151
5252### Fault tolerance
5353
54- :::{warning}
55- ** The current version of ` log-ingestor ` does not provide fault tolerance.**
54+ ` log-ingestor ` is designed to tolerate unexpected crashes or restarts without losing information
55+ about ingestion jobs or the files that have been submitted for compression. Note that this does not
56+ include fault tolerance of the components external to ` log-ingestor ` . Specifically, ` log-ingestor `
57+ guarantees the following, even in the presence of crashes or restarts of ` log-ingestor ` :
5658
57- If ` log-ingestor ` crashes or is restarted, all in-progress ingestion jobs and their associated state
58- will be lost, and must be restored manually. Robust fault tolerance for the ingestion pipeline is
59- planned for a future release.
59+ * Any ingestion job successfully submitted to ` log-ingestor ` will run continuously.
60+ * Within an ingestion job, any files that have been found on S3 or received as messages from SQS
61+ queue will eventually be submitted for compression.
62+
63+ :::{note}
64+ ` log-ingestor ` ** DOES NOT** guarantee the following after a crash or restart:
65+
66+ * Any file submitted for compression (that can be compressed successfully) will eventually be
67+ compressed successfully.
68+ * This is because failures of the compression cluster are external to ` log-ingestor ` . Future
69+ versions of CLP will address this limitation.
70+ * Any file submitted for compression will * only* be compressed once.
71+ * This is because for [ SQS listener] ( #sqs-listener ) ingestion jobs, the processes for deleting
72+ messages from the SQS queue and recording the files for ingestion are not synchronized. As a
73+ result, a failure during this process may cause the same file to be ingested multiple times.
6074:::
6175
6276---
0 commit comments