Skip to content

Commit 1c7ffd4

Browse files
Merge branch 'main' into cancel-none-fix
2 parents f3a1ee0 + 80757cd commit 1c7ffd4

File tree

3 files changed

+134
-0
lines changed

3 files changed

+134
-0
lines changed

docs/src/user-docs/guides-overview.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,13 @@ Using the API server
4040
How to use the API server to interact with CLP.
4141
:::
4242

43+
:::{grid-item-card}
44+
:link: guides-using-log-ingestor
45+
Using `log-ingestor`
46+
^^^
47+
How to use `log-ingestor` to continuously ingest logs.
48+
:::
49+
4350
:::{grid-item-card}
4451
:link: guides-using-presto
4552
Using Presto with CLP
Lines changed: 126 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,126 @@
1+
# Using `log-ingestor`
2+
3+
`log-ingestor` is a CLP component that facilitates continuous log ingestion from a given log source.
4+
5+
:::{note}
6+
Currently, `log-ingestor` can only be used by [`clp-json`](./quick-start/clp-json.md) deployments
7+
that are configured for S3 object storage. To set up this configuration, check out the
8+
[object storage guide](./guides-using-object-storage/index.md).
9+
10+
Support for ingestion from local filesystems, or for using `clp-text`, is planned for a future
11+
release.
12+
:::
13+
14+
---
15+
16+
## Starting `log-ingestor`
17+
18+
`clp-json` will spin up `log-ingestor` on startup as long as the `logs_input` field in the CLP
19+
package's config file (`clp-package/etc/clp-config.yaml`) is
20+
[configured for object storage][clp-s3-logs-input-config].
21+
22+
You can specify a custom configuration for `log-ingestor` by modifying the `log_ingestor` field in
23+
the same file.
24+
25+
---
26+
27+
## Ingestion jobs
28+
29+
`log-ingestor` facilitates continuous log ingestion with **ingestion jobs**. An ingestion job
30+
continuously monitors a configured log source, buffers incoming log data, and groups it into
31+
compression jobs. This buffering and batching strategy improves compression efficiency and reduces
32+
overall storage overhead.
33+
34+
:::{note}
35+
Support for one-time ingestion jobs (similar to the current CLP compression CLI workflows) is
36+
planned for a future release.
37+
:::
38+
39+
### Interacting with `log-ingestor`
40+
41+
`log-ingestor` exposes **RESTful APIs** that allow you to submit ingestion jobs, manage ingestion
42+
jobs, and check `log-ingestor`'s health.
43+
44+
You can explore all available endpoints and their schemas at the
45+
[Swagger UI `log-ingestor` page][swagger-ui-all].
46+
47+
:::{note}
48+
Currently, requests to `log-ingestor` must be sent directly to the `log-ingestor` service. Requests
49+
will be routed through CLP's [API server](./guides-using-the-api-server.md) in a future release.
50+
:::
51+
52+
### Fault tolerance
53+
54+
:::{warning}
55+
**The current version of `log-ingestor` does not provide fault tolerance.**
56+
57+
If `log-ingestor` crashes or is restarted, all in-progress ingestion jobs and their associated state
58+
will be lost, and must be restored manually. Robust fault tolerance for the ingestion pipeline is
59+
planned for a future release.
60+
:::
61+
62+
---
63+
64+
## Continuous ingestion from S3
65+
66+
`log-ingestor` supports **continuous ingestion jobs** for ingesting logs from S3-compatible object
67+
storage. Currently, two types of ingestion jobs are available:
68+
69+
* [**S3 scanner**](#s3-scanner): Periodically scans an S3 bucket and prefix for new log files to
70+
ingest.
71+
* [**SQS listener**](#sqs-listener): Listens to an [SQS queue][sqs] for notifications about newly
72+
created log files in S3.
73+
74+
### S3 scanner
75+
76+
An S3 scanner ingestion job periodically scans a specified S3 bucket and key prefix for new log
77+
files to ingest. The scan interval and other parameters can be configured when creating the job.
78+
79+
For configuration details and the request body, see the
80+
[API reference for creating S3 scanner ingestion jobs][s3-scanner-api].
81+
82+
:::{important}
83+
To ensure correct and efficient ingestion, the scanner relies on the following assumptions:
84+
85+
* **Lexicographical order**: Every new object added to the S3 bucket has a key that is
86+
lexicographically greater than the previously added object. For example, objects with keys `log1`
87+
and `log2` will be ingested sequentially. If a new object with key `log0` is added after `log2`,
88+
it will be ignored because it is not lexicographically greater than the last ingested key.
89+
* **Immutability**: Objects under the specified prefix are immutable. Once an object is created, it
90+
is not modified or overwritten.
91+
:::
92+
93+
### SQS listener
94+
95+
An SQS listener ingestion job listens to a specified AWS SQS queue and ingests S3 objects referenced
96+
by incoming notifications. For details on configuring S3 event notifications for SQS, see the
97+
[AWS documentation][aws-s3-event-notifications].
98+
99+
For configuration details and the request body, see the
100+
[API reference for creating SQS listener ingestion jobs][sqs-listener-api].
101+
102+
:::{important}
103+
To ensure correct and efficient ingestion, the listener relies on the following assumptions:
104+
105+
* **Dedicated queue**: The given SQS queue must be dedicated to this ingestion job. No other
106+
consumers should read from or delete messages in the queue. The ingestion job must have permission
107+
to delete messages after they are successfully processed.
108+
* **Immutability**: Objects under the specified prefix are immutable. Once an object is created, it
109+
is not modified or overwritten.
110+
:::
111+
112+
:::{note}
113+
SQS listener ingestion jobs carry the following limitations:
114+
115+
* An SQS listener ingestion job can only ingest objects from a single S3 bucket and prefix. Support
116+
for multiple buckets or prefixes is planned for a future release.
117+
* SQS listener ingestion jobs do not support custom S3 endpoint configurations. Support for custom
118+
endpoints is planned for a future release.
119+
:::
120+
121+
[aws-s3-event-notifications]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html#step2-enable-notification
122+
[clp-s3-logs-input-config]: ./guides-using-object-storage/clp-config.md#configuration-for-input-logs
123+
[s3-scanner-api]: https://petstore.swagger.io/?url=https://docs.yscope.com/clp/DOCS_VAR_CLP_GIT_REF/_static/generated/log-ingestor-openapi.json#/IngestionJob/create_s3_scanner_job
124+
[sqs]: https://docs.aws.amazon.com/sqs/
125+
[sqs-listener-api]: https://petstore.swagger.io/?url=https://docs.yscope.com/clp/DOCS_VAR_CLP_GIT_REF/_static/generated/log-ingestor-openapi.json#/IngestionJob/create_sqs_listener_job
126+
[swagger-ui-all]: https://petstore.swagger.io/?url=https://docs.yscope.com/clp/DOCS_VAR_CLP_GIT_REF/_static/generated/log-ingestor-openapi.json

docs/src/user-docs/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ guides-overview
6363
guides-mcp-server/index
6464
guides-using-object-storage/index
6565
guides-using-the-api-server
66+
guides-using-log-ingestor
6667
guides-external-database
6768
guides-multi-host
6869
guides-retention

0 commit comments

Comments
 (0)