|
| 1 | +# Using `log-ingestor` |
| 2 | + |
| 3 | +`log-ingestor` is a CLP component that facilitates continuous log ingestion from a given log source. |
| 4 | + |
| 5 | +:::{note} |
| 6 | +Currently, `log-ingestor` can only be used by [`clp-json`](./quick-start/clp-json.md) deployments |
| 7 | +that are configured for S3 object storage. To set up this configuration, check out the |
| 8 | +[object storage guide](./guides-using-object-storage/index.md). |
| 9 | + |
| 10 | +Support for ingestion from local filesystems, or for using `clp-text`, is planned for a future |
| 11 | +release. |
| 12 | +::: |
| 13 | + |
| 14 | +--- |
| 15 | + |
| 16 | +## Starting `log-ingestor` |
| 17 | + |
| 18 | +`clp-json` will spin up `log-ingestor` on startup as long as the `logs_input` field in the CLP |
| 19 | +package's config file (`clp-package/etc/clp-config.yaml`) is |
| 20 | +[configured for object storage][clp-s3-logs-input-config]. |
| 21 | + |
| 22 | +You can specify a custom configuration for `log-ingestor` by modifying the `log_ingestor` field in |
| 23 | +the same file. |
| 24 | + |
| 25 | +--- |
| 26 | + |
| 27 | +## Ingestion jobs |
| 28 | + |
| 29 | +`log-ingestor` facilitates continuous log ingestion with **ingestion jobs**. An ingestion job |
| 30 | +continuously monitors a configured log source, buffers incoming log data, and groups it into |
| 31 | +compression jobs. This buffering and batching strategy improves compression efficiency and reduces |
| 32 | +overall storage overhead. |
| 33 | + |
| 34 | +:::{note} |
| 35 | +Support for one-time ingestion jobs (similar to the current CLP compression CLI workflows) is |
| 36 | +planned for a future release. |
| 37 | +::: |
| 38 | + |
| 39 | +### Interacting with `log-ingestor` |
| 40 | + |
| 41 | +`log-ingestor` exposes **RESTful APIs** that allow you to submit ingestion jobs, manage ingestion |
| 42 | +jobs, and check `log-ingestor`'s health. |
| 43 | + |
| 44 | +You can explore all available endpoints and their schemas at the |
| 45 | +[Swagger UI `log-ingestor` page][swagger-ui-all]. |
| 46 | + |
| 47 | +:::{note} |
| 48 | +Currently, requests to `log-ingestor` must be sent directly to the `log-ingestor` service. Requests |
| 49 | +will be routed through CLP's [API server](./guides-using-the-api-server.md) in a future release. |
| 50 | +::: |
| 51 | + |
| 52 | +### Fault tolerance |
| 53 | + |
| 54 | +:::{warning} |
| 55 | +**The current version of `log-ingestor` does not provide fault tolerance.** |
| 56 | + |
| 57 | +If `log-ingestor` crashes or is restarted, all in-progress ingestion jobs and their associated state |
| 58 | +will be lost, and must be restored manually. Robust fault tolerance for the ingestion pipeline is |
| 59 | +planned for a future release. |
| 60 | +::: |
| 61 | + |
| 62 | +--- |
| 63 | + |
| 64 | +## Continuous ingestion from S3 |
| 65 | + |
| 66 | +`log-ingestor` supports **continuous ingestion jobs** for ingesting logs from S3-compatible object |
| 67 | +storage. Currently, two types of ingestion jobs are available: |
| 68 | + |
| 69 | +* [**S3 scanner**](#s3-scanner): Periodically scans an S3 bucket and prefix for new log files to |
| 70 | + ingest. |
| 71 | +* [**SQS listener**](#sqs-listener): Listens to an [SQS queue][sqs] for notifications about newly |
| 72 | + created log files in S3. |
| 73 | + |
| 74 | +### S3 scanner |
| 75 | + |
| 76 | +An S3 scanner ingestion job periodically scans a specified S3 bucket and key prefix for new log |
| 77 | +files to ingest. The scan interval and other parameters can be configured when creating the job. |
| 78 | + |
| 79 | +For configuration details and the request body, see the |
| 80 | +[API reference for creating S3 scanner ingestion jobs][s3-scanner-api]. |
| 81 | + |
| 82 | +:::{important} |
| 83 | +To ensure correct and efficient ingestion, the scanner relies on the following assumptions: |
| 84 | + |
| 85 | +* **Lexicographical order**: Every new object added to the S3 bucket has a key that is |
| 86 | + lexicographically greater than the previously added object. For example, objects with keys `log1` |
| 87 | + and `log2` will be ingested sequentially. If a new object with key `log0` is added after `log2`, |
| 88 | + it will be ignored because it is not lexicographically greater than the last ingested key. |
| 89 | +* **Immutability**: Objects under the specified prefix are immutable. Once an object is created, it |
| 90 | + is not modified or overwritten. |
| 91 | +::: |
| 92 | + |
| 93 | +### SQS listener |
| 94 | + |
| 95 | +An SQS listener ingestion job listens to a specified AWS SQS queue and ingests S3 objects referenced |
| 96 | +by incoming notifications. For details on configuring S3 event notifications for SQS, see the |
| 97 | +[AWS documentation][aws-s3-event-notifications]. |
| 98 | + |
| 99 | +For configuration details and the request body, see the |
| 100 | +[API reference for creating SQS listener ingestion jobs][sqs-listener-api]. |
| 101 | + |
| 102 | +:::{important} |
| 103 | +To ensure correct and efficient ingestion, the listener relies on the following assumptions: |
| 104 | + |
| 105 | +* **Dedicated queue**: The given SQS queue must be dedicated to this ingestion job. No other |
| 106 | + consumers should read from or delete messages in the queue. The ingestion job must have permission |
| 107 | + to delete messages after they are successfully processed. |
| 108 | +* **Immutability**: Objects under the specified prefix are immutable. Once an object is created, it |
| 109 | + is not modified or overwritten. |
| 110 | +::: |
| 111 | + |
| 112 | +:::{note} |
| 113 | +SQS listener ingestion jobs carry the following limitations: |
| 114 | + |
| 115 | +* An SQS listener ingestion job can only ingest objects from a single S3 bucket and prefix. Support |
| 116 | + for multiple buckets or prefixes is planned for a future release. |
| 117 | +* SQS listener ingestion jobs do not support custom S3 endpoint configurations. Support for custom |
| 118 | + endpoints is planned for a future release. |
| 119 | +::: |
| 120 | + |
| 121 | +[aws-s3-event-notifications]: https://docs.aws.amazon.com/AmazonS3/latest/userguide/ways-to-add-notification-config-to-bucket.html#step2-enable-notification |
| 122 | +[clp-s3-logs-input-config]: ./guides-using-object-storage/clp-config.md#configuration-for-input-logs |
| 123 | +[s3-scanner-api]: https://petstore.swagger.io/?url=https://docs.yscope.com/clp/DOCS_VAR_CLP_GIT_REF/_static/generated/log-ingestor-openapi.json#/IngestionJob/create_s3_scanner_job |
| 124 | +[sqs]: https://docs.aws.amazon.com/sqs/ |
| 125 | +[sqs-listener-api]: https://petstore.swagger.io/?url=https://docs.yscope.com/clp/DOCS_VAR_CLP_GIT_REF/_static/generated/log-ingestor-openapi.json#/IngestionJob/create_sqs_listener_job |
| 126 | +[swagger-ui-all]: https://petstore.swagger.io/?url=https://docs.yscope.com/clp/DOCS_VAR_CLP_GIT_REF/_static/generated/log-ingestor-openapi.json |
0 commit comments