Skip to content
Open
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
59 changes: 57 additions & 2 deletions pipeline/outputs/s3.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

![AWS logo](<../../.gitbook/assets/image (9).png>)

The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/WhatIsCloudWatchLogs.html) cloud object store.
The _Amazon S3_ output plugin lets you ingest records into the [S3](https://docs.aws.amazon.com/AmazonS3/latest/userguide/Welcome.html) cloud object store.

The plugin can upload data to S3 using the [multipart upload API](https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html) or [`PutObject`](https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObject.html). Multipart is the default and is recommended. Fluent Bit will stream data in a series of _parts_. This limits the amount of data buffered on disk at any point in time. By default, every time 5&nbsp;MiB of data have been received, a new part will be uploaded. The plugin can create files up to gigabytes in size from many small chunks or parts using the multipart API. All aspects of the upload process are configurable.

Expand Down Expand Up @@ -45,7 +45,8 @@
| `sts_endpoint` | Custom endpoint for the STS API. | _none_ |
| `profile` | Option to specify an AWS Profile for credentials. | `default` |
| `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ |
| `compression` | Compression type for S3 objects. `gzip` is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can use `arrow`. For gzip compression, the Content-Encoding HTTP Header will be set to `gzip`. Gzip compression can be enabled when `use_put_object` is `on` or `off` (`PutObject` and Multipart). Arrow compression can only be enabled with `use_put_object On`. | _none_ |
| `compression` | Compression/format for S3 objects. Supported: `gzip` (always available) and `parquet` (requires Arrow build). For `gzip`, the `Content-Encoding` header is set to `gzip`. `parquet` is available _only when Fluent Bit is built with `-DFLB_ARROW=On`_ and Arrow GLib/Parquet GLib are installed. Parquet is typically used with `use_put_object On`. | _none_ |

| `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ |
| `send_content_md5` | Send the Content-MD5 header with `PutObject` and UploadPart requests, as is required when Object Lock is enabled. | `false` |
| `auto_retry_requests` | Immediately retry failed requests to AWS services once. This option doesn't affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which can help improve throughput during transient network issues. | `true` |
Expand Down Expand Up @@ -649,3 +650,57 @@
3 2021-04-27T09:33:56.539430Z 0.0 0.0 0.0 0.0 0.0 0.0
4 2021-04-27T09:33:57.539803Z 0.0 0.0 0.0 0.0 0.0 0.0
```

## Enable Parquet support

Check warning on line 654 in pipeline/outputs/s3.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Headings] 'Enable Parquet support' should use sentence-style capitalization. Raw Output: {"message": "[FluentBit.Headings] 'Enable Parquet support' should use sentence-style capitalization.", "location": {"path": "pipeline/outputs/s3.md", "range": {"start": {"line": 654, "column": 4}}}, "severity": "INFO"}

### Build requirements for Parquet

Check warning on line 656 in pipeline/outputs/s3.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Headings] 'Build requirements for Parquet' should use sentence-style capitalization. Raw Output: {"message": "[FluentBit.Headings] 'Build requirements for Parquet' should use sentence-style capitalization.", "location": {"path": "pipeline/outputs/s3.md", "range": {"start": {"line": 656, "column": 5}}}, "severity": "INFO"}

To enable Parquet, build Fluent Bit with Apache Arrow support and install Arrow GLib/Parquet GLib:

```bash
# Ubuntu/Debian example
sudo apt-get update
sudo apt-get install -y -V ca-certificates lsb-release wget
wget https://packages.apache.org/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt-get install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
sudo apt-get update
sudo apt-get install -y -V libarrow-glib-dev libparquet-glib-dev

# Build Fluent Bit with Arrow:
cd build/
cmake -DFLB_ARROW=On ..
cmake --build .
```

For other Linux distributions, refer [the document for installation instructions of Apache Parquet](https://arrow.apache.org/install/).
Apache Parquet GLib is a part of Apache Arrow project.

### Testing Parquet compression

Check warning on line 678 in pipeline/outputs/s3.md

View workflow job for this annotation

GitHub Actions / runner / vale

[vale] reported by reviewdog 🐶 [FluentBit.Headings] 'Testing Parquet compression' should use sentence-style capitalization. Raw Output: {"message": "[FluentBit.Headings] 'Testing Parquet compression' should use sentence-style capitalization.", "location": {"path": "pipeline/outputs/s3.md", "range": {"start": {"line": 678, "column": 5}}}, "severity": "INFO"}

```md
## Testing (Parquet)

Example configuration:

```yaml
service:
flush: 5
daemon: Off
log_level: debug
http_server: Off

pipeline:
inputs:
- name: dummy
tag: dummy.local
dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}

outputs:
- name: s3
match: dummy*
region: us-east-2
bucket: <your_testing_bucket>
use_put_object: On
compression: parquet
# other parameters
```