Skip to content

Commit 90b4f3d

Browse files
committed
out_s3: Add an instruction for enabling parquet compression
Signed-off-by: Hiroshi Hatake <[email protected]>
1 parent e8f4a93 commit 90b4f3d

File tree

1 file changed

+53
-1
lines changed

1 file changed

+53
-1
lines changed

pipeline/outputs/s3.md

Lines changed: 53 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,8 @@ The [Prometheus success/retry/error metrics values](../../administration/monitor
4545
| `sts_endpoint` | Custom endpoint for the STS API. | _none_ |
4646
| `profile` | Option to specify an AWS Profile for credentials. | `default` |
4747
| `canned_acl` | [Predefined Canned ACL policy](https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl) for S3 objects. | _none_ |
48-
| `compression` | Compression type for S3 objects. `gzip` is currently the only supported value by default. If Apache Arrow support was enabled at compile time, you can use `arrow`. For gzip compression, the Content-Encoding HTTP Header will be set to `gzip`. Gzip compression can be enabled when `use_put_object` is `on` or `off` (`PutObject` and Multipart). Arrow compression can only be enabled with `use_put_object On`. | _none_ |
48+
| `compression` | Compression/format for S3 objects. Supported: `gzip` (always available) and `parquet` (requires Arrow build). For `gzip`, the `Content-Encoding` header is set to `gzip`. `parquet` is available **only when Fluent Bit is built with `-DFLB_ARROW=On`** and Arrow GLib/Parquet GLib are installed. Parquet is typically used with `use_put_object On`. | *none* |
49+
4950
| `content_type` | A standard MIME type for the S3 object, set as the Content-Type HTTP header. | _none_ |
5051
| `send_content_md5` | Send the Content-MD5 header with `PutObject` and UploadPart requests, as is required when Object Lock is enabled. | `false` |
5152
| `auto_retry_requests` | Immediately retry failed requests to AWS services once. This option doesn't affect the normal Fluent Bit retry mechanism with backoff. Instead, it enables an immediate retry with no delay for networking errors, which can help improve throughput during transient network issues. | `true` |
@@ -649,3 +650,54 @@ The following example uses `pyarrow` to analyze the uploaded data:
649650
3 2021-04-27T09:33:56.539430Z 0.0 0.0 0.0 0.0 0.0 0.0
650651
4 2021-04-27T09:33:57.539803Z 0.0 0.0 0.0 0.0 0.0 0.0
651652
```
653+
654+
## Enable Parquet support
655+
656+
### Build requirements for Parquet
657+
658+
To enable Parquet, build Fluent Bit with Apache Arrow support and install Arrow GLib/Parquet GLib:
659+
660+
```bash
661+
# Ubuntu/Debian example
662+
sudo apt-get update
663+
sudo apt-get install -y -V ca-certificates lsb-release wget
664+
wget https://packages.apache.org/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
665+
sudo apt-get install -y -V ./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb
666+
sudo apt-get update
667+
sudo apt-get install -y -V libarrow-glib-dev libparquet-glib-dev
668+
669+
# Build Fluent Bit with Arrow:
670+
cd build/
671+
cmake -DFLB_ARROW=On ..
672+
cmake --build .
673+
674+
### Testing Parquet compression
675+
676+
677+
```md
678+
## Testing (Parquet)
679+
680+
Example configuration:
681+
682+
```yaml
683+
service:
684+
flush: 5
685+
daemon: Off
686+
log_level: debug
687+
http_server: Off
688+
689+
pipeline:
690+
inputs:
691+
- name: dummy
692+
tag: dummy.local
693+
dummy {"boolean": false, "int": 1, "long": 1, "float": 1.1, "double": 1.1, "bytes": "foo", "string": "foo"}
694+
695+
outputs:
696+
- name: s3
697+
match: dummy*
698+
region: us-east-2
699+
bucket: <your_testing_bucket>
700+
use_put_object: On
701+
compression: parquet
702+
# other parameters
703+
```

0 commit comments

Comments
 (0)