From 300d609664dc041243d9f500ec8e200513a987c4 Mon Sep 17 00:00:00 2001 From: Anton Rubin Date: Tue, 11 Nov 2025 12:07:13 +0000 Subject: [PATCH] adding review comments to codec page Signed-off-by: Anton Rubin --- .../common-use-cases/codec-processor-combinations.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/_data-prepper/common-use-cases/codec-processor-combinations.md b/_data-prepper/common-use-cases/codec-processor-combinations.md index c362f90277a..48578088bb6 100644 --- a/_data-prepper/common-use-cases/codec-processor-combinations.md +++ b/_data-prepper/common-use-cases/codec-processor-combinations.md @@ -39,11 +39,14 @@ The [`newline` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/config ## Parquet -[Apache Parquet](https://parquet.apache.org/docs/overview/) is a columnar storage format built for Hadoop. It is most efficient without the use of a codec. Positive results, however, can be achieved when it's configured with [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source). +[Apache Parquet](https://parquet.apache.org/docs/overview/) is a columnar storage format built for Hadoop. Pipeline authors can use the parquet codec to read Parquet data directly from the S3 object. This will retrieve all data from Parquet. An alternative is to use [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source) instead of the codec. In this case, [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source) will parse the Parquet file directly. This can be more efficient if you are filtering or loading a subset of data. + +Additional S3 charges apply when using [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source). +{: .note} ## Avro -[Apache Avro] helps streamline streaming data pipelines. It is most efficient when used with the [`avro` codec]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sinks/s3#avro-codec) inside an `s3` sink. +[Apache Avro](https://avro.apache.org/docs) is a columnar storage format built for Hadoop. It is most efficient without the use of a codec, however, great results can be achieved when it is configured with [S3 Select]({{site.url}}{{site.baseurl}}/data-prepper/pipelines/configuration/sources/s3#using-s3_select-with-the-s3-source). ## `event_json`