Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion ingestion/formats-and-encoding-options.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -265,7 +265,9 @@ When using a file source to read Parquet files, the schema must be defined accor
| `timestamp(_, None)` | timestamp |

<Note>
Parquet sources require case-sensitive column names. However, PostgreSQL converts unquoted column names to lowercase by default. To preserve case sensitivity when defining the schema, use double quotes around column names.
By default, Parquet sources require case-sensitive column name matching. However, PostgreSQL converts unquoted column names to lowercase by default. To preserve case sensitivity when defining the schema, use double quotes around column names.

Alternatively, you can set `parquet.case_insensitive = 'true'` in the source connector options to enable case-insensitive column matching. This allows you to use lowercase column names in your schema (e.g., `id`, `name`) even when the Parquet file has mixed-case column names (e.g., `ID`, `Name`). If multiple columns match case-insensitively, the match is ambiguous and ignored.
</Note>

## Parameter reference
Expand Down
20 changes: 20 additions & 0 deletions integrations/sources/azure-blob.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ FORMAT data_format ENCODE data_encode (
| azblob.endpoint\_url | **Required**. The URL of the Azure Blob Storage service endpoint. |
| match\_pattern | **Conditional**. Set to find object keys in `azblob.container_name` that match the given pattern. Standard Unix-style [glob](https://en.wikipedia.org/wiki/Glob%5F%28programming%29) syntax is supported. A typical usage follows the `prefix/*.suffix` pattern. For example, `your_directory/*.parquet` matches all Parquet files under `your_directory/`. If `match_pattern` does not contain `/`, the scan runs from the container root. |
| compression\_format | **Optional**. Specifies the compression format of the file being read. When set to gzip or gz, the file reader reads all files with the `.gz` suffix; when set to `None` or not defined, the file reader will automatically read and decompress `.gz` and `.gzip` files. |
| parquet.case\_insensitive | **Optional**. For Parquet files only. When set to `true`, enables case-insensitive column name matching. This is useful when the Parquet file has column names with different casing than your table schema (e.g., `ID` in the file, `id` in the table). If multiple columns match case-insensitively, the match is ambiguous and ignored. Default is `false`. |

### Other parameters

Expand Down Expand Up @@ -145,6 +146,25 @@ WITH (
azblob.endpoint_url = 'xxx',
match_pattern = '*.parquet',
) FORMAT PLAIN ENCODE PARQUET;
```

To handle Parquet files with mixed-case column names (e.g., `ID`, `Name`, `Age`), use the `parquet.case_insensitive` option:

```sql Example with case-insensitive matching
CREATE SOURCE s2(
id int,
name varchar,
age int
)
WITH (
connector = 'azblob',
azblob.container_name = 'xxx',
azblob.credentials.account_name = 'xxx',
azblob.credentials.account_key = 'xxx',
azblob.endpoint_url = 'xxx',
match_pattern = '*.parquet',
parquet.case_insensitive = 'true'
) FORMAT PLAIN ENCODE PARQUET;
```
</Tab>
</Tabs>
Expand Down
20 changes: 19 additions & 1 deletion integrations/sources/google-cloud-storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ FORMAT data_format ENCODE data_encode (
| match\_pattern | **Conditional**. This field is used to find object keys in the bucket that match the given pattern. Standard Unix-style [glob](https://en.wikipedia.org/wiki/Glob%5F%28programming%29) syntax is supported. A typical usage follows the `prefix/*.suffix` pattern. For example, `your_directory/*.parquet` matches all Parquet files under `your_directory/`. If `match_pattern` does not contain `/`, the scan runs from the container root. |
| compression\_format | **Optional**. This field specifies the compression format of the file being read. You can define `compression_format` in the CREATE TABLE statement. When set to gzip or gz, the file reader reads all files with the `.gz` suffix. When set to None or not defined, the file reader will automatically read and decompress `.gz` and `.gzip` files. |
| refresh.interval.sec | **Optional**. Configure the time interval between operations of listing files. It determines the delay in discovering new files, with a default value of 60 seconds. |
| parquet.case\_insensitive | **Optional**. For Parquet files only. When set to `true`, enables case-insensitive column name matching. This is useful when the Parquet file has column names with different casing than your table schema (e.g., `ID` in the file, `id` in the table). If multiple columns match case-insensitively, the match is ambiguous and ignored. Default is `false`. |

### Other parameters

Expand Down Expand Up @@ -141,8 +142,25 @@ CREATE TABLE t(
WITH (
connector = 'gcs',
gcs.bucket_name = 'example-bucket',
gcs.credential = 'xxxxx'
gcs.credential = 'xxxxx',
match_pattern = '*.parquet'
) FORMAT PLAIN ENCODE PARQUET;
```

To handle Parquet files with mixed-case column names (e.g., `ID`, `Name`, `Age`), use the `parquet.case_insensitive` option:

```sql Example with case-insensitive matching
CREATE TABLE t(
id int,
name varchar,
age int
)
WITH (
connector = 'gcs',
gcs.bucket_name = 'example-bucket',
gcs.credential = 'xxxxx',
match_pattern = '*.parquet',
parquet.case_insensitive = 'true'
) FORMAT PLAIN ENCODE PARQUET;
```
</Tab>
Expand Down
20 changes: 20 additions & 0 deletions integrations/sources/s3.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ For CSV data, specify the delimiter in the `delimiter` option in `ENCODE propert
| `match_pattern` | **Conditional**. This field is used to find object keys in `s3.bucket_name` that match the given pattern. Standard Unix-style [glob](https://en.wikipedia.org/wiki/Glob%5F%28programming%29) syntax is supported. A typical usage follows the `prefix/*.suffix` pattern. For example, `your_directory/*.parquet` matches all Parquet files under `your_directory/`. If `match_pattern` does not contain `/`, the scan runs from the container root. |
| `s3.assume_role` | **Optional**. Specifies the ARN of an IAM role to assume when accessing S3\. It allows temporary, secure access to S3 resources without sharing long-term credentials. |
| `refresh.interval.sec` | **Optional**. Configure the time interval between operations of listing files. It determines the delay in discovering new files, with a default value of 60 seconds. |
| `parquet.case_insensitive` | **Optional**. For Parquet files only. When set to `true`, enables case-insensitive column name matching. This is useful when the Parquet file has column names with different casing than your table schema (e.g., `ID` in the file, `id` in the table). If multiple columns match case-insensitively, the match is ambiguous and ignored. Default is `false`. |

<Note>
In RisingWave Cloud, the default AWS credential provider chain is disabled. Provide `s3.credentials.access` and `s3.credentials.secret` (or use a supported assume-role setup). These credentials cannot be omitted. The `enable_config_load` option is supported only in self-hosted deployments.
Expand Down Expand Up @@ -153,6 +154,25 @@ WITH (
s3.credentials.secret = 'xxxxx'
) FORMAT PLAIN ENCODE PARQUET;

```

To handle Parquet files with mixed-case column names (e.g., `ID`, `Name`, `Age`), use the `parquet.case_insensitive` option:

```sql Example with case-insensitive matching
CREATE TABLE s(
id int,
name varchar,
age int
)
WITH (
connector = 's3_v2',
match_pattern = '*.parquet',
s3.region_name = 'ap-southeast-2',
s3.bucket_name = 'example-s3-source',
s3.credentials.access = 'xxxxx',
s3.credentials.secret = 'xxxxx',
parquet.case_insensitive = 'true'
) FORMAT PLAIN ENCODE PARQUET;
```
</Tab>
</Tabs>
Expand Down