Skip to content

Commit ac7dbbc

Browse files
committed
Improve SQL transform documentation for calcite_connection_properties in YAML
- Add comprehensive SQL transform examples showing calcite_connection_properties usage - Create dedicated sql/ directory under yaml/examples/transforms with 5 examples: - sql_basic_example.yaml: Basic SQL without special configuration - sql_postgresql_functions.yaml: PostgreSQL functions like SPLIT_PART - sql_bigquery_functions.yaml: BigQuery syntax and functions - sql_mysql_functions.yaml: MySQL date/string functions - sql_advanced_configuration.yaml: Multiple configuration options - Add detailed README.md explaining calcite_connection_properties options - Update yaml/tests/sql.yaml with calcite_connection_properties test cases - Update examples/README.md to reference new SQL documentation This addresses the issue where calcite_connection_properties configuration was 'tricky to get right' by providing clear examples and documentation for different SQL dialects and use cases. Fixes: SQL options in YAML/xlang pipelines need better documentation
1 parent 16062d3 commit ac7dbbc

File tree

8 files changed

+540
-0
lines changed

8 files changed

+540
-0
lines changed

sdks/python/apache_beam/yaml/examples/README.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,18 @@ These examples leverage the built-in mapping transforms including `MapToFields`,
100100
`Filter` and `Explode`. More information can be found about mapping transforms
101101
[here](https://beam.apache.org/documentation/sdks/yaml-udf/).
102102

103+
### SQL
104+
105+
Examples that demonstrate SQL transforms with various database dialect configurations:
106+
107+
- [Basic SQL Transform](transforms/sql/sql_basic_example.yaml) - Simple SQL queries without special configuration
108+
- [PostgreSQL Functions](transforms/sql/sql_postgresql_functions.yaml) - Using PostgreSQL-specific functions like SPLIT_PART
109+
- [BigQuery Functions](transforms/sql/sql_bigquery_functions.yaml) - BigQuery syntax and functions with proper calcite_connection_properties
110+
- [MySQL Functions](transforms/sql/sql_mysql_functions.yaml) - MySQL-specific date and string functions
111+
- [Advanced Configuration](transforms/sql/sql_advanced_configuration.yaml) - Multiple calcite_connection_properties options
112+
113+
These examples show how to use the `calcite_connection_properties` pipeline option to configure SQL transforms for different database dialects and enable dialect-specific functions and syntax.
114+
103115
### IO
104116

105117
#### Spanner
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
# SQL Transform calcite_connection_properties Configuration Guide
2+
3+
This directory contains examples demonstrating how to use `calcite_connection_properties` in Beam YAML pipelines to configure SQL transforms for different database dialects and use cases.
4+
5+
## Overview
6+
7+
The `calcite_connection_properties` option in pipeline options allows you to configure Apache Calcite's SQL parser and function library to support database-specific SQL syntax and functions. This is particularly useful when you need to use SQL functions or syntax that are specific to certain databases like PostgreSQL, BigQuery, MySQL, or Oracle.
8+
9+
## Configuration Options
10+
11+
The most commonly used `calcite_connection_properties` include:
12+
13+
### Function Libraries (`fun`)
14+
- `"standard"` - Standard SQL functions (default)
15+
- `"postgresql"` - PostgreSQL-specific functions (e.g., SPLIT_PART, STRING_AGG)
16+
- `"bigquery"` - BigQuery-specific functions (e.g., FORMAT_TIMESTAMP, ARRAY_TO_STRING)
17+
- `"mysql"` - MySQL-specific functions (e.g., DATEDIFF, SUBSTRING_INDEX)
18+
- `"oracle"` - Oracle-specific functions (e.g., NVL, SUBSTR)
19+
20+
### Lexical Rules (`lex`)
21+
- `"standard"` - Standard SQL lexical rules (default)
22+
- `"big_query"` - BigQuery lexical rules and syntax
23+
- `"mysql"` - MySQL lexical rules
24+
- `"oracle"` - Oracle lexical rules
25+
26+
### Other Properties
27+
- `conformance` - SQL conformance level ("LENIENT", "STRICT", etc.)
28+
- `caseSensitive` - Whether identifiers are case sensitive ("true"/"false")
29+
- `quotedCasing` - How to handle quoted identifiers ("UNCHANGED", "TO_UPPER", "TO_LOWER")
30+
- `unquotedCasing` - How to handle unquoted identifiers
31+
32+
## Usage Patterns
33+
34+
### Basic Configuration
35+
```yaml
36+
options:
37+
calcite_connection_properties:
38+
fun: "postgresql"
39+
```
40+
41+
### Advanced Configuration
42+
```yaml
43+
options:
44+
calcite_connection_properties:
45+
fun: "bigquery"
46+
lex: "big_query"
47+
conformance: "LENIENT"
48+
caseSensitive: "false"
49+
```
50+
51+
## Examples in this Directory
52+
53+
1. **sql_basic_example.yaml** - Basic SQL transform without special configuration
54+
2. **sql_postgresql_functions.yaml** - Using PostgreSQL functions like SPLIT_PART
55+
3. **sql_bigquery_functions.yaml** - BigQuery syntax and functions
56+
4. **sql_mysql_functions.yaml** - MySQL-specific date and string functions
57+
5. **sql_advanced_configuration.yaml** - Multiple configuration options
58+
59+
## Common Use Cases
60+
61+
### PostgreSQL Functions
62+
Useful for string manipulation and array operations:
63+
```yaml
64+
options:
65+
calcite_connection_properties:
66+
fun: "postgresql"
67+
```
68+
69+
### BigQuery Compatibility
70+
For BigQuery-style syntax and functions:
71+
```yaml
72+
options:
73+
calcite_connection_properties:
74+
fun: "bigquery"
75+
lex: "big_query"
76+
```
77+
78+
### Lenient SQL Parsing
79+
For more flexible SQL parsing:
80+
```yaml
81+
options:
82+
calcite_connection_properties:
83+
conformance: "LENIENT"
84+
```
85+
86+
## Important Notes
87+
88+
- These properties affect only the SQL parsing and function availability, not the actual data processing semantics
89+
- Some database-specific functions may not be available depending on the Calcite version used
90+
- Always test your SQL queries with the intended configuration before deploying to production
91+
- The `calcite_connection_properties` must be specified in the pipeline `options` section, not in individual transform configurations
92+
93+
## Troubleshooting
94+
95+
If you encounter SQL parsing errors:
96+
97+
1. Check that the function you're using is supported by the specified function library
98+
2. Verify that the lexical rules (`lex`) match your SQL syntax style
99+
3. Try using `conformance: "LENIENT"` for more flexible parsing
100+
4. Refer to the Apache Calcite documentation for supported functions in each dialect
101+
102+
For more information about Beam SQL and supported functions, see:
103+
- [Beam SQL Documentation](https://beam.apache.org/documentation/dsls/sql/overview/)
104+
- [Apache Calcite SQL Reference](https://calcite.apache.org/docs/reference.html)
Lines changed: 87 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# coding=utf-8
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one or more
4+
# contributor license agreements. See the NOTICE file distributed with
5+
# this work for additional information regarding copyright ownership.
6+
# The ASF licenses this file to You under the Apache License, Version 2.0
7+
# (the "License"); you may not use this file except in compliance with
8+
# the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
# Advanced SQL Transform Configuration Examples
19+
# This example demonstrates multiple calcite_connection_properties and their effects.
20+
21+
pipeline:
22+
transforms:
23+
- type: Create
24+
name: CreateComplexData
25+
config:
26+
elements:
27+
- {id: 1, name: "Product A", price: 29.99, tags: ["electronics", "gadget"], metadata: '{"brand": "TechCorp", "warranty": 12}'}
28+
- {id: 2, name: "Product B", price: 15.50, tags: ["books", "fiction"], metadata: '{"author": "John Doe", "pages": 320}'}
29+
- {id: 3, name: "Product C", price: 199.99, tags: ["electronics", "computer"], metadata: '{"brand": "CompuTech", "warranty": 24}'}
30+
31+
# Example 1: Standard SQL with strict conformance
32+
- type: Sql
33+
name: StandardSQL
34+
input: CreateComplexData
35+
config:
36+
query: |
37+
SELECT
38+
id,
39+
name,
40+
price,
41+
CASE
42+
WHEN price < 20 THEN 'Budget'
43+
WHEN price < 100 THEN 'Mid-range'
44+
ELSE 'Premium'
45+
END as price_category
46+
FROM PCOLLECTION
47+
WHERE price > 10
48+
ORDER BY price
49+
50+
# Example 2: Using Oracle-style functions
51+
- type: Sql
52+
name: OracleStyleSQL
53+
input: CreateComplexData
54+
config:
55+
query: |
56+
SELECT
57+
id,
58+
name,
59+
price,
60+
-- Oracle-style string functions
61+
SUBSTR(name, 1, 10) as short_name,
62+
LENGTH(name) as name_length,
63+
NVL(name, 'Unknown') as safe_name
64+
FROM PCOLLECTION
65+
66+
- type: LogForTesting
67+
input: StandardSQL
68+
69+
- type: LogForTesting
70+
input: OracleStyleSQL
71+
72+
# Multiple calcite_connection_properties can be configured:
73+
# - conformance: Controls SQL conformance level (LENIENT, STRICT, etc.)
74+
# - caseSensitive: Whether identifiers are case sensitive
75+
# - quotedCasing: How to handle quoted identifiers (UNCHANGED, TO_UPPER, TO_LOWER)
76+
# - unquotedCasing: How to handle unquoted identifiers
77+
# - fun: SQL function library (standard, oracle, mysql, postgresql, bigquery, etc.)
78+
# - lex: Lexical analysis rules (standard, oracle, mysql, big_query, etc.)
79+
options:
80+
calcite_connection_properties:
81+
conformance: "LENIENT"
82+
fun: "oracle"
83+
lex: "oracle"
84+
caseSensitive: "false"
85+
quotedCasing: "UNCHANGED"
86+
unquotedCasing: "TO_UPPER"
87+
streaming: false
Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# coding=utf-8
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one or more
4+
# contributor license agreements. See the NOTICE file distributed with
5+
# this work for additional information regarding copyright ownership.
6+
# The ASF licenses this file to You under the Apache License, Version 2.0
7+
# (the "License"); you may not use this file except in compliance with
8+
# the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
# Basic SQL Transform Example
19+
# This example demonstrates basic SQL transform usage with default Calcite configuration.
20+
21+
pipeline:
22+
transforms:
23+
- type: Create
24+
name: CreateData
25+
config:
26+
elements:
27+
- {id: 1, name: "Alice", age: 30, city: "Seattle"}
28+
- {id: 2, name: "Bob", age: 25, city: "Portland"}
29+
- {id: 3, name: "Charlie", age: 35, city: "San Francisco"}
30+
- {id: 4, name: "Diana", age: 28, city: "Seattle"}
31+
32+
- type: Sql
33+
name: FilterAndGroup
34+
input: CreateData
35+
config:
36+
query: |
37+
SELECT
38+
city,
39+
COUNT(*) as person_count,
40+
AVG(age) as avg_age
41+
FROM PCOLLECTION
42+
WHERE age >= 25
43+
GROUP BY city
44+
ORDER BY city
45+
46+
- type: LogForTesting
47+
input: FilterAndGroup
48+
49+
options:
50+
streaming: false
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# coding=utf-8
2+
#
3+
# Licensed to the Apache Software Foundation (ASF) under one or more
4+
# contributor license agreements. See the NOTICE file distributed with
5+
# this work for additional information regarding copyright ownership.
6+
# The ASF licenses this file to You under the Apache License, Version 2.0
7+
# (the "License"); you may not use this file except in compliance with
8+
# the License. You may obtain a copy of the License at
9+
#
10+
# http://www.apache.org/licenses/LICENSE-2.0
11+
#
12+
# Unless required by applicable law or agreed to in writing, software
13+
# distributed under the License is distributed on an "AS IS" BASIS,
14+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
# See the License for the specific language governing permissions and
16+
# limitations under the License.
17+
18+
# SQL Transform with BigQuery Functions and Syntax
19+
# This example demonstrates using BigQuery-specific SQL syntax and functions.
20+
# The calcite_connection_properties enable BigQuery function library and lexical rules.
21+
22+
pipeline:
23+
transforms:
24+
- type: Create
25+
name: CreateSalesData
26+
config:
27+
elements:
28+
- {transaction_id: "txn_001", customer_id: 101, amount: 250.75, timestamp: "2024-01-15T10:30:00Z", product_categories: ["electronics", "accessories"]}
29+
- {transaction_id: "txn_002", customer_id: 102, amount: 89.99, timestamp: "2024-01-15T11:45:00Z", product_categories: ["books", "education"]}
30+
- {transaction_id: "txn_003", customer_id: 103, amount: 1250.00, timestamp: "2024-01-15T14:20:00Z", product_categories: ["electronics", "computers"]}
31+
- {transaction_id: "txn_004", customer_id: 101, amount: 45.50, timestamp: "2024-01-16T09:15:00Z", product_categories: ["food", "groceries"]}
32+
33+
- type: Sql
34+
name: AnalyzeSalesData
35+
input: CreateSalesData
36+
config:
37+
query: |
38+
SELECT
39+
customer_id,
40+
COUNT(*) as transaction_count,
41+
SUM(amount) as total_spent,
42+
AVG(amount) as avg_transaction_amount,
43+
-- BigQuery-style date/time functions
44+
FORMAT_TIMESTAMP('%Y-%m-%d', PARSE_TIMESTAMP('%Y-%m-%dT%H:%M:%SZ', timestamp)) as transaction_date,
45+
-- BigQuery array functions (when available)
46+
ARRAY_TO_STRING(product_categories, ', ') as categories_str,
47+
-- Conditional aggregation using BigQuery syntax
48+
COUNTIF(amount > 100) as high_value_transactions,
49+
-- BigQuery mathematical functions
50+
ROUND(amount, 2) as rounded_amount
51+
FROM PCOLLECTION
52+
GROUP BY customer_id, transaction_date, categories_str, rounded_amount
53+
ORDER BY customer_id, total_spent DESC
54+
55+
- type: LogForTesting
56+
input: AnalyzeSalesData
57+
58+
# Configure Calcite to use BigQuery function library and syntax
59+
# 'fun': 'bigquery' enables BigQuery-specific functions
60+
# 'lex': 'big_query' enables BigQuery lexical rules and syntax
61+
options:
62+
calcite_connection_properties:
63+
fun: "bigquery"
64+
lex: "big_query"
65+
streaming: false

0 commit comments

Comments
 (0)