Skip to content

Commit f4bf977

Browse files
authored
DOC-1059 Add column mapping section (#192)
1 parent 2968860 commit f4bf977

File tree

1 file changed

+116
-1
lines changed

1 file changed

+116
-1
lines changed

modules/components/pages/outputs/snowflake_streaming.adoc

Lines changed: 116 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ output:
7272
CREATE TABLE IF NOT EXISTS mytable (amount NUMBER);
7373
schema_evolution:
7474
enabled: false # No default (required)
75+
ignore_nulls: true
7576
processors: [] # No default (optional)
7677
build_options:
7778
parallelism: 1
@@ -92,6 +93,107 @@ output:
9293
--
9394
======
9495

96+
== Conversion of message data into Snowflake table rows
97+
98+
Message data conversion to Snowflake table rows is determined by the:
99+
100+
- Output message contents.
101+
- <<schema_evolution, Schema evolution settings>>.
102+
- Schema of the <<table, target Snowflake table>>.
103+
104+
The following scenarios highlight how these three factors affect data written to the target table.
105+
106+
NOTE: For reduced complexity, consider <<schema_evolution, turning on schema evolution>>, which automatically creates and updates the Snowflake table schema based on message contents.
107+
108+
=== Scenario: Data and table schema match (schema evolution turned on or off)
109+
110+
An output message matches the existing table schema, and the `schema_evolution.enabled` field is set to `true` or `false`.
111+
112+
The target Snowflake table has two columns:
113+
114+
- `product_id` (NUMBER)
115+
- `product_code` (STRING)
116+
117+
A pipeline generates the following message:
118+
119+
```json
120+
{"product_id": 521, "product_code": “EST-PR”}
121+
```
122+
123+
In this scenario:
124+
125+
- The JSON keys in the message (`"product_id"` and `"product_code"`) match column names in the target Snowflake table.
126+
- The message values match the column data types. (If there was a data mismatch, the message would be rejected.)
127+
- Redpanda Connect inserts the message values into a new row in the target Snowflake table.
128+
+
129+
|===
130+
| product_id | product_code
131+
132+
^| 521
133+
^| EST-PR
134+
|===
135+
136+
=== Scenario: Data and table schema mismatch (schema evolution turned on)
137+
138+
An output message includes schema updates, and the `schema_evolution.enabled` field is set to `true`.
139+
140+
The target Snowflake table has the same two columns as the <<scenario-data-and-table-schema-match-schema-evolution-turned-on-or-off, previous scenario>>:
141+
142+
- `product_id` (NUMBER)
143+
- `product_code` (STRING)
144+
145+
This time, the pipeline generates the following message:
146+
147+
```json
148+
{"product_batch": 11111, "product_color": “yellow”}
149+
```
150+
151+
In this scenario:
152+
153+
- The JSON keys (`"product_batch"` and `"product_color"`) do not match column names in the target Snowflake table.
154+
- As schema evolution is enabled, Redpanda Connect adds two new columns to the target table with data types derived from the output message values. For more information about the mapping of data types, see <<supported-data-formats-for-snowflake-columns, Supported data formats for Snowflake columns>>.
155+
- Redpanda Connect inserts the message values into a new table row.
156+
+
157+
|===
158+
| product_id | product_code | product_batch | product_color
159+
160+
^| (null)
161+
^| (null)
162+
^| 11111
163+
^| yellow
164+
165+
|===
166+
+
167+
NOTE: You can <<schema_evolution-processors,configure processors>> to override the schema updates derived from the message values.
168+
169+
=== Scenario: Data and table schema mismatch (schema evolution turned off)
170+
171+
An output message includes schema updates, and the `schema_evolution.enabled` field is set to `false`.
172+
173+
The target Snowflake table has the same two columns:
174+
175+
- `product_id` (NUMBER)
176+
- `product_code` (STRING)
177+
178+
The pipeline generates the same message as the <<scenario-data-and-table-schema-mismatch-schema-evolution-turned-on,previous scenario>>:
179+
180+
```json
181+
{"product_batch": 11111, "product_color": “yellow”}
182+
```
183+
184+
In this scenario:
185+
186+
- The JSON keys (`"product_batch"` and `"product_color"`) do not match any existing column names.
187+
- Because schema evolution is turned off, Redpanda Connect ignores the extra column names and values and inserts a row of null values.
188+
+
189+
|===
190+
| product_id | product_code
191+
192+
^| (null)
193+
^| (null)
194+
195+
|===
196+
95197
== Supported data formats for Snowflake columns
96198

97199
The message data from your output must match the columns in the Snowflake table that you want to write data to. The following table shows you the https://docs.snowflake.com/en/sql-reference/intro-summary-data-types[column data types supported by Snowflake^] and how they correspond to the xref:guides:bloblang/methods.adoc#type[Bloblang data types] in Redpanda Connect.
@@ -329,9 +431,22 @@ Options to control schema updates when messages are written to the Snowflake tab
329431

330432
=== `schema_evolution.enabled`
331433

332-
Whether schema evolution is enabled. When set to `true`, the Snowflake table is automatically created based on the schema of the first message written to it, if the table does not already exist. As new fields are added to subsequent messages in the pipeline, existing columns are created in the Snowflake table. Any required columns are marked as `nullable` if new messages do not include data for them.
434+
Whether schema evolution is enabled. When set to `true`, the Snowflake table is automatically created based on the schema of the first message written to it, if the table does not already exist. As new fields are added to subsequent messages in the pipeline, new columns are created in the Snowflake table. Any required columns are marked as `nullable` if new messages do not include data for them.
435+
436+
*Type*: `bool`
437+
438+
=== `schema_evolution.ignore_nulls`
439+
440+
When set to `true` and schema evolution is enabled, new columns that have `null` values _are not_ added to the Snowflake table. This behavior:
441+
442+
- Prevents unnecessary schema changes caused by placeholder or incomplete data.
443+
- Avoids creating table columns with incorrect data types.
444+
445+
NOTE: Redpanda does not recommend updating the default setting unless you are confident about the data type of `null` columns in advance.
333446

334447
*Type*: `bool`
448+
449+
*Default*: `true`
335450

336451
=== `schema_evolution.processors`
337452

0 commit comments

Comments
 (0)