You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Schema of the <<table, target Snowflake table>>.
103
+
104
+
The following scenarios highlight how these three factors affect data written to the target table.
105
+
106
+
NOTE: For reduced complexity, consider <<schema_evolution, turning on schema evolution>>, which automatically creates and updates the Snowflake table schema based on message contents.
107
+
108
+
=== Scenario: Data and table schema match (schema evolution turned on or off)
109
+
110
+
An output message matches the existing table schema, and the `schema_evolution.enabled` field is set to `true` or `false`.
111
+
112
+
The target Snowflake table has two columns:
113
+
114
+
- `product_id` (NUMBER)
115
+
- `product_code` (STRING)
116
+
117
+
A pipeline generates the following message:
118
+
119
+
```json
120
+
{"product_id": 521, "product_code": “EST-PR”}
121
+
```
122
+
123
+
In this scenario:
124
+
125
+
- The JSON keys in the message (`"product_id"` and `"product_code"`) match column names in the target Snowflake table.
126
+
- The message values match the column data types. (If there was a data mismatch, the message would be rejected.)
127
+
- Redpanda Connect inserts the message values into a new row in the target Snowflake table.
128
+
+
129
+
|===
130
+
| product_id | product_code
131
+
132
+
^| 521
133
+
^| EST-PR
134
+
|===
135
+
136
+
=== Scenario: Data and table schema mismatch (schema evolution turned on)
137
+
138
+
An output message includes schema updates, and the `schema_evolution.enabled` field is set to `true`.
139
+
140
+
The target Snowflake table has the same two columns as the <<scenario-data-and-table-schema-match-schema-evolution-turned-on-or-off, previous scenario>>:
141
+
142
+
- `product_id` (NUMBER)
143
+
- `product_code` (STRING)
144
+
145
+
This time, the pipeline generates the following message:
- The JSON keys (`"product_batch"` and `"product_color"`) do not match column names in the target Snowflake table.
154
+
- As schema evolution is enabled, Redpanda Connect adds two new columns to the target table with data types derived from the output message values. For more information about the mapping of data types, see <<supported-data-formats-for-snowflake-columns, Supported data formats for Snowflake columns>>.
155
+
- Redpanda Connect inserts the message values into a new table row.
- The JSON keys (`"product_batch"` and `"product_color"`) do not match any existing column names.
187
+
- Because schema evolution is turned off, Redpanda Connect ignores the extra column names and values and inserts a row of null values.
188
+
+
189
+
|===
190
+
| product_id | product_code
191
+
192
+
^| (null)
193
+
^| (null)
194
+
195
+
|===
196
+
95
197
== Supported data formats for Snowflake columns
96
198
97
199
The message data from your output must match the columns in the Snowflake table that you want to write data to. The following table shows you the https://docs.snowflake.com/en/sql-reference/intro-summary-data-types[column data types supported by Snowflake^] and how they correspond to the xref:guides:bloblang/methods.adoc#type[Bloblang data types] in Redpanda Connect.
@@ -329,9 +431,22 @@ Options to control schema updates when messages are written to the Snowflake tab
329
431
330
432
=== `schema_evolution.enabled`
331
433
332
-
Whether schema evolution is enabled. When set to `true`, the Snowflake table is automatically created based on the schema of the first message written to it, if the table does not already exist. As new fields are added to subsequent messages in the pipeline, existing columns are created in the Snowflake table. Any required columns are marked as `nullable` if new messages do not include data for them.
434
+
Whether schema evolution is enabled. When set to `true`, the Snowflake table is automatically created based on the schema of the first message written to it, if the table does not already exist. As new fields are added to subsequent messages in the pipeline, new columns are created in the Snowflake table. Any required columns are marked as `nullable` if new messages do not include data for them.
435
+
436
+
*Type*: `bool`
437
+
438
+
=== `schema_evolution.ignore_nulls`
439
+
440
+
When set to `true` and schema evolution is enabled, new columns that have `null` values _are not_ added to the Snowflake table. This behavior:
441
+
442
+
- Prevents unnecessary schema changes caused by placeholder or incomplete data.
443
+
- Avoids creating table columns with incorrect data types.
444
+
445
+
NOTE: Redpanda does not recommend updating the default setting unless you are confident about the data type of `null` columns in advance.
0 commit comments