Skip to content

Commit 3adaa58

Browse files
add details about data type mismatch
1 parent 1c7d529 commit 3adaa58

File tree

1 file changed

+10
-14
lines changed

1 file changed

+10
-14
lines changed

src/connections/storage/warehouses/schema.md

Lines changed: 10 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -401,32 +401,28 @@ ORDER BY day
401401
| 2014-07-20 | $1,595 |
402402
| 2014-07-21 | $2,350 |
403403

404+
## Schema Evolution and Compatibility
405+
404406
### New Columns
405407

406408
New event properties and traits create columns. Segment processes the incoming data in batches, based on either data size or an interval of time. If the table doesn't exist we lock and create the table. If the table exists but new columns need to be created, we perform a diff and alter the table to append new columns.
407409

408410
When Segment process a new batch and discover a new column to add, we take the most recent occurrence of a column and choose its datatype.
409411

412+
### Data Types
410413

411-
### Supported Data Types
412-
Data types are set up in your warehouse based on the first value that comes in from a source. For example, if the first value that came in from a source was a string, Segment would set the data type in the warehouse to `string`.
413-
414-
The data types that Segment currently supports include:
415-
416-
#### `timestamp`
417-
418-
#### `integer`
414+
The data types that Segment currently supports include `timestamp`, `integer`, `float`, `boolean`, and `varchar`.
419415

420-
#### `float`
416+
Data types are set up in your warehouse based on the first value that comes in from a source. For example, if the first value that came in from a source was a string, Segment would set the data type in the warehouse to `string`.
421417

422-
#### `boolean`
418+
In cases where a data type is determined incorrectly, the support team can help you update the data type. As an example, if a field can include float values as well as integers, but the first value we received was an integer, we will set the data type of the field to integer, resulting in a loss of precision.
423419

424-
#### `varchar`
420+
To update the data type, the support team will update the internal schema that Segment uses to infer your warehouse schema. We will start syncing the data with the correct data type after the change is made. However, if you want to backfill all historical data correctly, it will be required to drop the impacted tables on your end so Segment can recreate them in the correct datatype, and then backfill those tables.
425421

426-
> note " "
427-
> To change data types after they've been determined, please reach out to [Segment Support](https://segment.com/help/contact) for assistance.
422+
To request data types changes, please reach out to [Segment Support](https://segment.com/help/contact) for assistance, and provide with these details for the affected columns in the following format:
423+
`<schema_name>.<table_name>.<column_name>.<current_datatype>.<new_datatype>`
428424

429-
## Column Sizing
425+
### Column Sizing
430426

431427
After analyzing the data from dozens of customers, we set the string column length limit at 512 characters. Longer strings are truncated. We found this was the sweet spot for good performance and ignoring non-useful data.
432428

0 commit comments

Comments
 (0)