Skip to content

Commit 1d7face

Browse files
committed
DOC-361 First pass of HTML table in the doc
1 parent 49b2dda commit 1d7face

File tree

1 file changed

+109
-4
lines changed

1 file changed

+109
-4
lines changed

src/connections/storage/warehouses/schema.md

Lines changed: 109 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -229,10 +229,115 @@ AND table_name = '<event>'
229229
ORDER by column_name
230230
```
231231

232-
> info "Note"
233-
> If you send us an array, we stringify it in Redshift. That way you don't end up having to pollute your events. It won't work if you have a lot of array elements but should work decently to store and query those. We also flatten nested objects. 
232+
### How event tables handle nested objects and arrays
233+
234+
In order to preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables:
235+
236+
<table border="1" cellspacing="0" cellpadding="0" width="100%">
237+
<tr>
238+
<th> Value Type </th>
239+
<th> Field Type </th>
240+
<th> Transformation </th>
241+
<th> Schema (Example) </th>
242+
<th> Code (Example) </th>
243+
</tr>
244+
245+
<tr>
246+
<td><b>Object</b></td>
247+
<td> Context </td>
248+
<td> Flatten </td>
249+
<td>
250+
251+
``` json
252+
context: {
253+
app: {
254+
version: "1.0.0"
255+
}
256+
}
257+
```
258+
259+
</td>
260+
<td>
261+
<b>Column Name:</b><br/>
262+
context_app_version
263+
<br/><br/>
264+
<b>Value:</b><br/>
265+
"1.0.0"
266+
</td>
267+
</tr>
268+
269+
<tr>
270+
<td></td>
271+
<td> Traits </td>
272+
<td> Flatten </td>
273+
<td>
274+
275+
```json
276+
traits: {
277+
address: {
278+
street: "6th Street"
279+
}
280+
}
281+
```
234282

283+
</td>
284+
<td>
285+
<b>Column Name:</b><br/>
286+
address_street<br/>
287+
<br/>
288+
<b>Value:</b><br/>
289+
"6th Street"
290+
</td>
291+
</tr>
292+
293+
<tr>
294+
<td></td>
295+
<td>Properties </td>
296+
<td>Stringify</td>
297+
<td>
298+
299+
```json
300+
properties: {
301+
product_id: {
302+
sku: "G-32"
303+
}
304+
}
305+
```
306+
307+
</td>
308+
<td>
309+
<b>Column Name:</b><br/>
310+
product_id<br/><br/>
311+
<b>Value:</b><br/>
312+
"{sku.'G-32'}"
313+
</td>
314+
</tr>
315+
316+
<tr>
317+
<td>
318+
<b>Array</b>
319+
</td>
320+
<td>Any</td>
321+
<td>Stringify</td>
322+
<td>
323+
324+
```json
325+
products: {
326+
product_id: [
327+
"507f1f77bcf86cd799439011", "505bd76785ebb509fc183733"
328+
]
329+
}
330+
```
235331

332+
</td>
333+
<td>
334+
<b>Column Name:</b> <br/>
335+
product_id <br/><br/>
336+
<b>Value:</b>
337+
"[507f1f77bcf86cd799439011, 505bd76785ebb509fc183733]"
338+
</td>
339+
</tr>
340+
</table>
236341

237342
## Tracks vs. Events Tables
238343

@@ -303,7 +408,7 @@ New event properties and traits create columns. Segment processes the incoming d
303408
304409
When Segment process a new batch and discover a new column to add, we take the most recent occurrence of a column and choose its datatype.
305410

306-
The datatypes that we support right now are
411+
The data types that we currently support include
307412

308413
- `timestamp`
309414
- `integer` 
@@ -325,7 +430,7 @@ All four timestamps pass through to your Warehouse for every ETL'd event. In mos
325430

326431
`timestamp` is the UTC-converted timestamp which is set by the Segment library. If you are importing historical events using a server-side library, this is the timestamp you'll want to reference in your queries.
327432

328-
`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabed as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column.
433+
`original_timestamp` is the original timestamp set by the Segment library at the time the event is created. Keep in mind, this timestamp can be affected by device clock skew. You can override this value by manually passing in a value for `timestamp` which will then be relabeled as `original_timestamp`. Generally, this timestamp should be ignored in favor of the `timestamp` column.
329434

330435
`sent_at` is the UTC timestamp set by library when the Segment API call was sent. This timestamp can also be affected by device clock skew.
331436

0 commit comments

Comments
 (0)