Skip to content

Commit 3625bb4

Browse files
committed
Warehouse doc improvements
1 parent 0e82b99 commit 3625bb4

File tree

3 files changed

+120
-110
lines changed

3 files changed

+120
-110
lines changed

src/connections/storage/warehouses/faq.md

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ The only restriction when loading your own data into your connected warehouse is
5858

5959
If you want to insert custom data into your warehouse, create new schemas that are not associated with an existing source, since these may be deleted upon a reload of the Segment data in the cluster.
6060

61-
We highly recommend scripting any sort of additions of data you might have to warehouse, so that you aren't doing one-off tasks that can be hard to recover from in the future in the case of hardware failure.
61+
Segment highly recommends scripting any sort of additions of data you might have to warehouse, so that you aren't doing one-off tasks that can be hard to recover from in the future in the case of hardware failure.
6262

6363
## Which IPs should I whitelist?
6464

@@ -69,12 +69,12 @@ If you're in the EU region, use CIDR `3.251.148.96/29`.
6969
> info ""
7070
> EU workspace regions are currently in beta. If you would like to learn more about the beta, please contact your account manager.
7171
72-
BigQuery does not require whitelisting an IP address. To learn how to set up BigQuery, check out our [set up guide](/docs/connections/storage/catalog/bigquery/#getting-started)
72+
BigQuery does not require whitelisting an IP address. To learn how to set up BigQuery, check out Segment's BigQuery [set up guide](/docs/connections/storage/catalog/bigquery/#getting-started)
7373

7474

7575
## Will Segment sync my historical data?
7676

77-
We will automatically load up to 2 months of your historical data when you connect a warehouse.
77+
Segment will automatically load up to two months of your historical data when you connect a warehouse.
7878

7979
For full historical backfills you'll need to be a Segment Business plan customer. If you'd like to learn more about our Business plan and all the features that come with it, [check out our pricing page](https://segment.com/pricing).
8080

@@ -93,12 +93,15 @@ When you create a new source, the source syncs to all warehouse(s) in the worksp
9393

9494
After a source is created, you can enable or disable a warehouse sync within the Warehouse Settings page.
9595

96-
## Can I be notified on warehouse sync failures?
96+
## Can I be notified when warehouse syncs fail?
9797

98-
You will recieve notifications in the Segment app for warehouse sync failiures.
98+
If you enabled activity notifications for your storage destination, you will receive notifications in the Segment app when your warehouse syncs fail.
9999

100-
To view the notifications:
101-
1.
100+
To sign up for warehouse sync notifications:
101+
1. Open the Segment app.
102+
2. Go to **Settings** > **User Preferences**.
103+
3. In the Activity Notifications section, select **Storage Destinations**.
104+
4. Enable **Storage Destination Sync Failed**.
102105

103106
## How is my data formatted in my warehouse?
104107

@@ -109,7 +112,7 @@ Data in your warehouse is formatted into **schemas**, which involve a detailed d
109112
If your syncs fail, you will need to reach out to [Segment Support](https://segment.com/help/) to ask for a backfill. Be sure to include the following information in your request:
110113
- The warehouse that requires the backfill
111114
- What sources you need information from
112-
- The timeframe of data that requires a backfill
115+
- The date range of data that requires a backfill
113116

114117
## Can I change my schema names once they've been created?
115118

src/connections/storage/warehouses/schema.md

Lines changed: 108 additions & 101 deletions
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,116 @@ title: Warehouse Schemas
55
A **schema** describes the way that the data in a warehouse is organized. Schemas include a detailed description of database elements (tables, views, indexes, synonyms, etc.) and the relationships that exist between elements.
66

77
Schemas of warehouse data are organized into the following template: <br/>
8-
`<source>.<collection>.<property>` eg. `segment-engineering.tracks.userId`, where Source refers to the source or project name (segment-engineering), collection refers to the event (tracks), and the property refers to the data being collected (userId).
8+
`<source>.<collection>.<property>` for example `segment-engineering.tracks.userId`, where Source refers to the source or project name (segment-engineering), collection refers to the event (tracks), and the property refers to the data being collected (userId).
99

10-
> note "Data warehouse column creation"
10+
> note "Warehouse column creation"
1111
> **Note:** Segment creates tables for each of your custom events in your warehouse, with columns for each event's custom properties. Segment does not allow unbounded `event` or `property` spaces in your data. Instead of recording events like "Ordered Product 15", use a single property of "Product Number" or similar.
1212
13+
### How warehouse tables handle nested objects and arrays
14+
15+
Segment's libraries pass nested objects and arrays into tracking calls as **properties**, **traits**, and **tracking calls**. To preserve the quality of your events data, Segment uses the following methods to store properties and traits in database tables:
16+
17+
- The warehouse connector stringifies all **properties** that contain a nested **array/object**
18+
- The warehouse connector stringifies all **context fields** that contain a nested **array**
19+
- The warehouse connector stringifies all **traits** that contain a nested **array**
20+
- The warehouse connector "flattens" all **traits** that contain a nested **object**
21+
- The warehouse connector optionally stringifies **arrays** when they follow our [Ecommerce spec](/docs/connections/spec/ecommerce/v2/)
22+
- The warehouse connector "flattens" all **context fields** that contain a nested **object** (for example, context.field.nestedA.nestedB becomes a column called context_field_nestedA_nestedB)
23+
24+
<table>
25+
<thead>
26+
<tr>
27+
<th> Field </th>
28+
<th> Code (Example) </th>
29+
<th> Schema (Example) </th>
30+
</tr>
31+
</thead>
32+
33+
<tr>
34+
<td><b>Object (Context):</b> Flatten </td>
35+
<td markdown="1">
36+
37+
``` json
38+
context: {
39+
app: {
40+
version: "1.0.0"
41+
}
42+
}
43+
```
44+
</td>
45+
<td>
46+
<b>Column Name:</b><br/>
47+
context_app_version
48+
<br/><br/>
49+
<b>Value:</b><br/>
50+
"1.0.0"
51+
</td>
52+
</tr>
53+
54+
<tr>
55+
<td> <b>Object (Traits):</b> Flatten </td>
56+
<td markdown= "1">
57+
58+
```json
59+
traits: {
60+
address: {
61+
street: "6th Street"
62+
}
63+
}
64+
```
65+
66+
</td>
67+
<td>
68+
<b>Column Name:</b><br/>
69+
address_street<br/>
70+
<br/>
71+
<b>Value:</b><br/>
72+
"6th Street"
73+
</td>
74+
</tr>
75+
76+
<tr>
77+
<td><b>Object (Properties):</b> Stringify</td>
78+
<td markdown="1">
79+
80+
```json
81+
properties: {
82+
product_id: {
83+
sku: "G-32"
84+
}
85+
}
86+
```
87+
</td>
88+
<td>
89+
<b>Column Name:</b><br/>
90+
product_id<br/><br/>
91+
<b>Value:</b><br/>
92+
"{sku.'G-32'}"
93+
</td>
94+
</tr>
95+
96+
<tr>
97+
<td><b>Array (Any):</b> Stringify</td>
98+
<td markdown="1">
99+
100+
```json
101+
products: {
102+
product_id: [
103+
"507f1", "505bd"
104+
]
105+
}
106+
```
107+
108+
</td>
109+
<td>
110+
<b>Column Name:</b> <br/>
111+
product_id <br/><br/>
112+
<b>Value:</b>
113+
"[507f1, 505bd]"
114+
</td>
115+
</tr>
116+
</table>
117+
13118
## Warehouse tables
14119

15120
The table below describes the schema in Segment Warehouses:
@@ -237,104 +342,6 @@ AND table_name = '<event>'
237342
ORDER by column_name
238343
```
239344

240-
### How event tables handle nested objects and arrays
241-
242-
To preserve the quality of your events data, Segment uses the following methods to store objects and arrays in the event tables:
243-
244-
<table>
245-
<thead>
246-
<tr>
247-
<th> Field </th>
248-
<th> Code (Example) </th>
249-
<th> Schema (Example) </th>
250-
</tr>
251-
</thead>
252-
253-
<tr>
254-
<td><b>Object (Context):</b> Flatten </td>
255-
<td markdown="1">
256-
257-
``` json
258-
context: {
259-
app: {
260-
version: "1.0.0"
261-
}
262-
}
263-
```
264-
</td>
265-
<td>
266-
<b>Column Name:</b><br/>
267-
context_app_version
268-
<br/><br/>
269-
<b>Value:</b><br/>
270-
"1.0.0"
271-
</td>
272-
</tr>
273-
274-
<tr>
275-
<td> <b>Object (Traits):</b> Flatten </td>
276-
<td markdown= "1">
277-
278-
```json
279-
traits: {
280-
address: {
281-
street: "6th Street"
282-
}
283-
}
284-
```
285-
286-
</td>
287-
<td>
288-
<b>Column Name:</b><br/>
289-
address_street<br/>
290-
<br/>
291-
<b>Value:</b><br/>
292-
"6th Street"
293-
</td>
294-
</tr>
295-
296-
<tr>
297-
<td><b>Object (Properties):</b> Stringify</td>
298-
<td markdown="1">
299-
300-
```json
301-
properties: {
302-
product_id: {
303-
sku: "G-32"
304-
}
305-
}
306-
```
307-
</td>
308-
<td>
309-
<b>Column Name:</b><br/>
310-
product_id<br/><br/>
311-
<b>Value:</b><br/>
312-
"{sku.'G-32'}"
313-
</td>
314-
</tr>
315-
316-
<tr>
317-
<td><b>Array (Any):</b> Stringify</td>
318-
<td markdown="1">
319-
320-
```json
321-
products: {
322-
product_id: [
323-
"507f1", "505bd"
324-
]
325-
}
326-
```
327-
328-
</td>
329-
<td>
330-
<b>Column Name:</b> <br/>
331-
product_id <br/><br/>
332-
<b>Value:</b>
333-
"[507f1, 505bd]"
334-
</td>
335-
</tr>
336-
</table>
337-
338345
## Tracks vs. Events Tables
339346

340347
To see the tables for your organization, you can run this query:
@@ -418,7 +425,7 @@ Float values
418425
Boolean data types
419426

420427
### `varchar`
421-
Varchar, or variable character data types,
428+
Varchar, or the variable character data type,
422429

423430
## Column Sizing
424431

src/connections/storage/warehouses/warehouse-syncs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ Your plan determines how frequently data is synced to your warehouse.
2323
| Team | Twice a day |
2424
| Business | Up to 24 times a day. Generally, these syncs are fixed to the top of the hour (:00), but these times can vary. |
2525

26-
If you're a Business plan member and would like to manage the data you send to your warehouse, use [Warehouse Selective Sync](#warehouse-selective-sync).
26+
If you're a Business plan member and would like to adjust your sync frequency, you can do so using the Sync Schedule feature. To enable Sync Schedule, please go to **Warehouse** > **Settings** > **Sync Schedule**.
2727

2828
## Sync History
2929
You can use the Sync History page to see the status and history of data updates in your warehouse. The Sync History page is available for every source connected to each warehouse. This page helps you answer questions like, “Has the data from a specific source been updated recently?” “Did a sync completely fail, or only partially fail?” and “Why wasn’t this sync successful?”

0 commit comments

Comments
 (0)