Skip to content

Commit 8ac97b2

Browse files
committed
Small content edits
1 parent 33a3103 commit 8ac97b2

File tree

5 files changed

+11
-13
lines changed

5 files changed

+11
-13
lines changed

src/_includes/content/how-a-sync-works.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,4 @@ When Segment loads data into your warehouse, each sync goes through the followin
33
2. **Scan:** Segment finds new events in AWS S3 and updated objects in Dynamo.
44
3. **Download:** Segment pulls the events and objects into a staging area.
55
4. **Process:** The raw Segment event and object archive files are transformed into database-specific formats. The [warehouse schema](/docs/connections/storage/warehouses/schema/) is also defined in this step.
6-
5. **Load:** Segment de-duplicates the transformed data and loads it into your warehouse. If you have queries set up in your warehouse, they run after the data is loaded into your warehouse. <br/>***This is the only step that connects to your warehouse: all other steps are internal to Segment.***
6+
5. **Load:** Segment de-duplicates the transformed data and loads it into your warehouse. If you have queries set up in your warehouse, they run after the data is loaded into your warehouse. ***This is the only step that connects to your warehouse: all other steps are internal to Segment.***

src/connections/storage/warehouses/faq.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,7 @@ Your warehouse id appears in the URL when you look at the [warehouse destination
4646

4747
Data is available in Warehouses within 24-48 hours, depending on your tier's sync frequency. For more information about sync frequency by tier, see [Sync Frequency](/docs/connections/storage/warehouses/warehouse-syncs/#sync-frequency).
4848

49-
Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time because of the way Redshift uses large batches to optimize and compress columns. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in Redshift within 24 hours. The underlying Redshift datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these.
50-
51-
As Segment improves and updates the ETL processes and optimizes for SQL query performance downstream, the actual load time will vary, but Segment ensures it's always within 24 hours.
49+
Real-time loading of the data into Segment Warehouses would cause significant performance degradation at query time. To optimize for your query speed, reliability, and robustness, Segment guarantees that your data will be available in your warehouse within 24 hours. The underlying datastore has a subtle tradeoff between data freshness, robustness, and query speed. For the best experience, Segment needs to balance all three of these.
5250

5351
## What if I want to add custom data to my warehouse?
5452

@@ -109,7 +107,7 @@ Data in your warehouse is formatted into **schemas**, which involve a detailed d
109107

110108
## If my syncs fail and get fixed, will I need to ask for a backfill?
111109

112-
If your syncs fail, you will need to reach out to [Segment Support](https://segment.com/help/) to ask for a backfill. Be sure to include the following information in your request:
110+
Yes, if your syncs fail, you will need to reach out to [Segment Support](https://segment.com/help/) to ask for a backfill. Be sure to include the following information in your request:
113111
- The warehouse that requires the backfill
114112
- What sources you need information from
115113
- The date range of data that requires a backfill

src/connections/storage/warehouses/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ Examples of data warehouses include Amazon Redshift, Google BigQuery, and Postgr
3434

3535
[How do I give users permissions to my warehouse?](/docs/connections/storage/warehouses/add-warehouse-users/)
3636

37-
Check out our [Frequently Asked Questions about Warehouses](/docs/connections/storage/warehouses/faq/) and [a list of helpful Redshift queries to get you started](/docs/connections/storage/warehouses/redshift-useful-sql).
37+
Check out our [Frequently Asked Questions about Warehouses](/docs/connections/storage/warehouses/faq/) and [a list of helpful SQL queries to get you started with Redshift ](/docs/connections/storage/warehouses/redshift-useful-sql).
3838

3939
## FAQs
4040

src/connections/storage/warehouses/schema.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,8 @@
22
title: Warehouse Schemas
33
---
44

5-
A **schema** describes the way that the data in a warehouse is organized. Schemas include a detailed description of database elements (tables, views, indexes, synonyms, etc.) and the relationships that exist between elements.
6-
7-
Schemas of warehouse data are organized into the following template: <br/>
8-
`<source>.<collection>.<property>` for example `segment-engineering.tracks.userId`, where Source refers to the source or project name (segment-engineering), collection refers to the event (tracks), and the property refers to the data being collected (userId).
5+
A **schema** describes the way that the data in a warehouse is organized. Schemas of warehouse data are organized into the following template:
6+
`<source>.<collection>.<property>`, for example `segment-engineering.tracks.userId`, where source refers to the source or project name (segment-engineering), collection refers to the event (tracks), and the property refers to the data being collected (userId).
97

108
> note "Warehouse column creation"
119
> **Note:** Segment creates tables for each of your custom events in your warehouse, with columns for each event's custom properties. Segment does not allow unbounded `event` or `property` spaces in your data. Instead of recording events like "Ordered Product 15", use a single property of "Product Number" or similar.
@@ -137,7 +135,7 @@ The table below describes the schema in Segment Warehouses:
137135

138136
## Identifies table
139137

140-
The `identifies` table stores the `.identify()` method calls =. Query it to find out user-level information. It has the following columns:
138+
The `identifies` table stores the `.identify()` method calls. Query it to find out user-level information. It has the following columns:
141139

142140
| method | property |
143141
| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
@@ -428,7 +426,7 @@ The data types that Segment currently supports include:
428426

429427
After analyzing the data from dozens of customers, we set the string column length limit at 512 characters. Longer strings are truncated. We found this was the sweet spot for good performance and ignoring non-useful data.
430428

431-
We special-case compression for some known columns, like event names and timestamps. The others default to LZO. We may add look-ahead sampling down the road, but from inspecting the datasets today this would be unnecessary complexity.
429+
We special-case compression for some known columns, like event names and timestamps. The others default to LZO. We may add look-ahead sampling down the road, but from inspecting the datasets today this would be unnecessary complex.
432430

433431
## Timestamps
434432

@@ -476,4 +474,6 @@ All tables use `received_at` for the sort key. Amazon Redshift stores your data
476474

477475
[How do I give users permissions to my warehouse?](/docs/connections/storage/warehouses/add-warehouse-users/)
478476

477+
[How frequently does data sync to my warehouse?](/docs/connections/storage/warehouses/warehouse-syncs/#sync-frequency)
478+
479479
Check out our [Frequently Asked Questions about Warehouses](/docs/connections/storage/warehouses/faq/) and [a list of helpful Redshift queries to get you started](/docs/connections/storage/warehouses/redshift-useful-sql).

src/connections/storage/warehouses/warehouse-syncs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Instead of constantly streaming data to the warehouse destination, Segment loads
1111

1212
{% include content/how-a-sync-works.md %}
1313

14-
Warehouses sync with all data coming from your source.
14+
Warehouses sync with all data coming from your source. However, Business plan members can manage the data that is sent to your warehouses using [Selective Sync](#warehouse-selective-sync).
1515

1616
## Sync Frequency
1717

0 commit comments

Comments
 (0)