Skip to content

Commit ae3fbdb

Browse files
committed
add reverse etl method
1 parent a176d60 commit ae3fbdb

File tree

1 file changed

+20
-4
lines changed

1 file changed

+20
-4
lines changed

src/guides/how-to-guides/import-historical-data.md

Lines changed: 20 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,9 @@ When transitioning over to Segment, customers commonly want to import historical
66

77
**Note:** Historical imports can only be done into destinations that can accept historical timestamped data. Most analytics tools like Mixpanel, Amplitude, or Kissmetrics can handle that type of data just fine. One common destination that doesn't accept historical data is Google Analytics, since their API cannot accept historical data.
88

9+
## Method 1: Using a Custom Solution
10+
11+
### General Instructions
912
Use any [server-side library](https://segment.com/docs/connections/sources/#server), which sends requests in batches to improve performance. Once you have data to import, follow the steps below:
1013

1114
1. Export or collect the data to be imported.
@@ -14,7 +17,7 @@ Use any [server-side library](https://segment.com/docs/connections/sources/#ser
1417

1518
2. Decide which destinations need to receive the data.
1619

17-
**By default, data coming into Segment will be forwarded to all destinations linked to a given source.** To limit data to specific destinations, the `integrations` object must be modified. With historical data, you often only want to send the data to a specific destination or into your data warehouse. For example, in [Node.js](https://segment.com/docs/connections/sources/catalog/libraries/server/node/#integrations) set the `integrations` object as follows.
20+
**By default, data coming into Segment will be forwarded to all destinations connected to a given source.** To limit data to specific destinations, the `integrations` object must be modified. With historical data, you often only want to send the data to a specific destination or into your data warehouse. For example, in [Node.js](https://segment.com/docs/connections/sources/catalog/libraries/server/node/#integrations) set the `integrations` object as follows.
1821
```js
1922
analytics.track({
2023
event: 'Upgraded Membership',
@@ -25,7 +28,13 @@ Use any [server-side library](https://segment.com/docs/connections/sources/#ser
2528

2629
3. Once you've done that, you'll need to write an application or worker to send the data to Segment.
2730

28-
You will need to cycle through each set of data and map it to a Segment server-side library method or build an array matching the [HTTP Import API format](https://segment.com/docs/connections/sources/catalog/libraries/server/http/#import). **Note**, we recommend using a Segment library for this process, as they will set contextual message fields like `message_id` (used for deduping) and `sent_at` (used for correctly client clock skew) that our API will use to ensure correct behavior upon ingestion. The server-side libraries will automatically batch requests to optimize for performance and prevent linear request volume. This batching behavior is modifiable. Some of the libraries implement a configurable max queue size that may discard messages if you enqueue requests much faster than the client can flush them. We recommend overriding the max queue size parameter for the library to a high value you're comfortable you can remain under in your batch job.
31+
You will need to cycle through each set of data and map it to a Segment server-side library method or build an array matching the [HTTP Import API format](https://segment.com/docs/connections/sources/catalog/libraries/server/http/#import).
32+
33+
**Tip**: We recommend using a Segment library for this process, as they will set contextual message fields like `message_id` (used for deduping) and `sent_at` (used for correctly client clock skew) that our API will use to ensure correct behavior upon ingestion.
34+
35+
**Tip**: The server-side libraries will automatically batch requests to optimize for performance and prevent linear request volume. This batching behavior is modifiable, and some of the underlying libraries implement a configurable max queue size that may discard messages if you enqueue requests much faster than the client can flush them. We recommend overriding the max queue size parameter for the library to a high value you're comfortable you can remain under in your batch job.
36+
37+
### Demo projects
2938
3039
One of Segment's Success Engineers wrote an alpha prototype Node.js app for importing data utilizing the HTTP API, which we've included below:
3140
@@ -35,6 +44,13 @@ Additionally, one of Segment's Software Engineers developed a React App with mor
3544

3645
[Desktop React CSV uploader](https://github.com/segmentio/desktop-csv-uploader){:target="_blank"}
3746

38-
If a server-side library doesn't meet your needs, use the Segment [bulk import HTTP API](https://segment.com/docs/connections/sources/catalog/libraries/server/http/#import). **Note**, if you're using the HTTP API directly to replay data you've exported from Segment, we recommend removing the original `sent_at`, `message_id`, and `project_id` fields from the archived message before forwarding them to Segment.
39-
4047
[MarketLytics](http://marketlytics.com/){:target="_blank"} has documented their experience using the alpha prototype importer and offer some [helpful visuals and tips](http://marketlytics.com/blog/import-historic-data-to-segment){:target="_blank"}.
48+
49+
### Alternative solution
50+
If a server-side library doesn't meet your needs, you can use the Segment [bulk import HTTP API](https://segment.com/docs/connections/sources/catalog/libraries/server/http/#import) directly.
51+
52+
**Note:** When the historical data were initially exported by Segment, we recommend removing all the original `sent_at`, `message_id`, and `project_id` fields from the archived message before forwarding them to Segment.
53+
54+
## Method 2: Using Reverse ETL
55+
56+
Please refer to the [Reverse ETL guide](https://segment.com/docs/connections/reverse-etl/){:target="_blank"} for more details.

0 commit comments

Comments
 (0)