You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/connections/storage/catalog/bigquery/index.md
+42-51Lines changed: 42 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,77 +7,75 @@ redirect_from:
7
7
8
8
Segment's [BigQuery](https://cloud.google.com/bigquery/){:target="_blank"} connector makes it easy
9
9
to load web, mobile, and third-party source data like Salesforce, Zendesk, and
10
-
Google AdWords into a BigQuery data warehouse. This guide will explain how to
11
-
set up BigQuery and start loading data into it.
10
+
Google AdWords into a BigQuery data warehouse. When you integrate BigQuery with Segment you get a fully managed data pipeline
11
+
loaded into a powerful and cost-effective data warehouse.
12
12
13
-
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load)
14
-
process to pull raw events and objects and load them into your BigQuery cluster.
15
-
16
-
Using BigQuery with Segment means you'll get a fully managed data pipeline
17
-
loaded into one of the most powerful and cost-effective data warehouses today.
13
+
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load) process to pull raw events and objects from your sources and load them into your BigQuery cluster.
14
+
For more information about the ETL process, including how it works and common ETL use cases, refer to [Google Cloud's ETL documentation](https://cloud.google.com/learn/what-is-etl){:target="_blank"}.
18
15
19
16
## Getting Started
20
17
21
18
To store your Segment data in BigQuery, complete the following steps:
22
-
1.[Enable BigQuery for your Google Cloud project](#create-a-project-and-enable-bigquery)
23
-
2.[Create a GCP service account for Segment to assume](#create-a-service-account-for-segment)
24
-
3.[Create a warehouse in the Segment app](#create-the-warehouse-in-segment)
19
+
1.[Create a project and enable BigQuery](#create-a-project-and-enable-bigquery)
20
+
2.[Create a service account for Segment](#create-a-service-account-for-segment)
21
+
3.[Create the warehouse in Segment](#create-the-warehouse-in-segment)
25
22
26
23
### Create a Project and Enable BigQuery
27
24
28
25
1. Navigate to the [Google Developers Console](https://console.developers.google.com/){:target="_blank"}.
2. Configure the [Google Cloud Platform](https://console.cloud.google.com/){:target="_blank"}:
30
27
- If you don't have a project already, [create one](https://support.google.com/cloud/answer/6251787?hl=en&ref_topic=6158848){:target="_blank"}.
31
-
- If you have an existing project, you will need to [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui){:target="_blank"}.
32
-
Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
33
-
-**Note:** make sure [billing is enabled](https://support.google.com/cloud/answer/6293499#enable-billing){:target="_blank"} on your project, or Segment will not be able to write into the cluster.
28
+
- If you have an existing project, [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui){:target="_blank"}. Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
34
29
3. Copy the project ID. You will need it when you create a warehouse source in the Segment app.
35
30
36
-
### Create a Service Account for Segment
31
+
> note "Enable billing"
32
+
> When creating your project, you must [enable billing](https://support.google.com/cloud/answer/6293499#enable-billing){:target="_blank"} so Segment can write into the cluster.
33
+
34
+
### Create a service account for Segment
37
35
38
36
1. From the Navigation panel on the left, select **IAM & admin** > **Service accounts**.
39
-
2. Click **Create Service Account** along the top.
40
-
3. Enter a name for the service account (for example: "segment-warehouses") and click **Create**.
41
-
4. When assigning permissions, make sure to grant the following roles:
37
+
2. Click **Create Service Account**.
38
+
3. Enter a name for the service account (for example`segment-warehouses`) and click **Create**.
39
+
4. When assigning permissions, make sure to grant the service account the following roles:
42
40
-`BigQuery Data Owner`
43
41
-`BigQuery Job User`
44
42
5.[Create a JSON key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys){:target="_blank"}.
45
43
The downloaded file will be used to create your warehouse in the Segment app.
46
44
47
-
Refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts){:target="_blank"} for more information.
45
+
If you have trouble creating a new service account, refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts){:target="_blank"} for more information.
48
46
49
47
### Create the Warehouse in Segment
50
48
51
-
1.In Segment, go to **Workspace** >**Add Destination**> Search for "BigQuery"
49
+
1.From the homepage of the Segment app, select **Connections**, click**Add Destination**and search for "BigQuery".
52
50
2. Click **BigQuery**.
53
51
3. Select the source(s) you'd like to sync with the BigQuery destination, and click **Next**.
54
52
3. Enter a name for your destination in the **Name your destination** field.
55
-
4. Enter your project ID in the **Project ID** field.
56
-
<br/>**Optional:** Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/){:target="_blank"} in the **Location** field (the default will be "US".)
57
-
5. Copy the contents of the JSON key into the **Credentials** field.
53
+
4. Enter your Project ID in the **Project ID** field.
54
+
<br/>**Optional:** Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/){:target="_blank"} in the **Location** field (the default is "US".)
55
+
5. Copy the contents of the JSON key that you created for the Segment service account into the **Credentials** field.
58
56
6. Click **Connect**.
59
-
7. If Segment can connect with the provided project ID and credentials, a warehouse will be created and your first sync should begin shortly.
57
+
7. If Segment is able to connect to your project, a warehouse will be created and your first sync will begin shortly.
60
58
61
59
## Schema
62
60
63
61
BigQuery datasets are broken down into [**tables**](#partitioned-tables) and [**views**](#views). **Tables**
64
-
contain duplicate data, **views** do _not_.
62
+
contain duplicate data, **views** do _not_ contain duplicate data.
65
63
66
64
### Partitioned Tables
67
65
68
-
The Segment connector takes advantage of[partitioned
tables allow you to query a subset of data, thus increasing query performance
71
69
and decreasing costs.
72
70
73
-
To search a full table, use the following query:
71
+
To query a full table, use the following command:
74
72
75
73
```sql
76
74
select*
77
75
from<project-id>.<source-name>.<collection-name>
78
76
```
79
77
80
-
To search a specific partitioned table, use the following query:
78
+
To query a specific partitioned table, use the following command:
81
79
82
80
83
81
```sql
@@ -91,9 +89,9 @@ A [view](https://cloud.google.com/bigquery/querying-data#views){:target="_blank"
91
89
table defined by a SQL query. Segment uses views in the de-duplication process to
92
90
ensure that events that you are querying are unique events and contain the latest objects
93
91
from third-party data. All Segment views are set up to show information from the last
94
-
60 days. Whenever possible, query from these views.
92
+
60 days. [Segment recommends querying from these views when possible](#use-views) to avoid duplicate events and historical objects.
95
93
96
-
Views are appended with `_view` , which you can query like this:
94
+
Views are appended with `_view` , which you can query using this format:
97
95
98
96
```sql
99
97
select*
@@ -142,20 +140,18 @@ BigQuery charges based on the amount of data scanned by your queries. Views are
142
140
a derived view over your tables that Segment uses for de-duplication of events.
143
141
Therefore, Segment recommends you query a specific view whenever possible to avoid
144
142
duplicate events and historical objects. It's important to note that BigQuery
145
-
views are not cached.
146
-
147
-
> BigQuery's views are logical views, not materialized views, which means that
148
-
> the query that defines the view is re-executed every time the view is queried.
149
-
> Queries are billed according to the total amount of data in all table fields
150
-
> referenced directly or indirectly by the top-level query.
143
+
views are not cached.
144
+
145
+
> note "Understanding BigQuery views"
146
+
> BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query.
151
147
152
-
To save more money, you can query the view and set a [destination
148
+
To save money, you can query the view and set a [destination
153
149
table](https://cloud.google.com/bigquery/docs/tables){:target="_blank"}, and then query the
154
150
destination table.
155
151
156
152
### Query structure
157
153
158
-
If you typically start exploratory data analysis with `SELECT *` consider
154
+
If you typically start exploratory data analysis with `SELECT *`, consider
159
155
specifying the fields to reduce costs.
160
156
161
157
Refer to the section on [partitioned tables](#partitioned-tables) for details on
@@ -166,11 +162,8 @@ querying sub-sets of tables.
166
162
167
163
### I need more than 60 days of data in my views. Can I change the view definition?
168
164
169
-
Absolutely! You will just need to modify one of the references to 60 in the view
170
-
definition to the number of days of your choosing.
171
-
172
-
Segment chose 60 days as it suits the needs of most customers. However,
173
-
you're welcome to update the definition of the view as long as the name stays
165
+
Yes! You just need to modify one of the references to `60` in the view
166
+
definition to the number of days of your choosing. You can update the definition of the view as long as the name stays
174
167
the same.
175
168
176
169
Here is the base query Segment uses when first setting up your views. Included in the base query are the placeholders (`%s.%s.%s`) that you would want to include the project,
@@ -191,21 +184,21 @@ WHERE ROW_NUMBER = 1
191
184
192
185
BigQuery offers both a scalable, pay-as-you-go pricing plan based on the amount
193
186
of data scanned, or a flat-rate monthly cost. You can learn more about BigQuery
pricing [on Google Cloud's BigQuery pricing page](https://cloud.google.com/bigquery/pricing){:target="_blank"}.
195
188
196
189
BigQuery allows you to set up [Cost Controls and
197
190
Alerts](https://cloud.google.com/bigquery/cost-controls){:target="_blank"} to help control and
198
-
monitor costs. If you want to learn more about what BigQuery will cost you,
199
-
they've provided [this
191
+
monitor costs. If you want to learn more about the costs associated with BigQuery,
192
+
Google Cloud has provided [a
200
193
calculator](https://cloud.google.com/products/calculator/){:target="_blank"} to estimate your
201
194
costs.
202
195
203
196
### How do I query my data in BigQuery?
204
197
205
-
You can connect to BigQuery using a BI tool like Mode or Looker, or query
198
+
You can connect a BI tool like Mode or Looker to BigQuery, or query
206
199
directly from the BigQuery console.
207
200
208
-
BigQuery now supports standard SQL, which you can enable using their query UI.
201
+
BigQuery now supports standard SQL, which you can enable [using Google Cloud's query UI](https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction#changing_from_the_default_dialect){:target="_blank"}.
209
202
This does not work with views, or with a query that uses table range
210
203
functions.
211
204
@@ -218,10 +211,8 @@ a need for streaming data into BigQuery, [contact Segment support](https://segme
218
211
219
212
{% include content/warehouse-sync-sched.md %}
220
213
221
-

222
-
223
214
## Troubleshooting
224
215
225
-
### I'm seeing duplicates in my tables.
216
+
### I see duplicates in my tables.
226
217
227
218
This behavior is expected. Segment only de-duplicates data in your views. Refer to the [schema section](#schema) for more details.
0 commit comments