You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: src/connections/storage/catalog/bigquery/index.md
+77-83Lines changed: 77 additions & 83 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,95 +5,96 @@ redirect_from:
5
5
- '/connections/warehouses/catalog/bigquery/'
6
6
---
7
7
8
-
Segment's [BigQuery](https://cloud.google.com/bigquery/) connector makes it easy
8
+
Segment's [BigQuery](https://cloud.google.com/bigquery/){:target="_blank"} connector makes it easy
9
9
to load web, mobile, and third-party source data like Salesforce, Zendesk, and
10
-
Google AdWords into a BigQuery data warehouse. This guide will explain how to
11
-
set up BigQuery and start loading data into it.
10
+
Google AdWords into a BigQuery data warehouse. When you integrate BigQuery with Segment you get a fully managed data pipeline loaded into a powerful and cost-effective data warehouse.
12
11
13
-
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load)
14
-
process to pull raw events and objects and load them into your BigQuery cluster.
15
-
16
-
Using BigQuery through Segment means you'll get a fully managed data pipeline
17
-
loaded into one of the most powerful and cost-effective data warehouses today.
12
+
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load) process to pull raw events and objects from your sources and load them into your BigQuery cluster.
13
+
For more information about the ETL process, including how it works and common ETL use cases, refer to [Google Cloud's ETL documentation](https://cloud.google.com/learn/what-is-etl){:target="_blank"}.
18
14
19
15
## Getting Started
20
16
21
17
To store your Segment data in BigQuery, complete the following steps:
22
-
-[Enable BigQuery for your Google Cloud project](#create-a-project-and-enable-bigquery)
23
-
-[Create a GCP service account for Segment to assume](#create-a-service-account-for-segment)
24
-
-[Create a warehouse in the Segment app](#create-the-warehouse-in-segment)
18
+
1.[Create a project and enable BigQuery](#create-a-project-and-enable-bigquery)
19
+
2.[Create a service account for Segment](#create-a-service-account-for-segment)
20
+
3.[Create the warehouse in Segment](#create-the-warehouse-in-segment)
25
21
26
22
### Create a Project and Enable BigQuery
27
23
28
-
1. Navigate to the [Google Developers Console](https://console.developers.google.com/)
- If you don't have a project already, [create one](https://support.google.com/cloud/answer/6251787?hl=en&ref_topic=6158848).
31
-
- If you have an existing project, you will need to [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui).
32
-
Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
33
-
-**Note:** make sure [billing is enabled](https://support.google.com/cloud/answer/6293499#enable-billing) on your project, or Segment will not be able to write into the cluster.
34
-
3. Copy the project ID. You will need it when you create a warehouse source in the Segment app.
24
+
To create a project and enable BigQuery:
25
+
1. Navigate to the [Google Developers Console](https://console.developers.google.com/){:target="_blank"}.
26
+
2. Configure the [Google Cloud Platform](https://console.cloud.google.com/){:target="_blank"}:
27
+
- If you don't have a project already, [create one](https://support.google.com/cloud/answer/6251787?hl=en&ref_topic=6158848){:target="_blank"}.
28
+
- If you have an existing project, [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui){:target="_blank"}. Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
29
+
3. Copy the project ID. You'll need it when you create a warehouse source in the Segment app.
35
30
36
-
### Create a Service Account for Segment
31
+
> note "Enable billing"
32
+
> When you create your project, you must [enable billing](https://support.google.com/cloud/answer/6293499#enable-billing){:target="_blank"} so Segment can write into the cluster.
37
33
38
-
Refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts)
39
-
for more information.
34
+
### Create a service account for Segment
40
35
41
-
1. From the Navigation panel on the left, go to **IAM & admin** > **Service accounts**
42
-
2. Click **Create Service Account** along the top
43
-
3. Enter a name for the service account (for example: "segment-warehouses") and click **Create**
44
-
4. When assigning permissions, make sure to grant the following roles:
36
+
To create a service account for Segment:
37
+
1. From the Navigation panel on the left, select **IAM & admin** > **Service accounts**.
38
+
2. Click **Create Service Account**.
39
+
3. Enter a name for the service account (for example, `segment-warehouses`) and click **Create**.
40
+
4. Assign the service account the following roles:
45
41
-`BigQuery Data Owner`
46
42
-`BigQuery Job User`
47
-
5.[Create a JSON key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
48
-
The downloaded file will be used to create your warehouse in the next section.
43
+
5.[Create a JSON key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys){:target="_blank"}.
44
+
The downloaded file will be used to create your warehouse in the Segment app.
45
+
46
+
If you have trouble creating a new service account, refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts){:target="_blank"} for more information.
49
47
50
48
### Create the Warehouse in Segment
51
49
52
-
1. In Segment, go to **Workspace** > **Add destination** > Search for "BigQuery"
53
-
2. Select **BigQuery**
54
-
3. Add a name for the destination to the **Name your destination** field
55
-
4. Enter your project ID in the **Project** field
56
-
5. Copy the contents of the credentials (the JSON key) into the **Credentials** field <br/>
57
-
**Optional:** Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/) in the **Location** field (the default will be "US")
58
-
6. Click **Connect**
59
-
7. If Segment can connect with the provided **Project ID** and **Credentials**, a warehouse will be created and your first sync should begin shortly
50
+
To create the warehouse in Segment:
51
+
1. From the homepage of the Segment app, select **Connections > Add Destination** and search for **BigQuery**.
52
+
2. Click **BigQuery**.
53
+
3. Select the source(s) you'd like to sync with the BigQuery destination, and click **Next**.
54
+
3. Enter a name for your destination in the **Name your destination** field.
55
+
4. Enter your Project ID in the **Project ID** field.
56
+
<br/>*Optional:* Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/){:target="_blank"} in the **Location** field (the default is *US*.)
57
+
5. Copy the contents of the JSON key that you created for the Segment service account into the **Credentials** field.
58
+
6. Click **Connect**.
59
+
60
+
If Segment is able to connect to your project, a warehouse will be created and your first sync will begin shortly.
60
61
61
-
###Schema
62
+
## Schema
62
63
63
-
BigQuery datasets are broken down into **tables** and **views**. **Tables**
64
-
contain duplicate data, **views** do _not_.
64
+
BigQuery datasets are broken down into [**tables**](#partitioned-tables) and [**views**](#views). **Tables**
65
+
contain duplicate data, **views** do _not_ contain duplicate data.
65
66
66
-
####Partitioned Tables
67
+
### Partitioned Tables
67
68
68
-
The Segment connector takes advantage of[partitioned
1. Create a [new Service Account for Segment](#create-a-service-account-for-segment) using the linked instructions.
127
129
2. Verify that the data is loading into your warehouse.
128
-
3. Sign in to the [Google Developers Console](https://console.developers.google.com).
130
+
3. Sign in to the [Google Developers Console](https://console.developers.google.com){:target="_blank"}.
129
131
4. Open the IAM & Admin product, and select **IAM**.
130
132
5. From the list of projects, select the project that has BigQuery enabled.
131
133
6. On the project's page, select the **Permissions** tab, and then click **view by PRINCIPALS**.
132
134
7. Select the checkbox for the `[email protected]` account and then click **Remove** to remove access to this shared Service Account.
133
135
134
-
For more information about managing IAM access, see Google's documentation, [Manage access to projects, folders, and organization](https://cloud.google.com/iam/docs/granting-changing-revoking-access).
136
+
For more information about managing IAM access, refer to Google's documentation, [Manage access to projects, folders, and organization](https://cloud.google.com/iam/docs/granting-changing-revoking-access){:target="_blank"}.
135
137
136
138
137
139
## Best Practices
@@ -142,35 +144,30 @@ BigQuery charges based on the amount of data scanned by your queries. Views are
142
144
a derived view over your tables that Segment uses for de-duplication of events.
143
145
Therefore, Segment recommends you query a specific view whenever possible to avoid
144
146
duplicate events and historical objects. It's important to note that BigQuery
145
-
views are not cached.
147
+
views aren't cached.
148
+
149
+
> note "Understanding BigQuery views"
150
+
> BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query.
146
151
147
-
> BigQuery's views are logical views, not materialized views, which means that
148
-
> the query that defines the view is re-executed every time the view is queried.
149
-
> Queries are billed according to the total amount of data in all table fields
150
-
> referenced directly or indirectly by the top-level query.
151
-
152
-
To save more money, you can query the view and set a [destination
153
-
table](https://cloud.google.com/bigquery/docs/tables), and then query the
152
+
To save money, you can query the view and set a [destination
153
+
table](https://cloud.google.com/bigquery/docs/tables){:target="_blank"}, and then query the
154
154
destination table.
155
155
156
156
### Query structure
157
157
158
-
If you typically start exploratory data analysis with `SELECT *` consider
158
+
If you start exploratory data analysis with `SELECT *`, consider
159
159
specifying the fields to reduce costs.
160
160
161
-
See the section on [partitioned tables](#partitioned-tables) for details on
161
+
Refer to the section on [partitioned tables](#partitioned-tables) for details on
162
162
querying sub-sets of tables.
163
163
164
164
165
165
## FAQs
166
166
167
167
### I need more than 60 days of data in my views. Can I change the view definition?
168
168
169
-
Absolutely! You will just need to modify one of the references to 60 in the view
170
-
definition to the number of days of your choosing.
171
-
172
-
Segment chose 60 days as it suits the needs of most customers. However,
173
-
you're welcome to update the definition of the view as long as the name stays
169
+
Yes! You just need to modify one of the references to `60` in the view
170
+
definition to the number of days of your choosing. You can update the definition of the view as long as the name stays
174
171
the same.
175
172
176
173
Here is the base query Segment uses when first setting up your views. Included in the base query are the placeholders (`%s.%s.%s`) that you would want to include the project,
@@ -191,38 +188,35 @@ WHERE ROW_NUMBER = 1
191
188
192
189
BigQuery offers both a scalable, pay-as-you-go pricing plan based on the amount
193
190
of data scanned, or a flat-rate monthly cost. You can learn more about BigQuery
pricing [on Google Cloud's BigQuery pricing page](https://cloud.google.com/bigquery/pricing){:target="_blank"}.
195
192
196
193
BigQuery allows you to set up [Cost Controls and
197
-
Alerts](https://cloud.google.com/bigquery/cost-controls) to help control and
198
-
monitor costs. If you want to learn more about what BigQuery will cost you,
199
-
they've provided [this
200
-
calculator](https://cloud.google.com/products/calculator/) to estimate your
194
+
Alerts](https://cloud.google.com/bigquery/cost-controls){:target="_blank"} to help control and
195
+
monitor costs. If you want to learn more about the costs associated with BigQuery,
196
+
Google Cloud provides [a
197
+
calculator](https://cloud.google.com/products/calculator/){:target="_blank"} to estimate your
201
198
costs.
202
199
203
200
### How do I query my data in BigQuery?
204
201
205
-
You can connect to BigQuery using a BI tool like Mode or Looker, or query
202
+
You can connect a BI tool like Mode or Looker to BigQuery, or query
206
203
directly from the BigQuery console.
207
204
208
-
BigQuery now supports standard SQL, which you can enable using their query UI.
209
-
This does not work with views, or with a query that uses table range
205
+
BigQuery supports standard SQL, which you can enable [using Google Cloud's query UI](https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction#changing_from_the_default_dialect){:target="_blank"}.
206
+
This doesn't work with views, or with a query that uses table range
210
207
functions.
211
208
212
209
### Does Segment support streaming inserts?
213
210
214
-
Segment's connector does not support streaming inserts at this time. If you have
215
-
a need for streaming data into BigQuery, [contact Segment support](https://segment.com/requests/integrations/).
211
+
Segment's connector doesn't support streaming inserts at this time. If you have
212
+
a need for streaming data into BigQuery, [contact Segment support](https://segment.com/requests/integrations/){:target="_blank"}.
216
213
217
214
### Can I customize my sync schedule?
218
215
219
216
{% include content/warehouse-sync-sched.md %}
220
217
221
-

222
-
223
218
## Troubleshooting
224
219
225
-
### I'm seeing duplicates in my tables.
220
+
### I see duplicates in my tables.
226
221
227
-
This behavior is expected. Segment only de-duplicates data in your views. See the
228
-
section on [views](#views) for more details.
222
+
This behavior is expected. Segment only de-duplicates data in your views. Refer to the [schema section](#schema) for more details.
0 commit comments