Skip to content

Commit ea63f01

Browse files
committed
Editing pass
1 parent 285450a commit ea63f01

File tree

1 file changed

+42
-51
lines changed
  • src/connections/storage/catalog/bigquery

1 file changed

+42
-51
lines changed

src/connections/storage/catalog/bigquery/index.md

Lines changed: 42 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -7,77 +7,75 @@ redirect_from:
77

88
Segment's [BigQuery](https://cloud.google.com/bigquery/){:target="_blank"} connector makes it easy
99
to load web, mobile, and third-party source data like Salesforce, Zendesk, and
10-
Google AdWords into a BigQuery data warehouse. This guide will explain how to
11-
set up BigQuery and start loading data into it.
10+
Google AdWords into a BigQuery data warehouse. When you integrate BigQuery with Segment you get a fully managed data pipeline
11+
loaded into a powerful and cost-effective data warehouse.
1212

13-
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load)
14-
process to pull raw events and objects and load them into your BigQuery cluster.
15-
16-
Using BigQuery with Segment means you'll get a fully managed data pipeline
17-
loaded into one of the most powerful and cost-effective data warehouses today.
13+
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load) process to pull raw events and objects from your sources and load them into your BigQuery cluster.
14+
For more information about the ETL process, including how it works and common ETL use cases, refer to [Google Cloud's ETL documentation](https://cloud.google.com/learn/what-is-etl){:target="_blank"}.
1815

1916
## Getting Started
2017

2118
To store your Segment data in BigQuery, complete the following steps:
22-
1. [Enable BigQuery for your Google Cloud project](#create-a-project-and-enable-bigquery)
23-
2. [Create a GCP service account for Segment to assume](#create-a-service-account-for-segment)
24-
3. [Create a warehouse in the Segment app](#create-the-warehouse-in-segment)
19+
1. [Create a project and enable BigQuery](#create-a-project-and-enable-bigquery)
20+
2. [Create a service account for Segment](#create-a-service-account-for-segment)
21+
3. [Create the warehouse in Segment](#create-the-warehouse-in-segment)
2522

2623
### Create a Project and Enable BigQuery
2724

2825
1. Navigate to the [Google Developers Console](https://console.developers.google.com/){:target="_blank"}.
29-
2. Configure [Cloud Platform](https://console.cloud.google.com/){:target="_blank"}:
26+
2. Configure the [Google Cloud Platform](https://console.cloud.google.com/){:target="_blank"}:
3027
- If you don't have a project already, [create one](https://support.google.com/cloud/answer/6251787?hl=en&ref_topic=6158848){:target="_blank"}.
31-
- If you have an existing project, you will need to [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui){:target="_blank"}.
32-
Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
33-
- **Note:** make sure [billing is enabled](https://support.google.com/cloud/answer/6293499#enable-billing){:target="_blank"} on your project, or Segment will not be able to write into the cluster.
28+
- If you have an existing project, [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui){:target="_blank"}. Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
3429
3. Copy the project ID. You will need it when you create a warehouse source in the Segment app.
3530

36-
### Create a Service Account for Segment
31+
> note "Enable billing"
32+
> When creating your project, you must [enable billing](https://support.google.com/cloud/answer/6293499#enable-billing){:target="_blank"} so Segment can write into the cluster.
33+
34+
### Create a service account for Segment
3735

3836
1. From the Navigation panel on the left, select **IAM & admin** > **Service accounts**.
39-
2. Click **Create Service Account** along the top.
40-
3. Enter a name for the service account (for example: "segment-warehouses") and click **Create**.
41-
4. When assigning permissions, make sure to grant the following roles:
37+
2. Click **Create Service Account**.
38+
3. Enter a name for the service account (for example `segment-warehouses`) and click **Create**.
39+
4. When assigning permissions, make sure to grant the service account the following roles:
4240
- `BigQuery Data Owner`
4341
- `BigQuery Job User`
4442
5. [Create a JSON key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys){:target="_blank"}.
4543
The downloaded file will be used to create your warehouse in the Segment app.
4644

47-
Refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts){:target="_blank"} for more information.
45+
If you have trouble creating a new service account, refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts){:target="_blank"} for more information.
4846

4947
### Create the Warehouse in Segment
5048

51-
1. In Segment, go to **Workspace** > **Add Destination** > Search for "BigQuery"
49+
1. From the homepage of the Segment app, select **Connections**, click **Add Destination** and search for "BigQuery".
5250
2. Click **BigQuery**.
5351
3. Select the source(s) you'd like to sync with the BigQuery destination, and click **Next**.
5452
3. Enter a name for your destination in the **Name your destination** field.
55-
4. Enter your project ID in the **Project ID** field.
56-
<br/>**Optional:** Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/){:target="_blank"} in the **Location** field (the default will be "US".)
57-
5. Copy the contents of the JSON key into the **Credentials** field.
53+
4. Enter your Project ID in the **Project ID** field.
54+
<br/>**Optional:** Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/){:target="_blank"} in the **Location** field (the default is "US".)
55+
5. Copy the contents of the JSON key that you created for the Segment service account into the **Credentials** field.
5856
6. Click **Connect**.
59-
7. If Segment can connect with the provided project ID and credentials, a warehouse will be created and your first sync should begin shortly.
57+
7. If Segment is able to connect to your project, a warehouse will be created and your first sync will begin shortly.
6058

6159
## Schema
6260

6361
BigQuery datasets are broken down into [**tables**](#partitioned-tables) and [**views**](#views). **Tables**
64-
contain duplicate data, **views** do _not_.
62+
contain duplicate data, **views** do _not_ contain duplicate data.
6563

6664
### Partitioned Tables
6765

68-
The Segment connector takes advantage of [partitioned
66+
The Segment connector uses [partitioned
6967
tables](https://cloud.google.com/bigquery/docs/partitioned-tables){:target="_blank"}. Partitioned
7068
tables allow you to query a subset of data, thus increasing query performance
7169
and decreasing costs.
7270

73-
To search a full table, use the following query:
71+
To query a full table, use the following command:
7472

7573
```sql
7674
select *
7775
from <project-id>.<source-name>.<collection-name>
7876
```
7977

80-
To search a specific partitioned table, use the following query:
78+
To query a specific partitioned table, use the following command:
8179

8280

8381
```sql
@@ -91,9 +89,9 @@ A [view](https://cloud.google.com/bigquery/querying-data#views){:target="_blank"
9189
table defined by a SQL query. Segment uses views in the de-duplication process to
9290
ensure that events that you are querying are unique events and contain the latest objects
9391
from third-party data. All Segment views are set up to show information from the last
94-
60 days. Whenever possible, query from these views.
92+
60 days. [Segment recommends querying from these views when possible](#use-views) to avoid duplicate events and historical objects.
9593

96-
Views are appended with `_view` , which you can query like this:
94+
Views are appended with `_view` , which you can query using this format:
9795

9896
```sql
9997
select *
@@ -142,20 +140,18 @@ BigQuery charges based on the amount of data scanned by your queries. Views are
142140
a derived view over your tables that Segment uses for de-duplication of events.
143141
Therefore, Segment recommends you query a specific view whenever possible to avoid
144142
duplicate events and historical objects. It's important to note that BigQuery
145-
views are not cached.
146-
147-
> BigQuery's views are logical views, not materialized views, which means that
148-
> the query that defines the view is re-executed every time the view is queried.
149-
> Queries are billed according to the total amount of data in all table fields
150-
> referenced directly or indirectly by the top-level query.
143+
views are not cached.
144+
145+
> note "Understanding BigQuery views"
146+
> BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query.
151147
152-
To save more money, you can query the view and set a [destination
148+
To save money, you can query the view and set a [destination
153149
table](https://cloud.google.com/bigquery/docs/tables){:target="_blank"}, and then query the
154150
destination table.
155151

156152
### Query structure
157153

158-
If you typically start exploratory data analysis with `SELECT *` consider
154+
If you typically start exploratory data analysis with `SELECT *`, consider
159155
specifying the fields to reduce costs.
160156

161157
Refer to the section on [partitioned tables](#partitioned-tables) for details on
@@ -166,11 +162,8 @@ querying sub-sets of tables.
166162

167163
### I need more than 60 days of data in my views. Can I change the view definition?
168164

169-
Absolutely! You will just need to modify one of the references to 60 in the view
170-
definition to the number of days of your choosing.
171-
172-
Segment chose 60 days as it suits the needs of most customers. However,
173-
you're welcome to update the definition of the view as long as the name stays
165+
Yes! You just need to modify one of the references to `60` in the view
166+
definition to the number of days of your choosing. You can update the definition of the view as long as the name stays
174167
the same.
175168

176169
Here is the base query Segment uses when first setting up your views. Included in the base query are the placeholders (`%s.%s.%s`) that you would want to include the project,
@@ -191,21 +184,21 @@ WHERE ROW_NUMBER = 1
191184

192185
BigQuery offers both a scalable, pay-as-you-go pricing plan based on the amount
193186
of data scanned, or a flat-rate monthly cost. You can learn more about BigQuery
194-
pricing [here](https://cloud.google.com/bigquery/pricing){:target="_blank"}.
187+
pricing [on Google Cloud's BigQuery pricing page](https://cloud.google.com/bigquery/pricing){:target="_blank"}.
195188

196189
BigQuery allows you to set up [Cost Controls and
197190
Alerts](https://cloud.google.com/bigquery/cost-controls){:target="_blank"} to help control and
198-
monitor costs. If you want to learn more about what BigQuery will cost you,
199-
they've provided [this
191+
monitor costs. If you want to learn more about the costs associated with BigQuery,
192+
Google Cloud has provided [a
200193
calculator](https://cloud.google.com/products/calculator/){:target="_blank"} to estimate your
201194
costs.
202195

203196
### How do I query my data in BigQuery?
204197

205-
You can connect to BigQuery using a BI tool like Mode or Looker, or query
198+
You can connect a BI tool like Mode or Looker to BigQuery, or query
206199
directly from the BigQuery console.
207200

208-
BigQuery now supports standard SQL, which you can enable using their query UI.
201+
BigQuery now supports standard SQL, which you can enable [using Google Cloud's query UI](https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction#changing_from_the_default_dialect){:target="_blank"}.
209202
This does not work with views, or with a query that uses table range
210203
functions.
211204

@@ -218,10 +211,8 @@ a need for streaming data into BigQuery, [contact Segment support](https://segme
218211

219212
{% include content/warehouse-sync-sched.md %}
220213

221-
![sync schedule image](images/syncsched.png)
222-
223214
## Troubleshooting
224215

225-
### I'm seeing duplicates in my tables.
216+
### I see duplicates in my tables.
226217

227218
This behavior is expected. Segment only de-duplicates data in your views. Refer to the [schema section](#schema) for more details.

0 commit comments

Comments
 (0)