Skip to content

Commit fde83e9

Browse files
authored
Merge pull request #395 from segmentio/repo-sync
repo sync
2 parents 56f76ec + 0957ad3 commit fde83e9

File tree

1 file changed

+77
-83
lines changed
  • src/connections/storage/catalog/bigquery

1 file changed

+77
-83
lines changed

src/connections/storage/catalog/bigquery/index.md

Lines changed: 77 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -5,95 +5,96 @@ redirect_from:
55
- '/connections/warehouses/catalog/bigquery/'
66
---
77

8-
Segment's [BigQuery](https://cloud.google.com/bigquery/) connector makes it easy
8+
Segment's [BigQuery](https://cloud.google.com/bigquery/){:target="_blank"} connector makes it easy
99
to load web, mobile, and third-party source data like Salesforce, Zendesk, and
10-
Google AdWords into a BigQuery data warehouse. This guide will explain how to
11-
set up BigQuery and start loading data into it.
10+
Google AdWords into a BigQuery data warehouse. When you integrate BigQuery with Segment you get a fully managed data pipeline loaded into a powerful and cost-effective data warehouse.
1211

13-
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load)
14-
process to pull raw events and objects and load them into your BigQuery cluster.
15-
16-
Using BigQuery through Segment means you'll get a fully managed data pipeline
17-
loaded into one of the most powerful and cost-effective data warehouses today.
12+
The Segment warehouse connector runs a periodic ETL (Extract - Transform - Load) process to pull raw events and objects from your sources and load them into your BigQuery cluster.
13+
For more information about the ETL process, including how it works and common ETL use cases, refer to [Google Cloud's ETL documentation](https://cloud.google.com/learn/what-is-etl){:target="_blank"}.
1814

1915
## Getting Started
2016

2117
To store your Segment data in BigQuery, complete the following steps:
22-
- [Enable BigQuery for your Google Cloud project](#create-a-project-and-enable-bigquery)
23-
- [Create a GCP service account for Segment to assume](#create-a-service-account-for-segment)
24-
- [Create a warehouse in the Segment app](#create-the-warehouse-in-segment)
18+
1. [Create a project and enable BigQuery](#create-a-project-and-enable-bigquery)
19+
2. [Create a service account for Segment](#create-a-service-account-for-segment)
20+
3. [Create the warehouse in Segment](#create-the-warehouse-in-segment)
2521

2622
### Create a Project and Enable BigQuery
2723

28-
1. Navigate to the [Google Developers Console](https://console.developers.google.com/)
29-
2. Configure [Cloud Platform](https://console.cloud.google.com/):
30-
- If you don't have a project already, [create one](https://support.google.com/cloud/answer/6251787?hl=en&ref_topic=6158848).
31-
- If you have an existing project, you will need to [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui).
32-
Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
33-
- **Note:** make sure [billing is enabled](https://support.google.com/cloud/answer/6293499#enable-billing) on your project, or Segment will not be able to write into the cluster.
34-
3. Copy the project ID. You will need it when you create a warehouse source in the Segment app.
24+
To create a project and enable BigQuery:
25+
1. Navigate to the [Google Developers Console](https://console.developers.google.com/){:target="_blank"}.
26+
2. Configure the [Google Cloud Platform](https://console.cloud.google.com/){:target="_blank"}:
27+
- If you don't have a project already, [create one](https://support.google.com/cloud/answer/6251787?hl=en&ref_topic=6158848){:target="_blank"}.
28+
- If you have an existing project, [enable the BigQuery API](https://cloud.google.com/bigquery/quickstart-web-ui){:target="_blank"}. Once you've done so, you should see BigQuery in the "Resources" section of Cloud Platform.
29+
3. Copy the project ID. You'll need it when you create a warehouse source in the Segment app.
3530

36-
### Create a Service Account for Segment
31+
> note "Enable billing"
32+
> When you create your project, you must [enable billing](https://support.google.com/cloud/answer/6293499#enable-billing){:target="_blank"} so Segment can write into the cluster.
3733
38-
Refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts)
39-
for more information.
34+
### Create a service account for Segment
4035

41-
1. From the Navigation panel on the left, go to **IAM & admin** > **Service accounts**
42-
2. Click **Create Service Account** along the top
43-
3. Enter a name for the service account (for example: "segment-warehouses") and click **Create**
44-
4. When assigning permissions, make sure to grant the following roles:
36+
To create a service account for Segment:
37+
1. From the Navigation panel on the left, select **IAM & admin** > **Service accounts**.
38+
2. Click **Create Service Account**.
39+
3. Enter a name for the service account (for example, `segment-warehouses`) and click **Create**.
40+
4. Assign the service account the following roles:
4541
- `BigQuery Data Owner`
4642
- `BigQuery Job User`
47-
5. [Create a JSON key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys).
48-
The downloaded file will be used to create your warehouse in the next section.
43+
5. [Create a JSON key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys){:target="_blank"}.
44+
The downloaded file will be used to create your warehouse in the Segment app.
45+
46+
If you have trouble creating a new service account, refer to [Google Cloud's documentation about service accounts](https://cloud.google.com/iam/docs/creating-managing-service-accounts){:target="_blank"} for more information.
4947

5048
### Create the Warehouse in Segment
5149

52-
1. In Segment, go to **Workspace** > **Add destination** > Search for "BigQuery"
53-
2. Select **BigQuery**
54-
3. Add a name for the destination to the **Name your destination** field
55-
4. Enter your project ID in the **Project** field
56-
5. Copy the contents of the credentials (the JSON key) into the **Credentials** field <br/>
57-
**Optional:** Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/) in the **Location** field (the default will be "US")
58-
6. Click **Connect**
59-
7. If Segment can connect with the provided **Project ID** and **Credentials**, a warehouse will be created and your first sync should begin shortly
50+
To create the warehouse in Segment:
51+
1. From the homepage of the Segment app, select **Connections > Add Destination** and search for **BigQuery**.
52+
2. Click **BigQuery**.
53+
3. Select the source(s) you'd like to sync with the BigQuery destination, and click **Next**.
54+
3. Enter a name for your destination in the **Name your destination** field.
55+
4. Enter your Project ID in the **Project ID** field.
56+
<br/>*Optional:* Enter a [region code](https://cloud.google.com/compute/docs/regions-zones/){:target="_blank"} in the **Location** field (the default is *US*.)
57+
5. Copy the contents of the JSON key that you created for the Segment service account into the **Credentials** field.
58+
6. Click **Connect**.
59+
60+
If Segment is able to connect to your project, a warehouse will be created and your first sync will begin shortly.
6061

61-
### Schema
62+
## Schema
6263

63-
BigQuery datasets are broken down into **tables** and **views**. **Tables**
64-
contain duplicate data, **views** do _not_.
64+
BigQuery datasets are broken down into [**tables**](#partitioned-tables) and [**views**](#views). **Tables**
65+
contain duplicate data, **views** do _not_ contain duplicate data.
6566

66-
#### Partitioned Tables
67+
### Partitioned Tables
6768

68-
The Segment connector takes advantage of [partitioned
69-
tables](https://cloud.google.com/bigquery/docs/partitioned-tables). Partitioned
70-
tables allow you to query a subset of data, thus increasing query performance
71-
and decreasing costs.
69+
The Segment connector uses [partitioned
70+
tables](https://cloud.google.com/bigquery/docs/partitioned-tables){:target="_blank"}. Partitioned
71+
tables allow you to query a subset of data, which increases query performance
72+
and decreases costs.
7273

73-
To query a full table, you can query like this:
74+
To query a full table, use the following command:
7475

7576
```sql
7677
select *
7778
from <project-id>.<source-name>.<collection-name>
7879
```
7980

80-
To query a specific partitioned table, you can query like this:
81+
To query a specific partitioned table, use the following command:
8182

8283

8384
```sql
8485
select *
8586
from <project-id>.<source-name>.<collection-name>$20160809
8687
```
8788

88-
#### Views
89+
### Views
8990

90-
A [view](https://cloud.google.com/bigquery/querying-data#views) is a virtual
91+
A [view](https://cloud.google.com/bigquery/querying-data#views){:target="_blank"} is a virtual
9192
table defined by a SQL query. Segment uses views in the de-duplication process to
92-
ensure that events that you are querying unique events, and the latest objects
93+
ensure that events that you are querying are unique events and contain the latest objects
9394
from third-party data. All Segment views are set up to show information from the last
94-
60 days. Whenever possible, query from these views.
95+
60 days. [Segment recommends querying from these views when possible](#use-views) to avoid duplicate events and historical objects.
9596

96-
Views are appended with `_view` , which you can query like this:
97+
Views are appended with `_view` , which you can query using this format:
9798

9899
```sql
99100
select *
@@ -116,22 +117,23 @@ Account.
116117
Migrate your warehouse from a shared Service Account to a dedicated Service Account
117118
by creating a new Service Account using the [Create a Service Account for Segment](#create-a-service-account-for-segment) section.
118119
Then, head to your warehouse's connection settings and update with the
119-
**Credentials** you created. Once you've verified that data is loading properly
120+
credentials you created. Once you've verified that data is loading properly
120121
to your warehouse, [remove access to the shared Service Account](#remove-access-to-the-shared-service-account).
121122

122123
### Remove access to the shared Service Account
123124
You can remove access to the shared Service Account
124125
(`[email protected]`) using the following instructions:
125126

127+
To remove access to the shared Service Account:
126128
1. Create a [new Service Account for Segment](#create-a-service-account-for-segment) using the linked instructions.
127129
2. Verify that the data is loading into your warehouse.
128-
3. Sign in to the [Google Developers Console](https://console.developers.google.com).
130+
3. Sign in to the [Google Developers Console](https://console.developers.google.com){:target="_blank"}.
129131
4. Open the IAM & Admin product, and select **IAM**.
130132
5. From the list of projects, select the project that has BigQuery enabled.
131133
6. On the project's page, select the **Permissions** tab, and then click **view by PRINCIPALS**.
132134
7. Select the checkbox for the `[email protected]` account and then click **Remove** to remove access to this shared Service Account.
133135

134-
For more information about managing IAM access, see Google's documentation, [Manage access to projects, folders, and organization](https://cloud.google.com/iam/docs/granting-changing-revoking-access).
136+
For more information about managing IAM access, refer to Google's documentation, [Manage access to projects, folders, and organization](https://cloud.google.com/iam/docs/granting-changing-revoking-access){:target="_blank"}.
135137

136138

137139
## Best Practices
@@ -142,35 +144,30 @@ BigQuery charges based on the amount of data scanned by your queries. Views are
142144
a derived view over your tables that Segment uses for de-duplication of events.
143145
Therefore, Segment recommends you query a specific view whenever possible to avoid
144146
duplicate events and historical objects. It's important to note that BigQuery
145-
views are not cached.
147+
views aren't cached.
148+
149+
> note "Understanding BigQuery views"
150+
> BigQuery's views are logical views, not materialized views, which means that the query that defines the view is re-executed every time the view is queried. Queries are billed according to the total amount of data in all table fields referenced directly or indirectly by the top-level query.
146151
147-
> BigQuery's views are logical views, not materialized views, which means that
148-
> the query that defines the view is re-executed every time the view is queried.
149-
> Queries are billed according to the total amount of data in all table fields
150-
> referenced directly or indirectly by the top-level query.
151-
152-
To save more money, you can query the view and set a [destination
153-
table](https://cloud.google.com/bigquery/docs/tables), and then query the
152+
To save money, you can query the view and set a [destination
153+
table](https://cloud.google.com/bigquery/docs/tables){:target="_blank"}, and then query the
154154
destination table.
155155

156156
### Query structure
157157

158-
If you typically start exploratory data analysis with `SELECT *` consider
158+
If you start exploratory data analysis with `SELECT *`, consider
159159
specifying the fields to reduce costs.
160160

161-
See the section on [partitioned tables](#partitioned-tables) for details on
161+
Refer to the section on [partitioned tables](#partitioned-tables) for details on
162162
querying sub-sets of tables.
163163

164164

165165
## FAQs
166166

167167
### I need more than 60 days of data in my views. Can I change the view definition?
168168

169-
Absolutely! You will just need to modify one of the references to 60 in the view
170-
definition to the number of days of your choosing.
171-
172-
Segment chose 60 days as it suits the needs of most customers. However,
173-
you're welcome to update the definition of the view as long as the name stays
169+
Yes! You just need to modify one of the references to `60` in the view
170+
definition to the number of days of your choosing. You can update the definition of the view as long as the name stays
174171
the same.
175172

176173
Here is the base query Segment uses when first setting up your views. Included in the base query are the placeholders (`%s.%s.%s`) that you would want to include the project,
@@ -191,38 +188,35 @@ WHERE ROW_NUMBER = 1
191188

192189
BigQuery offers both a scalable, pay-as-you-go pricing plan based on the amount
193190
of data scanned, or a flat-rate monthly cost. You can learn more about BigQuery
194-
pricing [here](https://cloud.google.com/bigquery/pricing).
191+
pricing [on Google Cloud's BigQuery pricing page](https://cloud.google.com/bigquery/pricing){:target="_blank"}.
195192

196193
BigQuery allows you to set up [Cost Controls and
197-
Alerts](https://cloud.google.com/bigquery/cost-controls) to help control and
198-
monitor costs. If you want to learn more about what BigQuery will cost you,
199-
they've provided [this
200-
calculator](https://cloud.google.com/products/calculator/) to estimate your
194+
Alerts](https://cloud.google.com/bigquery/cost-controls){:target="_blank"} to help control and
195+
monitor costs. If you want to learn more about the costs associated with BigQuery,
196+
Google Cloud provides [a
197+
calculator](https://cloud.google.com/products/calculator/){:target="_blank"} to estimate your
201198
costs.
202199

203200
### How do I query my data in BigQuery?
204201

205-
You can connect to BigQuery using a BI tool like Mode or Looker, or query
202+
You can connect a BI tool like Mode or Looker to BigQuery, or query
206203
directly from the BigQuery console.
207204

208-
BigQuery now supports standard SQL, which you can enable using their query UI.
209-
This does not work with views, or with a query that uses table range
205+
BigQuery supports standard SQL, which you can enable [using Google Cloud's query UI](https://cloud.google.com/bigquery/docs/reference/standard-sql/introduction#changing_from_the_default_dialect){:target="_blank"}.
206+
This doesn't work with views, or with a query that uses table range
210207
functions.
211208

212209
### Does Segment support streaming inserts?
213210

214-
Segment's connector does not support streaming inserts at this time. If you have
215-
a need for streaming data into BigQuery, [contact Segment support](https://segment.com/requests/integrations/).
211+
Segment's connector doesn't support streaming inserts at this time. If you have
212+
a need for streaming data into BigQuery, [contact Segment support](https://segment.com/requests/integrations/){:target="_blank"}.
216213

217214
### Can I customize my sync schedule?
218215

219216
{% include content/warehouse-sync-sched.md %}
220217

221-
![sync schedule image](images/syncsched.png)
222-
223218
## Troubleshooting
224219

225-
### I'm seeing duplicates in my tables.
220+
### I see duplicates in my tables.
226221

227-
This behavior is expected. Segment only de-duplicates data in your views. See the
228-
section on [views](#views) for more details.
222+
This behavior is expected. Segment only de-duplicates data in your views. Refer to the [schema section](#schema) for more details.

0 commit comments

Comments
 (0)