Skip to content

Commit ea93e5a

Browse files
authored
Merge pull request #268073 from Clare-Zheng82/0305-Add_GBQ_V2
[New feature] - Update Google BigQuery doc to V2 and add the legacy doc
2 parents 25087df + 5771113 commit ea93e5a

File tree

7 files changed

+280
-43
lines changed

7 files changed

+280
-43
lines changed

articles/data-factory/TOC.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -474,7 +474,11 @@ items:
474474
- name: Google Ads
475475
href: connector-google-adwords.md
476476
- name: Google BigQuery
477-
href: connector-google-bigquery.md
477+
items:
478+
- name: Google BigQuery
479+
href: connector-google-bigquery.md
480+
- name: Google BigQuery (Legacy)
481+
href: connector-google-bigquery-legacy.md
478482
- name: Google Cloud Storage
479483
href: connector-google-cloud-storage.md
480484
- name: Google Sheets
Lines changed: 233 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,233 @@
1+
---
2+
title: Copy data from Google BigQuery using legacy
3+
titleSuffix: Azure Data Factory & Azure Synapse
4+
description: Learn how to copy data from Google BigQuery to supported sink data stores by using a copy activity in a legacy Azure Data Factory or Synapse Analytics pipeline.
5+
ms.author: jianleishen
6+
author: jianleishen
7+
ms.service: data-factory
8+
ms.subservice: data-movement
9+
ms.topic: conceptual
10+
ms.custom: synapse
11+
ms.date: 03/05/2024
12+
---
13+
14+
# Copy data from Google BigQuery using Azure Data Factory or Synapse Analytics (legacy)
15+
[!INCLUDE[appliesto-adf-asa-md](includes/appliesto-adf-asa-md.md)]
16+
17+
This article outlines how to use Copy Activity in Azure Data Factory and Synapse Analytics pipelines to copy data from Google BigQuery. It builds on the [Copy Activity overview](copy-activity-overview.md) article that presents a general overview of the copy activity.
18+
19+
>[!Important]
20+
>The service has released a new Google BigQuery connector which provides better native Google BigQuery support comparing to this ODBC-based implementation, refer to [Google BigQuery](connector-google-bigquery.md) article on details.
21+
22+
## Supported capabilities
23+
24+
This Google BigQuery connector is supported for the following capabilities:
25+
26+
| Supported capabilities|IR |
27+
|---------| --------|
28+
|[Copy activity](copy-activity-overview.md) (source/-)|① ②|
29+
|[Lookup activity](control-flow-lookup-activity.md)|① ②|
30+
31+
*① Azure integration runtime ② Self-hosted integration runtime*
32+
33+
For a list of data stores that are supported as sources or sinks by the copy activity, see the [Supported data stores](copy-activity-overview.md#supported-data-stores-and-formats) table.
34+
35+
The service provides a built-in driver to enable connectivity. Therefore, you don't need to manually install a driver to use this connector.
36+
37+
>[!NOTE]
38+
>This Google BigQuery connector is built on top of the BigQuery APIs. Be aware that BigQuery limits the maximum rate of incoming requests and enforces appropriate quotas on a per-project basis, refer to [Quotas & Limits - API requests](https://cloud.google.com/bigquery/quotas#api_requests). Make sure you do not trigger too many concurrent requests to the account.
39+
40+
## Get started
41+
42+
[!INCLUDE [data-factory-v2-connector-get-started](includes/data-factory-v2-connector-get-started.md)]
43+
44+
## Create a linked service to Google BigQuery using UI
45+
46+
Use the following steps to create a linked service to Google BigQuery in the Azure portal UI.
47+
48+
1. Browse to the Manage tab in your Azure Data Factory or Synapse workspace and select Linked Services, then click New:
49+
50+
# [Azure Data Factory](#tab/data-factory)
51+
52+
:::image type="content" source="media/doc-common-process/new-linked-service.png" alt-text="Screenshot of creating a new linked service with Azure Data Factory UI.":::
53+
54+
# [Azure Synapse](#tab/synapse-analytics)
55+
56+
:::image type="content" source="media/doc-common-process/new-linked-service-synapse.png" alt-text="Screenshot of creating a new linked service with Azure Synapse UI.":::
57+
58+
2. Search for Google and select the Google BigQuery connector.
59+
60+
:::image type="content" source="media/connector-google-bigquery-legacy/google-bigquery-legacy-connector.png" alt-text="Screenshot of the Google BigQuery connector.":::
61+
62+
1. Configure the service details, test the connection, and create the new linked service.
63+
64+
:::image type="content" source="media/connector-google-bigquery-legacy/configure-google-bigquery-legacy-linked-service.png" alt-text="Screenshot of linked service configuration for Google BigQuery.":::
65+
66+
## Connector configuration details
67+
68+
The following sections provide details about properties that are used to define entities specific to the Google BigQuery connector.
69+
70+
## Linked service properties
71+
72+
The following properties are supported for the Google BigQuery linked service.
73+
74+
| Property | Description | Required |
75+
|:--- |:--- |:--- |
76+
| type | The type property must be set to **GoogleBigQuery**. | Yes |
77+
| project | The project ID of the default BigQuery project to query against. | Yes |
78+
| additionalProjects | A comma-separated list of project IDs of public BigQuery projects to access. | No |
79+
| requestGoogleDriveScope | Whether to request access to Google Drive. Allowing Google Drive access enables support for federated tables that combine BigQuery data with data from Google Drive. The default value is **false**. | No |
80+
| authenticationType | The OAuth 2.0 authentication mechanism used for authentication. ServiceAuthentication can be used only on Self-hosted Integration Runtime. <br/>Allowed values are **UserAuthentication** and **ServiceAuthentication**. Refer to sections below this table on more properties and JSON samples for those authentication types respectively. | Yes |
81+
82+
### Using user authentication
83+
84+
Set "authenticationType" property to **UserAuthentication**, and specify the following properties along with generic properties described in the previous section:
85+
86+
| Property | Description | Required |
87+
|:--- |:--- |:--- |
88+
| clientId | ID of the application used to generate the refresh token. | Yes |
89+
| clientSecret | Secret of the application used to generate the refresh token. Mark this field as a SecureString to store it securely, or [reference a secret stored in Azure Key Vault](store-credentials-in-key-vault.md). | Yes |
90+
| refreshToken | The refresh token obtained from Google used to authorize access to BigQuery. Learn how to get one from [Obtaining OAuth 2.0 access tokens](https://developers.google.com/identity/protocols/OAuth2WebServer#obtainingaccesstokens) and [this community blog](https://jpd.ms/getting-your-bigquery-refresh-token-for-azure-datafactory-f884ff815a59). Mark this field as a SecureString to store it securely, or [reference a secret stored in Azure Key Vault](store-credentials-in-key-vault.md). | Yes |
91+
92+
The minimum scope required to obtain an OAuth 2.0 refresh token is `https://www.googleapis.com/auth/bigquery.readonly`. If you plan to run a query that might return large results, other scope might be required. For more information, refer to this [article](https://cloud.google.com/bigquery/docs/writing-results#large-results).
93+
94+
**Example:**
95+
96+
```json
97+
{
98+
"name": "GoogleBigQueryLinkedService",
99+
"properties": {
100+
"type": "GoogleBigQuery",
101+
"typeProperties": {
102+
"project" : "<project ID>",
103+
"additionalProjects" : "<additional project IDs>",
104+
"requestGoogleDriveScope" : true,
105+
"authenticationType" : "UserAuthentication",
106+
"clientId": "<id of the application used to generate the refresh token>",
107+
"clientSecret": {
108+
"type": "SecureString",
109+
"value":"<secret of the application used to generate the refresh token>"
110+
},
111+
"refreshToken": {
112+
"type": "SecureString",
113+
"value": "<refresh token>"
114+
}
115+
}
116+
}
117+
}
118+
```
119+
120+
### Using service authentication
121+
122+
Set "authenticationType" property to **ServiceAuthentication**, and specify the following properties along with generic properties described in the previous section. This authentication type can be used only on Self-hosted Integration Runtime.
123+
124+
| Property | Description | Required |
125+
|:--- |:--- |:--- |
126+
| email | The service account email ID that is used for ServiceAuthentication. It can be used only on Self-hosted Integration Runtime. | No |
127+
| keyFilePath | The full path to the `.p12` or `.json` key file that is used to authenticate the service account email address. | Yes |
128+
| trustedCertPath | The full path of the .pem file that contains trusted CA certificates used to verify the server when you connect over TLS. This property can be set only when you use TLS on Self-hosted Integration Runtime. The default value is the cacerts.pem file installed with the integration runtime. | No |
129+
| useSystemTrustStore | Specifies whether to use a CA certificate from the system trust store or from a specified .pem file. The default value is **false**. | No |
130+
131+
**Example:**
132+
133+
```json
134+
{
135+
"name": "GoogleBigQueryLinkedService",
136+
"properties": {
137+
"type": "GoogleBigQuery",
138+
"typeProperties": {
139+
"project" : "<project id>",
140+
"requestGoogleDriveScope" : true,
141+
"authenticationType" : "ServiceAuthentication",
142+
"email": "<email>",
143+
"keyFilePath": "<.p12 or .json key path on the IR machine>"
144+
},
145+
"connectVia": {
146+
"referenceName": "<name of Self-hosted Integration Runtime>",
147+
"type": "IntegrationRuntimeReference"
148+
}
149+
}
150+
}
151+
```
152+
153+
## Dataset properties
154+
155+
For a full list of sections and properties available for defining datasets, see the [Datasets](concepts-datasets-linked-services.md) article. This section provides a list of properties supported by the Google BigQuery dataset.
156+
157+
To copy data from Google BigQuery, set the type property of the dataset to **GoogleBigQueryObject**. The following properties are supported:
158+
159+
| Property | Description | Required |
160+
|:--- |:--- |:--- |
161+
| type | The type property of the dataset must be set to: **GoogleBigQueryObject** | Yes |
162+
| dataset | Name of the Google BigQuery dataset. |No (if "query" in activity source is specified) |
163+
| table | Name of the table. |No (if "query" in activity source is specified) |
164+
| tableName | Name of the table. This property is supported for backward compatibility. For new workload, use `dataset` and `table`. | No (if "query" in activity source is specified) |
165+
166+
**Example**
167+
168+
```json
169+
{
170+
"name": "GoogleBigQueryDataset",
171+
"properties": {
172+
"type": "GoogleBigQueryObject",
173+
"typeProperties": {},
174+
"schema": [],
175+
"linkedServiceName": {
176+
"referenceName": "<GoogleBigQuery linked service name>",
177+
"type": "LinkedServiceReference"
178+
}
179+
}
180+
}
181+
```
182+
183+
## Copy activity properties
184+
185+
For a full list of sections and properties available for defining activities, see the [Pipelines](concepts-pipelines-activities.md) article. This section provides a list of properties supported by the Google BigQuery source type.
186+
187+
### GoogleBigQuerySource as a source type
188+
189+
To copy data from Google BigQuery, set the source type in the copy activity to **GoogleBigQuerySource**. The following properties are supported in the copy activity **source** section.
190+
191+
| Property | Description | Required |
192+
|:--- |:--- |:--- |
193+
| type | The type property of the copy activity source must be set to **GoogleBigQuerySource**. | Yes |
194+
| query | Use the custom SQL query to read data. An example is `"SELECT * FROM MyTable"`. | No (if "tableName" in dataset is specified) |
195+
196+
**Example:**
197+
198+
```json
199+
"activities":[
200+
{
201+
"name": "CopyFromGoogleBigQuery",
202+
"type": "Copy",
203+
"inputs": [
204+
{
205+
"referenceName": "<GoogleBigQuery input dataset name>",
206+
"type": "DatasetReference"
207+
}
208+
],
209+
"outputs": [
210+
{
211+
"referenceName": "<output dataset name>",
212+
"type": "DatasetReference"
213+
}
214+
],
215+
"typeProperties": {
216+
"source": {
217+
"type": "GoogleBigQuerySource",
218+
"query": "SELECT * FROM MyTable"
219+
},
220+
"sink": {
221+
"type": "<sink type>"
222+
}
223+
}
224+
}
225+
]
226+
```
227+
228+
## Lookup activity properties
229+
230+
To learn details about the properties, check [Lookup activity](control-flow-lookup-activity.md).
231+
232+
## Related content
233+
For a list of data stores supported as sources and sinks by the copy activity, see [Supported data stores](copy-activity-overview.md#supported-data-stores-and-formats).

0 commit comments

Comments
 (0)