You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### Specifying the Spark BigQuery connector version in a Dataproc cluster
@@ -124,8 +124,8 @@ Using the standard `--jars` or `--packages` (or alternatively, the `spark.jars`/
124
124
125
125
To use another version than the built-in one, please do one of the following:
126
126
127
-
* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.42.2`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.2.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
128
-
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.42.2`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.42.2.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
127
+
* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.43.0`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.0.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
128
+
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.43.0`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.0.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
129
129
130
130
## Hello World Example
131
131
@@ -135,7 +135,7 @@ You can run a simple PySpark wordcount against the API without compilation by ru
135
135
136
136
```
137
137
gcloud dataproc jobs submit pyspark --cluster "$MY_CLUSTER" \
val df = spark.read.bigquery("bigquery-public-data.samples.shakespeare")
170
170
```
171
171
172
+
The connector supports reading from tables that contain spaces in their names.
173
+
174
+
**Note on ambiguous table names**: If a table name contains both spaces and a SQL keyword (e.g., "from", "where", "join"), it may be misinterpreted as a SQL query. To resolve this ambiguity, quote the table identifier with backticks \`. For example:
175
+
176
+
```
177
+
df = spark.read \
178
+
.format("bigquery") \
179
+
.load("`my_project.my_dataset.orders from 2023`")
180
+
```
181
+
172
182
For more information, see additional code samples in
**Important:** The connector does not configure the GCS connector, in order to avoid conflict with another GCS connector, if exists. In order to use the write capabilities of the connector, please configure the GCS connector on your cluster as explained [here](https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs).
359
369
370
+
### Running SQL on BigQuery
371
+
372
+
The connector supports Spark's [SparkSession#executeCommand](https://archive.apache.org/dist/spark/docs/3.0.0/api/java/org/apache/spark/sql/SparkSession.html#executeCommand-java.lang.String-java.lang.String-scala.collection.immutable.Map-)
373
+
with the Spark-X.Y-bigquery connectors. It can be used to run any arbitrary DDL/DML StandardSQL statement on BigQuery as
374
+
a query job. `SELECT` statements are not supported, as those are supported by reading from query as shown above. It can
375
+
be used as follows:
376
+
```
377
+
spark.executeCommand("bigquery", sql, options)
378
+
```
379
+
Notice the following:
380
+
* Notice that apart from the authentication options no other options are supported by this functionality.
381
+
* This API is available only in the Scala/Java API. PySpark does not provide it.
382
+
360
383
### Properties
361
384
362
385
The API Supports a number of options to configure the read
@@ -925,6 +948,16 @@ word-break:break-word
925
948
</td>
926
949
<td>Read/Write</td>
927
950
</tr>
951
+
<tr>
952
+
<td><code>credentialsScopes</code>
953
+
</td>
954
+
<td>Replaces the scopes of the Google Credentials if the credentials type supports that.
955
+
If scope replacement is not supported then it does nothing.
956
+
<br/>The value should be a comma separated list of valid scopes.
957
+
<br/> (Optional)
958
+
</td>
959
+
<td>Read/Write</td>
960
+
</tr>
928
961
</table>
929
962
930
963
Options can also be set outside of the code, using the `--conf` parameter of `spark-submit` or `--properties` parameter
**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.42.2.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
1325
+
**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.43.0.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
0 commit comments