Skip to content

Commit fa1da36

Browse files
author
kbuilder user
committed
Release 0.44.0.
1 parent ffaf6a3 commit fa1da36

File tree

2 files changed

+71
-26
lines changed

2 files changed

+71
-26
lines changed

CHANGES.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Release Notes
22

3-
## Next
3+
## 0.44.0 - 2026-02-11
44
* Added new connector, `spark-4.1-bigquery` aimed to be used in Spark 4.1. Like Spark 4.1, this connector requires at
55
least Java 17 runtime. It is currently in preview mode.
66
* `spark-4.0-bigquery` is generally available!

README.md

Lines changed: 70 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ The new API allows column and predicate filtering to only read the data you are
2020

2121
#### Column Filtering
2222

23-
Since BigQuery is [backed by a columnar datastore](https://cloud.google.com/blog/products/bigquery/inside-capacitor-bigquerys-next-generation-columnar-storage-format), it can efficiently stream data without reading all columns.
23+
Since BigQuery is [backed by a columnar datastore](https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-bigquerys-next-generation-columnar-storage-format), it can efficiently stream data without reading all columns.
2424

2525
#### Predicate Filtering
2626

@@ -57,14 +57,16 @@ The latest version of the connector is publicly available in the following links
5757

5858
| version | Link |
5959
|------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
60-
| Spark 3.5 | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.43.1.jar)) |
61-
| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.43.1.jar)) |
62-
| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar)) |
63-
| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.43.1.jar)) |
64-
| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.43.1.jar)) |
60+
| Spark 4.1 | `gs://spark-lib/bigquery/spark-4.1-bigquery-0.44.0-preview.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-4.1-bigquery-0.44.0-preview.jar)) |
61+
| Spark 4.0 | `gs://spark-lib/bigquery/spark-4.0-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-4.0-bigquery-0.44.0.jar)) |
62+
| Spark 3.5 | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.44.0.jar)) |
63+
| Spark 3.4 | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.44.0.jar)) |
64+
| Spark 3.3 | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar)) |
65+
| Spark 3.2 | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.44.0.jar)) |
66+
| Spark 3.1 | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.44.0.jar)) |
6567
| Spark 2.4 | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.37.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.37.0.jar)) |
66-
| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.43.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.43.1.jar)) |
67-
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.43.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.43.1.jar)) |
68+
| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.44.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.44.0.jar)) |
69+
| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.44.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.44.0.jar)) |
6870
| Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar)) |
6971

7072
The first six versions are Java based connectors targeting Spark 2.4/3.1/3.2/3.3/3.4/3.5 of all Scala versions built on the new
@@ -107,14 +109,16 @@ repository. It can be used using the `--packages` option or the
107109

108110
| version | Connector Artifact |
109111
|------------|------------------------------------------------------------------------------------|
110-
| Spark 3.5 | `com.google.cloud.spark:spark-3.5-bigquery:0.43.1` |
111-
| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.43.1` |
112-
| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.43.1` |
113-
| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.43.1` |
114-
| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.43.1` |
112+
| Spark 4.1 | `com.google.cloud.spark:spark-4.1-bigquery:0.44.0-preview` |
113+
| Spark 4.0 | `com.google.cloud.spark:spark-4.0-bigquery:0.44.0` |
114+
| Spark 3.5 | `com.google.cloud.spark:spark-3.5-bigquery:0.44.0` |
115+
| Spark 3.4 | `com.google.cloud.spark:spark-3.4-bigquery:0.44.0` |
116+
| Spark 3.3 | `com.google.cloud.spark:spark-3.3-bigquery:0.44.0` |
117+
| Spark 3.2 | `com.google.cloud.spark:spark-3.2-bigquery:0.44.0` |
118+
| Spark 3.1 | `com.google.cloud.spark:spark-3.1-bigquery:0.44.0` |
115119
| Spark 2.4 | `com.google.cloud.spark:spark-2.4-bigquery:0.37.0` |
116-
| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.43.1` |
117-
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1` |
120+
| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.44.0` |
121+
| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.44.0` |
118122
| Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.29.0` |
119123

120124
### Specifying the Spark BigQuery connector version in a Dataproc cluster
@@ -124,8 +128,8 @@ Using the standard `--jars` or `--packages` (or alternatively, the `spark.jars`/
124128

125129
To use another version than the built-in one, please do one of the following:
126130

127-
* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.43.1`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
128-
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.43.1`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
131+
* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.44.0`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
132+
* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.44.0`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
129133

130134
## Hello World Example
131135

@@ -135,7 +139,7 @@ You can run a simple PySpark wordcount against the API without compilation by ru
135139

136140
```
137141
gcloud dataproc jobs submit pyspark --cluster "$MY_CLUSTER" \
138-
--jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.43.1.jar \
142+
--jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.44.0.jar \
139143
examples/python/shakespeare.py
140144
```
141145

@@ -367,6 +371,31 @@ df.writeStream \
367371

368372
**Important:** The connector does not configure the GCS connector, in order to avoid conflict with another GCS connector, if exists. In order to use the write capabilities of the connector, please configure the GCS connector on your cluster as explained [here](https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs).
369373

374+
#### Schema Behavior on Overwrite
375+
376+
When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**.
377+
The data is truncated, but column types, descriptions, and policy tags are retained.
378+
379+
```
380+
df.write \
381+
.format("bigquery") \
382+
.mode("overwrite") \
383+
.option("temporaryGcsBucket","some-bucket") \
384+
.save("dataset.table")
385+
```
386+
387+
**Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from
388+
`INTEGER` to `DOUBLE`), the write will fail with a type mismatch error. To change the schema, either:
389+
- Drop the table before overwriting
390+
- Use BigQuery DDL to alter the table schema first
391+
392+
For some of the schema difference, the following options can work with overwrite:
393+
Programmatic Relaxation: Set `.option("allowFieldRelaxation", "true")` for nullability changes and `.option("allowFieldAddition", "true")` for new columns.
394+
395+
This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift.
396+
397+
**Note:** This behavior applies to both the `indirect` (default) and `direct` write methods.
398+
370399
### Running SQL on BigQuery
371400

372401
The connector supports Spark's [SparkSession#executeCommand](https://archive.apache.org/dist/spark/docs/3.0.0/api/java/org/apache/spark/sql/SparkSession.html#executeCommand-java.lang.String-java.lang.String-scala.collection.immutable.Map-)
@@ -426,14 +455,30 @@ word-break:break-word
426455
</td>
427456
<td>Read/Write</td>
428457
</tr>
458+
<tr valign="top">
459+
<td><code>billingProject</code>
460+
</td>
461+
<td>The Google Cloud Project ID to use for <strong>billing</strong> (API calls, query execution).
462+
<br/>(Optional. Defaults to the project of the Service Account being used)
463+
</td>
464+
<td>Read/Write</td>
465+
</tr>
429466
<tr valign="top">
430467
<td><code>parentProject</code>
431468
</td>
432-
<td>The Google Cloud Project ID of the table to bill for the export.
469+
<td><strong>(Deprecated)</strong> Alias for <code>billingProject</code>.
433470
<br/>(Optional. Defaults to the project of the Service Account being used)
434471
</td>
435472
<td>Read/Write</td>
436473
</tr>
474+
<tr valign="top">
475+
<td><code>location</code>
476+
</td>
477+
<td>The BigQuery location where the data resides (e.g. US, EU, asia-northeast1).
478+
<br/>(Optional. Defaults to BigQuery default)
479+
</td>
480+
<td>Read/Write</td>
481+
</tr>
437482
<tr valign="top">
438483
<td><code>maxParallelism</code>
439484
</td>
@@ -1229,7 +1274,7 @@ using the following code:
12291274
```python
12301275
from pyspark.sql import SparkSession
12311276
spark = SparkSession.builder \
1232-
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1") \
1277+
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.44.0") \
12331278
.getOrCreate()
12341279
df = spark.read.format("bigquery") \
12351280
.load("dataset.table")
@@ -1238,15 +1283,15 @@ df = spark.read.format("bigquery") \
12381283
**Scala:**
12391284
```scala
12401285
val spark = SparkSession.builder
1241-
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1")
1286+
.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.44.0")
12421287
.getOrCreate()
12431288
val df = spark.read.format("bigquery")
12441289
.load("dataset.table")
12451290
```
12461291

12471292
In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x,
12481293
mandatory in 3.0.x), then the relevant package is
1249-
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.43.1. In
1294+
com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.44.0. In
12501295
order to know which Scala version is used, please run the following code:
12511296

12521297
**Python:**
@@ -1270,14 +1315,14 @@ To include the connector in your project:
12701315
<dependency>
12711316
<groupId>com.google.cloud.spark</groupId>
12721317
<artifactId>spark-bigquery-with-dependencies_${scala.version}</artifactId>
1273-
<version>0.43.1</version>
1318+
<version>0.44.0</version>
12741319
</dependency>
12751320
```
12761321

12771322
### SBT
12781323

12791324
```sbt
1280-
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.43.1"
1325+
libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.44.0"
12811326
```
12821327

12831328
### Connector metrics and how to view them
@@ -1322,7 +1367,7 @@ word-break:break-word
13221367
</table>
13231368

13241369

1325-
**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.43.1.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
1370+
**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.44.0.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
13261371

13271372
## FAQ
13281373

0 commit comments

Comments
 (0)