Release 0.44.0.

kbuilder user · kbuilder user · commit fa1da36b5a4f · 2026-02-11T21:24:00.000Z
diff --git a/CHANGES.md b/CHANGES.md
@@ -1,6 +1,6 @@
 # Release Notes
 
-## Next
+## 0.44.0 - 2026-02-11
 * Added new connector, `spark-4.1-bigquery` aimed to be used in Spark 4.1. Like Spark 4.1, this connector requires at
   least Java 17 runtime. It is currently in preview mode.
 * `spark-4.0-bigquery` is generally available!
diff --git a/README.md b/README.md
@@ -20,7 +20,7 @@ The new API allows column and predicate filtering to only read the data you are
 
 #### Column Filtering
 
-Since BigQuery is [backed by a columnar datastore](https://cloud.google.com/blog/products/bigquery/inside-capacitor-bigquerys-next-generation-columnar-storage-format), it can efficiently stream data without reading all columns.
+Since BigQuery is [backed by a columnar datastore](https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-bigquerys-next-generation-columnar-storage-format), it can efficiently stream data without reading all columns.
 
 #### Predicate Filtering
 
@@ -57,14 +57,16 @@ The latest version of the connector is publicly available in the following links
 
 | version    | Link                                                                                                                                                                                                                   |
 |------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| Spark 3.5  | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.43.1.jar))                                        |
-| Spark 3.4  | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.43.1.jar))                                        |
-| Spark 3.3  | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar))                                        |
-| Spark 3.2  | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.43.1.jar))                                        |
-| Spark 3.1  | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.43.1.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.43.1.jar))                                        |
+| Spark 4.1  | `gs://spark-lib/bigquery/spark-4.1-bigquery-0.44.0-preview.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-4.1-bigquery-0.44.0-preview.jar))                        |
+| Spark 4.0  | `gs://spark-lib/bigquery/spark-4.0-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-4.0-bigquery-0.44.0.jar))                                        |
+| Spark 3.5  | `gs://spark-lib/bigquery/spark-3.5-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.5-bigquery-0.44.0.jar))                                        |
+| Spark 3.4  | `gs://spark-lib/bigquery/spark-3.4-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.4-bigquery-0.44.0.jar))                                        |
+| Spark 3.3  | `gs://spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar))                                        |
+| Spark 3.2  | `gs://spark-lib/bigquery/spark-3.2-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.2-bigquery-0.44.0.jar))                                        |
+| Spark 3.1  | `gs://spark-lib/bigquery/spark-3.1-bigquery-0.44.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-3.1-bigquery-0.44.0.jar))                                        |
 | Spark 2.4  | `gs://spark-lib/bigquery/spark-2.4-bigquery-0.37.0.jar`([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-2.4-bigquery-0.37.0.jar))                                                                  |
-| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.43.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.43.1.jar)) |
-| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.43.1.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.43.1.jar)) |
+| Scala 2.13 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.44.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.13-0.44.0.jar)) |
+| Scala 2.12 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.44.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.44.0.jar)) |
 | Scala 2.11 | `gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar` ([HTTP link](https://storage.googleapis.com/spark-lib/bigquery/spark-bigquery-with-dependencies_2.11-0.29.0.jar))                           |
 
 The first six versions are Java based connectors targeting Spark 2.4/3.1/3.2/3.3/3.4/3.5 of all Scala versions built on the new
@@ -107,14 +109,16 @@ repository. It can be used using the `--packages` option or the
 
 | version    | Connector Artifact                                                                 |
 |------------|------------------------------------------------------------------------------------|
-| Spark 3.5  | `com.google.cloud.spark:spark-3.5-bigquery:0.43.1`                    |
-| Spark 3.4  | `com.google.cloud.spark:spark-3.4-bigquery:0.43.1`                    |
-| Spark 3.3  | `com.google.cloud.spark:spark-3.3-bigquery:0.43.1`                    |
-| Spark 3.2  | `com.google.cloud.spark:spark-3.2-bigquery:0.43.1`                    |
-| Spark 3.1  | `com.google.cloud.spark:spark-3.1-bigquery:0.43.1`                    |
+| Spark 4.1  | `com.google.cloud.spark:spark-4.1-bigquery:0.44.0-preview`            |
+| Spark 4.0  | `com.google.cloud.spark:spark-4.0-bigquery:0.44.0`                    |
+| Spark 3.5  | `com.google.cloud.spark:spark-3.5-bigquery:0.44.0`                    |
+| Spark 3.4  | `com.google.cloud.spark:spark-3.4-bigquery:0.44.0`                    |
+| Spark 3.3  | `com.google.cloud.spark:spark-3.3-bigquery:0.44.0`                    |
+| Spark 3.2  | `com.google.cloud.spark:spark-3.2-bigquery:0.44.0`                    |
+| Spark 3.1  | `com.google.cloud.spark:spark-3.1-bigquery:0.44.0`                    |
 | Spark 2.4  | `com.google.cloud.spark:spark-2.4-bigquery:0.37.0`                                 |
-| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.43.1` |
-| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1` |
+| Scala 2.13 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.13:0.44.0` |
+| Scala 2.12 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.44.0` |
 | Scala 2.11 | `com.google.cloud.spark:spark-bigquery-with-dependencies_2.11:0.29.0`              |
 
 ### Specifying the  Spark BigQuery connector version in a Dataproc cluster
@@ -124,8 +128,8 @@ Using the standard `--jars` or `--packages` (or alternatively, the `spark.jars`/
 
 To use another version than the built-in one, please do one of the following:
 
-* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.43.1`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
-* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.43.1`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.43.1.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
+* For Dataproc clusters, using image 2.1 and above, add the following flag on cluster creation to upgrade the version `--metadata SPARK_BQ_CONNECTOR_VERSION=0.44.0`, or `--metadata SPARK_BQ_CONNECTOR_URL=gs://spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar` to create the cluster with a different jar. The URL can point to any valid connector JAR for the cluster's Spark version.
+* For Dataproc serverless batches, add the following property on batch creation to upgrade the version: `--properties dataproc.sparkBqConnector.version=0.44.0`, or `--properties dataproc.sparkBqConnector.uri=gs://spark-lib/bigquery/spark-3.3-bigquery-0.44.0.jar` to create the batch with a different jar. The URL can point to any valid connector JAR for the runtime's Spark version.
 
 ## Hello World Example
 
@@ -135,7 +139,7 @@ You can run a simple PySpark wordcount against the API without compilation by ru
 
 ```
 gcloud dataproc jobs submit pyspark --cluster "$MY_CLUSTER" \
-  --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.43.1.jar \
+  --jars gs://spark-lib/bigquery/spark-bigquery-with-dependencies_2.12-0.44.0.jar \
   examples/python/shakespeare.py
 ```
 
@@ -367,6 +371,31 @@ df.writeStream \
 
 **Important:** The connector does not configure the GCS connector, in order to avoid conflict with another GCS connector, if exists. In order to use the write capabilities of the connector, please configure the GCS connector on your cluster as explained [here](https://github.com/GoogleCloudPlatform/bigdata-interop/tree/master/gcs).
 
+#### Schema Behavior on Overwrite
+
+When using `SaveMode.Overwrite` (`.mode("overwrite")`), the connector **preserves the existing table's schema**.
+The data is truncated, but column types, descriptions, and policy tags are retained.
+
+```
+df.write \
+  .format("bigquery") \
+  .mode("overwrite") \
+  .option("temporaryGcsBucket","some-bucket") \
+  .save("dataset.table")
+```
+
+**Important:** If your DataFrame has a different schema than the existing table (e.g., changing a column from
+`INTEGER` to `DOUBLE`), the write will fail with a type mismatch error. To change the schema, either:
+- Drop the table before overwriting
+- Use BigQuery DDL to alter the table schema first
+
+For some of the schema difference, the following options can work with overwrite:
+Programmatic Relaxation: Set `.option("allowFieldRelaxation", "true")` for nullability changes and `.option("allowFieldAddition", "true")` for new columns.
+
+This behavior was introduced between version 0.22.0 and 0.41.0 to prevent accidental schema drift.
+
+**Note:** This behavior applies to both the `indirect` (default) and `direct` write methods.
+
 ### Running SQL on BigQuery
 
 The connector supports Spark's [SparkSession#executeCommand](https://archive.apache.org/dist/spark/docs/3.0.0/api/java/org/apache/spark/sql/SparkSession.html#executeCommand-java.lang.String-java.lang.String-scala.collection.immutable.Map-)
@@ -426,14 +455,30 @@ word-break:break-word
    </td>
    <td>Read/Write</td>
   </tr>
+  <tr valign="top">
+   <td><code>billingProject</code>
+   </td>
+   <td>The Google Cloud Project ID to use for <strong>billing</strong> (API calls, query execution).
+       <br/>(Optional. Defaults to the project of the Service Account being used)
+   </td>
+   <td>Read/Write</td>
+  </tr>
   <tr valign="top">
    <td><code>parentProject</code>
    </td>
-   <td>The Google Cloud Project ID of the table to bill for the export.
+   <td><strong>(Deprecated)</strong> Alias for <code>billingProject</code>.
        <br/>(Optional. Defaults to the project of the Service Account being used)
    </td>
    <td>Read/Write</td>
   </tr>
+  <tr valign="top">
+   <td><code>location</code>
+   </td>
+   <td>The BigQuery location where the data resides (e.g. US, EU, asia-northeast1).
+       <br/>(Optional. Defaults to BigQuery default)
+   </td>
+   <td>Read/Write</td>
+  </tr>
   <tr valign="top">
    <td><code>maxParallelism</code>
    </td>
@@ -1229,7 +1274,7 @@ using the following code:
 ```python
 from pyspark.sql import SparkSession
 spark = SparkSession.builder \
-  .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1") \
+  .config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.44.0") \
   .getOrCreate()
 df = spark.read.format("bigquery") \
   .load("dataset.table")
@@ -1238,15 +1283,15 @@ df = spark.read.format("bigquery") \
 **Scala:**
 ```scala
 val spark = SparkSession.builder
-.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.43.1")
+.config("spark.jars.packages", "com.google.cloud.spark:spark-bigquery-with-dependencies_2.12:0.44.0")
 .getOrCreate()
 val df = spark.read.format("bigquery")
 .load("dataset.table")
 ```
 
 In case Spark cluster is using Scala 2.12 (it's optional for Spark 2.4.x,
 mandatory in 3.0.x), then the relevant package is
-com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.43.1. In
+com.google.cloud.spark:spark-bigquery-with-dependencies_**2.12**:0.44.0. In
 order to know which Scala version is used, please run the following code:
 
 **Python:**
@@ -1270,14 +1315,14 @@ To include the connector in your project:
 <dependency>
   <groupId>com.google.cloud.spark</groupId>
   <artifactId>spark-bigquery-with-dependencies_${scala.version}</artifactId>
-  <version>0.43.1</version>
+  <version>0.44.0</version>
 </dependency>
 ```
 
 ### SBT
 
 ```sbt
-libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.43.1"
+libraryDependencies += "com.google.cloud.spark" %% "spark-bigquery-with-dependencies" % "0.44.0"
 ```
 
 ### Connector metrics and how to view them
@@ -1322,7 +1367,7 @@ word-break:break-word
 </table>
 
 
-**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.43.1.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
+**Note:** To use the metrics in the Spark UI page, you need to make sure the `spark-bigquery-metrics-0.44.0.jar` is the class path before starting the history-server and the connector version is `spark-3.2` or above.
 
 ## FAQ