Skip to content

Commit a88649d

Browse files
committed
Release 0.5.0.
1 parent 6b36f90 commit a88649d

File tree

7 files changed

+55
-17
lines changed

7 files changed

+55
-17
lines changed

CHANGES.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Release Notes
22

3-
## Next
3+
## 0.5.0 - 2025-05-16
44

55
* PR #72: feat: Dynamic column qualifier support for reading from Bigtable
66

README.md

Lines changed: 45 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ In Java and Scala applications, you can use different dependency management
1919
tools (e.g., Maven, sbt, or Gradle) to access the
2020
connector `com.google.cloud.spark.bigtable:spark-bigtable_2.13:<version>` or
2121
`com.google.cloud.spark.bigtable:spark-bigtable_2.12:<version>` (current
22-
`<version>` is `0.4.0`) and package it inside your application JAR
22+
`<version>` is `0.5.0`) and package it inside your application JAR
2323
using libraries such as Maven Shade Plugin. For PySpark applications, you can
2424
use the `--jars` flag to pass the GCS address of the connector when submitting
2525
it.
@@ -31,7 +31,7 @@ For Maven, you can add the following snippet to your `pom.xml` file:
3131
<dependency>
3232
<groupId>com.google.cloud.spark.bigtable</groupId>
3333
<artifactId>spark-bigtable_2.13</artifactId>
34-
<version>0.4.0</version>
34+
<version>0.5.0</version>
3535
</dependency>
3636
```
3737

@@ -40,20 +40,20 @@ For Maven, you can add the following snippet to your `pom.xml` file:
4040
<dependency>
4141
<groupId>com.google.cloud.spark.bigtable</groupId>
4242
<artifactId>spark-bigtable_2.12</artifactId>
43-
<version>0.4.0</version>
43+
<version>0.5.0</version>
4444
</dependency>
4545
```
4646

4747
For sbt, you can add the following to your `build.sbt` file:
4848

4949
```
5050
// for scala 2.13
51-
libraryDependencies += "com.google.cloud.spark.bigtable" % "spark-bigtable_2.13" % "0.4.0"
51+
libraryDependencies += "com.google.cloud.spark.bigtable" % "spark-bigtable_2.13" % "0.5.0"
5252
```
5353

5454
```
5555
// for scala 2.12
56-
libraryDependencies += "com.google.cloud.spark.bigtable" % "spark-bigtable_2.12" % "0.4.0"
56+
libraryDependencies += "com.google.cloud.spark.bigtable" % "spark-bigtable_2.12" % "0.5.0"
5757
```
5858

5959
Finally, you can add the following to your `build.gradle` file when using
@@ -62,14 +62,14 @@ Gradle:
6262
```
6363
// for scala 2.13
6464
dependencies {
65-
implementation group: 'com.google.cloud.bigtable', name: 'spark-bigtable_2.13', version: '0.4.0'
65+
implementation group: 'com.google.cloud.bigtable', name: 'spark-bigtable_2.13', version: '0.5.0'
6666
}
6767
```
6868

6969
```
7070
// for scala 2.12
7171
dependencies {
72-
implementation group: 'com.google.cloud.bigtable', name: 'spark-bigtable_2.12', version: '0.4.0'
72+
implementation group: 'com.google.cloud.bigtable', name: 'spark-bigtable_2.12', version: '0.5.0'
7373
}
7474
```
7575

@@ -157,6 +157,44 @@ columns and the `id` column is used as the row key. Note that you could also
157157
specify *compound* row keys,
158158
which are created by concatenating multiple DataFrame columns together.
159159

160+
#### Catalog with variable column definitions
161+
162+
You can also use `regexColumns` to match multiple columns in the same column
163+
family to a single data frame column. This can be useful in scenarios where
164+
you don't know the exact column qualifiers for your data ahead of time, like
165+
when your column qualifier is partially composed of other pieces of data.
166+
167+
For example this catalog:
168+
```
169+
{
170+
"table": {"name": "t1"},
171+
"rowkey": "id_rowkey",
172+
"columns": {
173+
"id": {"cf": "rowkey", "col": "id_rowkey", "type": "string"},
174+
},
175+
"regexColumns": {
176+
"metadata": {"cf": "info", "pattern": "\C*", "type": "long" }
177+
}
178+
}
179+
```
180+
181+
Would match all columns in the column family "info" and the result would be a
182+
DataFrame column named "metadata", where it's contents would be a Map of String
183+
to Long with the keys being the column qualifiers and the values are the results
184+
in those columns in Bigtable.
185+
186+
A few caveats:
187+
188+
- The values of all matching columns must be deserializable to the type defined
189+
in the catalog. If you expect to need more complex deserialization you can
190+
also define the type as `bytes` and run custom deserialization logic.
191+
- A catalog with regex columns cannot be used for writes.
192+
- Bigtable uses [RE2](https://github.com/google/re2/wiki/Syntax) for it's regex
193+
implementation, which has slight differences from other implementations.
194+
- Because columns may contain arbitrary characters, including new lines, it is
195+
advisable to use `\C` as the wildcard expression, since `.` will not match on
196+
those
197+
160198
### Writing to Bigtable
161199

162200
You can use the `bigtable` format along with specifying the Bigtable

pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<groupId>com.google.cloud.spark.bigtable</groupId>
2222
<artifactId>spark-bigtable-connector</artifactId>
2323
<packaging>pom</packaging>
24-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
24+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
2525
<name>Spark Bigtable Connector Build Parent</name>
2626
<description>Parent project for all the Spark Bigtable Connector artifacts</description>
2727
<url>https://github.com/GoogleCloudDataproc/spark-bigtable-connector</url>

spark-bigtable-core-it/pom.xml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,14 @@
2121
<parent>
2222
<groupId>com.google.cloud.spark.bigtable</groupId>
2323
<artifactId>spark-bigtable-connector</artifactId>
24-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
24+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
2525
<relativePath>../</relativePath>
2626
</parent>
2727

2828
<groupId>com.google.cloud.spark.bigtable</groupId>
2929
<artifactId>spark-bigtable-core-it</artifactId>
3030
<name>Google Bigtable - Spark Connector Integration Tests</name>
31-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
31+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
3232

3333
<dependencies>
3434
<dependency>
@@ -52,7 +52,7 @@
5252
<dependency>
5353
<groupId>com.google.cloud.spark.bigtable</groupId>
5454
<artifactId>${connector.artifact.id}</artifactId>
55-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
55+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
5656
</dependency>
5757

5858
<dependency>

spark-bigtable-core/src/main/scala/com/google/cloud/spark/bigtable/BigtableDefaultSource.scala

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ import org.apache.spark.sql.{DataFrame, SQLContext, SaveMode, Row => SparkRow}
3131
import org.apache.yetus.audience.InterfaceAudience
3232

3333
object UserAgentInformation {
34-
val CONNECTOR_VERSION = "0.4.0" // ${NEXT_VERSION_FLAG}
34+
val CONNECTOR_VERSION = "0.5.0" // ${NEXT_VERSION_FLAG}
3535
val DATA_SOURCE_VERSION = "V1"
3636
val DATAFRAME_TEXT = "DF/" + DATA_SOURCE_VERSION
3737
val RDD_TEXT = "RDD/"

spark-bigtable_2.12/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,14 @@
2121
<parent>
2222
<groupId>com.google.cloud.spark.bigtable</groupId>
2323
<artifactId>spark-bigtable-connector</artifactId>
24-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
24+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>com.google.cloud.spark.bigtable</groupId>
2929
<artifactId>spark-bigtable_2.12</artifactId>
3030
<name>Google Bigtable - Apache Spark Connector</name>
31-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
31+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
3232

3333
<properties>
3434
<scala.version>2.12.18</scala.version>

spark-bigtable_2.13/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,14 +21,14 @@
2121
<parent>
2222
<groupId>com.google.cloud.spark.bigtable</groupId>
2323
<artifactId>spark-bigtable-connector</artifactId>
24-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
24+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>com.google.cloud.spark.bigtable</groupId>
2929
<artifactId>spark-bigtable_2.13</artifactId>
3030
<name>Google Bigtable - Apache Spark Connector for Scala 2.13</name>
31-
<version>0.4.0</version> <!-- ${NEXT_VERSION_FLAG} -->
31+
<version>0.5.0</version> <!-- ${NEXT_VERSION_FLAG} -->
3232

3333
<properties>
3434
<scala.version>2.13.14</scala.version>

0 commit comments

Comments
 (0)