Skip to content

Bigquery 2.9.16: Using this connector in Spark is resulting in all values in the spark dataframe being the column names #1245

@dannnnthemannnn

Description

@dannnnthemannnn

Thanks for stopping by to let us know something could be better!

PLEASE READ: If you have a support contract with Google, please create an issue in the support console instead of filing on GitHub. This will ensure a timely response.

Please run down the following list and make sure you've tried the usual "quick fixes":

If you are still having issues, please include as much information as possible:

Environment details

  1. Specify the API at the beginning of the title. For example, "BigQuery: ...").
    General, Core, and Other are also allowed as types
  2. OS type and version: Mac 12.0
  3. Java version: 20.0.1
  4. version(s):

Steps to reproduce

  1. Hook this connector up in a spark job with the following code:
def querySpanner[T](sqlQuery: String)(implicit spark: SparkSession, enc: org.apache.spark.sql.Encoder[T]): Dataset[T] = {
  val url = "jdbc:cloudspanner:/projects/your-project-id/instances/your-instance-id/databases/your-database-id?credentials=$jsonKeyFilePath"

  // Read data using Spark
  val df = spark.read
    .format("jdbc")
    .option("url", url)
    .option("dbtable", "myTable")
    .option("driver", "com.google.cloud.spanner.jdbc.JdbcDriver")
    .load()

  // Convert DataFrame to Dataset
  df.as[T]
}
  1. See that it is returning the values as if every row is the column name:
    +----+---+
    |name| id|
    +----+---+
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    |name| id|
    +----+---+
    only showing top 20 rows

My query is: "SELECT name, id FROM OrgInfoV2"

Any additional information below

It seems similar to this issue:
https://stackoverflow.com/questions/66983401/spark-mariadb-jdbc-sql-query-returns-column-names-instead-of-column-values
or this one:
https://stackoverflow.com/questions/63177736/spark-read-as-jdbc-return-all-rows-as-columns-name

where it appears to be issues with the driver

Following these steps guarantees the quickest resolution possible.

Thanks!

Metadata

Metadata

Assignees

Labels

api: spannerIssues related to the googleapis/java-spanner-jdbc API.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions