Skip to content

[BUG]: Reconciliation fails with AnalysisException when using jdbc_options: partition_column not found in JDBC relation #2247

@mikemillerr

Description

@mikemillerr

Is there an existing issue for this?

  • I have searched the existing issues

Category of Bug / Issue

Reconcile bug

Current Behavior

When running a reconciliation using an Oracle source with jdbc_options configured (specifically partition_column, lower_bound, upper_bound, and num_partitions), the process fails with an org.apache.spark.sql.AnalysisException.

Even though the column exists in the source table, Spark reports that the user-defined partition column is not found in the JDBC relation. This appears to be caused by how the partition_column string is handled or quoted before being passed to the Spark JDBC reader.

Error Snippet:

AnalysisException: User-defined partition column `id_col_name` not found in the JDBC relation: struct<HASH_VALUE_RECON: string, AMOUNT: decimal(10,0), ..., ID_COL_NAME: decimal(10,0), ...>

I found that the issue is related to backticks/quoting. I applied a local workaround in lakebridge/reconcile/connectors/jdbc_reader.py at line 18 by stripping the wrapping characters, which resolved the issue:

# Temporary fix/workaround
option_dict["partitionColumn"] = options.partition_column[1:-1]

Expected Behavior

he JDBC reader should correctly accept the partition_column defined in the YAML configuration, match it against the JDBC schema, and successfully split the data into the requested number of partitions without throwing a partition column not found error.

Steps To Reproduce

  1. Install databricks-labs-lakebridge==0.11.3.
  2. Configure a reconciliation with an Oracle source.
  3. Add jdbc_options to the source configuration:
jdbc_options:
    number_partitions: "8"
    partition_column: "ID_COL_NAME"
    lower_bound: "1"
    upper_bound: "10000000"
  1. Run the reconciliation.
  2. The process fails during the data fetching phase with a ReconciliationException wrapping a Spark AnalysisException.

Relevant log output or Exception details

14:57:29  WARNING [d.l.l.r.connectors.data_source] Runtime exception occurred while fetching data using SELECT ... FROM DP_EXCHANGE.dataset_010 : User-defined partition column `id_col_name` not found in the JDBC relation: struct<HASH_VALUE_RECON: string, AMOUNT: decimal(10,0), BIRTHDAY: timestamp, CREATED_AT: timestamp, ID_COL_NAME: decimal(10,0), IS_ACTIVE: decimal(1,0), PRICE: decimal(19,4), PRODUCT_ID: string, TRANSACTION_ID: string, TRANSACTION_TIME: timestamp, USER_CATEGORY: string ... 9 more fields>.

JVM stacktrace:
org.apache.spark.sql.AnalysisException
        at org.apache.spark.sql.errors.QueryCompilationErrors$.userDefinedPartitionNotFoundInJDBCRelationError(QueryCompilationErrors.scala:4420)
        at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$verifyAndGetNormalizedPartitionColumn$2(JDBCRelation.scala:198)
        ...
        at org.apache.spark.sql.classic.DataFrameReader.load(DataFrameReader.scala:170)

Logs Confirmation

  • I ran the command line with --debug
  • I have attached the lsp-server.log under USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log

Sample Query

SELECT LOWER(DBMS_CRYPTO.HASH(RAWTOHEX(COALESCE(TRIM(amount), '_null_recon_') || ... )), 2)) AS hash_value_recon, amount, birthday, created_at, id_col_name, is_active, price, product_id, transaction_id, transaction_time, user_category FROM ORCL_SCHEMA.TBL

Operating System

macOS

Version

latest via Databricks CLI

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions