-
Notifications
You must be signed in to change notification settings - Fork 83
Description
Is there an existing issue for this?
- I have searched the existing issues
Category of Bug / Issue
Reconcile bug
Current Behavior
When running a reconciliation using an Oracle source with jdbc_options configured (specifically partition_column, lower_bound, upper_bound, and num_partitions), the process fails with an org.apache.spark.sql.AnalysisException.
Even though the column exists in the source table, Spark reports that the user-defined partition column is not found in the JDBC relation. This appears to be caused by how the partition_column string is handled or quoted before being passed to the Spark JDBC reader.
Error Snippet:
AnalysisException: User-defined partition column `id_col_name` not found in the JDBC relation: struct<HASH_VALUE_RECON: string, AMOUNT: decimal(10,0), ..., ID_COL_NAME: decimal(10,0), ...>
I found that the issue is related to backticks/quoting. I applied a local workaround in lakebridge/reconcile/connectors/jdbc_reader.py at line 18 by stripping the wrapping characters, which resolved the issue:
# Temporary fix/workaround
option_dict["partitionColumn"] = options.partition_column[1:-1]Expected Behavior
he JDBC reader should correctly accept the partition_column defined in the YAML configuration, match it against the JDBC schema, and successfully split the data into the requested number of partitions without throwing a partition column not found error.
Steps To Reproduce
- Install
databricks-labs-lakebridge==0.11.3. - Configure a reconciliation with an Oracle source.
- Add jdbc_options to the source configuration:
jdbc_options:
number_partitions: "8"
partition_column: "ID_COL_NAME"
lower_bound: "1"
upper_bound: "10000000"- Run the reconciliation.
- The process fails during the data fetching phase with a
ReconciliationExceptionwrapping a SparkAnalysisException.
Relevant log output or Exception details
14:57:29 WARNING [d.l.l.r.connectors.data_source] Runtime exception occurred while fetching data using SELECT ... FROM DP_EXCHANGE.dataset_010 : User-defined partition column `id_col_name` not found in the JDBC relation: struct<HASH_VALUE_RECON: string, AMOUNT: decimal(10,0), BIRTHDAY: timestamp, CREATED_AT: timestamp, ID_COL_NAME: decimal(10,0), IS_ACTIVE: decimal(1,0), PRICE: decimal(19,4), PRODUCT_ID: string, TRANSACTION_ID: string, TRANSACTION_TIME: timestamp, USER_CATEGORY: string ... 9 more fields>.
JVM stacktrace:
org.apache.spark.sql.AnalysisException
at org.apache.spark.sql.errors.QueryCompilationErrors$.userDefinedPartitionNotFoundInJDBCRelationError(QueryCompilationErrors.scala:4420)
at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.$anonfun$verifyAndGetNormalizedPartitionColumn$2(JDBCRelation.scala:198)
...
at org.apache.spark.sql.classic.DataFrameReader.load(DataFrameReader.scala:170)Logs Confirmation
- I ran the command line with
--debug - I have attached the
lsp-server.logunder USER_HOME/.databricks/labs/remorph-transpilers/<converter_name>/lib/lsp-server.log
Sample Query
SELECT LOWER(DBMS_CRYPTO.HASH(RAWTOHEX(COALESCE(TRIM(amount), '_null_recon_') || ... )), 2)) AS hash_value_recon, amount, birthday, created_at, id_col_name, is_active, price, product_id, transaction_id, transaction_time, user_category FROM ORCL_SCHEMA.TBLOperating System
macOS
Version
latest via Databricks CLI