-
Notifications
You must be signed in to change notification settings - Fork 24
Description
Summary
The Databricks OSS JDBC driver, as of v3.0.1, currently limits prepared statements to a maximum of 256 parameters. This limitation prevents the driver from efficiently executing large batched inserts, as each statement must be sent individually.
This issue proposes either:
1. Lifting or expanding the 256-parameter limit, or
2. Providing a connection/property flag to disable parameterized queries — allowing client-side interpolation and enabling true multi-row batch inserts.
⸻
Background
In its current form, the OSS driver always relies on parameterized execution for batches. However, this approach:
• Caps the total number of bound parameters to 256.
• Forces each statement in a batch to be executed individually.
• Severely limits throughput for large datasets.
For example, inserting a large DataFrame results in multiple round-trips, each limited by the 256-parameter constraint:
INSERT INTO table (col1, col2, col3) VALUES (?, ?, ?)→ repeated many times, instead of a single optimized multi-row insert like:
INSERT INTO table (col1, col2, col3)
VALUES (1, 2, 3), (4, 5, 6), (7, 8, 9)⸻
Problem
• The 256-parameter cap prevents scaling inserts efficiently.
• The driver does not provide a configuration option to disable parameterized queries or to switch to interpolated mode.
• As a result, large batch writes perform poorly, especially in data-intensive workloads (e.g., Spark-to-JDBC writes).
⸻
Proposed Enhancement
Introduce one or both of the following:
1. Configurable parameter limit — allow increasing the number of parameters supported per prepared statement.
2. Property to disable query parameters (e.g., DisableParameterizedQueries=true) — enabling client-side interpolation and multi-row value construction for batched inserts.
This would provide parity with how other JDBC drivers (e.g., Snowflake, PostgreSQL, MySQL) handle large batches efficiently.
⸻
Benefits
• Significantly improved batch insert performance.
• Reduced network round-trips.
• Enables scalability for large data ingestion workloads.
⸻
Next Steps
• Evaluate feasibility of increasing or removing the 256-parameter cap.
• Discuss adding a driver-level flag to control parameterized query behavior.
• Test performance improvements with Spark’s JDBC write path once implemented.
This is a follow-up of improvements made on: #867