-
Notifications
You must be signed in to change notification settings - Fork 12
Fix timezone handling: Honor querySetting.timezone for interpretting timestamp with timezone #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix timezone handling: Honor querySetting.timezone for interpretting timestamp with timezone #78
Conversation
…timestamp with timezone
Datacloud jdbc driver was not handling timezone with timestamp properly.
example :
Tested with a sql query (below)
String query = "SELECT " + "'2024-06-15 12:00:00'::timestamp AS ts_naive, "
+ "'2024-06-15 12:00:00+00:00'::timestamptz AS ts_utc, "
+ "'2024-06-15 12:00:00+05:00'::timestamptz AS ts_plus5, "
+ "'2024-06-15 12:00:00-08:00'::timestamptz AS ts_minus8";
revealed that irrespective of the timezone, identical values were returned by the driver.
// All these connections returned identical values:
// UTC: 2024-01-01 17:30:00.0
// America/New_York: 2024-01-01 17:30:00.0 // Should be 12:30:00.0
// Asia/Tokyo: 2024-01-01 17:30:00.0 // Should be 02:30:00.0
Couple of issues were in the codebase
1. StreamingResultSet was using TimeZone.getDefault() (system timezone) instead of session timezone.
2. TimeStampVectorAccessor was defaulting to UTC when Arrow schema did not have timezone metadata.
Solution:
Enhanced ConnectionQuerySettings
1. Added TIMEZONE_SETTING constant for the timezone property
2. Added getSessionTimeZone() method with proper fallback to system default
3. Used existing querySetting.* prefix handling pattern
Updated StreamingResultSet and TimeStampVectorAccessor to correctly handle timezone info from the session and a fallback to system default.
Testing details(Same timestamps mentioned above returned the following)
UTC: 2024-01-01 12:00:00.0
America/New_York: 2024-01-01 07:00:00.0 ✓ (5 hours behind UTC)
Europe/London: 2024-01-01 12:00:00.0 ✓ (same as UTC in winter)
Asia/Tokyo: 2024-01-01 21:00:00.0 ✓ (9 hours ahead of UTC)
|
Thanks for the contribution! Unfortunately we can't verify the commit author(s): Praveen Kumar <p***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request. |
| public TimeStampVectorAccessor( | ||
| TimeStampVector vector, | ||
| IntSupplier currentRowSupplier, | ||
| QueryJDBCAccessorFactory.WasNullConsumer wasNullConsumer, | ||
| Properties connectionProperties) | ||
| throws SQLException { | ||
| super(currentRowSupplier, wasNullConsumer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is completely missing test coverage. Please add some explicit unit test coverage and also extend the jdbc-reference/src/main/java/com/salesforce/datacloud/reference/PostgresReferenceGenerator.java infrastructure to create a baseline for the scenarios from your commit message and ensure that Hyper is consistent with PostgreSQL JDBC driver
In preparation for cleaning up the various timeout handlings in the JDBC driver this change first cleans up the property handling. All properties will now be interpreted when the connection is created instead of the previous approach where it was more distributed at the places where the respective properties are used. For queryTimeout this for example had the benefit that it's much easier to understand how it is computed and the internal code doesn't need to check for invalid user inputs again and again. The property handling is now centralized into `...Properties` classes (like `ConnectionProperties` and `StatementProperties`) that coarsely match the area for which they are relevant. This structuring helped to for example identify that `resultset.bytelimit` was used at the wrong levels (see notes further down). For now the new approach only covers properties relevant for the Connection & Statement objects. _Future_ refactorings should also make use of it for the Authorization related properties. With the transition of properties to this new framework we also removed the "silent" ignoring of malformed property values. Just ignoring property values for valid keys can lead to surprising behavior like for example timeouts getting ignored, thus explicitly failing is beneficial. Removed Properties: - The `resultset.bytelimit` property was removed. It didn't work for usage in DBeaver or other generic JDBC clients anyway as it would only have effect when the max rows are limited. As currently it is sufficient that these settings can be configured through our API extensions instead of fixing and adding more properties for maxRows this change removes the property. Related Changes: - Double enforcement of query timeout on stub and in the executor: The query timeout was used both for individual generic RPC requests at the connection level as well as governing the whole end to end query execution flow encompassing multiple RPC requests. To clean this up the individual RPC call timeout is now detached from the queryTimeout and instead associated with the networkTimeout (with the standardized get / set support). - Don't keep `byteLimit` as a member in `HyperGprcClientExecutor`. The `byteLimit` is only relevant when there is a row limit. In the existing code the byteLimit was set as a member during construction for all use cases while the row limit was only passed as argument in specific `executeQuery` overloads. This split was highly confusing. The `byteLimit` is now explicitly passed in on the methods that need it alongside the row limit. - This change fixes the behavior of the `client info` related connection methods. Our driver currently doesn't support client info properties and thus we just return an empty properties set / null when returning those and ignore set values. This behavior mirrors the documented behavior of the MSSQL driver. - Removed support for legacy `serverSetting.` prefix The actual proper timeout handling alignment and cleanup will be done separately in a follow up.
This change adjusts the query timeout handling from being locally + gRPC deadline enforced to now primarily being enforced on Hyper server side. - This allows to return the server side error to the caller. Previously it would either raise an internal error by the JDBC driver or obscure gRPC timeout errors depending on which layer detected the problem first. - It now also guarantees that async queries don't exceed their timeout even if the client crashes / or the user flow forgets to explicitly invoke `cancel`. To ensure robustness of the query timeout handling in face of network problems or even server side bugs this change doesn't only rely on the server side timeouts. As a safety net there is also local enforcement (which happens with a configurable delay to allow the server to terminate first) As a related change the driver now also internally tracks timeouts more accurately (both for executeQuery as well as for the custom `wait*` methods). - The old approach was in some places relying on non monotonic clocks (`Instant`) and thus would have resulted in failures around system clock updates / summer time changes - The old approach sometimes waited for a multiple of the configured query timeout To remove this class of issues usages a new Deadline class is introduced that tracks the remaining time for a timeout.
- Fix TimeStampVectorAccessor to properly handle timezone-aware timestamps and naive timestamps with calendar support - Add session timezone extraction from Properties for consistent timezone conversion - Implement Avatica detection logic to distinguish between framework and user-provided calendars - Add overloaded QueryJDBCAccessorFactory.createAccessor method with Properties parameter - Fix PostgreSQL reference generator to use consistent JVM timezone setup - Add comprehensive unit tests for all new timezone handling functionality - Fix JSON field ordering in ColumnMetadata to prevent unnecessary diff noise in reference.json Key behavioral changes: - timestamptz values now correctly convert from vector timezone to session timezone - Naive timestamps respect user-provided calendars when different from session timezone - Avatica-provided calendars are ignored for naive timestamps to maintain literal values - Session timezone defaults to JVM timezone when not specified in Properties Tests: Added 8 new test methods covering timezone conversion, calendar handling, and session timezone extraction
…timestamp with timezone
Datacloud jdbc driver was not handling timezone with timestamp properly.
example :
Tested with a sql query (below)
String query = "SELECT " + "'2024-06-15 12:00:00'::timestamp AS ts_naive, "
+ "'2024-06-15 12:00:00+00:00'::timestamptz AS ts_utc, "
+ "'2024-06-15 12:00:00+05:00'::timestamptz AS ts_plus5, "
+ "'2024-06-15 12:00:00-08:00'::timestamptz AS ts_minus8";
revealed that irrespective of the timezone, identical values were returned by the driver.
// All these connections returned identical values:
// UTC: 2024-01-01 17:30:00.0
// America/New_York: 2024-01-01 17:30:00.0 // Should be 12:30:00.0
// Asia/Tokyo: 2024-01-01 17:30:00.0 // Should be 02:30:00.0
Couple of issues were in the codebase
1. StreamingResultSet was using TimeZone.getDefault() (system timezone) instead of session timezone.
2. TimeStampVectorAccessor was defaulting to UTC when Arrow schema did not have timezone metadata.
Solution:
Enhanced ConnectionQuerySettings
1. Added TIMEZONE_SETTING constant for the timezone property
2. Added getSessionTimeZone() method with proper fallback to system default
3. Used existing querySetting.* prefix handling pattern
Updated StreamingResultSet and TimeStampVectorAccessor to correctly handle timezone info from the session and a fallback to system default.
Testing details(Same timestamps mentioned above returned the following)
UTC: 2024-01-01 12:00:00.0
America/New_York: 2024-01-01 07:00:00.0 ✓ (5 hours behind UTC)
Europe/London: 2024-01-01 12:00:00.0 ✓ (same as UTC in winter)
Asia/Tokyo: 2024-01-01 21:00:00.0 ✓ (9 hours ahead of UTC)
- Fix TimeStampVectorAccessor to properly handle timezone-aware timestamps and naive timestamps with calendar support - Add session timezone extraction from Properties for consistent timezone conversion - Implement Avatica detection logic to distinguish between framework and user-provided calendars - Add overloaded QueryJDBCAccessorFactory.createAccessor method with Properties parameter - Fix PostgreSQL reference generator to use consistent JVM timezone setup - Add comprehensive unit tests for all new timezone handling functionality - Fix JSON field ordering in ColumnMetadata to prevent unnecessary diff noise in reference.json Key behavioral changes: - timestamptz values now correctly convert from vector timezone to session timezone - Naive timestamps respect user-provided calendars when different from session timezone - Avatica-provided calendars are ignored for naive timestamps to maintain literal values - Session timezone defaults to JVM timezone when not specified in Properties Tests: Added 8 new test methods covering timezone conversion, calendar handling, and session timezone extraction
…z-fix' into praveen2450-timestamp-timestamptz-fix # Conflicts: # jdbc-core/src/main/java/com/salesforce/datacloud/jdbc/core/ConnectionQuerySettings.java # jdbc-core/src/main/java/com/salesforce/datacloud/jdbc/core/HyperGrpcClientExecutor.java # jdbc-core/src/test/java/com/salesforce/datacloud/jdbc/core/ConnectionQuerySettingsTest.java
…r rebase - Remove ConnectionQuerySettings class deleted during rebase - Replace with direct "querySetting.timezone" property access - Update StreamingResultSet.of() to accept Properties parameter - Update listener classes to accept connection properties - Fix test method signatures to match updated APIs - Maintain all timestamp and timezone functionality
|
Closing as I raised another PR with #83, this PR lacked test cases and with latest changes having lots of changes. |
Fix timezone handling: Honor querySetting.timezone for interpretting timestamp with timezone
Problem
The Datacloud JDBC driver was not properly handling timezone settings for timestamp interpretation.
Regardless of the
querySetting.timezoneconnection property, all sessions returned identical timestampvalues, breaking timezone-aware applications.
Example Issue:
All timezone configurations returned identical values:
2024-01-01 17:30:00.02024-01-01 17:30:00.0(should be12:30:00.0)2024-01-01 17:30:00.0(should be02:30:00.0)Root Cause
StreamingResultSetusedTimeZone.getDefault()(system timezone) instead of session timezoneTimeStampVectorAccessordefaulted to UTC when Arrow schema lacked timezone metadata,ignoring the session timezone setting
Solution
Enhanced
ConnectionQuerySettings:TIMEZONE_SETTINGconstant for the timezone propertygetSessionTimeZone()method with proper fallback to system defaultquerySetting.*prefix patternUpdated timezone handling in:
TimeStampVectorAccessor: Now uses session timezone for interpretation when Arrow schema lacks timezone infoStreamingResultSet: Replaced system default with session timezoneQueryJDBCAccessorFactory,ArrowStreamReaderCursor,HyperGrpcClientExecutor: Added connection properties supportVerification
After the fix, the same query returns correct timezone-adjusted values:
2024-01-01 12:00:00.02024-01-01 07:00:00.0✓ (5 hours behind UTC)2024-01-01 12:00:00.0✓ (same as UTC in winter)2024-01-01 21:00:00.0✓ (9 hours ahead of UTC)