Merged
Conversation
(cherry picked from commit 1fb08c6)
… Types, fixes apache#5947 (apache#6186) This commit introduces significant improvements to the Parquet Input and Output transforms by implementing comprehensive support for Parquet's Logical Types. Previously, the transforms relied primarily on primitive types, leading to conversions issues and errors with data when handling complex types, such as Timestamps.. Key Changes & Features: 1. Parquet Input: * Logical Type Mapping: Refactors the field discovery to use `LogicalTypeAnnotation` (instead of only primitive type), enabling correct mapping for semantic types. * Timestamp/Date Precision: Implements a conversion mechanism to map Parquet's timestamps units (MILLIS, MICROS..) to Hop's `TYPE_TIMESTAMP` and `TYPE_DATE`, preserving precision and handling UTC adjustments. * JSON Support: Adds explicit support for the JSON Logical Type, converting the Parquet binary/string data into Hop's `TYPE_JSON` object. * Decimal Handling: Uses precision and scale from `DecimalLogicalTypeAnnotation` to correctly convert binary/long Parquet decimals into Hop's `TYPE_BIGNUMBER`. 2. Parquet Output: * Date/Timestamp Consistency: Ensures that Hop's `TYPE_DATE` and `TYPE_TIMESTAMP` are consistently converted to a `LONG` representation with the Parquet `timestampMillis` logical annotation, which is the most compatible format. * Schema Mapping: Maps Hop's `TYPE_JSON` and `TYPE_UUID` to Parquet `STRING` types in the schema definition. Testing and Validation: * Test Data Enrichment: The test dataset (`golden-parquet-input.json`) was extended to include new fields: `isActive` (Boolean), `registrationTimestamp` (Timestamp), and `metadataJson` (JSON), ensuring the new types are covered end-to-end. * Unit Test Update: The unit test configuration (`0029-parquet-input UNIT.json`) was updated to map and validate the new fields, confirming the correct functionality of the transform. This resolves a major limitation regarding data fidelity when dealing with common modern Parquet schemas. (cherry picked from commit 6583915)
* Fix apache#5164 - value type in "Formula" transform injection * Fix apache#5225 - Reverted previous changes. Applied usage of InjectionTypeConverter to convet from type id to datatype name. Added integration tests * Fix apache#5225 - Fixed missing license header (cherry picked from commit a54ea4b)
…#6060 (apache#6065) * fix show filenames button throws an error in Get Data From XML Signed-off-by: lance <leehaut@gmail.com> * fix show filenames button throws an error in Get Data From XML Signed-off-by: lance <leehaut@gmail.com> --------- Signed-off-by: lance <leehaut@gmail.com> (cherry picked from commit ccf230a)
* Fix apache#5225 - Cannot wire Workflow Executor results rows hop to next transform * Fix apache#5225 - Added integration test to check for execution results' rows (cherry picked from commit 8ff5bd2)
…it mapping' function (apache#6223) (cherry picked from commit 9002cdc)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Please add a meaningful description for your change here
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
mvn clean install apache-rat:checkto make sure basic checks pass. A more thorough check will be performed on your pull request automatically.git rebase -i.addresses #123), if applicable.To make clear that you license your contribution under the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.