Skip to content

Rahil/spark converter#12

Closed
rahil-c wants to merge 57 commits intomasterfrom
rahil/spark-converter
Closed

Rahil/spark converter#12
rahil-c wants to merge 57 commits intomasterfrom
rahil/spark-converter

Conversation

@rahil-c
Copy link
Owner

@rahil-c rahil-c commented Dec 3, 2025

Describe the issue this Pull Request addresses

Summary and Changelog

Impact

Risk Level

Documentation Update

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

voonhous and others added 30 commits November 5, 2025 15:20
Signed-off-by: TheR1sing3un <chaoyang@apache.org>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Signed-off-by: TheR1sing3un <chaoyang@apache.org>
* perf: reduce unnecessary row group metadata loading

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…nt deadlocks (apache#14225)

* Move hudi split loaders to resumable tasks architecture to prevent deadlocks.

* Address review comments: Add javadoc and debug logging

* Add testcases to verify the fix
…ed and add unpersist (apache#14069)

Co-authored-by: Lokesh Jain <ljain@Lokeshs-MacBook-Pro.local>
…e should the meta fields be eliminated (apache#14230)

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
1. Claim RFC-81: Introduce Primary Key Sorted Table

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…4161)

Co-authored-by: Jonathan Vexler <=>
Co-authored-by: sivabalan <n.siva.b@gmail.com>
Co-authored-by: Vamsi <vamsi@onehouse.ai>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Co-authored-by: Lin Liu <linliu.code@gmail.com>
1. introduce pk filter to log file

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…ark test to avoid flaky tests (apache#14198)

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…dd new APIs based on current usage of Avro schema (apache#14265)

Co-authored-by: Balaji Varadarajan <balaji@Balajis-Laptop.local>
Co-authored-by: Balaji Varadarajan <balaji@Balajis-Laptop.attlocal.net>
Co-authored-by: Timothy Brown <tim@onehouse.ai>
…ache#14287)

---------

Co-authored-by: Pavithran Ravichandiran <pavithran@Pavithrans-MacBook-Pro.local>
jonvex and others added 27 commits November 24, 2025 21:27
…pache#14060)

Co-authored-by: Jonathan Vexler <=>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
…rPushDown` (apache#14332)

1. push down pk filters to log file when spark enable `parquetFilterPushDown`

The previous judgment logic was a typo error.
Whether to push down depends on the `parquetFilterPushDown`,
while whether to perform a record filter at the parquet level depends on the `parquetRecordFilterEnabled`

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…ion in the compaction plan (apache#14362)

fix the metrics for file slice with filtered log files.

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
Co-authored-by: danny0405 <yuzhao.cyz@gmail.com>
the corner case: the load instant range is contained within one of the archived file instant range.

---------

Signed-off-by: TheR1sing3un <chaoyang@apache.org>
Co-authored-by: danny0405 <yuzhao.cyz@gmail.com>
Signed-off-by: TheR1sing3un <chaoyang@apache.org>
…14309)

* feat: Support read virutal metadata columns for Flink reader
…pache#17456)

* refactor: Add helper to get HoodieSchema in TableSchemaResolver

* address tim comment
…e start and end time for both active and archive timelines (apache#14261)

This PR introduces a new comprehensive show_timeline procedure for Hudi Spark SQL that provides detailed timeline information for all table operations. The procedure displays timeline instants including commits, deltacommits, compactions, clustering, cleaning, and rollback operations with support for both active and archived timelines and completed/pending state instants.

Features added:

Comprehensive timeline view:
Shows all timeline instants with detailed metadata including state transitions (REQUESTED, INFLIGHT, COMPLETED)

Time-based filtering:
Support for startTime and endTime parameters to filter results within specific time ranges

Archive timeline support:
showArchived parameter to include archived timeline data for complete historical view

Generic SQL filtering:
filter parameter supporting SQL expressions for flexible result filtering

Rich metadata output:
Includes formatted timestamps, rollback information, and table type details

---------

Co-authored-by: vamshikrishnakyatham <vamshikrishna.kyatham.22@gmail.com>
Co-authored-by: Pavithran Ravichandiran <pavithran@Pavithrans-MBP.attlocal.net>
Co-authored-by: Pavithran Ravichandiran <pavithran@Pavithrans-MacBook-Pro.local>
…pache#14061)

Co-authored-by: Jonathan Vexler <=>
Co-authored-by: Y Ethan Guo <ethan.guoyihua@gmail.com>
Co-authored-by: Timothy Brown <tim@onehouse.ai>
…pache#14311)

* apache#14267 - phase 2: Perform Column Statistics Schema Migration

* Change method parameters in HoodieTableMetadataUtil to HoodieSchema

* Fix type erasure issue due collector + stream usage

* Address comments

* Account for decimal being a bytes type

* Remove formatting to reduce delta 1

* Remove formatting to reduce delta 2

* Remove formatting to reduce delta 3

* Remove formatting to reduce delta 4

* Remove formatting to reduce delta 5

* Address comments in TestHoodieTableMetadataUtil

* Address comments in TestHoodieTableMetadataUtil (hudi-common)

* Address comments again

* Fix tests

* Address comments

* Fix checkstyle errors

* Use getTableSchema instead of getTableAvroSchema
@rahil-c rahil-c force-pushed the rahil/spark-converter branch from 7717a17 to 57b4de1 Compare December 3, 2025 23:36
@rahil-c rahil-c closed this Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.