Skip to content

Conversation

@vinishjail97
Copy link
Contributor

@vinishjail97 vinishjail97 commented Jan 5, 2026

Describe the issue this Pull Request addresses

PR adds support for external base files with file group prefixes, allowing external files to be organized in subdirectories within partitions. (Paimon for example organizes parquet files into partitions and each partition has multiple buckets).
The implementation consolidates all external file handling logic in ExternalFilePathUtil for better maintainability.

Summary and Changelog

Support for external files with file group prefix format: <fileName>_<commitTime>_fg%3D<prefix>_hudiext
Consolidated external file parsing logic in ExternalFilePathUtil.parseFileIdAndCommitTimeFromExternalFile()
Added comprehensive test suite for ExternalFilePathUtil.

Impact

Enables hudi metadata generation for external files that are organized in subdirectories within partitions, useful for generating hudi metadata for Paimon tables.

Risk Level

Low. The changes maintain backward compatibility with existing external files (without prefix) while adding support for the new format.

Documentation Update

None. Updated java doc for ExternalFilePathUtil.

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@vinishjail97
Copy link
Contributor Author

Error:  Plugin org.jacoco:jacoco-maven-plugin:0.8.12 or one of its dependencies could not be resolved:
Error:  	The following artifacts could not be resolved: org.jacoco:jacoco-maven-plugin:pom:0.8.12 (absent): Could not transfer artifact org.jacoco:jacoco-maven-plugin:pom:0.8.12 from/to central (https://repo.maven.apache.org/maven2): repo.maven.apache.org: Temporary failure in name resolution
Error:  -> [Help 1]
org.apache.maven.plugin.PluginResolutionException: Plugin org.jacoco:jacoco-maven-plugin:0.8.12 or one of its dependencies could not be resolved:
	The following artifacts could not be resolved: org.jacoco:jacoco-maven-plugin:pom:0.8.12 (absent): Could not transfer artifact org.jacoco:jacoco-maven-plugin:pom:0.8.12 from/to central (https://repo.maven.apache.org/maven2): repo.maven.apache.org: Temporary failure in name resolution

There are temporary dependency resolution errors from maven.

@github-actions github-actions bot added size:L PR with lines of changes in (300, 1000] and removed size:M PR with lines of changes in (100, 300] labels Jan 16, 2026
@vinishjail97 vinishjail97 force-pushed the Hudi-ExternalFileGroup branch from 9328ff3 to 0682a50 Compare January 21, 2026 01:20
@hudi-bot
Copy link
Collaborator

CI report:

Bot commands @hudi-bot supports the following commands:
  • @hudi-bot run azure re-run the last Azure build

@nsivabalan nsivabalan merged commit d79360c into apache:master Jan 26, 2026
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants