Skip to content

Conversation

@DmitriyLewen
Copy link
Contributor

@DmitriyLewen DmitriyLewen commented Dec 4, 2025

Description

This PR refactors Maven POM dependency parsing to use a composite package ID format that combines GAV (GroupId:ArtifactId:Version) coordinates with a hash of the POM file path. The new format is groupId:artifactId:version::hash8, where hash8 is an 8-character hash derived from both the GAV and the file path.

This change addresses issues with duplicate package identification in multi-module Maven projects where the same dependency can appear in different modules with potentially different transitive dependency trees.

Changes

  • Modified package ID generation (pkg/dependency/parser/java/pom/parse.go:897-911): Added packageID() function that generates IDs in the format GAV::hash8 using hashstructure to hash both the GAV and file path
  • Added RootFilePath tracking (pkg/dependency/parser/java/pom/artifact.go:35-38): Added RootFilePath field to artifact struct to store the root or module POM file path for hash calculation
  • Updated parsing logic (pkg/dependency/parser/java/pom/parse.go:129-131,141,175-176,259,268): Threading RootFilePath through artifact creation and analysis to ensure proper package ID generation
  • Enhanced vulnerability sorting (pkg/types/vulnerability.go:45-56): Updated BySeverity.Less() to include PkgID comparison for disambiguating packages with identical names but different IDs
  • Updated test fixtures: Modified integration test golden files and added comprehensive test cases for multi-module scenarios
  • Added filter sorting (pkg/result/filter.go): Sort vulnerabilities by PkgID to ensure deterministic filtering behavior

Benefits

  1. Unique identification: Prevents package ID collisions in multi-module Maven projects where the same dependency appears in different modules
  2. Accurate dependency tracking: Each module's dependencies are tracked independently with their own dependency trees
  3. Improved vulnerability reporting: Vulnerabilities can be correctly associated with specific module contexts
  4. Deterministic results: Consistent package ID generation across different runs and platforms (using filepath.ToSlash for cross-platform compatibility)
  5. Better filtering and sorting: Enhanced vulnerability filtering and sorting with PkgID-based disambiguation

Reasons

Problem: In multi-module Maven projects, the same dependency (same GAV coordinates) can appear in multiple modules. Previously, Trivy used only the GAV as the package ID, which caused:

  • Loss of module context for dependencies
  • Inability to track different transitive dependency trees for the same package in different modules
  • Potential issues with vulnerability deduplication and filtering

Related issues

Related PRs

Checklist

  • I've read the guidelines for contributing to this repository.
  • I've followed the conventions in the PR title.
  • I've added tests that prove my fix is effective or that my feature works.
  • I've updated the documentation with the relevant information (if needed).
  • I've added usage information (if the PR introduces new options)
  • I've included a "before" and "after" example to the description (if the PR is a user interface change).

@DmitriyLewen DmitriyLewen changed the title test: add module tests from #7879 fix(java): use hash of GAV+root pom file path for pkgID for packages from pom.xml files Dec 4, 2025
@DmitriyLewen DmitriyLewen self-assigned this Dec 4, 2025
@DmitriyLewen
Copy link
Contributor Author

@knqyf263 I created PR with your idea (#7879 (comment))
take a look, when you have time, please

@DmitriyLewen DmitriyLewen added the autoready Automatically mark PR as ready for review when all checks pass label Dec 4, 2025
@DmitriyLewen DmitriyLewen force-pushed the refactor/pom/hash-as-id branch from d5108b3 to c111673 Compare December 8, 2025 10:24
@github-actions github-actions bot marked this pull request as ready for review December 8, 2025 10:47
@github-actions github-actions bot requested a review from knqyf263 as a code owner December 8, 2025 10:47
@github-actions github-actions bot removed the autoready Automatically mark PR as ready for review when all checks pass label Dec 8, 2025
Copy link
Collaborator

@knqyf263 knqyf263 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left one comment with my idea, but overall it looks good to me.

Comment on lines 896 to 908
func packageID(name, version, pomFilePath string) string {
v := map[string]any{
"gav": dependency.ID(ftypes.Pom, name, version),
"path": filepath.ToSlash(pomFilePath),
}
h, err := hashstructure.Hash(v, hashstructure.FormatV2, &hashstructure.HashOptions{
ZeroNil: true,
IgnoreZeroValue: true,
})
if err != nil {
log.Warn("Failed to calculate the pom.xml hash", log.String("name", name), log.String("version", version), log.Err(err))
}
return strconv.FormatUint(h, 16)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about including GAV in the ID for readability? Something like com.example:log4shell:1.0-SNAPSHOT::a302c021 (GAV + 8-char hash suffix).

Suggested change
func packageID(name, version, pomFilePath string) string {
v := map[string]any{
"gav": dependency.ID(ftypes.Pom, name, version),
"path": filepath.ToSlash(pomFilePath),
}
h, err := hashstructure.Hash(v, hashstructure.FormatV2, &hashstructure.HashOptions{
ZeroNil: true,
IgnoreZeroValue: true,
})
if err != nil {
log.Warn("Failed to calculate the pom.xml hash", log.String("name", name), log.String("version", version), log.Err(err))
}
return strconv.FormatUint(h, 16)
func packageID(name, version, pomFilePath string) string {
gav := dependency.ID(ftypes.Pom, name, version)
v := map[string]any{
"gav": gav,
"path": filepath.ToSlash(pomFilePath),
}
h, err := hashstructure.Hash(v, hashstructure.FormatV2, &hashstructure.HashOptions{
ZeroNil: true,
IgnoreZeroValue: true,
})
if err != nil {
log.Warn("Failed to calculate hash", log.Err(err))
return gav // fallback to GAV only
}
// Append 8-character hash suffix
return fmt.Sprintf("%s::%s", gav, strconv.FormatUint(h, 16)[:8])
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good idea!
Updated in 1be9285

Comment on lines 386 to 394
// vulnID builds human-readable vulnerability package ID for POM target type (override hash to groupId:artifactId:version)
// and returns package ID for other target types.
func vulnID(t ftypes.TargetType, pkg ftypes.Package) string {
if t == ftypes.Pom {
pomIDInfoOnce()
return dependency.ID(t, pkg.Name, pkg.Version)
}
return pkg.ID
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

com.example:log4shell:1.0-SNAPSHOT::a302c021 doesn't need this function. Or, we can just cut ::.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Let's use new IDs.
If users ask to remove the hash suffix, we will trim it in another PR.

@DmitriyLewen DmitriyLewen changed the title fix(java): use hash of GAV+root pom file path for pkgID for packages from pom.xml files fix(java): add hash of GAV+root pom file path for pkgID for packages from pom.xml files Jan 14, 2026
@DmitriyLewen DmitriyLewen added this pull request to the merge queue Jan 15, 2026
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 15, 2026
@DmitriyLewen DmitriyLewen added this pull request to the merge queue Jan 15, 2026
Merged via the queue into aquasecurity:main with commit 809db46 Jan 15, 2026
17 checks passed
@DmitriyLewen DmitriyLewen deleted the refactor/pom/hash-as-id branch January 15, 2026 07:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(sbom): Duplicate SBOM packages for multi-module pom.xml files

2 participants