Skip to content

Conversation

@NicoHinderling
Copy link
Contributor

@NicoHinderling NicoHinderling commented Dec 10, 2025

In conjunction with getsentry/sentry-cli#3024 , addresses https://linear.app/getsentry/issue/EME-550/ios-insights-duplicate-files

This PR improves duplicate file detection for iOS assets by flattening nested children. Previously, assets inside .car files were invisible to duplicate detection.

  • Recursively flatten nested children from .car files into duplicate detection
  • Skip synthetic /Other paths representing residual bytes (corner case I found)
  • Add PDF/SVG to asset extraction file type list

The primary app I was testing on went from 120 duplicate files to 193. Those 73 new files are exclusively png, pdf and svg files

for f in files:
for f in all_files:
# Skip synthetic "/Other" nodes (residual bytes from .car files)
if f.path.endswith("/Other"):
Copy link
Contributor Author

@NicoHinderling NicoHinderling Dec 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super noisy case i found after flattening the files. It will report every Assets.car and Assets.car/Other as duplicates of each other

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that if the entire Assets.car is duplicated, we only show the asset catalog as duplicated and not all of the image entries inside it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, i've updated the code to handle this better

for f in files:
for f in all_files:
# Skip synthetic "/Other" nodes (residual bytes from .car files)
if f.path.endswith("/Other"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that if the entire Assets.car is duplicated, we only show the asset catalog as duplicated and not all of the image entries inside it?

privacy_group = next((g for g in result.groups if "xcprivacy" in g.name), None)
assert privacy_group is None

def test_nested_children_are_flattened_for_duplicate_detection(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test case for what I mentioned above: have two duplicate Assets.car's with duplicate children so we can confirm only the Assets.car gets flagged as the duplicate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

parts = path.split("/")
for depth in range(1, len(parts)):
parent_path = "/".join(parts[:depth])
if parent_path in containers:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made this method just generally more efficient

@NicoHinderling NicoHinderling merged commit e825dae into main Dec 11, 2025
20 checks passed
@NicoHinderling NicoHinderling deleted the improve-duplicate-file-support branch December 11, 2025 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants