-
-
Notifications
You must be signed in to change notification settings - Fork 1
Improve Duplicate File Insight Coverage #519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| for f in files: | ||
| for f in all_files: | ||
| # Skip synthetic "/Other" nodes (residual bytes from .car files) | ||
| if f.path.endswith("/Other"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super noisy case i found after flattening the files. It will report every Assets.car and Assets.car/Other as duplicates of each other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that if the entire Assets.car is duplicated, we only show the asset catalog as duplicated and not all of the image entries inside it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, i've updated the code to handle this better
| for f in files: | ||
| for f in all_files: | ||
| # Skip synthetic "/Other" nodes (residual bytes from .car files) | ||
| if f.path.endswith("/Other"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm that if the entire Assets.car is duplicated, we only show the asset catalog as duplicated and not all of the image entries inside it?
| privacy_group = next((g for g in result.groups if "xcprivacy" in g.name), None) | ||
| assert privacy_group is None | ||
|
|
||
| def test_nested_children_are_flattened_for_duplicate_detection(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test case for what I mentioned above: have two duplicate Assets.car's with duplicate children so we can confirm only the Assets.car gets flagged as the duplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
| parts = path.split("/") | ||
| for depth in range(1, len(parts)): | ||
| parent_path = "/".join(parts[:depth]) | ||
| if parent_path in containers: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made this method just generally more efficient
In conjunction with getsentry/sentry-cli#3024 , addresses https://linear.app/getsentry/issue/EME-550/ios-insights-duplicate-files
This PR improves duplicate file detection for iOS assets by flattening nested children. Previously, assets inside .car files were invisible to duplicate detection.
The primary app I was testing on went from 120 duplicate files to 193. Those 73 new files are exclusively png, pdf and svg files