-
-
Notifications
You must be signed in to change notification settings - Fork 373
fix: improve EmbededImagesRepairToolBase.GetImageFormat to support additional image header variants #3049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughRefactors image-format detection to use length-validated, sequential byte-signature checks (BMP, GIF, PNG, TIFF LE/BE, JPEG via SOI, SVG via "<svg"), adds SVG to ImageFormat, introduces unit tests and test assets, updates csproj to copy assets, and adds Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Caller as CallGetImageFormat
participant Detector as GetImageFormat
Note over Detector: Validate null / minimum length
Caller->>Detector: byte[] bytes
alt null or too short
Detector-->>Caller: ImageFormat.unknown / early return
else
Note over Detector: Sequential signature checks
Detector->>Detector: check BMP (0x42 0x4D)
alt matches
Detector-->>Caller: ImageFormat.bmp
else
Detector->>Detector: check GIF ("GIF87a"/"GIF89a")
alt matches
Detector-->>Caller: ImageFormat.gif
else
Detector->>Detector: check PNG (8-byte sig)
alt matches
Detector-->>Caller: ImageFormat.png
else
Detector->>Detector: check TIFF (II*/MM*)
alt matches
Detector-->>Caller: ImageFormat.tiff
else
Detector->>Detector: check JPEG (SOI 0xFF 0xD8)
alt matches
Detector-->>Caller: ImageFormat.jpeg
else
Detector->>Detector: check SVG ("<svg" or XML with svg)
alt matches
Detector-->>Caller: ImageFormat.svg
else
Detector-->>Caller: ImageFormat.unknown
end
end
end
end
end
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro ⛔ Files ignored due to path filters (6)
📒 Files selected for processing (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
.gitignore (1)
120-121: Simplify the.devcontainerpatterns.The two patterns are redundant. Using
.devcontaineris sufficient to ignore the directory and its contents;.devcontainer/*adds nothing. Better still, use the explicit directory pattern for clarity and consistency with git conventions.-.devcontainer -.devcontainer/* +/.devcontainer/The trailing slash signals "directory only" and anchors to root, which is clearer and idiomatic.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (6)
src/MigrationTools.Tests/Tools/Infrastructure/Assets/bmpsample.bmpis excluded by!**/*.bmpsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/gifsample.gifis excluded by!**/*.gifsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/jpgsample.jpgis excluded by!**/*.jpgsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/pngsample.pngis excluded by!**/*.pngsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/svgsample.svgis excluded by!**/*.svgsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/tiffsample.tiffis excluded by!**/*.tiff
📒 Files selected for processing (4)
.gitignore(1 hunks)src/MigrationTools.Tests/MigrationTools.Tests.csproj(1 hunks)src/MigrationTools.Tests/Tools/Infrastructure/EmbededImagesRepairToolBaseTests.cs(1 hunks)src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/MigrationTools.Tests/Tools/Infrastructure/EmbededImagesRepairToolBaseTests.cs (2)
src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs (3)
EmbededImagesRepairToolBase(14-143)EmbededImagesRepairToolBase(19-22)FixEmbededImages(28-28)src/MigrationTools/Tools/Infrastructure/ITool.cs (1)
ITool(7-9)
src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs
Show resolved
Hide resolved
| var svgTag = System.Text.Encoding.ASCII.GetBytes("<svg"); | ||
|
|
||
| if (jpeg3.SequenceEqual(bytes.Take(jpeg3.Length))) | ||
| return ImageFormat.jpeg; | ||
| // Check GIF | ||
| if (gif87a.SequenceEqual(bytes.Take(gif87a.Length)) || | ||
| gif89a.SequenceEqual(bytes.Take(gif89a.Length))) | ||
| return ImageFormat.gif; | ||
|
|
||
| if (jpeg4.SequenceEqual(bytes.Take(jpeg4.Length))) | ||
| return ImageFormat.jpeg; | ||
| // Check PNG | ||
| if (png.SequenceEqual(bytes.Take(png.Length))) | ||
| return ImageFormat.png; | ||
|
|
||
| // Check TIFF | ||
| if (tiffLE.SequenceEqual(bytes.Take(tiffLE.Length)) || | ||
| tiffBE.SequenceEqual(bytes.Take(tiffBE.Length))) | ||
| return ImageFormat.tiff; | ||
|
|
||
| if (jpeg5.SequenceEqual(bytes.Take(jpeg5.Length))) | ||
| // Check JPEG | ||
| if (jpegSOI.SequenceEqual(bytes.Take(jpegSOI.Length))) | ||
| return ImageFormat.jpeg; | ||
|
|
||
| // Check SVG (text-based) | ||
| if (bytes.Length >= 4 && svgTag.SequenceEqual(bytes.Take(svgTag.Length))) | ||
| return ImageFormat.svg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SVG detection must tolerate prologs and BOMs
Most real SVG assets start with a UTF‑8 BOM and/or an XML declaration (<?xml …?>) before the <svg> element. Requiring the very first four bytes to be <svg means we will continue to misclassify those files—the exact scenario this fix set out to solve. Strip the BOM, peel off the XML declaration, and perform a case-insensitive <svg check within the first chunk so that common SVG variants are recognised. Please add a regression test covering an SVG with an XML declaration once you wire this up.
- // SVG: starts with "<svg" (text-based)
- var svgTag = System.Text.Encoding.ASCII.GetBytes("<svg");
+ // SVG: starts with "<svg" (text-based), but real files often include a BOM or XML declaration first
@@
- // Check SVG (text-based)
- if (bytes.Length >= 4 && svgTag.SequenceEqual(bytes.Take(svgTag.Length)))
- return ImageFormat.svg;
+ // Check SVG (text-based)
+ if (bytes.Length > 0)
+ {
+ var prefix = Encoding.UTF8.GetString(bytes, 0, Math.Min(bytes.Length, 256))
+ .TrimStart('\uFEFF', ' ', '\t', '\r', '\n');
+
+ if (prefix.StartsWith("<?xml", StringComparison.OrdinalIgnoreCase))
+ {
+ var terminator = prefix.IndexOf("?>", StringComparison.Ordinal);
+ if (terminator >= 0 && terminator + 2 < prefix.Length)
+ {
+ prefix = prefix.Substring(terminator + 2).TrimStart(' ', '\t', '\r', '\n');
+ }
+ }
+
+ if (prefix.StartsWith("<svg", StringComparison.OrdinalIgnoreCase))
+ return ImageFormat.svg;
+ }Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs
around lines 88 to 110, the SVG detection only checks the first four bytes for
"<svg" and therefore misses files that start with a UTF‑8 BOM and/or an XML
declaration; change the logic to read a small initial slice (e.g., first 512
bytes), decode it tolerant of invalid sequences while stripping a UTF‑8 BOM if
present, remove any leading XML declaration (skip up through the first "?>" if
it appears), then perform a case‑insensitive search for "<svg" within that
cleaned head to determine SVG; make sure the code is null/exception safe for
non‑text bytes and retains existing binary checks order; add a regression unit
test that supplies an SVG file starting with a BOM and an XML prolog and asserts
the method returns ImageFormat.svg.
|
@johannedin can you sort out the user issue that CLAassistant is complaining of? Other than that Im inclined to approve. |
Will do! I try to sort out some of the other review comments as well. |
46c4649 to
a7b5b22
Compare
f81f161 to
941cfde
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs (2)
70-71: Stop throwing—this breaks the graceful-degradation contract.This guard detonates on short buffers instead of returning
unknown, which aborts the entire repair pass for truncated downloads or legitimately small payloads. Your own BMP check at line 62 tolerates 2-byte buffers, and JPEG SOI is also 2 bytes, so this 4-byte floor is arbitrary and wrong. The established contract is to fall back tounknownwhen detection fails, not to throw.Apply this diff:
- if (bytes == null || bytes.Length < 4) - throw new ArgumentException("Byte array too short to determine image format."); + if (bytes == null) + return ImageFormat.unknown;
108-110: SVG detection is still broken—real SVGs have BOMs and XML prologs.You're checking the first four bytes for
<svg, which fails on any SVG that starts with a UTF-8 BOM (\uFEFF) or an XML declaration (<?xml ...?>). That's the majority of production SVG files. Decode the first ~256 bytes as UTF-8, strip the BOM, skip past the XML prolog if present, trim whitespace, then look for<svgcase-insensitively.Refer to the previous review comment for a concrete implementation that handles these cases. Add a test asset with an XML declaration to verify the fix.
🧹 Nitpick comments (3)
.gitignore (1)
120-121: Drop the redundant wildcard pattern on line 121.In
.gitignore, ignoring a directory automatically ignores all its contents. The/.devcontainer/*entry on line 121 is redundant..devcontainer - .devcontainer/*src/MigrationTools.Tests/MigrationTools.Tests.csproj (1)
27-46: Simplify these paths—lose the redundant navigation.The
..navigation is pointless here: you're going up fromMigrationTools.Teststosrc, then diving straight back intoMigrationTools.Tests. Just reference the files directly relative to the project root:Tools\Infrastructure\Assets\bmpsample.bmp(or use forward slashes for better cross-platform hygiene).Apply this diff to clean up the paths:
<ItemGroup> - <Content Include="..\MigrationTools.Tests\Tools\Infrastructure\Assets\bmpsample.bmp"> + <Content Include="Tools/Infrastructure/Assets/bmpsample.bmp"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> </Content> - <Content Include="..\MigrationTools.Tests\Tools\Infrastructure\Assets\gifsample.gif"> + <Content Include="Tools/Infrastructure/Assets/gifsample.gif"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> </Content> - <Content Include="..\MigrationTools.Tests\Tools\Infrastructure\Assets\jpgsample.jpg"> + <Content Include="Tools/Infrastructure/Assets/jpgsample.jpg"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> </Content> - <Content Include="..\MigrationTools.Tests\Tools\Infrastructure\Assets\tiffsample.tiff"> + <Content Include="Tools/Infrastructure/Assets/tiffsample.tiff"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> </Content> - <Content Include="..\MigrationTools.Tests\Tools\Infrastructure\Assets\pngsample.png"> + <Content Include="Tools/Infrastructure/Assets/pngsample.png"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> </Content> - <Content Include="..\MigrationTools.Tests\Tools\Infrastructure\Assets\svgsample.svg"> + <Content Include="Tools/Infrastructure/Assets/svgsample.svg"> <CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory> </Content> </ItemGroup>src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs (1)
73-93: Consider direct byte-index checks instead ofTake+SequenceEqual.The
Take().SequenceEqual()pattern works but allocates enumerables unnecessarily. For fixed-length signatures, a manual loop orSpan<byte>comparison is cleaner and faster, especially since you're already doing length validation (or should be, once line 70 is fixed).Example alternative approach:
// Check GIF if (bytes.Length >= 6) { if ((bytes[0] == 0x47 && bytes[1] == 0x49 && bytes[2] == 0x46 && bytes[3] == 0x38 && bytes[5] == 0x61 && (bytes[4] == 0x37 || bytes[4] == 0x39))) return ImageFormat.gif; }Or use
Span<byte>slicing if targeting a modern runtime.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (6)
src/MigrationTools.Tests/Tools/Infrastructure/Assets/bmpsample.bmpis excluded by!**/*.bmpsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/gifsample.gifis excluded by!**/*.gifsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/jpgsample.jpgis excluded by!**/*.jpgsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/pngsample.pngis excluded by!**/*.pngsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/svgsample.svgis excluded by!**/*.svgsrc/MigrationTools.Tests/Tools/Infrastructure/Assets/tiffsample.tiffis excluded by!**/*.tiff
📒 Files selected for processing (4)
.gitignore(1 hunks)src/MigrationTools.Tests/MigrationTools.Tests.csproj(1 hunks)src/MigrationTools.Tests/Tools/Infrastructure/EmbededImagesRepairToolBaseTests.cs(1 hunks)src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/MigrationTools.Tests/Tools/Infrastructure/EmbededImagesRepairToolBaseTests.cs
🔇 Additional comments (2)
src/MigrationTools/Tools/Infrastructure/EmbededImagesRepairEnricherBase.cs (2)
104-106: JPEG detection fixed—SOI is the correct marker.Checking the mandatory SOI marker (FF D8) instead of specific 4-byte JFIF/Exif headers solves the original bug. This will catch all JPEG variants regardless of APP segment order.
140-141: Enum extension looks good.Adding
svgto theImageFormatenum is the right move to support SVG detection, once the detection logic itself is fixed.
Summary
This PR fixes an issue where
EmbededImagesRepairToolBase.GetImageFormatcould not correctly identify certain image formats due to incomplete header checks.Changes
Why
Previously, some valid images were misclassified as Authentication errors or not detected at all. This update ensures broader and more accurate format detection. (Might still have gaps).
Testing
Closes #3048
Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores