-
Notifications
You must be signed in to change notification settings - Fork 465
feat: handle very old Excel BIFF formats gracefully with no-op executor #559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
delei
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds graceful handling for very old Excel BIFF formats (like BIFF2) that are not supported by Apache POI's HSSF library by implementing a no-op executor fallback mechanism instead of throwing exceptions.
- Catches
OldExcelFormatExceptionduring POI filesystem initialization and falls back to a no-op executor - Adds error handling in
XlsSaxAnalyserto detect and gracefully handle old BIFF format exceptions during processing - Introduces comprehensive test coverage for the new fallback behavior
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| ExcelAnalyserImpl.java | Implements no-op executor fallback for old BIFF formats during POI filesystem initialization |
| XlsSaxAnalyser.java | Adds exception handling for old Excel formats during workbook event processing |
| ExcelAnalyserOldBiffTest.java | Unit tests verifying no-op executor behavior for old BIFF format inputs |
| XlsReadFuzzTest.java | Fuzz testing for XLS parsing robustness with arbitrary byte inputs |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| protected boolean isOldExcelFormat(Throwable t) { | ||
| for (int i = 0; i < 6 && t != null; i++, t = t.getCause()) { | ||
| if (t instanceof OldExcelFormatException) { | ||
| return true; | ||
| } | ||
| String msg = t.getMessage(); | ||
| if (msg != null) { | ||
| String m = msg.toLowerCase(); | ||
| if (m.contains("biff2") || m.contains("oldexcelformatexception") || m.contains("biff")) { | ||
| return true; | ||
| } | ||
| } | ||
| } | ||
| return false; | ||
| } |
Copilot
AI
Sep 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The string contains check for 'biff' is too broad and could match unrelated error messages. Consider making it more specific, such as 'old biff' or 'unsupported biff', to avoid false positives.
| xlsReadContext.analysisEventProcessor().endSheet(xlsReadContext); | ||
| } | ||
|
|
||
| protected boolean isOldExcelFormat(Throwable t) { |
Copilot
AI
Sep 6, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This magic number '6' for the loop depth should be extracted as a named constant to improve code readability and maintainability.
psxjoy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Purpose of the pull request
handle very old Excel BIFF formats gracefully with no-op executor
Related: #558 #521
What's changed?
Before

After

Even the error message mentioned the usage of "OldExcelExtractor", but it won't fulfill the basic requirement. As BIFF2 for Excel 2.0 (1987), so just skip the hint with an alternative solution.
reference:
https://www.ibm.com/docs/en/personal-communications/13.0.0?topic=types-biff-files
Checklist