Skip to content

Conversation

@alaahong
Copy link
Member

@alaahong alaahong commented Aug 30, 2025

Purpose of the pull request

handle very old Excel BIFF formats gracefully with no-op executor

Related: #558 #521

What's changed?

Before
image

After
image

Even the error message mentioned the usage of "OldExcelExtractor", but it won't fulfill the basic requirement. As BIFF2 for Excel 2.0 (1987), so just skip the hint with an alternative solution.

reference:
https://www.ibm.com/docs/en/personal-communications/13.0.0?topic=types-biff-files

Checklist

  • I have read the Contributor Guide.
  • I have written the necessary doc or comment.
  • I have added the necessary unit tests and all cases have passed.

@alaahong alaahong requested a review from delei August 30, 2025 07:49
@alaahong alaahong marked this pull request as ready for review August 30, 2025 07:50
Copy link
Member

@delei delei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alaahong alaahong requested a review from Copilot September 6, 2025 10:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds graceful handling for very old Excel BIFF formats (like BIFF2) that are not supported by Apache POI's HSSF library by implementing a no-op executor fallback mechanism instead of throwing exceptions.

  • Catches OldExcelFormatException during POI filesystem initialization and falls back to a no-op executor
  • Adds error handling in XlsSaxAnalyser to detect and gracefully handle old BIFF format exceptions during processing
  • Introduces comprehensive test coverage for the new fallback behavior

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
ExcelAnalyserImpl.java Implements no-op executor fallback for old BIFF formats during POI filesystem initialization
XlsSaxAnalyser.java Adds exception handling for old Excel formats during workbook event processing
ExcelAnalyserOldBiffTest.java Unit tests verifying no-op executor behavior for old BIFF format inputs
XlsReadFuzzTest.java Fuzz testing for XLS parsing robustness with arbitrary byte inputs

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +188 to +202
protected boolean isOldExcelFormat(Throwable t) {
for (int i = 0; i < 6 && t != null; i++, t = t.getCause()) {
if (t instanceof OldExcelFormatException) {
return true;
}
String msg = t.getMessage();
if (msg != null) {
String m = msg.toLowerCase();
if (m.contains("biff2") || m.contains("oldexcelformatexception") || m.contains("biff")) {
return true;
}
}
}
return false;
}
Copy link

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The string contains check for 'biff' is too broad and could match unrelated error messages. Consider making it more specific, such as 'old biff' or 'unsupported biff', to avoid false positives.

Copilot uses AI. Check for mistakes.
xlsReadContext.analysisEventProcessor().endSheet(xlsReadContext);
}

protected boolean isOldExcelFormat(Throwable t) {
Copy link

Copilot AI Sep 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This magic number '6' for the loop depth should be extracted as a named constant to improve code readability and maintainability.

Copilot uses AI. Check for mistakes.
Copy link
Member

@psxjoy psxjoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@psxjoy psxjoy merged commit 697f554 into apache:main Sep 10, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants