fix(clp-s): Skip empty files during compression to avoid false errors (fixes #1993).#2067
fix(clp-s): Skip empty files during compression to avoid false errors (fixes #1993).#2067junhaoliao wants to merge 1 commit intoy-scope:mainfrom
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (2)
WalkthroughThe PR adds checks to skip empty input files during ingestion and parsing. It detects empty files by querying filesystem size before processing in both JsonParser and log_converter components. Empty files are skipped with informational logging when the input source is the filesystem. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Description
When compressing a directory containing empty files (e.g., empty syslog rotation files),
clp-sfails with errors because
try_deduce_reader_type()returnsFileType::Unknownfor empty files —there are no bytes to inspect for type detection.
This PR adds an empty-file check in both clp-s compression code paths — the
log-converter(
--unstructuredmode in CLP-JSONcompress.sh) andJsonParser::ingest()(structured JSON mode) — so that empty filesystem files are skipped with an informational log message instead of causing task failures.This brings clp-s behaviour in line with CLP-TEXT (
clpbinary), which already handles empty filesgracefully by iterating over zero messages without error.
Checklist
breaking change.
Validation performed
1. Unstructured compression with
~/samples/hive-24hr(clp-s,--unstructured)Task: Verify that compressing
~/samples/hive-24hr— which contains empty syslog rotationfiles — with
--unstructuredno longer fails with "Received input that was not unstructuredlogtext".
Command:
Output:
Explanation: The compression job completes successfully with no task errors. Previously, the
empty syslog rotation files would cause task failures with "Received input that was not unstructured
logtext". Now they are skipped gracefully and all 2.44 GB of data compresses into 54.57 MB.
2. Structured JSON compression with empty files (clp-s)
Task: Verify that compressing a directory containing an empty file in structured JSON mode also
succeeds.
Command:
Output:
Explanation: The structured JSON compression also handles the empty file gracefully — no
"Could not deduce content type" error.
3. CLP-TEXT consistency check
Task: Verify that CLP-TEXT (
clpbinary) already handles empty files gracefully, confirmingthis fix makes clp-s consistent.
Command:
Output:
Explanation: CLP-TEXT successfully compresses the directory with the empty file — no errors. Both
engines now handle empty files gracefully.
4. Regression check: normal JSON compression
Task: Verify that normal JSON compression (without empty files) is not affected.
Command:
Output:
Explanation: Normal JSON compression continues to work correctly with no regressions.
Summary by CodeRabbit
Release Notes