Skip to content

Conversation

@SharafMohamed
Copy link
Contributor

@SharafMohamed SharafMohamed commented Feb 11, 2026

Reference

Description

  • As the timestamp capture from a header is not stored in the variable dictionary, the header variable is treated specially:
    • 0 captures, it is treated as a normal variable in both compression and search.
    • 1 timestamp capture: it extracts timestamps + static-text in compression, thus it is not needed in the search lexer as timestamps aren't considered during log matching.
    • 1+ non-timestamp capture or 2+ timestamp captures: Disabled as TNFA subquery decomposition is needed.

Validation Performed

  • Unit-tests still pass.

Summary by CodeRabbit

  • New Features

    • Accepts header rules that capture a single timestamp without raising errors.
  • Refactor

    • Simplified lexer rule and token initialization and registration.
    • Streamlined construction and handling of newline and header rules.
    • Reduced and simplified delimiter-related error handling and control flow.

@SharafMohamed SharafMohamed requested a review from a team as a code owner February 11, 2026 10:47
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

Walkthrough

Wrapped lexer utilities in namespace clp, introduced type aliases and reduced includes; simplified token initialization and newline regex construction; removed verbose delimiter-error handling and added wildcard delimiter removal; added symbol-id registration for rules. At runtime, capture validation now uses optional_captures and skips header rules with a single timestamp capture.

Changes

Cohort / File(s) Summary
Lexer utils & token init
components/core/src/clp/Utils.cpp
Moved code into clp namespace, added type aliases, removed several includes, added cTokenHeader to token handling, simplified newline regex creation, ensured rule names are registered into symbol tables, removed verbose delimiter error-reporting, and applied remove_delimiters_from_wildcard where appropriate.
Runtime capture validation
components/core/src/clp/clp/run.cpp
Replaced unconditional throw on detected regex captures with optional_captures handling; if rule name is "header" with exactly one capture named "timestamp", skip validation, otherwise preserve existing error behavior for capture groups.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically identifies the main changes: allowing header variables with timestamp captures and removing an outdated delimiter check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@components/core/src/clp/clp/run.cpp`:
- Around line 67-83: The code treats any present optional_captures as indicating
a capture group and throws, but get_captures_from_rule_id() can return an empty
vector; update the logic in the block handling optional_captures (the variables
optional_captures, captures, rule_name, rule_id and schema_file_path) to check
captures.empty() (or captures.size() == 0) and skip/continue when there are zero
captures before the existing special-case for the "header" rule and before
throwing the runtime_error so that only rules with one or more capture groups
trigger the error.

In `@components/core/src/clp/Utils.cpp`:
- Around line 169-171: The call currently constructs a temporary
RegexASTLiteral<ByteNfaState> and then passes it to make_unique; replace that by
calling make_unique<RegexASTLiteral<ByteNfaState>> with the constructor argument
directly (e.g., pass '\n' directly) so the object is constructed in-place;
update the expression that currently wraps RegexASTLiteral<ByteNfaState>('\n')
inside make_unique to a direct make_unique<RegexASTLiteral<ByteNfaState>>('\n').

@SharafMohamed SharafMohamed changed the title fix(log-surgeon): Allow header variables to contain a timestamp capture timestamp is unused in subquery decomposition; Remove outdated delimiter check in search lexer. fix(log-surgeon): Allow header variables to contain a timestamp capture as timestamps are unused in subquery decomposition; Remove outdated delimiter check in search lexer. Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant