Skip to content

fix(source): validate descending ranges and guard timestamp counters#66

Merged
oritwoen merged 2 commits intomainfrom
fix/source-range-validation
Mar 8, 2026
Merged

fix(source): validate descending ranges and guard timestamp counters#66
oritwoen merged 2 commits intomainfrom
fix/source-range-validation

Conversation

@oritwoen
Copy link
Owner

@oritwoen oritwoen commented Mar 8, 2026

Summary

  • reject descending and overflowing source ranges before processing to avoid underflow/overflow-driven incorrect workloads
  • fix timestamp microseconds workload accounting to match actual processing (+1 second + 1000 ms variants per second)
  • add focused unit tests for descending range/date rejection, overflow guards, pre-epoch date rejection, and microseconds-mode counting

Why

RangeSource and TimestampSource used inclusive-count arithmetic that could wrap on edge inputs (end - start + 1), and timestamp microseconds mode reported an incorrect total. This change makes source processing fail fast on invalid input and keeps progress/statistics consistent with real work.

Verification

  • lsp_diagnostics clean for modified files
  • cargo test
  • cargo build

@coderabbitai
Copy link

coderabbitai bot commented Mar 8, 2026

📝 Walkthrough

Walkthrough

Both RangeSource and timestamp handling now validate ranges (reject descending and pre-1970 dates) and use checked arithmetic to prevent size/overflow errors; new tests cover descending ranges, pre-epoch dates, and overflow scenarios.

Changes

Cohort / File(s) Summary
Range validation
src/source/range.rs
Added validation to reject descending ranges and used checked_sub/checked_add for overflow-safe count calculation; returns descriptive errors on overflow. Added tests for descending and overflowed ranges.
Timestamp validation
src/source/timestamps.rs
Normalized and validated dates in from_dates (reject pre-1970, enforce end ≥ start). Strengthened process with range ordering checks, checked arithmetic for counts, and overflow detection for microseconds-mode multiplication. Added tests for pre-epoch, descending, and overflow cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I bounded through ranges, careful and spry,
Counting each hop beneath a safe sky.
No descending leaps, no overflowed cheer,
Checked math keeps the path crystal clear.
Hop on, dear code — the bounds are near! 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main changes: validation of descending ranges and guarding of timestamp counters for overflow protection.
Description check ✅ Passed The description clearly explains the changes, their rationale, and verification steps, directly relating to the modifications in range.rs and timestamps.rs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/source-range-validation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 2 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant R as Runner
    participant TS as TimestampSource
    participant RS as RangeSource
    participant PB as ProgressBar
    participant O as Output/Stats

    Note over R,TS: Timestamp Initialization
    R->>TS: from_dates(start, end, micros)
    alt NEW: Date Validation
        TS->>TS: Check if dates < 1970-01-01
        TS->>TS: Check if end < start
        TS-->>R: Result::Err (Bail)
    else Valid Range
        TS-->>R: Result::Ok(TimestampSource)
    end

    Note over R,RS: Processing Flow (Range or Timestamp)
    R->>RS: process(transforms, deriver, matcher, output)
    
    RS->>RS: NEW: Guard descending range (end < start)
    
    RS->>RS: NEW: Calculate count via checked_sub().checked_add(1)
    alt Overflow detected
        RS-->>R: Result::Err (Overflow)
    end

    opt CHANGED: If TimestampSource in Microseconds mode
        TS->>TS: total = count * 1001 (Sec + MS variants)
    end

    RS->>PB: new(total)
    
    loop For each batch (BATCH_SIZE)
        RS->>RS: Generate u64 inputs
        RS->>O: Push processed results
        RS->>PB: inc(batch_size)
    end

    RS-->>R: ProcessStats
Loading

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 8, 2026

Merging this PR will not alter performance

✅ 7 untouched benchmarks


Comparing fix/source-range-validation (4a151fc) with main (dde3dd4)

Open in CodSpeed

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/source/timestamps.rs`:
- Around line 85-91: The microseconds overflow guard is incomplete: even though
you use checked_mul(1001) for total, individual timestamp math (TimestampSource
with microseconds=true) can still overflow when computing ts * 1000 + ms; update
the code to either reject unsafe bounds up front (e.g. when start or end are too
large for millisecond conversion) or compute ts_ms using checked arithmetic (use
checked_mul(1000) and checked_add(ms) and propagate an error) before any further
math. Apply the same checked-math or upfront-bound-rejection fix around the ts,
ms computation referenced in the block around the current ts * 1000 + ms (lines
~107-109) so both total and per-timestamp calculations are safe.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 90a0f852-aba4-43b8-8bc2-c9c5a765cb33

📥 Commits

Reviewing files that changed from the base of the PR and between dde3dd4 and a87d224.

📒 Files selected for processing (2)
  • src/source/range.rs
  • src/source/timestamps.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: cubic · AI code reviewer
  • GitHub Check: benchmarks
🧰 Additional context used
📓 Path-based instructions (6)
src/{transform,analyze,source,output,storage}/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

All core trait implementations must satisfy Send + Sync bounds

Files:

  • src/source/timestamps.rs
  • src/source/range.rs
src/{analyze,source}/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Use AtomicBool for early termination flags across multi-threaded contexts

Files:

  • src/source/timestamps.rs
  • src/source/range.rs
src/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.rs: Prefer ? operator over .unwrap() for error handling in Rust code
Avoid panic!() macro; use Result types for error handling instead
Never use type suppression or type-unsafe casting patterns (equivalent to as any or @ts-ignore)

Files:

  • src/source/timestamps.rs
  • src/source/range.rs
src/source/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Implement the Source trait for new input sources in dedicated src/source/{name}.rs files

Files:

  • src/source/timestamps.rs
  • src/source/range.rs
src/source/*.rs

📄 CodeRabbit inference engine (src/source/AGENTS.md)

Create new source in src/source/{name}.rs file

Files:

  • src/source/timestamps.rs
  • src/source/range.rs
src/source/!(mod).rs

📄 CodeRabbit inference engine (src/source/AGENTS.md)

src/source/!(mod).rs: Implement Source trait with process() method accepting transforms, deriver, matcher, and output parameters
Use Rayon par_chunks() for batch processing and parallelism in source implementations
Report progress via optional ProgressBar using indicatif::ProgressBar in process() method
All sources must implement Send + Sync traits for thread safety

Files:

  • src/source/timestamps.rs
  • src/source/range.rs
🧠 Learnings (10)
📚 Learning: 2026-03-05T12:48:44.245Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/source/AGENTS.md:0-0
Timestamp: 2026-03-05T12:48:44.245Z
Learning: Applies to src/source/!(mod).rs : Implement `Source` trait with `process()` method accepting `transforms`, `deriver`, `matcher`, and `output` parameters

Applied to files:

  • src/source/timestamps.rs
  • src/source/range.rs
📚 Learning: 2026-03-02T14:18:44.316Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/transform/AGENTS.md:0-0
Timestamp: 2026-03-02T14:18:44.316Z
Learning: Applies to src/transform/*.rs : Process inputs as batch operations using `&[Input]` as input and `&mut Vec<(String, Key)>` as output, where the first tuple element is a human-readable source description

Applied to files:

  • src/source/timestamps.rs
  • src/source/range.rs
📚 Learning: 2026-03-05T12:48:44.245Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/source/AGENTS.md:0-0
Timestamp: 2026-03-05T12:48:44.245Z
Learning: Applies to src/source/src/main.rs : Update `create_source()` function in `src/main.rs` to handle new source variants

Applied to files:

  • src/source/range.rs
📚 Learning: 2026-03-05T12:48:44.245Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/source/AGENTS.md:0-0
Timestamp: 2026-03-05T12:48:44.245Z
Learning: Applies to src/source/src/main.rs : Update `SourceCommand` enum in `src/main.rs` when adding a new source type

Applied to files:

  • src/source/range.rs
📚 Learning: 2026-03-05T12:48:44.245Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/source/AGENTS.md:0-0
Timestamp: 2026-03-05T12:48:44.245Z
Learning: Applies to src/source/!(mod).rs : Report progress via optional `ProgressBar` using `indicatif::ProgressBar` in `process()` method

Applied to files:

  • src/source/range.rs
📚 Learning: 2026-03-02T14:18:00.377Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-02T14:18:00.377Z
Learning: Applies to src/{transform,analyze}/**/*.rs : Always process inputs and outputs as batches: `&[Input]` → `&mut Vec<(String, Key)>`

Applied to files:

  • src/source/range.rs
📚 Learning: 2026-03-02T14:18:44.316Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/transform/AGENTS.md:0-0
Timestamp: 2026-03-02T14:18:44.316Z
Learning: Applies to src/transform/*.rs : Every transform must implement the `Transform` trait with `name()` and `apply_batch()` methods, optionally implementing `supports_gpu()` and `apply_batch_gpu()` for GPU acceleration

Applied to files:

  • src/source/range.rs
📚 Learning: 2026-03-02T14:18:44.316Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/transform/AGENTS.md:0-0
Timestamp: 2026-03-02T14:18:44.316Z
Learning: Applies to src/transform/*.rs : The Input struct provides multiple representations (string, u64, big-endian bytes, little-endian bytes) - use the appropriate representation for each transform algorithm

Applied to files:

  • src/source/range.rs
📚 Learning: 2026-03-02T14:18:00.377Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: AGENTS.md:0-0
Timestamp: 2026-03-02T14:18:00.377Z
Learning: Applies to src/main.rs : GPU implementations for generate/scan commands require Source trait redesign to support batch GPU processing

Applied to files:

  • src/source/range.rs
📚 Learning: 2026-03-05T12:40:29.177Z
Learnt from: oritwoen
Repo: oritwoen/vuke PR: 64
File: src/source/mod.rs:26-32
Timestamp: 2026-03-05T12:40:29.177Z
Learning: In `oritwoen/vuke`, `AGENTS.md` and `src/source/AGENTS.md` describe an **aspirational/future** `Source::process` trait signature (generic `T: Transform`, `O: Output`, with `no_gpu: bool` and `progress: Option<&ProgressBar>` parameters). The **current** implementation in `src/source/mod.rs` uses dynamic dispatch: `fn process(&self, transforms: &[Box<dyn Transform>], deriver: &KeyDeriver, matcher: Option<&Matcher>, output: &dyn Output) -> Result<ProcessStats>`. Migrating to the generic form is blocked by object-safety requirements (`Box<dyn Source>` is used in `src/main.rs`) and is a tracked future refactor. Do not flag the current dynamic-dispatch signature as non-conformant with AGENTS.md.

Applied to files:

  • src/source/range.rs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/source/timestamps.rs (1)

23-31: Consider replacing .unwrap() with explicit error handling.

While and_hms_opt(0, 0, 0) and and_hms_opt(23, 59, 59) will never return None for these literal values, using .unwrap() violates the coding guideline preferring ? operator for error handling.

Suggested improvement
         let start = NaiveDate::parse_from_str(start_date, "%Y-%m-%d")?
             .and_hms_opt(0, 0, 0)
-            .unwrap()
+            .ok_or_else(|| anyhow::anyhow!("Invalid start time"))?
             .and_utc()
             .timestamp();

         let end = NaiveDate::parse_from_str(end_date, "%Y-%m-%d")?
             .and_hms_opt(23, 59, 59)
-            .unwrap()
+            .ok_or_else(|| anyhow::anyhow!("Invalid end time"))?
             .and_utc()
             .timestamp();

As per coding guidelines: "Prefer ? operator over .unwrap() for error handling in Rust code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/source/timestamps.rs` around lines 23 - 31, Replace the .unwrap() calls
on the Option returned by and_hms_opt for the start and end timestamps with
explicit error propagation using .ok_or_else(...)? so failures map to the
function's Result error type; specifically change the start expression using
NaiveDate::parse_from_str(...).and_hms_opt(0,0,0).unwrap() to
.and_hms_opt(0,0,0).ok_or_else(|| /* descriptive error like format!("invalid
time for start_date {}", start_date) */ )? and do the same for the end
expression that calls .and_hms_opt(23,59,59); keep the surrounding
parse_from_str? propagation as-is so the function (timestamps creation code)
uses ? instead of panicking.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/source/timestamps.rs`:
- Around line 23-31: Replace the .unwrap() calls on the Option returned by
and_hms_opt for the start and end timestamps with explicit error propagation
using .ok_or_else(...)? so failures map to the function's Result error type;
specifically change the start expression using
NaiveDate::parse_from_str(...).and_hms_opt(0,0,0).unwrap() to
.and_hms_opt(0,0,0).ok_or_else(|| /* descriptive error like format!("invalid
time for start_date {}", start_date) */ )? and do the same for the end
expression that calls .and_hms_opt(23,59,59); keep the surrounding
parse_from_str? propagation as-is so the function (timestamps creation code)
uses ? instead of panicking.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 16437501-5d4e-423b-898e-2c80644037dd

📥 Commits

Reviewing files that changed from the base of the PR and between a87d224 and 4a151fc.

📒 Files selected for processing (1)
  • src/source/timestamps.rs
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: benchmarks
🧰 Additional context used
📓 Path-based instructions (6)
src/{transform,analyze,source,output,storage}/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

All core trait implementations must satisfy Send + Sync bounds

Files:

  • src/source/timestamps.rs
src/{analyze,source}/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Use AtomicBool for early termination flags across multi-threaded contexts

Files:

  • src/source/timestamps.rs
src/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

src/**/*.rs: Prefer ? operator over .unwrap() for error handling in Rust code
Avoid panic!() macro; use Result types for error handling instead
Never use type suppression or type-unsafe casting patterns (equivalent to as any or @ts-ignore)

Files:

  • src/source/timestamps.rs
src/source/**/*.rs

📄 CodeRabbit inference engine (AGENTS.md)

Implement the Source trait for new input sources in dedicated src/source/{name}.rs files

Files:

  • src/source/timestamps.rs
src/source/*.rs

📄 CodeRabbit inference engine (src/source/AGENTS.md)

Create new source in src/source/{name}.rs file

Files:

  • src/source/timestamps.rs
src/source/!(mod).rs

📄 CodeRabbit inference engine (src/source/AGENTS.md)

src/source/!(mod).rs: Implement Source trait with process() method accepting transforms, deriver, matcher, and output parameters
Use Rayon par_chunks() for batch processing and parallelism in source implementations
Report progress via optional ProgressBar using indicatif::ProgressBar in process() method
All sources must implement Send + Sync traits for thread safety

Files:

  • src/source/timestamps.rs
🧠 Learnings (2)
📚 Learning: 2026-03-05T12:48:44.245Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/source/AGENTS.md:0-0
Timestamp: 2026-03-05T12:48:44.245Z
Learning: Applies to src/source/!(mod).rs : Implement `Source` trait with `process()` method accepting `transforms`, `deriver`, `matcher`, and `output` parameters

Applied to files:

  • src/source/timestamps.rs
📚 Learning: 2026-03-02T14:18:44.316Z
Learnt from: CR
Repo: oritwoen/vuke PR: 0
File: src/transform/AGENTS.md:0-0
Timestamp: 2026-03-02T14:18:44.316Z
Learning: Applies to src/transform/*.rs : Process inputs as batch operations using `&[Input]` as input and `&mut Vec<(String, Key)>` as output, where the first tuple element is a human-readable source description

Applied to files:

  • src/source/timestamps.rs
🔇 Additional comments (4)
src/source/timestamps.rs (4)

35-48: LGTM!

The validation logic correctly rejects pre-epoch dates and descending ranges with clear, user-friendly error messages. The use of u64::try_from provides defense-in-depth after the negativity check.


74-79: Overflow guard is now complete.

The bound check self.end > (u64::MAX - 999) / 1000 ensures that for all ts in the range, the computation ts * 1000 + ms at line 116 cannot overflow. This addresses the previous review concern about incomplete microseconds overflow protection.


109-125: LGTM!

The parallel iteration correctly processes 1001 items per timestamp in microseconds mode (1 base + 1000 variants), with progress bar increments matching the workload calculation. The use of Rayon's into_par_iter() satisfies the coding guideline for parallelism.


137-202: LGTM! Comprehensive test coverage.

The tests effectively cover:

  • Descending date range rejection via from_dates
  • Pre-epoch date rejection
  • Microseconds mode workload accounting (1001 inputs per timestamp)
  • Descending range rejection in process() for direct struct construction
  • Overflow detection for extreme ranges

@oritwoen oritwoen self-assigned this Mar 8, 2026
@oritwoen oritwoen merged commit 2ab2a71 into main Mar 8, 2026
4 checks passed
@oritwoen oritwoen deleted the fix/source-range-validation branch March 8, 2026 19:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant