Skip to content

Conversation

@trosenblatt
Copy link
Contributor

@trosenblatt trosenblatt commented Dec 10, 2025

This PR builds on the previous CPU duration histogram metric PR to expose timing information to callers through the async scan method APIs.

Changes

1. New Public Types

ScanMetrics - Contains timing information:

  • total_duration: Total time for the entire scan
  • io_duration: Time spent in async I/O operations
  • num_async_rules: Number of async rules that were executed

ScanResult - Returned by async scan methods:

  • matches: Vec - The list of rule matches
  • metrics: ScanMetrics - Timing metrics for the scan

2. Updated Async Scan Methods

  • scan_async() now returns Result<ScanResult, ScannerError>
  • scan_async_with_options() now returns Result<ScanResult, ScannerError>

Breaking Change: Callers need to access matches via result.matches instead of using the result directly.

3. Synchronous Methods Unchanged

  • scan() and scan_with_options() still return Result<Vec<RuleMatch>, ScannerError>
  • No breaking changes for synchronous callers

4. Internal Changes

  • internal_scan now returns (Vec<RuleMatch>, Duration, usize) tuple
  • internal_scan_with_metrics constructs and returns ScanResult
  • Updated async tests to use new result.matches API

Migration Guide

Before:

let matches = scanner.scan_async(&mut event).await?;
for match in matches { ... }

After:

let result = scanner.scan_async(&mut event).await?;
for match in result.matches { ... }

// Access metrics
println!("Total: {:?}", result.metrics.total_duration);
println!("I/O: {:?}", result.metrics.io_duration);

Testing

  • All 293 tests passing
  • Updated async tests to use new API
  • Backward compatibility maintained for sync methods

🤖 Generated with Claude Code

trosenblatt and others added 2 commits December 10, 2025 14:31
This commit adds a CPU duration histogram metric to track scanning time
excluding I/O wait operations. This is the first PR in a series to improve
timing observability.

## Changes

1. **I/O Duration Tracking**:
   - Added `io_duration` field to `AsyncRuleInfo` to track async I/O time
   - Modified `process_async` to measure duration using `Instant::now()`
   - `internal_scan` aggregates I/O duration from all async jobs

2. **CPU Duration Histogram Metric**:
   - Added `cpu_duration` histogram field to `ScannerMetrics` using highcard labels
   - Metric name: `scanning.cpu_duration` (in nanoseconds)
   - Calculated as: `total_duration - io_duration` to exclude I/O wait time
   - Recorded in `internal_scan_with_metrics` after each scan

3. **Highcard Labels Support**:
   - Added `highcard_labels` field to `Scanner` and `ScannerBuilder`
   - Added `highcard_labels()` builder method for configuration
   - `ScannerMetrics::new()` now accepts both regular and highcard labels

4. **Internal Changes**:
   - `internal_scan` now returns `(Vec<RuleMatch>, Duration)` tuple
   - Scan method signatures remain unchanged for backward compatibility

## Why This Approach

This PR is intentionally minimal and doesn't change public scan method
signatures. This ensures that bumping the version in sds_shared_library
will be a no-op with no breaking changes. The histogram metric provides
visibility into CPU scanning time while maintaining API stability.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
This PR builds on the previous CPU duration histogram metric PR to expose
timing information to callers through the async scan method APIs.

## Changes

### 1. New Public Types

**ScanMetrics** - Contains timing information:
- `total_duration`: Total time for the entire scan
- `io_duration`: Time spent in async I/O operations
- `num_async_rules`: Number of async rules that were executed

**ScanResult** - Returned by async scan methods:
- `matches`: Vec<RuleMatch> - The list of rule matches
- `metrics`: ScanMetrics - Timing metrics for the scan

### 2. Updated Async Scan Methods

- `scan_async()` now returns `Result<ScanResult, ScannerError>`
- `scan_async_with_options()` now returns `Result<ScanResult, ScannerError>`

**Breaking Change**: Callers need to access matches via `result.matches` instead of using the result directly.

### 3. Synchronous Methods Unchanged

- `scan()` and `scan_with_options()` still return `Result<Vec<RuleMatch>, ScannerError>`
- No breaking changes for synchronous callers

### 4. Internal Changes

- `internal_scan` now returns `(Vec<RuleMatch>, Duration, usize)` tuple
- `internal_scan_with_metrics` constructs and returns `ScanResult`
- Updated async tests to use new `result.matches` API

## Migration Guide

### Before:
```rust
let matches = scanner.scan_async(&mut event).await?;
for match in matches { ... }
```

### After:
```rust
let result = scanner.scan_async(&mut event).await?;
for match in result.matches { ... }

// Access metrics
println!("Total: {:?}", result.metrics.total_duration);
println!("I/O: {:?}", result.metrics.io_duration);
```

## Testing

- All 293 tests passing
- Updated async tests to use new API
- Backward compatibility maintained for sync methods

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@trosenblatt trosenblatt requested a review from a team as a code owner December 10, 2025 13:48
trosenblatt and others added 4 commits December 10, 2025 14:55
Added two tests to verify the cpu_duration histogram metric is properly populated:

1. **should_submit_cpu_duration_metric_non_async**: Tests non-async rules
   - Verifies that CPU duration is recorded and greater than 0
   - Ensures the histogram metric is emitted for regular synchronous scanning

2. **should_submit_cpu_duration_metric_with_async_rule**: Tests async rule with I/O
   - Creates a custom async rule that sleeps for 100ms to simulate I/O
   - Verifies that CPU duration is less than 10ms
   - Confirms that I/O wait time is correctly excluded from CPU duration

Both tests validate that the cpu_duration histogram metric correctly tracks
CPU-only processing time by excluding I/O wait operations.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Base automatically changed from trosenblatt/report_sds_duration to main December 11, 2025 08:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants