Skip to content

Commit a651355

Browse files
committed
feat: raglogs compare
1 parent eb2548b commit a651355

File tree

13 files changed

+1733
-2526
lines changed

13 files changed

+1733
-2526
lines changed

DESIGN_DOC.md

Lines changed: 0 additions & 2114 deletions
This file was deleted.

README.md

Lines changed: 92 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,27 @@ raglogs timeline --since 2h
8888
25 events · api · 44 min span
8989
```
9090

91-
```sh
91+
```bash
92+
raglogs compare --since 30m --baseline 24h
93+
```
94+
95+
```
96+
Incident comparison
97+
98+
Window A (now): 2026-03-16 15:17:42 UTC → 2026-03-16 15:47:42 UTC
99+
Window B (baseline): 2026-03-15 15:17:42 UTC → 2026-03-15 15:47:42 UTC
100+
101+
New error clusters
102+
+ Stripe signature verification failed for endpoint /webhooks/stripe 86 events
103+
+ POST /api/checkout 500 Internal Server Error — upstream billing error 20 events
104+
+ Webhook retries (24 distinct events, 24 total) 24 events
105+
+ Webhook queue growing 13 events
106+
107+
Triggers in A not seen in B
108+
+⚡ Deploy completed for billing-worker version v2.4.1 · deployment-controller
109+
```
110+
111+
```bash
92112
raglogs ask 'why did stripe fail?'
93113
```
94114

@@ -107,12 +127,11 @@ raglogs ask 'why did stripe fail?'
107127

108128
`explain` answers **what happened**.
109129
`timeline` shows **how it unfolded**.
130+
`compare` shows **what changed**.
110131

111-
Together they work like `git log` and `git blame` — but for incidents.
132+
Together they work like `git log`, `git blame`, `git diff` — but for incidents.
112133

113-
Both outputs are fully deterministic. No LLM required.
114-
115-
`ask` answers **questions you didn’t think to ask ahead of time**.
134+
All three outputs are fully deterministic. No LLM required.
116135

117136
---
118137

@@ -158,6 +177,8 @@ raglogs init
158177
raglogs ingest ./sample_data/sample_incident
159178
raglogs explain --since 1h
160179
raglogs timeline --since 2h
180+
raglogs compare --since 30m --baseline 24h
181+
raglogs ask 'why did stripe fail?'
161182
```
162183

163184
Or with Make:
@@ -361,6 +382,65 @@ No LLM required. The timeline is assembled entirely from cluster timestamps and
361382

362383
---
363384

385+
### `raglogs compare`
386+
387+
Diffs two time windows by their cluster sets. Shows exactly which error patterns appeared, disappeared, intensified, or resolved between a current window and a baseline.
388+
389+
```bash
390+
raglogs compare --since 30m --baseline 24h
391+
raglogs compare --since 1h --baseline 7d
392+
raglogs compare --since 2h --baseline 24h --service billing-worker
393+
raglogs compare \
394+
--window-a-from 2026-03-16T14:00:00Z --window-a-to 2026-03-16T14:30:00Z \
395+
--window-b-from 2026-03-15T14:00:00Z --window-b-to 2026-03-15T14:30:00Z
396+
raglogs compare --since 30m --baseline 24h --format json
397+
```
398+
399+
`--since 30m --baseline 24h` compares the last 30 minutes against the equivalent 30-minute window from 24 hours ago — the most useful form during an active incident.
400+
401+
| Flag | Description |
402+
|---|---|
403+
| `--since` | Incident window size, e.g. `30m`, `1h` |
404+
| `--baseline` | Offset to baseline window, e.g. `24h`, `7d` |
405+
| `--window-a-from/to` | Explicit start/end for window A (ISO 8601) |
406+
| `--window-b-from/to` | Explicit start/end for window B (ISO 8601) |
407+
| `--service` | Filter both windows to one service |
408+
| `--env` | Filter both windows to one environment |
409+
| `--format` | `text` or `json` |
410+
411+
**Output sections**
412+
413+
| Symbol | Meaning |
414+
|---|---|
415+
| `+` | New cluster — present in A, absent in B |
416+
| `-` | Disappeared — present in B, gone in A |
417+
| `` | Increased — in both, count grew by more than 50% |
418+
| `` | Decreased — in both, count shrank by more than 50% |
419+
| `+⚡` | New trigger — deploy or restart only seen in A |
420+
| `-⚡` | Dropped trigger — deploy or restart only seen in B |
421+
422+
**Output**
423+
424+
```
425+
Incident comparison
426+
427+
Window A (now): 2026-03-16 15:17:42 UTC → 2026-03-16 15:47:42 UTC
428+
Window B (baseline): 2026-03-15 15:17:42 UTC → 2026-03-15 15:47:42 UTC
429+
430+
New error clusters
431+
+ Stripe signature verification failed for endpoint /webhooks/stripe 86 events
432+
+ POST /api/checkout 500 Internal Server Error — upstream billing error 20 events
433+
+ Webhook retries (24 distinct events, 24 total) 24 events
434+
+ Webhook queue growing 13 events
435+
436+
Triggers in A not seen in B
437+
+⚡ Deploy completed for billing-worker version v2.4.1 · deployment-controller
438+
```
439+
440+
Individual webhook retry events (`evt_XXXXXX`) and queue-depth lines are deduplicated into single entries before diffing. No LLM required.
441+
442+
---
443+
364444
### `raglogs clusters`
365445

366446
Lists the top log clusters in a time window ranked by importance score. Useful for exploration and understanding dominant event families without running a full explain.
@@ -597,7 +677,7 @@ Evidence Assembly
597677
LLM (optional) or Deterministic Templates
598678
599679
600-
Incident Summary + Timeline
680+
Incident Summary · Timeline · Diff
601681
```
602682

603683
### Normalization
@@ -651,6 +731,10 @@ A trigger candidate is promoted to "likely trigger" when it precedes the primary
651731

652732
Secondary clusters are classified by message content: queue/backlog growth becomes `symptom`, 500 errors and latency spikes become `effect`. Repeated webhook retry events (individual `evt_XXXXXX` lines) are deduplicated into a single count. Effects that appear to have started before the primary error — due to data noise — are floored to the primary's first occurrence to preserve causal ordering.
653733

734+
### Window diffing
735+
736+
`raglogs compare` runs clustering independently on both windows, then diffs the resulting fingerprint sets. Before diffing, each cluster set is collapsed: all `evt_XXXXXX` retry clusters merge into a single entry, and all queue-depth lines merge into one. The collapsed maps are then diffed by fingerprint, with counts compared to determine direction (new, disappeared, increased, decreased). Trigger candidates are normalized by message prefix to handle version strings, so `v2.4.1` and `v2.3.9` both resolve as "deploy" without creating spurious diffs.
737+
654738
### Confidence scoring
655739

656740
Confidence is derived from measurable signals, not from LLM output:
@@ -748,6 +832,7 @@ raglogs/
748832
│ ├── config/ Pydantic settings
749833
│ ├── core/
750834
│ │ ├── clustering/ Fingerprint grouping, importance scoring, baseline
835+
│ │ ├── compare/ Window diffing — new, disappeared, increased, decreased
751836
│ │ ├── explain/ Evidence assembly, templates, confidence, summarizer
752837
│ │ ├── ingestion/ Ingestion orchestration and batch persistence
753838
│ │ ├── llm/ Provider abstraction (OpenAI, Ollama, noop)
@@ -776,9 +861,8 @@ New source adapters go in `raglogs/adapters/`. Each adapter yields `ParsedLogLin
776861
- Loki adapter
777862
- Kubernetes log export ingestion
778863
- Semantic cluster merging via pgvector
779-
- `raglogs compare` — diff two time windows
780864
- Markdown incident report export (`raglogs explain --format markdown > postmortem.md`)
781-
- `POST /query/timeline` API endpoint
865+
- `POST /query/timeline` and `POST /query/compare` API endpoints
782866
- Web UI
783867

784868
---

0 commit comments

Comments
 (0)