Skip to content

feat(cymbal): add typed frame resolution metrics#52057

Merged
hpouillot merged 6 commits intomasterfrom
feat/cymbal-frame-resolution-metrics
Mar 24, 2026
Merged

feat(cymbal): add typed frame resolution metrics#52057
hpouillot merged 6 commits intomasterfrom
feat/cymbal-frame-resolution-metrics

Conversation

@hpouillot
Copy link
Contributor

Problem

Cymbal lacked centralized metrics for frame resolution outcomes, making it hard to monitor resolution success rates and diagnose failure patterns (missing symbol sets, network errors, invalid data, etc.).

Changes

  • Store Option<FrameError> directly on Frame instead of separate Option<String> + Option<&'static str> fields — the typed error provides both the human-readable message (via Display) and a metric-friendly reason (via metric_reason())
  • Add metric_reason() methods to all resolution error enums (JsResolveErr, HermesError, ProguardError, AppleError, FrameError) returning categorized labels: no_reference, no_symbol_set, symbol_not_found, network_error, invalid_data
  • Emit cymbal_frame_resolved and cymbal_frame_not_resolved counters with lang and reason labels in the centralized RawFrame::resolve() method
  • Add warn! log on frame resolution failures with structured lang, reason, and error fields
  • Add Clone, PartialEq, Eq derives to FrameError and all child error enums (+ SymbolDataError) so they can live on the Frame struct
  • Custom serde for resolve_failure field: serializes FrameError as its Display string for backward-compatible JSON output

How did you test this code?

  • cargo check -p cymbal --tests passes
  • All non-DB unit tests pass (57 passed, DB-dependent tests skip without local Postgres)
  • Snapshot format unchanged (custom serializer preserves string representation)

Publish to changelog?

no

…ation

Store FrameError directly on Frame instead of a string, enabling typed
metric reasons (no_reference, no_symbol_set, symbol_not_found,
network_error, invalid_data) without fragile string matching. Adds
cymbal_frame_resolved and cymbal_frame_not_resolved counters with lang
and reason labels, plus a warn log on resolution failures.
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 24, 2026

Comments Outside Diff (1)

  1. rust/cymbal/src/frames/mod.rs, line 434-442 (link)

    P2 Lossy deserialization silently drops resolve_failure data

    The comment says "production never deserializes Frame", but the FrameError enum itself is documented to be serialized/deserialized for PG caching. If that assumption ever changes, or if a test round-trips a resolved Frame and checks resolve_failure, the silent None return could mask bugs.

    At minimum, the assumption should be verified at compile time (e.g. by removing Deserialize from Frame entirely) or the comment should be strengthened to explain why Frame will never be deserialized in production (e.g., only RawFrame is deserialized from Kafka and Frame is only written to ClickHouse/PG for querying, never read back as a Frame).

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: rust/cymbal/src/frames/mod.rs
    Line: 434-442
    
    Comment:
    **Lossy deserialization silently drops `resolve_failure` data**
    
    The comment says "production never deserializes `Frame`", but the `FrameError` enum itself is [documented to be serialized/deserialized for PG caching](rust/cymbal/src/error.rs#L64-L66). If that assumption ever changes, or if a test round-trips a resolved `Frame` and checks `resolve_failure`, the silent `None` return could mask bugs.
    
    At minimum, the assumption should be verified at compile time (e.g. by removing `Deserialize` from `Frame` entirely) or the comment should be strengthened to explain *why* `Frame` will never be deserialized in production (e.g., only `RawFrame` is deserialized from Kafka and `Frame` is only written to ClickHouse/PG for querying, never read back as a `Frame`).
    
    How can I resolve this? If you propose a fix, please make it concise.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: rust/cymbal/src/frames/mod.rs
Line: 116-118

Comment:
**`warn!` log level for expected failures will be very noisy**

`metric_reason()` is called twice on the same value — once for the log and once for the counter. Since it returns `&'static str` and is pure, this is a minor inefficiency. Worth binding to a local variable to make the OnceAndOnlyOnce principle explicit.

Additionally, the `warn!` level fires for _all_ resolution failures, including very common expected cases like `no_symbol_set` (user never uploaded symbols) and `no_reference` (frame has no source URL). In a high-traffic system this will produce a flood of `WARN` log entries that obscure genuinely unexpected failures (e.g. `network_error` or `invalid_data`). Consider logging at `info` for expected cases and reserving `warn` for genuinely unexpected categories:

```suggestion
                } else if let Some(err) = &frame.resolve_failure {
                    let reason = err.metric_reason();
                    match reason {
                        "network_error" | "invalid_data" => {
                            tracing::warn!(lang = lang_tag, reason = reason, error = %err, "frame resolution failed");
                        }
                        _ => {
                            tracing::debug!(lang = lang_tag, reason = reason, error = %err, "frame resolution failed");
                        }
                    }
                    metrics::counter!(FRAME_NOT_RESOLVED, "lang" => lang_tag, "reason" => reason)
                        .increment(1);
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust/cymbal/src/frames/mod.rs
Line: 424-431

Comment:
**`serialize` has an unreachable `None` branch**

Because the field is annotated with `skip_serializing_if = "Option::is_none"`, serde will only invoke `serialize_with` when the value is `Some(...)`. The `None => serializer.serialize_none()` arm is therefore dead code and can mislead future readers into thinking the function could be called with `None`.

```suggestion
    pub fn serialize<S>(value: &Option<FrameError>, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        // `skip_serializing_if = "Option::is_none"` guarantees value is Some here,
        // but we match exhaustively to satisfy the type checker.
        match value {
            Some(err) => serializer.serialize_str(&err.to_string()),
            None => unreachable!("skip_serializing_if = Option::is_none prevents None reaching here"),
        }
    }
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: rust/cymbal/src/frames/mod.rs
Line: 434-442

Comment:
**Lossy deserialization silently drops `resolve_failure` data**

The comment says "production never deserializes `Frame`", but the `FrameError` enum itself is [documented to be serialized/deserialized for PG caching](rust/cymbal/src/error.rs#L64-L66). If that assumption ever changes, or if a test round-trips a resolved `Frame` and checks `resolve_failure`, the silent `None` return could mask bugs.

At minimum, the assumption should be verified at compile time (e.g. by removing `Deserialize` from `Frame` entirely) or the comment should be strengthened to explain *why* `Frame` will never be deserialized in production (e.g., only `RawFrame` is deserialized from Kafka and `Frame` is only written to ClickHouse/PG for querying, never read back as a `Frame`).

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "fix(cymbal): fix rustfmt and remove unus..." | Re-trigger Greptile

@hpouillot hpouillot requested review from a team, ablaszkiewicz and cat-ph March 24, 2026 10:46
Use raw JSON extraction in handles_missing_sourcemap test to avoid
losing resolve_failure during typed deserialization round-trip. Also
fix rustfmt formatting in frame_error_serde.
Copy link
Contributor

@cat-ph cat-ph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a test is still failing but the approach looks great!

@hpouillot hpouillot merged commit 9124117 into master Mar 24, 2026
134 checks passed
@hpouillot hpouillot deleted the feat/cymbal-frame-resolution-metrics branch March 24, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants