Skip to content

Comments

Tighten API latency budget to 300 ms#532

Merged
shayancoin merged 1 commit intomainfrom
codex/update-api-latency-thresholds-and-documentation
Oct 21, 2025
Merged

Tighten API latency budget to 300 ms#532
shayancoin merged 1 commit intomainfrom
codex/update-api-latency-thresholds-and-documentation

Conversation

@shayancoin
Copy link
Owner

@shayancoin shayancoin commented Oct 21, 2025

Summary

  • lower the API p95 latency SLO in the observability budget to 300 ms and match the Prometheus alert
  • update backend golden signals dashboards to display latency in milliseconds with a 300 ms target marker
  • document the new API latency objective in the release checklist so runbooks stay aligned

Testing

  • not run (configuration changes only)

https://chatgpt.com/codex/tasks/task_e_68f8088ad1a0833081edce6705515464

Summary by CodeRabbit

  • Documentation

    • Updated API latency SLO specification to 300ms at P95, establishing a hard threshold for release gating.
  • Chores

    • Aligned observability configuration across monitoring systems to enforce the new API latency performance standard consistently.

@vercel
Copy link

vercel bot commented Oct 21, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
paform Ready Ready Preview Comment Oct 21, 2025 11:47pm

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 21, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This PR consolidates P95 latency SLO enforcement across the observability stack by converting the metric from seconds to milliseconds, tightening alert thresholds from 750ms to 300ms, and updating documentation with the new hard SLO limit.

Changes

Cohort / File(s) Summary
Documentation & Configuration
docs/release-checklist.md, observability-budgets.yml
Updated release-checklist to document API latency SLO of ≤300ms at P95 as a blocking condition. Reduced observability-budgets.yml api_p95_latency threshold from 3000 to 300.
Grafana Dashboards
ops/grafana/provisioning/dashboards/backend-golden-signals.json, ops/grafana/provisioning/dashboards/backend_golden_signals.json
Converted P95 latency display unit from seconds to milliseconds. Added/updated thresholds: orange at 250ms, red at 300ms. Multiplied metric expressions by 1000 to align with millisecond unit.
Prometheus Alerts
ops/prometheus/alerts.yml
Updated ApiP95LatencyHigh alert threshold from 0.75 (750ms) to 0.3 (300ms) with corresponding description text update.

Sequence Diagram(s)

sequenceDiagram
    participant Release as Release Process
    participant Check as observability-budgets Gate
    participant Metric as P95 Latency Metric
    participant Dashboard as Grafana Dashboard
    participant Alert as Prometheus Alert

    Release->>Check: Evaluate SLO compliance
    Check->>Metric: Query metric (now in ms)
    Metric-->>Check: Return latency value
    Check->>Check: Compare vs 300ms threshold
    
    par Dashboard Update
        Dashboard->>Dashboard: Display metric in milliseconds
        Dashboard->>Dashboard: Highlight thresholds (250=orange, 300=red)
    and Alert Update
        Alert->>Alert: Trigger if latency > 300ms
    end
    
    alt SLO Met
        Check-->>Release: Proceed with release
    else SLO Violated
        Check-->>Release: Block release
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Multiple configuration files with consistent unit conversions (seconds→milliseconds) and threshold updates require validation across dashboards and alerts. While the pattern is repetitive, verification of numerical accuracy and dashboard JSON structure is needed.

Possibly related PRs

Poem

🐰 A latency of three-hundred calls,
No longer measured in seconds' long falls,
Milliseconds now mark our SLO line,
Thresholds in orange and red shine so fine,
The observability stack, now aligned!

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/update-api-latency-thresholds-and-documentation

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4a25a30 and 92479a5.

📒 Files selected for processing (5)
  • docs/release-checklist.md (1 hunks)
  • observability-budgets.yml (1 hunks)
  • ops/grafana/provisioning/dashboards/backend-golden-signals.json (2 hunks)
  • ops/grafana/provisioning/dashboards/backend_golden_signals.json (2 hunks)
  • ops/prometheus/alerts.yml (1 hunks)

Comment @coderabbitai help to get the list of available commands and usage tips.

@shayancoin shayancoin merged commit 873a4db into main Oct 21, 2025
6 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant