Skip to content

- dashboard fixes (more data exported/shown)#5

Merged
HardMax71 merged 1 commit intomainfrom
dev
Oct 19, 2025
Merged

- dashboard fixes (more data exported/shown)#5
HardMax71 merged 1 commit intomainfrom
dev

Conversation

@HardMax71
Copy link
Owner

@HardMax71 HardMax71 commented Oct 19, 2025

  • moved rag-specific metrics calculation from service to views

Summary by CodeRabbit

  • Bug Fixes

    • Resolved division-by-zero errors in Grafana dashboard metric calculations to prevent invalid displays
  • Monitoring

    • Enhanced Qdrant dashboard with new metric panels and improved visualization layouts
    • Expanded Redis dashboard to display per-database key counts and aggregated totals
    • Improved metrics instrumentation for RAG operations

- moved rag-specific metrics calculation from service to views
@coderabbitai
Copy link

coderabbitai bot commented Oct 19, 2025

Walkthrough

This PR refactors RAG operation metrics collection by moving timing instrumentation from the service layer to the view layer, adds defensive zero-fallback guards to Grafana dashboard expressions to prevent NaN results, and reworks dashboard layouts and targets to enhance monitoring granularity and visualization.

Changes

Cohort / File(s) Summary
RAG metrics refactoring
backend/rag/services/rag_service.py, backend/rag/views.py
Removed timing instrumentation and metric increments from rag_service methods; migrated feature-scoped metrics tracking (RAG_GENERATION_COUNT, RAG_GENERATION_DURATION) with timing logic to corresponding views, adding exception-aware status labels.
Grafana dashboard expression safety
monitoring/grafana/dashboards/rag.json
Wrapped division expressions with parentheses and appended or vector(0) fallback to prevent NaN results in Cache Hit Rate, Error Rate, Lifetime Cache Hit Rate, and related rate/percentile calculations.
Grafana dashboard content rework
monitoring/grafana/dashboards/qdrant.json
Expanded single-panel layout to multi-panel dashboard with new stat and timeseries visualizations; restructured panels to display granular multi-metric data (Qdrant Info, Cluster Mode, memory breakdown, REST API metrics); adjusted grid positioning and changed visualization modes from single-series to table-based legends.
Grafana dashboard panel update
monitoring/grafana/dashboards/redis.json
Updated "Keys per Database" panel target from single metric to dual targets: per-database keys plus total aggregate, changing legend format from DB{{db}} to {{db}} and adding Total series.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes span heterogeneous concerns: RAG service-to-view metric migration (straightforward but requires tracing control flow), repetitive defensive expression patterns in dashboard configs, and substantial dashboard layout restructuring. The Qdrant dashboard overhaul introduces complexity in visualization structure, while RAG and Redis changes follow consistent patterns.

Poem

🐰 Metrics hop from service to view with grace,
Dashboards guard against division's empty space—
Zero fallbacks catch the NaN before it falls,
Qdrant's panels dance on monitoring walls,
Redis counts shine bright with totals in tow! 📊

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The PR title "dashboard fixes (more data exported/shown)" is partially related to the changeset. It accurately describes one aspect of the changes—the updates to three Grafana dashboard JSON files (qdrant.json, rag.json, redis.json)—which do represent fixes and expanded metrics visualization. However, the title does not capture the main architectural change described in the PR objectives: "Moved RAG-specific metrics calculation from service to views," which involves the substantive backend refactoring in rag_service.py and views.py. The title focuses on a real but secondary aspect of the changeset rather than the primary driver of these changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9b5835 and e23925b.

📒 Files selected for processing (5)
  • backend/rag/services/rag_service.py (3 hunks)
  • backend/rag/views.py (4 hunks)
  • monitoring/grafana/dashboards/qdrant.json (2 hunks)
  • monitoring/grafana/dashboards/rag.json (4 hunks)
  • monitoring/grafana/dashboards/redis.json (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
backend/rag/views.py (2)
backend/rag/serializers.py (3)
  • ExplainMatchRequestSerializer (4-6)
  • CompareCandidatesRequestSerializer (9-14)
  • InterviewQuestionsRequestSerializer (17-23)
backend/rag/services/rag_service.py (3)
  • explain_match (33-62)
  • compare_candidates (64-106)
  • generate_interview_questions (108-158)
🔇 Additional comments (19)
monitoring/grafana/dashboards/redis.json (1)

85-85: LGTM! Enhanced visibility with total aggregate.

The addition of a total keys metric alongside per-database breakdowns improves dashboard visibility. The legend format simplification from "DB{{db}}" to "{{db}}" is also cleaner.

backend/rag/views.py (4)

9-9: LGTM! Metrics imported for view-layer instrumentation.

The import aligns with the PR objective of moving RAG-specific metrics calculation from service to views.


32-47: LGTM! Consistent timing and counting instrumentation.

The feature-scoped timing and counting pattern is well-structured:

  • Context manager for automatic timing
  • Success/error status tracking
  • Exception re-raise preserves upstream error handling

63-79: LGTM! Instrumentation pattern consistently applied.

The same timing and counting pattern is correctly applied to the compare candidates endpoint.


94-111: LGTM! Complete view-layer instrumentation.

All three RAG endpoints now have consistent feature-scoped metrics collection at the view layer.

monitoring/grafana/dashboards/rag.json (4)

65-65: LGTM! Defensive guard against division-by-zero.

The or vector(0) fallback ensures the cache hit rate displays as 0 instead of NaN when there are no cache operations yet.


93-93: LGTM! Consistent defensive fallback.

The error rate calculation is properly guarded against division-by-zero.


133-133: LGTM! Per-feature cache metrics protected.

The per-feature cache hit rate calculation includes the same defensive fallback.


327-327: LGTM! Lifetime metrics safeguarded.

The lifetime cache hit rate stat panel is protected against division-by-zero.

backend/rag/services/rag_service.py (3)

60-60: LGTM! Simplified logging after moving timing to views.

The log line no longer includes duration since timing instrumentation has been moved to the view layer, maintaining a cleaner separation of concerns.


104-104: LGTM! Consistent simplification across methods.

Candidate comparison logging is simplified in line with the architectural change.


156-156: LGTM! Complete separation of view and service metrics.

All three RAG methods now have simplified service-layer logging, with timing instrumentation delegated to the view layer.

monitoring/grafana/dashboards/qdrant.json (7)

7-7: LGTM! Shared crosshair enabled.

Setting graphTooltip to 1 enables a shared crosshair across panels, improving dashboard usability when correlating metrics across time.


11-88: LGTM! Enhanced monitoring with new panels.

The addition of Qdrant Info, Cluster Mode, Total Corrupted Points, Total Vectors, and Recovery Mode Status panels provides comprehensive visibility into the vector database's health and configuration.


89-107: LGTM! Detailed memory breakdown.

Expanding from a single memory metric to a breakdown of Active, Allocated, Resident, Metadata, and Retained memory provides better visibility into memory usage patterns.


108-120: LGTM! Enhanced REST API observability.

The updated REST API Request Rate panel with detailed legend formatting ({{method}} {{endpoint}} [{{status}}]) improves visibility into API usage patterns.


121-137: LGTM! Latency percentiles added.

The new REST Response Latency panel with p50/p95/p99 percentiles provides essential performance insights.


138-150: LGTM! Failure rate tracking with defensive guard.

The REST API Failure Rate panel includes the or vector(0) fallback to handle cases with no failures gracefully.


151-180: LGTM! Data corruption monitoring with alerting.

The Data Corruption Rate panel includes both defensive fallback and alert configuration to notify when corruption is detected. The alert triggers when the average rate exceeds 0 over a 5-minute window, which is appropriate for data integrity monitoring.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@HardMax71 HardMax71 merged commit 00b965a into main Oct 19, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant