- dashboard fixes (more data exported/shown) by HardMax71 · Pull Request #5 · HardMax71/ResuMariner

HardMax71 · 2025-10-19T20:18:23Z

moved rag-specific metrics calculation from service to views

Summary by CodeRabbit

Bug Fixes
- Resolved division-by-zero errors in Grafana dashboard metric calculations to prevent invalid displays
Monitoring
- Enhanced Qdrant dashboard with new metric panels and improved visualization layouts
- Expanded Redis dashboard to display per-database key counts and aggregated totals
- Improved metrics instrumentation for RAG operations

- moved rag-specific metrics calculation from service to views

coderabbitai · 2025-10-19T20:18:48Z

Walkthrough

This PR refactors RAG operation metrics collection by moving timing instrumentation from the service layer to the view layer, adds defensive zero-fallback guards to Grafana dashboard expressions to prevent NaN results, and reworks dashboard layouts and targets to enhance monitoring granularity and visualization.

Changes

Cohort / File(s)	Summary
RAG metrics refactoring `backend/rag/services/rag_service.py`, `backend/rag/views.py`	Removed timing instrumentation and metric increments from rag_service methods; migrated feature-scoped metrics tracking (RAG_GENERATION_COUNT, RAG_GENERATION_DURATION) with timing logic to corresponding views, adding exception-aware status labels.
Grafana dashboard expression safety `monitoring/grafana/dashboards/rag.json`	Wrapped division expressions with parentheses and appended `or vector(0)` fallback to prevent NaN results in Cache Hit Rate, Error Rate, Lifetime Cache Hit Rate, and related rate/percentile calculations.
Grafana dashboard content rework `monitoring/grafana/dashboards/qdrant.json`	Expanded single-panel layout to multi-panel dashboard with new stat and timeseries visualizations; restructured panels to display granular multi-metric data (Qdrant Info, Cluster Mode, memory breakdown, REST API metrics); adjusted grid positioning and changed visualization modes from single-series to table-based legends.
Grafana dashboard panel update `monitoring/grafana/dashboards/redis.json`	Updated "Keys per Database" panel target from single metric to dual targets: per-database keys plus total aggregate, changing legend format from `DB{{db}}` to `{{db}}` and adding `Total` series.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

The changes span heterogeneous concerns: RAG service-to-view metric migration (straightforward but requires tracing control flow), repetitive defensive expression patterns in dashboard configs, and substantial dashboard layout restructuring. The Qdrant dashboard overhaul introduces complexity in visualization structure, while RAG and Redis changes follow consistent patterns.

Poem

🐰 Metrics hop from service to view with grace,
Dashboards guard against division's empty space—
Zero fallbacks catch the NaN before it falls,
Qdrant's panels dance on monitoring walls,
Redis counts shine bright with totals in tow! 📊

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The PR title "dashboard fixes (more data exported/shown)" is partially related to the changeset. It accurately describes one aspect of the changes—the updates to three Grafana dashboard JSON files (qdrant.json, rag.json, redis.json)—which do represent fixes and expanded metrics visualization. However, the title does not capture the main architectural change described in the PR objectives: "Moved RAG-specific metrics calculation from service to views," which involves the substantive backend refactoring in rag_service.py and views.py. The title focuses on a real but secondary aspect of the changeset rather than the primary driver of these changes.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch dev

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c9b5835 and e23925b.

📒 Files selected for processing (5)

backend/rag/services/rag_service.py (3 hunks)
backend/rag/views.py (4 hunks)
monitoring/grafana/dashboards/qdrant.json (2 hunks)
monitoring/grafana/dashboards/rag.json (4 hunks)
monitoring/grafana/dashboards/redis.json (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

backend/rag/views.py (2)

backend/rag/serializers.py (3)

ExplainMatchRequestSerializer (4-6)

CompareCandidatesRequestSerializer (9-14)

InterviewQuestionsRequestSerializer (17-23)

backend/rag/services/rag_service.py (3)

explain_match (33-62)

compare_candidates (64-106)

generate_interview_questions (108-158)

🔇 Additional comments (19)

monitoring/grafana/dashboards/redis.json (1)

85-85: LGTM! Enhanced visibility with total aggregate.

The addition of a total keys metric alongside per-database breakdowns improves dashboard visibility. The legend format simplification from "DB{{db}}" to "{{db}}" is also cleaner.

backend/rag/views.py (4)

9-9: LGTM! Metrics imported for view-layer instrumentation.

The import aligns with the PR objective of moving RAG-specific metrics calculation from service to views.

32-47: LGTM! Consistent timing and counting instrumentation.

The feature-scoped timing and counting pattern is well-structured:

Context manager for automatic timing

Success/error status tracking

Exception re-raise preserves upstream error handling

63-79: LGTM! Instrumentation pattern consistently applied.

The same timing and counting pattern is correctly applied to the compare candidates endpoint.

94-111: LGTM! Complete view-layer instrumentation.

All three RAG endpoints now have consistent feature-scoped metrics collection at the view layer.

monitoring/grafana/dashboards/rag.json (4)

65-65: LGTM! Defensive guard against division-by-zero.

The or vector(0) fallback ensures the cache hit rate displays as 0 instead of NaN when there are no cache operations yet.

93-93: LGTM! Consistent defensive fallback.

The error rate calculation is properly guarded against division-by-zero.

133-133: LGTM! Per-feature cache metrics protected.

The per-feature cache hit rate calculation includes the same defensive fallback.

327-327: LGTM! Lifetime metrics safeguarded.

The lifetime cache hit rate stat panel is protected against division-by-zero.

backend/rag/services/rag_service.py (3)

60-60: LGTM! Simplified logging after moving timing to views.

The log line no longer includes duration since timing instrumentation has been moved to the view layer, maintaining a cleaner separation of concerns.

104-104: LGTM! Consistent simplification across methods.

Candidate comparison logging is simplified in line with the architectural change.

156-156: LGTM! Complete separation of view and service metrics.

All three RAG methods now have simplified service-layer logging, with timing instrumentation delegated to the view layer.

monitoring/grafana/dashboards/qdrant.json (7)

7-7: LGTM! Shared crosshair enabled.

Setting graphTooltip to 1 enables a shared crosshair across panels, improving dashboard usability when correlating metrics across time.

11-88: LGTM! Enhanced monitoring with new panels.

The addition of Qdrant Info, Cluster Mode, Total Corrupted Points, Total Vectors, and Recovery Mode Status panels provides comprehensive visibility into the vector database's health and configuration.

89-107: LGTM! Detailed memory breakdown.

Expanding from a single memory metric to a breakdown of Active, Allocated, Resident, Metadata, and Retained memory provides better visibility into memory usage patterns.

108-120: LGTM! Enhanced REST API observability.

The updated REST API Request Rate panel with detailed legend formatting ({{method}} {{endpoint}} [{{status}}]) improves visibility into API usage patterns.

121-137: LGTM! Latency percentiles added.

The new REST Response Latency panel with p50/p95/p99 percentiles provides essential performance insights.

138-150: LGTM! Failure rate tracking with defensive guard.

The REST API Failure Rate panel includes the or vector(0) fallback to handle cases with no failures gracefully.

151-180: LGTM! Data corruption monitoring with alerting.

The Data Corruption Rate panel includes both defensive fallback and alert configuration to notify when corruption is detected. The alert triggers when the average rate exceeds 0 over a 5-minute window, which is appropriate for data integrity monitoring.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

- dashboard fixes (more data exported/shown)

e23925b

- moved rag-specific metrics calculation from service to views

HardMax71 merged commit 00b965a into main Oct 19, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

- dashboard fixes (more data exported/shown)#5

- dashboard fixes (more data exported/shown)#5
HardMax71 merged 1 commit intomainfrom
dev

HardMax71 commented Oct 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

HardMax71 commented Oct 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

HardMax71 commented Oct 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 19, 2025 •

edited

Loading