@@ -112,6 +112,41 @@ A full Prometheus & Grafana stack is included in the Docker Compose file.
112112
113113* ** GPU Metrics:** This project utilizes CPU for training and inference, so GPU-specific metrics are not applicable.
114114
115+ # ## LLM Evaluation
116+
117+ We monitor the RAG pipeline using a dedicated Grafana dashboard powered by Prometheus metrics.
118+
119+ * ** Token Usage:** Tracks ` llm_token_usage_total` (Input vs Output) to monitor usage volume.
120+ * ** Cost Estimation:** Tracks ` llm_cost_total` based on a calculated rate per 1k tokens.
121+ * ** RAG Latency:** A Histogram (` rag_request_latency_seconds` ) visualizing the response time distribution.
122+ * ** Safety Violations:** Tracks ` guardrail_events_total` to see how often PII or Injection attacks are attempted.
123+
124+ * To view this dashboard:*
125+ 1. Run ` docker-compose up`
126+ 2. Go to ` http://localhost:3000`
127+ 3. Import the JSON dashboard located in ` config/grafana_dashboard.json` (if provided) or build a panel using the metrics above.
128+
129+ # # LLM Monitoring
130+
131+ We employ a dual-stack monitoring approach to ensure the reliability of both the Generative (LLM) and Predictive (ML) components.
132+
133+ # ## 1. Real-time Metrics (Grafana + Prometheus)
134+ We track operational metrics for the RAG pipeline using a Grafana dashboard.
135+ * ** Token Usage & Cost:** Tracks ` llm_token_usage_total` to estimate API costs ($0 .50/1M input, $1 .50/1M output).
136+ * ** RAG Latency:** Monitors the P95 and P99 latency of the ` /ask` endpoint to ensure responsiveness.
137+ * ** Safety Violations:** Logs ` guardrail_events_total` to track attempted attacks (Injection/PII).
138+ < img src=" assets/D4 S1.png" alt=" http request total" width=" 500" >
139+ < img src=" assets/D4 S2.png" alt=" llm token usage total" width=" 500" >
140+ < img src=" assets/D4 S3.png" alt=" guardrail events total" width=" 500" >
141+ < img src=" assets/D4 S4.png" alt=" Grafana Dashboard" width=" 500" >
142+
143+ # ## 2. Data Drift Monitoring (Evidently)
144+ We monitor the integrity of our retrieval corpus and tabular data using ** Evidently AI** .
145+ * ** Retrieval Corpus Drift:** Detects semantic shifts in the product descriptions that could degrade RAG performance.
146+ * ** Feature Drift:** specific checks on key features like ` Original_Price` and ` Ratings` .
147+
148+ < img src=" assets/D4 S5.png" alt=" Evidently Drift Report" width=" 500" >
149+
115150# # Cloud Deployment
116151
117152This project is deployed and hosted on ** Amazon Web Services (AWS)** using three distinct services: ** EC2** , ** S3** , and ** CloudWatch** , fulfilling the D9 requirement.
@@ -192,7 +227,7 @@ Answer 'Y' if prompted.
192227** Q: Pre-commit hook fails?**
193228** A:** Run pre-commit run --all-files locally. This will show you the errors and automatically fix many of them. Commit the changes made by the hooks.
194229
195- # D2 RAG Pipeline — Daraz Insight Copilot
230+ # RAG Pipeline — Daraz Insight Copilot
196231
197232** Status** : Complete | ** Vector Store** : FAISS | ** Embedding** : all-MiniLM-L6-v2 | ** LLM** : Groq Llama-3.1-8B
198233
0 commit comments