Skip to content

Commit 22369e7

Browse files
committed
feat: LLM Evaluation & Monitoring
1 parent a3e437e commit 22369e7

File tree

1 file changed

+36
-1
lines changed

1 file changed

+36
-1
lines changed

README.md

Lines changed: 36 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,6 +112,41 @@ A full Prometheus & Grafana stack is included in the Docker Compose file.
112112

113113
* **GPU Metrics:** This project utilizes CPU for training and inference, so GPU-specific metrics are not applicable.
114114

115+
### LLM Evaluation
116+
117+
We monitor the RAG pipeline using a dedicated Grafana dashboard powered by Prometheus metrics.
118+
119+
* **Token Usage:** Tracks `llm_token_usage_total` (Input vs Output) to monitor usage volume.
120+
* **Cost Estimation:** Tracks `llm_cost_total` based on a calculated rate per 1k tokens.
121+
* **RAG Latency:** A Histogram (`rag_request_latency_seconds`) visualizing the response time distribution.
122+
* **Safety Violations:** Tracks `guardrail_events_total` to see how often PII or Injection attacks are attempted.
123+
124+
*To view this dashboard:*
125+
1. Run `docker-compose up`
126+
2. Go to `http://localhost:3000`
127+
3. Import the JSON dashboard located in `config/grafana_dashboard.json` (if provided) or build a panel using the metrics above.
128+
129+
## LLM Monitoring
130+
131+
We employ a dual-stack monitoring approach to ensure the reliability of both the Generative (LLM) and Predictive (ML) components.
132+
133+
### 1. Real-time Metrics (Grafana + Prometheus)
134+
We track operational metrics for the RAG pipeline using a Grafana dashboard.
135+
* **Token Usage & Cost:** Tracks `llm_token_usage_total` to estimate API costs ($0.50/1M input, $1.50/1M output).
136+
* **RAG Latency:** Monitors the P95 and P99 latency of the `/ask` endpoint to ensure responsiveness.
137+
* **Safety Violations:** Logs `guardrail_events_total` to track attempted attacks (Injection/PII).
138+
<img src="assets/D4 S1.png" alt="http request total" width="500">
139+
<img src="assets/D4 S2.png" alt="llm token usage total" width="500">
140+
<img src="assets/D4 S3.png" alt="guardrail events total" width="500">
141+
<img src="assets/D4 S4.png" alt="Grafana Dashboard" width="500">
142+
143+
### 2. Data Drift Monitoring (Evidently)
144+
We monitor the integrity of our retrieval corpus and tabular data using **Evidently AI**.
145+
* **Retrieval Corpus Drift:** Detects semantic shifts in the product descriptions that could degrade RAG performance.
146+
* **Feature Drift:** specific checks on key features like `Original_Price` and `Ratings`.
147+
148+
<img src="assets/D4 S5.png" alt="Evidently Drift Report" width="500">
149+
115150
## Cloud Deployment
116151

117152
This project is deployed and hosted on **Amazon Web Services (AWS)** using three distinct services: **EC2**, **S3**, and **CloudWatch**, fulfilling the D9 requirement.
@@ -192,7 +227,7 @@ Answer 'Y' if prompted.
192227
**Q: Pre-commit hook fails?**
193228
**A:** Run pre-commit run --all-files locally. This will show you the errors and automatically fix many of them. Commit the changes made by the hooks.
194229

195-
# D2 RAG Pipeline — Daraz Insight Copilot
230+
# RAG Pipeline — Daraz Insight Copilot
196231

197232
**Status**: Complete | **Vector Store**: FAISS | **Embedding**: all-MiniLM-L6-v2 | **LLM**: Groq Llama-3.1-8B
198233

0 commit comments

Comments
 (0)