Skip to content

Commit 492a007

Browse files
committed
advanced version history, metrics, and monitoring
1 parent 906a515 commit 492a007

26 files changed

+913
-22
lines changed

.coverage

0 Bytes
Binary file not shown.

CHANGELOG.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
## [Unreleased]
2+
### Added
3+
- Test suite now fully mocks OpenAI embedding and Qdrant vector DB for all integration tests. No real API keys or running Qdrant instance are required for tests. This is achieved by:
4+
- Patching `get_openai_embedding` to return a fixed vector in all test files.
5+
- Patching `QdrantClient.upsert` and `QdrantClient.search` in tests that require Qdrant operations, returning controlled results.
6+
- Patching `to_uuid` to return a string for test compatibility with Qdrant's expected ID type.
7+
- This approach ensures tests are fast, reliable, and isolated from external dependencies.
8+
- Version history is now tracked for each record in Qdrant. Every upsert appends the current model/embedding version and timestamp to a `version_history` list in metadata.
9+
- New API endpoint `/records/{id}/version-history` returns the full version history for a given record ID.
10+
- Minimal web UI at `/ui/version_history.html` to view current model versions and version history for any record.
11+
- Prometheus metrics integration: `/metrics` endpoint exposes API metrics for Prometheus scraping.
12+
- Sentry error monitoring: If `SENTRY_DSN` is set, errors and traces are sent to Sentry.
13+
- Docs updated to describe metrics and monitoring setup.
14+
15+
## [1.2.2] - 2025-07-14
16+
### Added
17+
- New `docs/TESTING.md` with a comprehensive guide to running, extending, and mocking tests (OpenAI, Qdrant) for fast, reliable integration tests.
18+
- All documentation files now cross-link to each other, including the new Testing Guide, ensuring no missing or broken links.
19+
- README, USAGE, ARCHITECTURE, CONTRIBUTING, SECURITY, CI_CACHING, and ENVIRONMENT_VARIABLES docs updated to reference the Testing Guide and clarify the mocking/testing approach.
20+
21+
### Changed
22+
- Documentation reviewed and improved for clarity, completeness, and cross-referencing.
23+
24+
## [1.2.3] - 2025-07-14
25+
### Added
26+
- Version history tracking for model/embedding versions per record in Qdrant.
27+
- `/records/{id}/version-history` API endpoint.
28+
- Simple web UI for version history display.
29+
30+
## [1.2.4] - 2025-07-14
31+
### Added
32+
- Prometheus metrics and Sentry error monitoring integration.
33+
34+
## [1.3.0] - 2025-07-14
35+
### Added
36+
- **Advanced Version History UI:**
37+
- Web UI at `/ui/version_history.html` now lists records with search/filter, shows metadata, and allows clicking to view version history.
38+
- **/records API Endpoint:**
39+
- List records with filtering and pagination. Supports filtering by type and note content.
40+
- **Prometheus Metrics Integration:**
41+
- `/metrics` endpoint exposes API metrics (request count, latency, errors) for Prometheus scraping.
42+
- **Sentry Error Monitoring:**
43+
- If `SENTRY_DSN` is set, errors and traces are sent to Sentry for monitoring.
44+
- **Testing & Mocking:**
45+
- All new endpoints and integrations are covered by tests.
46+
- Mocking patterns for Qdrant, OpenAI, Prometheus, and Sentry are documented in `docs/TESTING.md`.
47+
- **Documentation:**
48+
- All new features and integrations are fully documented in the README, ARCHITECTURE.md, and TESTING.md.
49+
50+
### Changed
51+
- Improved UI/UX for version history and record management.
52+
- Enhanced documentation and cross-linking for all features.
53+
54+
### Fixed
55+
- Ensured all endpoints and integrations are robustly tested and isolated from external dependencies in CI.

README.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,7 @@ See the [full Deployment Instructions](./docs/DEPLOYMENT.md) for detailed setup
151151
```bash
152152
make test
153153
```
154+
- See [Testing Guide](./docs/TESTING.md) for our approach to mocking OpenAI and Qdrant in integration tests.
154155

155156
## 🧹 Formatting
156157
```bash
@@ -185,6 +186,7 @@ make lint
185186
- [**Architecture Overview**](./docs/ARCHITECTURE.md) — System design and architecture
186187
- [**Usage Examples**](./docs/USAGE.md) — Example API requests and usage patterns
187188
- [**Contributing Guidelines**](./docs/CONTRIBUTING.md) — How to contribute to this project
189+
- [**Testing Guide**](./docs/TESTING.md) — How to run and extend tests, and our mocking approach
188190

189191
## 📋 Resources
190192

@@ -194,6 +196,7 @@ make lint
194196
- [**Usage Examples**](./docs/USAGE.md) - API usage patterns and examples
195197
- [**CI Caching Strategy**](./docs/CI_CACHING.md) - Performance optimization guide
196198
- [**Environment Variables**](./docs/ENVIRONMENT_VARIABLES.md) - Configuration management
199+
- [**Testing Guide**](./docs/TESTING.md) - Test running and mocking best practices
197200

198201
### 🛠️ Development
199202
- [**Contributing Guidelines**](./docs/CONTRIBUTING.md) - How to contribute to the project
@@ -218,6 +221,11 @@ make lint
218221
- **Qdrant Dashboard**: [http://localhost:6333/dashboard](http://localhost:6333/dashboard) - Vector database management
219222
- **API Documentation**: [http://localhost:8000/docs](http://localhost:8000/docs) - Interactive API docs
220223

224+
## 📊 Metrics & Monitoring
225+
- **Prometheus metrics** available at `/metrics` for all API endpoints (request count, latency, errors).
226+
- **Sentry error monitoring** enabled if `SENTRY_DSN` is set in the environment.
227+
- See [Architecture Overview](./docs/ARCHITECTURE.md#metrics--monitoring) for details.
228+
221229
## 🛡️ License
222230
[**AGPLv3**](./docs/LICENSE) — Free for use with source-sharing required for derivatives.
223231

api/api_v1.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
from . import ws
2+
3+
api_router.include_router(ws.router)

api/ws.py

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
from fastapi import APIRouter, WebSocket, WebSocketDisconnect
2+
from pydantic import BaseModel
3+
from datetime import datetime
4+
import asyncio
5+
import logging
6+
import json
7+
import openai
8+
import os
9+
10+
logger = logging.getLogger(__name__)
11+
12+
router = APIRouter()
13+
14+
openai.api_key = os.getenv("OPENAI_API_KEY")
15+
16+
class PromptRequest(BaseModel):
17+
prompt: str
18+
model: str = "gpt-4o"
19+
temperature: float = 0.7
20+
max_tokens: int = 100
21+
22+
23+
@router.websocket("/ws/generate")
24+
async def websocket_generate(websocket: WebSocket):
25+
await websocket.accept()
26+
logger.info("WebSocket connection accepted")
27+
28+
stop_generation = False
29+
30+
async def heartbeat():
31+
"""Send periodic ping to client to keep connection alive."""
32+
while True:
33+
await asyncio.sleep(10)
34+
try:
35+
await websocket.send_text(json.dumps({"event": "ping", "timestamp": datetime.utcnow().isoformat()}))
36+
except Exception:
37+
logger.info("Heartbeat failed, connection might be closed")
38+
break
39+
40+
heartbeat_task = asyncio.create_task(heartbeat())
41+
42+
try:
43+
while True:
44+
data = await websocket.receive_text()
45+
logger.info(f"Received input: {data}")
46+
47+
try:
48+
message = json.loads(data)
49+
except json.JSONDecodeError:
50+
await websocket.send_text(json.dumps({"error": "Invalid JSON input."}))
51+
continue
52+
53+
if isinstance(message, dict) and message.get("command") == "stop":
54+
logger.info("Stop command received.")
55+
stop_generation = True
56+
continue
57+
58+
try:
59+
request_data = PromptRequest.parse_obj(message)
60+
except Exception as e:
61+
logger.error(f"Invalid prompt input: {e}")
62+
await websocket.send_text(json.dumps({
63+
"error": "Invalid input format. Expected JSON with prompt, model, temperature, max_tokens."
64+
}))
65+
continue
66+
67+
stop_generation = False
68+
69+
async for token_data in generate_openai_stream(request_data):
70+
if stop_generation:
71+
logger.info("Generation stopped by client.")
72+
break
73+
await websocket.send_text(json.dumps(token_data))
74+
75+
await websocket.send_text(json.dumps({"event": "end_of_stream"}))
76+
77+
except WebSocketDisconnect:
78+
logger.info("WebSocket connection disconnected")
79+
finally:
80+
heartbeat_task.cancel()
81+
82+
83+
async def generate_openai_stream(request: PromptRequest):
84+
try:
85+
response = await openai.ChatCompletion.acreate(
86+
model=request.model,
87+
messages=[{"role": "user", "content": request.prompt}],
88+
temperature=request.temperature,
89+
max_tokens=request.max_tokens,
90+
stream=True
91+
)
92+
93+
async for chunk in response:
94+
content = chunk["choices"][0].get("delta", {}).get("content")
95+
if content:
96+
yield {
97+
"token": content,
98+
"timestamp": datetime.utcnow().isoformat(),
99+
"model": request.model
100+
}
101+
except Exception as e:
102+
logger.error(f"OpenAI streaming error: {e}")
103+
yield {"error": str(e), "timestamp": datetime.utcnow().isoformat()}

app/api/websocket.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
from fastapi import APIRouter, WebSocket, WebSocketDisconnect, status
2+
from fastapi.responses import JSONResponse
3+
from app.auth import verify_token_str
4+
from app.utils.openai_client import get_openai_stream
5+
import asyncio
6+
import json
7+
8+
router = APIRouter()
9+
10+
async def websocket_auth(websocket: WebSocket):
11+
token = websocket.query_params.get("token")
12+
if not token or not verify_token_str(token):
13+
await websocket.close(code=status.WS_1008_POLICY_VIOLATION)
14+
raise WebSocketDisconnect(code=status.WS_1008_POLICY_VIOLATION)
15+
16+
@router.websocket("/ws/generate")
17+
async def ws_generate(websocket: WebSocket):
18+
await websocket.accept()
19+
try:
20+
await websocket_auth(websocket)
21+
# Receive initial payload from client (e.g., prompt)
22+
data = await websocket.receive_json()
23+
prompt = data.get("prompt")
24+
stream_json = data.get("json", False)
25+
if not prompt:
26+
await websocket.send_json({"error": "Missing prompt"})
27+
await websocket.close()
28+
return
29+
# Simulate OpenAI streaming (replace with real OpenAI stream=True logic)
30+
async for chunk in get_openai_stream(prompt):
31+
if stream_json:
32+
await websocket.send_json({"text": chunk, "meta": {"length": len(chunk)}})
33+
else:
34+
await websocket.send_text(chunk)
35+
# Heartbeat ping (optional)
36+
await asyncio.sleep(0.01)
37+
except WebSocketDisconnect:
38+
# Handle disconnects gracefully
39+
pass
40+
except Exception as e:
41+
await websocket.send_json({"error": str(e)})
42+
await websocket.close()

app/auth.py

Lines changed: 11 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -9,34 +9,37 @@
99
def verify_token(request: Request) -> None:
1010
"""
1111
Verify the Bearer token from the request headers.
12-
1312
Args:
1413
request: FastAPI request object
15-
1614
Raises:
1715
HTTPException: If token is invalid or missing
1816
"""
1917
authorization: str = request.headers.get("Authorization")
20-
2118
if not authorization:
2219
logger.warning("Missing Authorization header")
2320
raise HTTPException(
2421
status_code=status.HTTP_401_UNAUTHORIZED,
2522
detail="Missing Authorization header"
2623
)
27-
2824
scheme, token = get_authorization_scheme_param(authorization)
29-
3025
if not token or scheme.lower() != "bearer":
3126
logger.warning(f"Invalid authorization scheme: {scheme}")
3227
raise HTTPException(
3328
status_code=status.HTTP_401_UNAUTHORIZED,
3429
detail="Invalid authorization scheme"
3530
)
36-
37-
if token.strip() not in Config.API_TOKENS:
31+
if not verify_token_str(token):
3832
logger.warning(f"Unauthorized access attempt with token: {token[:8]}...")
3933
raise HTTPException(
4034
status_code=status.HTTP_401_UNAUTHORIZED,
4135
detail="Invalid token"
42-
)
36+
)
37+
38+
def verify_token_str(token: str) -> bool:
39+
"""
40+
Directly verify a token string (for WebSocket or internal use).
41+
Returns True if valid, False otherwise.
42+
"""
43+
if not token:
44+
return False
45+
return token.strip() in Config.API_TOKENS

app/config.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,12 @@ class Config:
3030
OPENAI_RETRY_MIN_WAIT = int(os.getenv("OPENAI_RETRY_MIN_WAIT", 2))
3131
OPENAI_RETRY_MAX_WAIT = int(os.getenv("OPENAI_RETRY_MAX_WAIT", 10))
3232

33+
# === Model Version Tracking ===
34+
MODEL_VERSIONS = {
35+
"llm": os.getenv("MODEL_VERSION_LLM", "gpt-4o"),
36+
"embedding": os.getenv("MODEL_VERSION_EMBEDDING", "text-embedding-3-small")
37+
}
38+
3339
@classmethod
3440
def summary(cls):
3541
return {

app/main.py

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,13 @@
22

33
from fastapi import FastAPI
44
from fastapi.middleware.cors import CORSMiddleware
5+
from fastapi.staticfiles import StaticFiles
6+
from prometheus_fastapi_instrumentator import Instrumentator
7+
import sentry_sdk
58

69
from app.config import Config
710
from app.router import router
11+
from app.api.websocket import router as ws_router
812
from app.utils.logger import logger
913

1014
app = FastAPI(
@@ -21,6 +25,23 @@
2125
)
2226

2327
app.include_router(router)
28+
app.include_router(ws_router)
29+
30+
# Serve static UI files at /ui
31+
import os
32+
static_dir = os.path.join(os.path.dirname(__file__), "static")
33+
if not os.path.exists(static_dir):
34+
os.makedirs(static_dir)
35+
app.mount("/ui", StaticFiles(directory=static_dir), name="ui")
36+
37+
# Prometheus metrics
38+
Instrumentator().instrument(app).expose(app, include_in_schema=False, should_gzip=True)
39+
40+
# Sentry error monitoring (optional, set SENTRY_DSN env var)
41+
import os
42+
SENTRY_DSN = os.getenv("SENTRY_DSN")
43+
if SENTRY_DSN:
44+
sentry_sdk.init(dsn=SENTRY_DSN, traces_sample_rate=1.0)
2445

2546
logger.info("🚀 LLM Output Processor ready")
2647
logger.info(f"Loaded Config: {Config.summary()}")

0 commit comments

Comments
 (0)