raold
diff --git a/‎.coverage‎
0 Bytes b/‎.coverage‎
0 Bytes
diff --git a/‎CHANGELOG.md‎
Lines changed: 55 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 55 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 8 additions & 0 deletions b/‎README.md‎
Lines changed: 8 additions & 0 deletions
diff --git a/‎api/api_v1.py‎
Lines changed: 3 additions & 0 deletions b/‎api/api_v1.py‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎api/ws.py‎
Lines changed: 103 additions & 0 deletions b/‎api/ws.py‎
Lines changed: 103 additions & 0 deletions
diff --git a/‎app/api/websocket.py‎
Lines changed: 42 additions & 0 deletions b/‎app/api/websocket.py‎
Lines changed: 42 additions & 0 deletions
diff --git a/‎app/auth.py‎
Lines changed: 11 additions & 8 deletions b/‎app/auth.py‎
Lines changed: 11 additions & 8 deletions
diff --git a/‎app/config.py‎
Lines changed: 6 additions & 0 deletions b/‎app/config.py‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎app/main.py‎
Lines changed: 21 additions & 0 deletions b/‎app/main.py‎
Lines changed: 21 additions & 0 deletions
@@ -0,0 +1,55 @@
+## [Unreleased]
+### Added
+- Test suite now fully mocks OpenAI embedding and Qdrant vector DB for all integration tests. No real API keys or running Qdrant instance are required for tests. This is achieved by:
+  - Patching `get_openai_embedding` to return a fixed vector in all test files.
+  - Patching `QdrantClient.upsert` and `QdrantClient.search` in tests that require Qdrant operations, returning controlled results.
+  - Patching `to_uuid` to return a string for test compatibility with Qdrant's expected ID type.
+- This approach ensures tests are fast, reliable, and isolated from external dependencies.
+- Version history is now tracked for each record in Qdrant. Every upsert appends the current model/embedding version and timestamp to a `version_history` list in metadata.
+- New API endpoint `/records/{id}/version-history` returns the full version history for a given record ID.
+- Minimal web UI at `/ui/version_history.html` to view current model versions and version history for any record.
+- Prometheus metrics integration: `/metrics` endpoint exposes API metrics for Prometheus scraping.
+- Sentry error monitoring: If `SENTRY_DSN` is set, errors and traces are sent to Sentry.
+- Docs updated to describe metrics and monitoring setup.
+
+## [1.2.2] - 2025-07-14
+### Added
+- New `docs/TESTING.md` with a comprehensive guide to running, extending, and mocking tests (OpenAI, Qdrant) for fast, reliable integration tests.
+- All documentation files now cross-link to each other, including the new Testing Guide, ensuring no missing or broken links.
+- README, USAGE, ARCHITECTURE, CONTRIBUTING, SECURITY, CI_CACHING, and ENVIRONMENT_VARIABLES docs updated to reference the Testing Guide and clarify the mocking/testing approach.
+
+### Changed
+- Documentation reviewed and improved for clarity, completeness, and cross-referencing. 
+
+## [1.2.3] - 2025-07-14
+### Added
+- Version history tracking for model/embedding versions per record in Qdrant.
+- `/records/{id}/version-history` API endpoint.
+- Simple web UI for version history display. 
+
+## [1.2.4] - 2025-07-14
+### Added
+- Prometheus metrics and Sentry error monitoring integration. 
+
+## [1.3.0] - 2025-07-14
+### Added
+- **Advanced Version History UI:**
+  - Web UI at `/ui/version_history.html` now lists records with search/filter, shows metadata, and allows clicking to view version history.
+- **/records API Endpoint:**
+  - List records with filtering and pagination. Supports filtering by type and note content.
+- **Prometheus Metrics Integration:**
+  - `/metrics` endpoint exposes API metrics (request count, latency, errors) for Prometheus scraping.
+- **Sentry Error Monitoring:**
+  - If `SENTRY_DSN` is set, errors and traces are sent to Sentry for monitoring.
+- **Testing & Mocking:**
+  - All new endpoints and integrations are covered by tests.
+  - Mocking patterns for Qdrant, OpenAI, Prometheus, and Sentry are documented in `docs/TESTING.md`.
+- **Documentation:**
+  - All new features and integrations are fully documented in the README, ARCHITECTURE.md, and TESTING.md.
+
+### Changed
+- Improved UI/UX for version history and record management.
+- Enhanced documentation and cross-linking for all features.
+
+### Fixed
+- Ensured all endpoints and integrations are robustly tested and isolated from external dependencies in CI. 
@@ -151,6 +151,7 @@ See the [full Deployment Instructions](./docs/DEPLOYMENT.md) for detailed setup
 ```bash
 make test
 ```
+- See [Testing Guide](./docs/TESTING.md) for our approach to mocking OpenAI and Qdrant in integration tests.
 
 ## 🧹 Formatting
 ```bash
@@ -185,6 +186,7 @@ make lint
 - [**Architecture Overview**](./docs/ARCHITECTURE.md) — System design and architecture
 - [**Usage Examples**](./docs/USAGE.md) — Example API requests and usage patterns
 - [**Contributing Guidelines**](./docs/CONTRIBUTING.md) — How to contribute to this project
+- [**Testing Guide**](./docs/TESTING.md) — How to run and extend tests, and our mocking approach
 
 ## 📋 Resources
 
@@ -194,6 +196,7 @@ make lint
 - [**Usage Examples**](./docs/USAGE.md) - API usage patterns and examples
 - [**CI Caching Strategy**](./docs/CI_CACHING.md) - Performance optimization guide
 - [**Environment Variables**](./docs/ENVIRONMENT_VARIABLES.md) - Configuration management
+- [**Testing Guide**](./docs/TESTING.md) - Test running and mocking best practices
 
 ### 🛠️ Development
 - [**Contributing Guidelines**](./docs/CONTRIBUTING.md) - How to contribute to the project
@@ -218,6 +221,11 @@ make lint
 - **Qdrant Dashboard**: [http://localhost:6333/dashboard](http://localhost:6333/dashboard) - Vector database management
 - **API Documentation**: [http://localhost:8000/docs](http://localhost:8000/docs) - Interactive API docs
 
+## 📊 Metrics & Monitoring
+- **Prometheus metrics** available at `/metrics` for all API endpoints (request count, latency, errors).
+- **Sentry error monitoring** enabled if `SENTRY_DSN` is set in the environment.
+- See [Architecture Overview](./docs/ARCHITECTURE.md#metrics--monitoring) for details.
+
 ## 🛡️ License
 [**AGPLv3**](./docs/LICENSE) — Free for use with source-sharing required for derivatives.
 
 
@@ -0,0 +1,3 @@
+from . import ws
+
+api_router.include_router(ws.router)
@@ -0,0 +1,103 @@
+from fastapi import APIRouter, WebSocket, WebSocketDisconnect
+from pydantic import BaseModel
+from datetime import datetime
+import asyncio
+import logging
+import json
+import openai
+import os
+
+logger = logging.getLogger(__name__)
+
+router = APIRouter()
+
+openai.api_key = os.getenv("OPENAI_API_KEY")
+
+class PromptRequest(BaseModel):
+    prompt: str
+    model: str = "gpt-4o"
+    temperature: float = 0.7
+    max_tokens: int = 100
+
+
+@router.websocket("/ws/generate")
+async def websocket_generate(websocket: WebSocket):
+    await websocket.accept()
+    logger.info("WebSocket connection accepted")
+
+    stop_generation = False
+
+    async def heartbeat():
+        """Send periodic ping to client to keep connection alive."""
+        while True:
+            await asyncio.sleep(10)
+            try:
+                await websocket.send_text(json.dumps({"event": "ping", "timestamp": datetime.utcnow().isoformat()}))
+            except Exception:
+                logger.info("Heartbeat failed, connection might be closed")
+                break
+
+    heartbeat_task = asyncio.create_task(heartbeat())
+
+    try:
+        while True:
+            data = await websocket.receive_text()
+            logger.info(f"Received input: {data}")
+
+            try:
+                message = json.loads(data)
+            except json.JSONDecodeError:
+                await websocket.send_text(json.dumps({"error": "Invalid JSON input."}))
+                continue
+
+            if isinstance(message, dict) and message.get("command") == "stop":
+                logger.info("Stop command received.")
+                stop_generation = True
+                continue
+
+            try:
+                request_data = PromptRequest.parse_obj(message)
+            except Exception as e:
+                logger.error(f"Invalid prompt input: {e}")
+                await websocket.send_text(json.dumps({
+                    "error": "Invalid input format. Expected JSON with prompt, model, temperature, max_tokens."
+                }))
+                continue
+
+            stop_generation = False
+
+            async for token_data in generate_openai_stream(request_data):
+                if stop_generation:
+                    logger.info("Generation stopped by client.")
+                    break
+                await websocket.send_text(json.dumps(token_data))
+
+            await websocket.send_text(json.dumps({"event": "end_of_stream"}))
+
+    except WebSocketDisconnect:
+        logger.info("WebSocket connection disconnected")
+    finally:
+        heartbeat_task.cancel()
+
+
+async def generate_openai_stream(request: PromptRequest):
+    try:
+        response = await openai.ChatCompletion.acreate(
+            model=request.model,
+            messages=[{"role": "user", "content": request.prompt}],
+            temperature=request.temperature,
+            max_tokens=request.max_tokens,
+            stream=True
+        )
+
+        async for chunk in response:
+            content = chunk["choices"][0].get("delta", {}).get("content")
+            if content:
+                yield {
+                    "token": content,
+                    "timestamp": datetime.utcnow().isoformat(),
+                    "model": request.model
+                }
+    except Exception as e:
+        logger.error(f"OpenAI streaming error: {e}")
+        yield {"error": str(e), "timestamp": datetime.utcnow().isoformat()}
@@ -0,0 +1,42 @@
+from fastapi import APIRouter, WebSocket, WebSocketDisconnect, status
+from fastapi.responses import JSONResponse
+from app.auth import verify_token_str
+from app.utils.openai_client import get_openai_stream
+import asyncio
+import json
+
+router = APIRouter()
+
+async def websocket_auth(websocket: WebSocket):
+    token = websocket.query_params.get("token")
+    if not token or not verify_token_str(token):
+        await websocket.close(code=status.WS_1008_POLICY_VIOLATION)
+        raise WebSocketDisconnect(code=status.WS_1008_POLICY_VIOLATION)
+
+@router.websocket("/ws/generate")
+async def ws_generate(websocket: WebSocket):
+    await websocket.accept()
+    try:
+        await websocket_auth(websocket)
+        # Receive initial payload from client (e.g., prompt)
+        data = await websocket.receive_json()
+        prompt = data.get("prompt")
+        stream_json = data.get("json", False)
+        if not prompt:
+            await websocket.send_json({"error": "Missing prompt"})
+            await websocket.close()
+            return
+        # Simulate OpenAI streaming (replace with real OpenAI stream=True logic)
+        async for chunk in get_openai_stream(prompt):
+            if stream_json:
+                await websocket.send_json({"text": chunk, "meta": {"length": len(chunk)}})
+            else:
+                await websocket.send_text(chunk)
+            # Heartbeat ping (optional)
+            await asyncio.sleep(0.01)
+    except WebSocketDisconnect:
+        # Handle disconnects gracefully
+        pass
+    except Exception as e:
+        await websocket.send_json({"error": str(e)})
+        await websocket.close() 
@@ -9,34 +9,37 @@
 def verify_token(request: Request) -> None:
     """
     Verify the Bearer token from the request headers.
-    
     Args:
         request: FastAPI request object
-        
     Raises:
         HTTPException: If token is invalid or missing
     """
     authorization: str = request.headers.get("Authorization")
-    
     if not authorization:
         logger.warning("Missing Authorization header")
         raise HTTPException(
             status_code=status.HTTP_401_UNAUTHORIZED,
             detail="Missing Authorization header"
         )
-    
     scheme, token = get_authorization_scheme_param(authorization)
-    
     if not token or scheme.lower() != "bearer":
         logger.warning(f"Invalid authorization scheme: {scheme}")
         raise HTTPException(
             status_code=status.HTTP_401_UNAUTHORIZED,
             detail="Invalid authorization scheme"
         )
-    
-    if token.strip() not in Config.API_TOKENS:
+    if not verify_token_str(token):
         logger.warning(f"Unauthorized access attempt with token: {token[:8]}...")
         raise HTTPException(
             status_code=status.HTTP_401_UNAUTHORIZED,
             detail="Invalid token"
-        )
+        )
+
+def verify_token_str(token: str) -> bool:
+    """
+    Directly verify a token string (for WebSocket or internal use).
+    Returns True if valid, False otherwise.
+    """
+    if not token:
+        return False
+    return token.strip() in Config.API_TOKENS
@@ -30,6 +30,12 @@ class Config:
     OPENAI_RETRY_MIN_WAIT = int(os.getenv("OPENAI_RETRY_MIN_WAIT", 2))
     OPENAI_RETRY_MAX_WAIT = int(os.getenv("OPENAI_RETRY_MAX_WAIT", 10))
 
+    # === Model Version Tracking ===
+    MODEL_VERSIONS = {
+        "llm": os.getenv("MODEL_VERSION_LLM", "gpt-4o"),
+        "embedding": os.getenv("MODEL_VERSION_EMBEDDING", "text-embedding-3-small")
+    }
+
     @classmethod
     def summary(cls):
         return {
 
@@ -2,9 +2,13 @@
 
 from fastapi import FastAPI
 from fastapi.middleware.cors import CORSMiddleware
+from fastapi.staticfiles import StaticFiles
+from prometheus_fastapi_instrumentator import Instrumentator
+import sentry_sdk
 
 from app.config import Config
 from app.router import router
+from app.api.websocket import router as ws_router
 from app.utils.logger import logger
 
 app = FastAPI(
@@ -21,6 +25,23 @@
 )
 
 app.include_router(router)
+app.include_router(ws_router)
+
+# Serve static UI files at /ui
+import os
+static_dir = os.path.join(os.path.dirname(__file__), "static")
+if not os.path.exists(static_dir):
+    os.makedirs(static_dir)
+app.mount("/ui", StaticFiles(directory=static_dir), name="ui")
+
+# Prometheus metrics
+Instrumentator().instrument(app).expose(app, include_in_schema=False, should_gzip=True)
+
+# Sentry error monitoring (optional, set SENTRY_DSN env var)
+import os
+SENTRY_DSN = os.getenv("SENTRY_DSN")
+if SENTRY_DSN:
+    sentry_sdk.init(dsn=SENTRY_DSN, traces_sample_rate=1.0)
 
 logger.info("🚀 LLM Output Processor ready")
 logger.info(f"Loaded Config: {Config.summary()}")
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,3 @@`
	`1`	`+from . import ws`
	`2`	`+`
	`3`	`+api_router.include_router(ws.router)`