add files

pmutua · pmutua · commit 029dc6a423c8 · 2025-06-24T00:39:11.000+03:00
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -0,0 +1,162 @@
+# ChaosPilot System Architecture
+
+## Overview
+ChaosPilot is a full-stack AI platform for log analysis, incident detection, and automated remediation. It leverages Google Agent Development Kit (ADK), Google Cloud services (BigQuery, Logging, AI Platform, Gemini), and a modular multi-agent architecture. The system is designed for security, extensibility, and modern DevOps.
+
+---
+
+## 1. High-Level Architecture
+
+- **Frontend:** Angular SPA (TypeScript, TailwindCSS, RxJS)
+- **Backend:** Python (FastAPI, async/await, Google ADK)
+- **Agents:** Main agent manager orchestrates multiple ADK-compliant sub-agents (detector, planner, fixer, notifier, action recommender)
+- **Data/AI:** Google BigQuery, Cloud Logging, Gemini LLM (Google AI Platform)
+- **Authentication:** Supabase (user/session management)
+- **DevOps:** Docker, `uv`, `hatch`, GCP deployment scripts
+
+---
+
+## 2. Google Agent Development Kit (ADK) Usage
+
+- **Core Orchestration:**
+  - All agent logic is built using ADK's async runtime and event-driven patterns.
+  - The main agent manager (`/agent_manager/agent.py`) coordinates sub-agents, each inheriting from ADK base classes.
+  - Sub-agents (in `/agent_manager/sub_agents/`) handle specialized tasks (detection, planning, fixing, notification, recommendations).
+- **Toolbox Integration:**
+  - `/mcp-toolbox/tools.yaml` defines tools and toolsets in ADK schema, enabling dynamic tool invocation and chaining.
+- **Schema Compliance:**
+  - All tool and agent definitions are kept in sync with ADK's open source schema, ensuring compatibility and reliability.
+- **Open Source Contribution:**
+  - Refactors and schema corrections to `tools.yaml` and agent code are suitable for upstream contribution to the ADK open source project.
+
+---
+
+## 3. Multi-Agent Orchestration
+
+- **Agent Manager:**
+  - Receives user requests and delegates to specialized sub-agents.
+- **Agent Handoffs:**
+  - Workflows are designed for agent handoff (e.g., detector → planner → fixer/notifier).
+- **Dynamic Toolsets:**
+  - Each agent can invoke tools from the ADK toolbox, with toolsets defined per agent type.
+- **Frontend Visualization:**
+  - The Angular frontend visualizes multi-agent workflows, showing handoffs, function calls, and responses in the chat UI.
+
+---
+
+## 4. Google Cloud & AI Services (including Gemini)
+
+- **BigQuery:**
+  - Stores and queries logs, incident data. Agents use BigQuery for analytics and context retrieval.
+- **Cloud Logging:**
+  - Ingests and manages raw logs. Scripts in `/scripts/` support log injection and management.
+- **Gemini LLM (Google AI Platform):**
+  - Backend calls Gemini for advanced log analysis, incident classification, and remediation planning.
+  - All LLM calls are backend-only. There is currently no retrieval-augmented generation (RAG) pipeline, embedding generation, or vector similarity search implemented in the codebase. If RAG is implemented in the future, it will follow strict security and privacy guidelines.
+- **ADK Toolbox:**
+  - All tools and toolsets are defined for use by agents, ensuring schema compliance and dynamic extensibility.
+
+---
+
+## 5. Security & Best Practices
+
+- All API keys and secrets are stored in environment variables.
+- No direct client access to LLM APIs.
+- All communication is over HTTPS.
+- Supabase authentication for all sensitive routes.
+- Input/output sanitization, rate limiting, and audit logging at every step.
+
+---
+
+## 6. DevOps & Deployment
+
+- **Local Development:** Use `uv` or `hatch` for environment management, run backend with `uvicorn`, frontend with Angular CLI.
+- **Production:** Build Docker image, deploy to cloud (GCP, Azure, etc.), use managed DBs and secure secrets.
+- **Scripts:** `/scripts` for GCP setup, IAM, log injection, etc.
+
+---
+
+## 7. Example End-to-End Flow
+
+1. User logs in via Supabase (Angular frontend).
+2. User triggers an action (e.g., "Analyze Error Logs").
+3. Frontend sends authenticated request to FastAPI backend.
+4. Backend authenticates and invokes the main ADK agent.
+5. Agent manager delegates to the appropriate sub-agent.
+6. Sub-agent queries BigQuery, retrieves relevant logs, and may send those logs or summaries to the LLM (Gemini or Azure) for analysis.
+7. Agent manager may hand off to other agents as needed.
+8. Backend streams response to frontend, which visualizes the multi-agent workflow.
+
+---
+
+## 8. Google Tech, Open Source, and Published Content
+
+- **Google Tech:**
+  - Deep integration with Google Cloud (BigQuery, Logging, AI Platform, Gemini).
+  - Full adoption of Google ADK for agent orchestration and tool management.
+- **Open Source:**
+  - Refactored and schema-corrected `tools.yaml` and agent code are suitable for contribution to the ADK open source project.
+- **Published Content:**
+  - The project journal (`xREADME.md`) and documentation provide a transparent record of technical decisions, suitable for publication as a case study or blog post.
+
+---
+
+## 9. Summary Table
+
+| Layer      | Tech/Service         | Key Files/Dirs                | Google/ADK Usage                |
+|------------|----------------------|-------------------------------|----------------------------------|
+| Frontend   | Angular, Tailwind    | `/frontend/src/app/`          | Visualizes multi-agent ADK flows |
+| Backend    | FastAPI, ADK, Python | `/main.py`, `/agent_manager/` | ADK async agents, tool orchestration |
+| Data/AI    | BigQuery, Gemini     | `/mcp-toolbox/tools.yaml`     | BigQuery queries, Gemini LLM, ADK toolbox |
+| Auth       | Supabase             | `/frontend`, `/main.py`       | -                                |
+| DevOps     | Docker, uv, hatch    | `/Dockerfile`, `/scripts/`    | GCP deployment scripts           |
+
+---
+
+## 10. Visual Diagram
+
+```
+graph TD
+  subgraph Frontend (Angular)
+    A1["User<br/>Browser"]
+    A2["Angular App<br/>(SPA)"]
+  end
+  subgraph Backend (Python/FastAPI)
+    B1["API Gateway<br/>(FastAPI/Uvicorn)"]
+    B2["Agent Manager"]
+    B3["Sub-Agents<br/>(Detector, Planner, Fixer, etc.)"]
+    B4["Session & Auth Service"]
+    B6["BigQuery/Logging Service"]
+  end
+  subgraph Cloud & Data
+    C1["Google BigQuery"]
+    C2["Google Cloud Logging"]
+    C3["Google AI Platform (Gemini)"]
+    C4["Supabase<br/>(Auth, DB)"]
+  end
+  subgraph DevOps
+    D1["Docker"]
+    D2["CI/CD"]
+  end
+
+  A1-->|HTTPS|A2
+  A2-->|REST/WebSocket|B1
+  B1-->|Auth|B4
+  B1-->|Agent Requests|B2
+  B2-->|Delegate|B3
+  B3-->|Data|B6
+  B6-->|Query|C1
+  B6-->|Logs|C2
+  B2-->|LLM Analysis|C3
+  B4-->|User/Session|C4
+  B1-->|Streamed Response|A2
+  D1-->|Containerize|B1
+  D2-->|Deploy|D1
+```
+
+---
+
+**Note:**
+- There is currently no RAG/embedding/vector similarity service implemented. If this is a future goal, it will be added in a later version and clearly documented as such.
+
+For more details, see the project journal (`xREADME.md`) and codebase documentation. 
diff --git a/HOW_IT_WORKS.md b/HOW_IT_WORKS.md
@@ -0,0 +1,104 @@
+# How ChaosPilot Works: Step-by-Step Script
+
+This document provides a clear, step-by-step walkthrough of how the ChaosPilot application operates, from user interaction in the frontend to agent orchestration and AI analysis in the backend. Use this as a guide for onboarding, demos, or understanding the system flow.
+
+---
+
+## 1. User Login & Authentication
+
+- The user navigates to the ChaosPilot web app (Angular frontend).
+- The app prompts the user to log in using Supabase authentication.
+- Upon successful login, the user is granted access to the dashboard, chat, history, and settings.
+
+---
+
+## 2. Initiating an AI Workflow (Example: Analyze Error Logs)
+
+- The user sees a set of quick action buttons (e.g., "Analyze Error Logs", "Classify Incident", "Generate Fix Plan").
+- The user clicks "Analyze Error Logs".
+- The frontend sends an authenticated request to the backend (FastAPI server) to start the analysis workflow.
+
+---
+
+## 3. Backend Agent Orchestration
+
+- The backend receives the request and verifies the user's authentication (via Supabase).
+- The main agent manager (using Google ADK) is invoked.
+- The agent manager delegates the task to the appropriate sub-agent (e.g., the detector agent for log analysis).
+
+---
+
+## 4. Data Query & AI Analysis
+
+- The sub-agent queries Google BigQuery for recent error logs.
+- The relevant logs or summaries are prepared for analysis.
+- The backend sends the prepared data to the selected LLM (Gemini or Azure OpenAI) for advanced analysis and insights.
+- The LLM returns its analysis (e.g., detected patterns, incident classification, recommendations).
+
+---
+
+## 5. Multi-Agent Workflow (if needed)
+
+- If the workflow requires further steps (e.g., generating a fix plan, recommending fixes), the agent manager hands off the task to other sub-agents (planner, fixer, etc.).
+- Each sub-agent may query data, invoke tools, or call the LLM as needed.
+- The results from each agent are collected and organized.
+
+---
+
+## 6. Streaming Results to the Frontend
+
+- The backend streams the results of the agent workflow back to the frontend.
+- The Angular app dynamically updates the chat UI, displaying:
+  - Markdown-formatted analysis and reports
+  - Structured data (tables, JSON)
+  - Agent handoffs and function calls
+  - Status updates and loading indicators
+
+---
+
+## 7. User Experience & Further Actions
+
+- The user reviews the AI-generated analysis and recommendations in the chat interface.
+- The user can trigger additional actions (e.g., request a fix plan, escalate an incident, review history).
+- All sensitive actions and data remain protected by authentication and backend-only processing.
+
+---
+
+## 8. Security & Best Practices
+
+- All LLM/API calls are made from the backend only; the client never interacts directly with AI services.
+- All communication is over HTTPS.
+- User sessions and permissions are managed by Supabase.
+- Logs and sensitive data are never exposed to the client or external services.
+
+---
+
+## 9. Summary Flow Diagram
+
+```
+User (Browser)
+   │
+   ▼
+Angular Frontend (UI, Auth, Chat)
+   │  (REST API call)
+   ▼
+FastAPI Backend (Python, ADK)
+   │
+   ├─► Agent Manager (Orchestrates sub-agents)
+   │      │
+   │      ├─► Detector Agent (queries BigQuery)
+   │      ├─► Planner Agent (generates plans)
+   │      └─► Fixer/Notifier Agents (as needed)
+   │
+   └─► LLM (Gemini/Azure) for analysis
+   │
+   ▼
+Backend streams results
+   │
+   ▼
+Angular Frontend (renders chat, tables, reports)
+```
+
+---
+
+For more details, see the architecture and project journal files. 
diff --git a/agent_manager/sub_agents/action_recommender/agent.py b/agent_manager/sub_agents/action_recommender/agent.py
@@ -7,12 +7,14 @@
 from typing import Dict, List, Any
 from enum import Enum
 from toolbox_core import ToolboxSyncClient
+from agent_manger.config import TOOLBOX_URL
+
 from dotenv import load_dotenv
 
 
 load_dotenv()
 
-toolbox = ToolboxSyncClient("http://127.0.0.1:5000")
+toolbox = ToolboxSyncClient(TOOLBOX_URL)
 tools = toolbox.load_toolset("action_recommender_toolset")
 
 
diff --git a/assets/deploy-agent-success-on-gcp.PNG b/assets/deploy-agent-success-on-gcp.PNG
diff --git a/makefile b/makefile
@@ -0,0 +1,14 @@
+# Load .env file
+include .env
+export
+
+deploy:
+	gcloud run deploy $(AGENT_SERVICE_NAME) \
+		--source . \
+		--region $(GOOGLE_CLOUD_LOCATION) \
+		--allow-unauthenticated \
+		--port=8000 \
+		--set-env-vars "GOOGLE_CLOUD_PROJECT=$(GOOGLE_CLOUD_PROJECT),GOOGLE_CLOUD_LOCATION=$(GOOGLE_CLOUD_LOCATION),GOOGLE_GENAI_USE_VERTEXAI=$(GOOGLE_GENAI_USE_VERTEXAI), MODEL=$(MODEL),TOOLBOX_URL=$(TOOLBOX_URL),GOOGLE_API_KEY=$(GOOGLE_API_KEY)"
+
+delete:
+	gcloud run services delete $(AGENT_SERVICE_NAME) --region $(GOOGLE_CLOUD_LOCATION)
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,8 @@
+google-adk==1.3.0
+google-cloud-logging==3.12.1
+google-cloud-bigquery==3.34.0
+google-cloud-aiplatform==1.95.1
+google-generativeai==0.4.1
+litellm==1.72.7
+toolbox-core==0.2.1
+python-dotenv==1.1.0
diff --git a/set b/set