|
| 1 | +# Project Story: ChaosPilot – AI-Powered Log Analysis |
| 2 | + |
| 3 | +## Inspiration |
| 4 | + |
| 5 | +As someone who has spent years working in software engineering, I've seen firsthand how overwhelming and time-consuming it can be to sift through endless logs during incidents. I've been on-call and faced the daunting task of finding the root cause of a production outage—often buried in thousands of lines of logs. The frustration of manual log analysis, the pressure to restore service quickly, and the risk of missing critical patterns inspired me to imagine a better way. |
| 6 | + |
| 7 | +The real spark came when I saw a post on Reddit: someone asked if there was an AI tool that could analyze logs and surface actionable insights. That question resonated deeply with my own experience and the pain points I've seen across teams. I realized that with the rise of LLMs and cloud-native architectures, it was finally possible to build a tool that could automate the chaos of log analysis and incident response. |
| 8 | + |
| 9 | +## About the Project |
| 10 | + |
| 11 | +ChaosPilot is my answer to the modern log analysis problem. It's a full-stack, AI-powered platform that: |
| 12 | +- Ingests and analyzes logs in real time, currently connected with BigQuery and it's possible to connect with other data sources via the [MCP Toolbox for databases](https://googleapis.github.io/genai-toolbox/getting-started/introduction/). |
| 13 | + > This solution was originally named “Gen AI Toolbox for Databases” as its initial development. |
| 14 | +- Detects patterns, anomalies, and root causes |
| 15 | +- Classifies incidents by severity and impact |
| 16 | +- Generates actionable response plans and fix recommendations |
| 17 | +- Provides a beautiful, interactive dashboard for teams |
| 18 | + |
| 19 | +I wanted to build something that not only saves engineers time, but also empowers them to respond faster and with more confidence during high-stress incidents. |
| 20 | + |
| 21 | +## 🛠️ How I Built It |
| 22 | + |
| 23 | +- **Frontend:** Angular, TypeScript, TailwindCSS for a modern, responsive UI |
| 24 | +- **Backend:** Python, FastAPI, Google ADK for orchestrating AI agents and workflows |
| 25 | +- **Authentication:** Supabase for secure user management |
| 26 | +- **Data/AI:** Google ADK agents (detector, planner, fixer, etc.), BigQuery analytics |
| 27 | +- **DevOps:** Docker, GCP, and modern Python dependency management (`uv`, `hatch`) |
| 28 | + |
| 29 | +I focused on building a seamless integration between the UI and backend, ensuring that every agent response—whether plain text, markdown, or structured JSON—was rendered clearly and usefully for the end user. I also prioritized security, making sure only authenticated users could access sensitive dashboards and data. |
| 30 | + |
| 31 | +## 📚 What I Learned |
| 32 | + |
| 33 | +- How to design and implement a full-stack AI product from scratch |
| 34 | +- Advanced Angular and TypeScript patterns for dynamic, reactive UIs |
| 35 | +- Secure authentication and route protection with Supabase |
| 36 | +- Orchestrating multi-agent workflows and handling diverse response types (text, markdown, JSON) |
| 37 | +- Best practices in async Python, API design, and frontend-backend contract alignment |
| 38 | +- The importance of clear documentation and project journaling for future maintainers and employers |
| 39 | +- **Learning the Agent Development Kit (ADK):** I had to go through many resources, documentation, and community posts to understand and implement the ADK. It was a steep learning curve, but it opened my eyes to the power of building autonomous, agent-based systems. I realized that with the right tools, we can create systems that not only react, but proactively manage and optimize complex environments. |
| 40 | + |
| 41 | +## ⚡ Challenges Faced |
| 42 | + |
| 43 | +- Handling the wide variety of log formats and agent response types (sometimes plain text, sometimes deeply nested JSON) |
| 44 | +- Ensuring the UI was both beautiful and functional, even as the backend evolved |
| 45 | +- Debugging authentication flows and making sure session management was robust |
| 46 | +- Building a system that could scale from a hackathon prototype to a production-ready tool |
| 47 | +- **Deployment Issues:** Deploying a multi-service, cloud-native stack (with ADK, FastAPI, Supabase, and the MCP Toolbox) was a real challenge. I faced CORS issues, service account permission errors, and the usual headaches of getting everything to work smoothly on GCP and Docker. Each deployment hurdle taught me more about cloud infrastructure and the importance of automation and clear documentation. |
| 48 | + |
| 49 | +## 💡 Final Thoughts |
| 50 | + |
| 51 | +ChaosPilot is more than just a hackathon project—it's a reflection of my passion for solving real engineering problems with modern technology. I built it for every engineer who's ever felt lost in a sea of logs, and for every team that wants to move faster and smarter in the face of chaos. |
| 52 | + |
| 53 | +If you're reading this as a future employer or collaborator, know that I bring not just technical skills, but also empathy for the user and a drive to build tools that make a real difference. |
0 commit comments