Skip to content

Commit 046b128

Browse files
committed
chore: mode doc
1 parent 864e35f commit 046b128

File tree

1 file changed

+53
-0
lines changed

1 file changed

+53
-0
lines changed

docs/STORY.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# Project Story: ChaosPilot – AI-Powered Log Analysis
2+
3+
## Inspiration
4+
5+
As someone who has spent years working in software engineering, I've seen firsthand how overwhelming and time-consuming it can be to sift through endless logs during incidents. I've been on-call and faced the daunting task of finding the root cause of a production outage—often buried in thousands of lines of logs. The frustration of manual log analysis, the pressure to restore service quickly, and the risk of missing critical patterns inspired me to imagine a better way.
6+
7+
The real spark came when I saw a post on Reddit: someone asked if there was an AI tool that could analyze logs and surface actionable insights. That question resonated deeply with my own experience and the pain points I've seen across teams. I realized that with the rise of LLMs and cloud-native architectures, it was finally possible to build a tool that could automate the chaos of log analysis and incident response.
8+
9+
## About the Project
10+
11+
ChaosPilot is my answer to the modern log analysis problem. It's a full-stack, AI-powered platform that:
12+
- Ingests and analyzes logs in real time, currently connected with BigQuery and it's possible to connect with other data sources via the [MCP Toolbox for databases](https://googleapis.github.io/genai-toolbox/getting-started/introduction/).
13+
> This solution was originally named “Gen AI Toolbox for Databases” as its initial development.
14+
- Detects patterns, anomalies, and root causes
15+
- Classifies incidents by severity and impact
16+
- Generates actionable response plans and fix recommendations
17+
- Provides a beautiful, interactive dashboard for teams
18+
19+
I wanted to build something that not only saves engineers time, but also empowers them to respond faster and with more confidence during high-stress incidents.
20+
21+
## 🛠️ How I Built It
22+
23+
- **Frontend:** Angular, TypeScript, TailwindCSS for a modern, responsive UI
24+
- **Backend:** Python, FastAPI, Google ADK for orchestrating AI agents and workflows
25+
- **Authentication:** Supabase for secure user management
26+
- **Data/AI:** Google ADK agents (detector, planner, fixer, etc.), BigQuery analytics
27+
- **DevOps:** Docker, GCP, and modern Python dependency management (`uv`, `hatch`)
28+
29+
I focused on building a seamless integration between the UI and backend, ensuring that every agent response—whether plain text, markdown, or structured JSON—was rendered clearly and usefully for the end user. I also prioritized security, making sure only authenticated users could access sensitive dashboards and data.
30+
31+
## 📚 What I Learned
32+
33+
- How to design and implement a full-stack AI product from scratch
34+
- Advanced Angular and TypeScript patterns for dynamic, reactive UIs
35+
- Secure authentication and route protection with Supabase
36+
- Orchestrating multi-agent workflows and handling diverse response types (text, markdown, JSON)
37+
- Best practices in async Python, API design, and frontend-backend contract alignment
38+
- The importance of clear documentation and project journaling for future maintainers and employers
39+
- **Learning the Agent Development Kit (ADK):** I had to go through many resources, documentation, and community posts to understand and implement the ADK. It was a steep learning curve, but it opened my eyes to the power of building autonomous, agent-based systems. I realized that with the right tools, we can create systems that not only react, but proactively manage and optimize complex environments.
40+
41+
## ⚡ Challenges Faced
42+
43+
- Handling the wide variety of log formats and agent response types (sometimes plain text, sometimes deeply nested JSON)
44+
- Ensuring the UI was both beautiful and functional, even as the backend evolved
45+
- Debugging authentication flows and making sure session management was robust
46+
- Building a system that could scale from a hackathon prototype to a production-ready tool
47+
- **Deployment Issues:** Deploying a multi-service, cloud-native stack (with ADK, FastAPI, Supabase, and the MCP Toolbox) was a real challenge. I faced CORS issues, service account permission errors, and the usual headaches of getting everything to work smoothly on GCP and Docker. Each deployment hurdle taught me more about cloud infrastructure and the importance of automation and clear documentation.
48+
49+
## 💡 Final Thoughts
50+
51+
ChaosPilot is more than just a hackathon project—it's a reflection of my passion for solving real engineering problems with modern technology. I built it for every engineer who's ever felt lost in a sea of logs, and for every team that wants to move faster and smarter in the face of chaos.
52+
53+
If you're reading this as a future employer or collaborator, know that I bring not just technical skills, but also empathy for the user and a drive to build tools that make a real difference.

0 commit comments

Comments
 (0)