docs: Complete README.md overhaul with comprehensive setup guide

jeffdyoung · jeffdyoung · commit 3e9be1bd20fd · 2025-07-09T16:50:05.000-04:00
- Expand from 30 to 200+ lines with detailed documentation
- Add prerequisites section with system requirements and software versions
- Add step-by-step setup guide covering installation, configuration, and deployment
- Add configuration options for model switching and log analysis
- Add usage examples for supported input types and analysis workflow
- Add comprehensive troubleshooting section with common error solutions
- Add development section with project structure and contribution guidelines
- Add support information and license details

This provides a complete getting-started guide for new users cloning
the repository and running the CI Analysis Agent from scratch.
diff --git a/README.md b/README.md
@@ -2,20 +2,283 @@
 
 ## What is it?
 
-This tool is experimentation to find root cause analysis for multi-arch release test failures.
+This tool is experimentation to find root cause analysis for multi-arch release test failures. It uses Google's Agent Development Kit (ADK) with local LLM models via Ollama to analyze CI/CD pipeline failures and provide intelligent insights.
 
-## How to use
+## Prerequisites
 
-1. [Install ADK](https://google.github.io/adk-docs/get-started/installation/)
-2. If you intend to use local models with LiteLLM, install ollama
-3. Build the Prow MCP server this agent uses: 
-```sh
+Before getting started, ensure you have the following installed:
+
+### Required Software
+- **Python 3.11+** (recommended 3.13)
+- **Git** for version control
+- **Ollama** for local LLM models
+- **Docker/Podman** for containerization
+- **Node.js 18+** (for ADK web interface)
+
+### System Requirements
+- **RAM**: 8GB minimum, 16GB recommended (for running local LLM models)
+- **Storage**: 10GB free space (for models and dependencies)
+- **OS**: Linux (recommended), macOS, or Windows with WSL2
+
+## Getting Started
+
+### 1. Clone the Repository
+
+```bash
+# Clone your fork
+git clone git@github.com:jeffdyoung/ci_analysis_agent.git
+cd ci_analysis_agent
+
+# Add upstream remote (optional, for contributing)
+git remote add upstream git@github.com:sherine-k/ci_analysis_agent.git
+```
+
+### 2. Install Dependencies
+
+#### Install Python Dependencies
+```bash
+# Create virtual environment (recommended)
+python3 -m venv venv
+source venv/bin/activate  # On Windows: venv\Scripts\activate
+
+# Install Python packages
+pip install -r requirements.txt
+
+# If requirements.txt doesn't exist, install core dependencies:
+pip install google-adk litellm drain3 google-cloud-storage python-dotenv
+```
+
+#### Install Ollama
+```bash
+# On Linux/macOS
+curl -fsSL https://ollama.com/install.sh | sh
+
+# On Windows (PowerShell)
+# Download from https://ollama.com/download/windows
+
+# Start Ollama service
+ollama serve
+```
+
+#### Install ADK (Agent Development Kit)
+```bash
+# Install ADK globally
+npm install -g @google/adk
+
+# Or install locally
+npm install @google/adk
+```
+
+### 3. Setup Local LLM Model
+
+```bash
+# Pull the qwen3:4b model (recommended)
+ollama pull qwen3:4b
+
+# Verify model is available
+ollama list
+
+# Test the model (optional)
+ollama run qwen3:4b "Hello, how are you?"
+```
+
+### 4. Environment Configuration
+
+Create a `.env` file in the project root:
+
+```bash
+# For local Ollama models (default)
+OLLAMA_API_BASE=http://localhost:11434
+
+# For Google Gemini (alternative)
+# GOOGLE_GENAI_USE_VERTEXAI=FALSE
+# GOOGLE_API_KEY=your_google_api_key_here
+
+# Optional: Logging level
+LOG_LEVEL=INFO
+```
+
+### 5. Build the Prow MCP Server
+
+```bash
+# Navigate to the prow_mcp_server directory
+cd prow_mcp_server
+
+# Build the container image
 podman build -t mcp-server-template:latest .
+# Or with Docker:
+# docker build -t mcp-server-template:latest .
+
+# Return to project root
+cd ..
+```
+
+### 6. Run the Application
+
+#### Option A: Using ADK Web Interface (Recommended)
+```bash
+# Start the web interface
+adk web
+
+# Open your browser to http://localhost:3000
+# Select "CI Analysis Agent" from the available agents
+```
+
+#### Option B: Command Line Interface
+```bash
+# Run the agent directly
+python agent.py
+
+# Or run specific sub-agents
+python sub_agents/installation_analyst/agent.py
+python sub_agents/mustgather_analyst/agent.py
 ```
-4. run `adk web` from the parent folder of ci_analysis_agent
 
-PS: If you're using Gemini (not a local model), create a .env file with content:
+#### Option C: Development Mode
+```bash
+# Run with auto-reload for development
+adk dev
+
+# Or use Python's development server
+python -m adk.cli dev
+```
+
+## Configuration Options
+
+### Model Configuration
+Edit `agent.py` to change the model:
+```python
+# For local Ollama models
+MODEL = LiteLlm(model="ollama_chat/qwen3:4b")
+
+# For other Ollama models
+MODEL = LiteLlm(model="ollama_chat/llama3:8b")
+MODEL = LiteLlm(model="ollama_chat/codellama:7b")
+
+# For Google Gemini
+MODEL = LiteLlm(model="gemini/gemini-1.5-flash")
 ```
-GOOGLE_GENAI_USE_VERTEXAI=FALSE
-GOOGLE_API_KEY={{YOUR_TOKEN_HERE}}
-```
+
+### Log Analysis Configuration
+The system uses Drain3 for log pattern detection. Configure in `drain3.ini`:
+```ini
+[DRAIN]
+sim_th = 0.4
+depth = 4
+max_children = 100
+max_clusters = 1000
+```
+
+## Usage Examples
+
+### Analyzing CI Failures
+1. Upload your CI logs or must-gather files
+2. The agent will automatically:
+   - Parse and categorize logs
+   - Identify failure patterns
+   - Provide root cause analysis
+   - Suggest remediation steps
+
+### Supported Input Types
+- **Prow job logs**
+- **OpenShift must-gather archives**
+- **Installation logs**
+- **Test execution reports**
+
+## Troubleshooting
+
+### Common Issues
+
+#### "Model not found" Error
+```bash
+# Check if Ollama is running
+ollama list
+
+# If model missing, pull it
+ollama pull qwen3:4b
+
+# Verify environment variable
+echo $OLLAMA_API_BASE
+```
+
+#### "Connection refused" Error
+```bash
+# Start Ollama service
+ollama serve
+
+# Check if port 11434 is available
+netstat -tlnp | grep 11434
+```
+
+#### ADK Web Interface Issues
+```bash
+# Clear ADK cache
+adk cache clear
+
+# Reinstall ADK
+npm uninstall -g @google/adk
+npm install -g @google/adk
+```
+
+#### Python Import Errors
+```bash
+# Activate virtual environment
+source venv/bin/activate
+
+# Reinstall dependencies
+pip install --force-reinstall -r requirements.txt
+```
+
+### Performance Tips
+- Use smaller models (qwen3:4b) for faster responses
+- Increase system RAM for better model performance
+- Use SSD storage for faster model loading
+- Monitor system resources during analysis
+
+## Development
+
+### Project Structure
+```
+ci_analysis_agent/
+├── agent.py                 # Main agent implementation
+├── prompt.py               # Agent prompts and instructions
+├── sub_agents/             # Specialized analysis agents
+│   ├── installation_analyst/
+│   └── mustgather_analyst/
+├── prow_mcp_server/        # MCP server for Prow integration
+├── deploy/                 # Deployment configurations
+└── requirements.txt        # Python dependencies
+```
+
+### Adding New Features
+1. Create a new sub-agent in `sub_agents/`
+2. Update the main agent to include the new functionality
+3. Add appropriate prompts and instructions
+4. Test with sample data
+
+### Contributing
+1. Fork the repository
+2. Create a feature branch: `git checkout -b feature-name`
+3. Make your changes and test thoroughly
+4. Submit a pull request to the upstream repository
+
+## Deployment
+
+For production deployment on Kubernetes or OpenShift clusters, see the [`deploy/`](deploy/) directory:
+
+- **Kubernetes**: `./deploy/deploy.sh`
+- **OpenShift 4.19+**: `./deploy/deploy-openshift.sh`
+
+Full documentation: [KUBERNETES.md](KUBERNETES.md)
+
+## Support
+
+For issues and questions:
+1. Check the troubleshooting section above
+2. Search existing issues in the repository
+3. Create a new issue with detailed information
+4. Include system information and error logs
+
+## License
+
+This project is licensed under the Apache License 2.0 - see the LICENSE file for details.