RHOAI implements a production-ready architecture with LlamaDeploy Python backend and @llamaindex/server TypeScript frontend, designed for enterprise-grade multi-agent analysis workflows.
- Production First: Built on LlamaDeploy for enterprise deployment and monitoring
- Native Python: Full LlamaIndex capabilities with Python v0.12+ compatibility
- Modern Frontend: Professional chat UI with @llamaindex/server
- API-Driven: Complete REST API for programmatic access
- Workflow Orchestration: Event-driven asynchronous agent coordination
- Observability: Built-in monitoring, logging, and health checks
Purpose: Production workflow orchestration and multi-agent coordination
┌─────────────────────────────────────────┐
│ PYTHON BACKEND │
│ (LlamaDeploy) │
│ │
│ 🔄 Workflow Engine │
│ • LlamaDeploy orchestration │
│ • Multi-agent coordination │
│ • Event-driven state management │
│ │
│ 🤖 Agent System │
│ • RFEAgentManager │
│ • Persona-specific RAG retrieval │
│ • Analysis synthesis │
│ │
│ 📚 Knowledge Integration │
│ • Python RAG index loading │
│ • Vector similarity search │
│ • Context generation │
│ │
│ 🌐 API Services │
│ • REST API endpoints │
│ • Streaming responses │
│ • Health monitoring │
└─────────────────────────────────────────┘
Key Files:
backend/src/rfe_builder_workflow.py- Main RFE Builder workflow definitionbackend/src/artifact_editor_workflow.py- Artifact editing workflowbackend/src/agents.py- Multi-agent managementbackend/llama_deploy.yml- Deployment configuration
Services:
- LlamaDeploy API Server (port 8000)
- Multi-agent RFE workflow orchestration
- Vector index management and RAG retrieval
Purpose: Modern chat interface and user experience
┌─────────────────────────────────────────┐
│ TYPESCRIPT FRONTEND │
│ (@llamaindex/server) │
│ │
│ 💬 Chat Interface │
│ • Professional chat UI │
│ • Streaming response handling │
│ • Starter questions │
│ │
│ 🔗 API Integration │
│ • LlamaDeploy connection │
│ • Real-time workflow updates │
│ • Task management │
│ │
│ 🎨 User Experience │
│ • Responsive design │
│ • Progress indicators │
│ • Error handling │
└─────────────────────────────────────────┘
Key Files:
frontend/index.ts- UI server configurationfrontend/package.json- Dependencies and scripts
Services:
- Frontend Server (port 3001)
- Chat UI with LlamaDeploy integration
- Real-time workflow progress tracking
Purpose: Knowledge base preparation and indexing
┌─────────────────────────────────────────┐
│ RAG INGESTION PIPELINE │
│ │
│ 📥 Data Sources │
│ • GitHub repositories │
│ • Local documentation │
│ • Web pages │
│ │
│ 🔄 Processing │
│ • Document chunking │
│ • Embedding generation (OpenAI) │
│ • Metadata extraction │
│ │
│ 💾 Index Creation │
│ • FAISS vector stores │
│ • Persona-specific indices │
│ • Metadata and statistics │
└─────────────────────────────────────────┘
Key Files:
python-rag-ingestion/rhoai_rag_ingestion/cli.py- Ingestion pipeline- Agent configurations in
src/agents/*.yaml
Output: Vector indexes saved to output/python-rag/{agent_name}/
- Agent Config Reading: Parse YAML configurations from
src/agents/ - Source Processing: Clone GitHub repositories, read local directories
- Document Processing: Chunk text, generate embeddings via OpenAI
- Index Creation: Build FAISS vector stores with persona-specific metadata
- Persistence: Save indexes to
output/python-rag/{agent_name}/
- User Input: RFE submission via chat interface (port 3001)
- Workflow Trigger: LlamaDeploy receives task via API (port 8000)
- Agent Initialization: Load agent configs and RAG indices
- Multi-Agent Analysis: Parallel analysis by all 7 agent personas
- Context Retrieval: Vector similarity search for each agent's knowledge base
- Synthesis: Combine all agent analyses into comprehensive output
- Deliverable Generation: Create component teams, architecture, timeline
- Streaming Response: Real-time updates back to chat interface
- Task Creation: POST to
/deployments/rhoai/tasks/create - Event Streaming: GET
/deployments/rhoai/tasks/{task_id}/events - Result Retrieval: Complete analysis results in structured JSON
Frontend and backend communicate via LlamaDeploy API:
Frontend (port 3001) ←──HTTP API──→ LlamaDeploy (port 8000)
│ │
├── Chat interface ├── Workflow orchestration
├── Real-time updates ├── Task management
└── Progress tracking └── Agent coordination
Python ingestion and backend share filesystem storage:
output/python-rag/{agent_persona}/
├── docstore.json # Document content and metadata
├── default__vector_store.json # FAISS vector embeddings
├── index_store.json # LlamaIndex configuration
├── graph_store.json # Relationship data
└── metadata.json # Agent info and statistics
Agents are defined in YAML with JSON Schema validation:
# yaml-language-server: $schema=./agent-schema.json
name: "Agent Display Name"
persona: "UNIQUE_IDENTIFIER"
role: "Role description"
dataSources:
- "local-directory"
- name: "github-source"
type: "github"
source: "org/repo"
options:
path: "docs/"
fileTypes: [".md"]
analysisPrompt:
template: "Analysis prompt with {rfe_description} variables"
templateVars: ["rfe_description", "context"]graph TD
A[RFE Input] --> B[Start Analysis]
B --> C[Multi-Agent Analysis]
C --> D[Collect Results]
D --> E[Synthesize Analysis]
E --> F[Generate Deliverables]
F --> G[Complete Workflow]
The RFEWorkflow coordinates all agent personas:
class RFEWorkflow(Workflow):
@step
async def analyze_with_agents(self, ev: RFEAnalysisEvent):
# Parallel execution of all 7 agents
events = []
for persona, config in agent_personas.items():
analysis = await self.agent_manager.analyze_rfe(
persona, ev.rfe_description, config
)
events.append(AgentAnalysisCompleteEvent(...))
return events┌─────────────────────────────────────────┐
│ LlamaDeploy Workflow │
│ │
│ ┌───────┐ ┌───────┐ ┌───────────┐ │
│ │ PM │ │ UXD │ │BACKEND_ENG│ │
│ └───────┘ └───────┘ └───────────┘ │
│ │ │ │ │
│ └──────────┼───────────┘ │
│ │ │
│ ┌───────────┐ │ ┌───────────────┐ │
│ │FRONTEND_ │ │ │ ARCHITECT │ │
│ │ ENG │ │ │ │ │
│ └───────────┘ │ └───────────────┘ │
│ │ │ │ │
│ └──────────┼───────────┘ │
│ │ │
│ ┌───────────────┐ │
│ │PRODUCT_OWNER │ │
│ │SME_RESEARCHER│ │
│ └───────────────┘ │
│ │
│ → Synthesis → Deliverables │
└─────────────────────────────────────────┘
- RFEAnalysisEvent: User input triggers workflow
- AgentAnalysisCompleteEvent: Each agent completes analysis
- AllAnalysesCompleteEvent: Synthesis phase begins
- SynthesisCompleteEvent: Generate deliverables
- StopEvent: Workflow completion with results
LlamaIndex + FAISS Integration:
- Native Python LlamaIndex v0.12+ vector stores
- FAISS backend for efficient similarity search
- Persona-specific indices for domain expertise
- Persistent storage for production deployment
class RFEAgentManager:
async def get_agent_index(self, persona: str):
# 1. Try Python RAG index (primary)
storage_dir = Path(f"../output/python-rag/{persona.lower()}")
if storage_dir.exists():
storage_context = StorageContext.from_defaults(persist_dir=storage_dir)
return load_index_from_storage(storage_context)
# 2. Fall back to LlamaCloud index (if available)
llamacloud_dir = Path(f"../output/llamacloud/{persona.lower()}")
if llamacloud_dir.exists():
storage_context = StorageContext.from_defaults(persist_dir=llamacloud_dir)
return load_index_from_storage(storage_context)
# 3. No index available
return Noneclass RFEAnalysisEvent(Event):
rfe_description: str
chat_history: List[Dict] = []
class AgentAnalysisCompleteEvent(Event):
persona: str
analysis: Dict[str, Any]
class AllAnalysesCompleteEvent(Event):
analyses: List[Dict[str, Any]]
rfe_description: str
class SynthesisCompleteEvent(Event):
synthesis: Dict[str, Any]
analyses: List[Dict[str, Any]]- LlamaDeploy Orchestration: Built-in workflow state management
- Task Tracking: Each analysis gets unique task ID
- Progress Streaming: Real-time updates via API endpoints
- Error Recovery: Graceful handling of individual agent failures
- Observability: Built-in monitoring and logging
- LlamaDeploy Orchestration: Enterprise-grade workflow management
- Parallel Agent Execution: All agents analyze simultaneously via async/await
- Persistent Vector Stores: Indices cached across restarts
- API-First Design: RESTful endpoints for horizontal scaling
- Health Monitoring: Built-in observability and health checks
# 1. Start LlamaDeploy API server
uv run -m llama_deploy.apiserver # Port 8000
# 2. Deploy workflow
uv run llamactl deploy llama_deploy.yml
# 3. Start frontend
npm run dev # Port 3001
# 4. Optional: Scheduled ingestion updates
0 2 * * * cd /app/python-rag-ingestion && rhoai-rag ingest# Check deployment status
uv run llamactl status
# View workflow logs
uv run llamactl logs rfe-builder-workflow
# Monitor tasks
uv run llamactl tasks
# Health checks
curl http://localhost:8000/health
curl http://localhost:3001/health- Create YAML configuration in
src/agents/ - Configure data sources (local directories or GitHub repositories)
- Run Python ingestion:
cd python-rag-ingestion && rhoai-rag ingest - Restart backend: LlamaDeploy automatically reloads agent configurations
- Local Sources: Update files in
data/directories, re-run ingestion - GitHub Sources: Re-run Python ingestion to pull latest commits
- Agent Config: Modify YAML files, LlamaDeploy hot-reloads configurations
cd backend
# Install development dependencies
uv sync --dev
# Run tests
uv run pytest
# Type checking
uv run mypy src/
# Restart workflow
uv run llamactl deploy llama_deploy.ymlcd frontend
# Install dependencies
npm install
# Development mode with hot reload
npm run dev
# Build for production
npm run build- Task Management:
/deployments/rhoai/tasks/* - Event Streaming:
/deployments/rhoai/tasks/{task_id}/events - Health Checks:
/health,/docs - Workflow Status:
/deployments/rhoai/status
- OpenAI API: GPT-4 language model and text-embedding-3-small
- GitHub API: Repository access and documentation retrieval (via python-rag-ingestion)
- LlamaDeploy: Production workflow orchestration and monitoring
- Custom Workflows: Extend
RFEWorkflowwith additional analysis steps - Agent Specializations: Create domain-specific agent personas and prompts
- Data Source Integration: Add new readers to python-rag-ingestion pipeline
- UI Customization: Configure @llamaindex/server chat interface
- API Integration: Build external applications using REST endpoints
This production-ready architecture provides enterprise-grade multi-agent analysis with built-in scalability, monitoring, and extensibility for complex feature refinement workflows.