The Uderia Platform delivers enterprise-grade AI orchestration with unmatched flexibility. Whether you leverage hyperscaler intelligence for maximum capability, run private local models for absolute sovereignty, or blend both approaches, you get cloud-level reasoning with complete control over your data and costs.
Experience a fundamental transformation in how you work with enterprise data:
- From Intent to Autonomy - Your AI organization that senses, reasons, and delivers. Stop orchestrating. Start delegating. Specialized agents coordinate autonomously to gather data, reason across domains, and synthesize actionable intelligence.
- From Ideation to Operationalization - Revolutionary IFOC Methodology adapts to your needs. Four execution modes (Ideate, Focus, Optimize, Coordinate) in one conversation with zero friction. Switch between creative ideation, document-verified answers, sovereign efficiency, and cross-team orchestration with a simple @TAG.
- From Days to Seconds - Discover insights via conversation. Operationalize them via API. Your conversational discovery is your production-ready automation.
- From Hallucination to Ground Truth - Knowledge Graph maps your databases and RAG retrieves verified documents. Every answer grounded in proven sources with full citations. Zero fabrication, complete traceability.
- From Guesswork to Clarity - Full transparency eliminates the AI black box. See every strategic plan, tool execution, and self-correction in real-time through the Live Status Window.
- From Uncertainty to Accountability - Every action recorded. Every decision traceable. Enterprise-grade audit logging captures every interaction with full forensic context for compliance (GDPR, SOC2) and accountability at scale.
- From Prompt Hijacking to Prompt Integrity - Two-layer encryption with license-derived keys makes prompt extraction cryptographically impossible. Your AI logic remains proprietary, auditable, and tamper-proof.
- From Data Exposure to Data Sovereignty - Your data, your rules, your environment. Execute with cloud intelligence while maintaining local privacy through decoupled planning and execution with Champion Cases.
- From Isolated Expertise to Collective Intelligence - Intelligence Marketplace transforms individual expertise into collective knowledge. Share and discover repositories, agent packs, skills, extensions, and knowledge graphs with one click.
- From Context Contamination to Context Optimization - Nine intelligent context modules with budget-aware orchestration. Dynamic adjustments, surplus redistribution, and intelligent condensation optimize every token.
- From $$$ to ¢¢¢ - Revolutionary Fusion Optimizer with strategic planning, proactive optimization, and autonomous self-correction for cost-effective execution.
- From Hidden Costs to Total Visibility - Complete financial governance with real-time tracking, comprehensive analytics, and fine-grained cost control. Track every token, understand every cost.
Whether on-premises or in the cloud, you get enterprise results with optimized speed and minimal token cost, built on the six core principles detailed below.
- Core Principles: A Superior Approach
- Key Features
- Core Components
- Profile Classes: The IFOC Workflow
- The Fusion Optimizer
- Retrieval-Augmented Generation (RAG)
- Vector Store Abstraction Layer
- Intelligence Marketplace: Collaborative Knowledge Sharing
- Skills: Pre-Processing Context Injection
- Extensions: Post-Processing Transformations
- Interactive Visual Components: Generative UI
- Security Architecture
- System Architecture & Deployment
- Installation and Setup Guide
- User Guide
- Docker Deployment
- License
- Author & Contributions
- Appendix: Feature Update List
The Uderia Platform transcends typical data chat applications by delivering a seamless and powerful experience based on six core principles:
Go from conversational discovery to a production-ready, automated workflow in seconds. The agent's unique two-in-one approach means your interactive queries can be immediately operationalized via a REST API, eliminating the friction and redundancy of traditional data operations. What once took data experts weeks is now at your fingertips.
Eliminate the "black box" of AI. The Uderia Platform is built on a foundation of absolute trust, with a Live Status Window that shows you every step of the agent's thought process. From the initial high-level plan to every tool execution and self-correction, you have a clear, real-time view, leaving no room for guesswork.
Powered by the intelligent Fusion Optimizer, the agent features a revolutionary multi-layered architecture for resilient and cost-effective task execution. Through strategic and tactical planning, proactive optimization, and autonomous self-correction, the agent ensures enterprise-grade performance and reliability.
Your data, your rules, your environment. The agent gives you the ultimate freedom to choose your data exposure strategy. Leverage the power of hyperscaler LLMs, or run fully private models on your own infrastructure with Ollama, keeping your data governed entirely by your rules. The agent connects to the models you trust.
Complete cost transparency and control over your LLM spending. The agent provides real-time cost tracking, comprehensive analytics, and detailed visibility into every token consumed. With accurate per-model pricing, cost attribution by provider, and powerful administrative tools, you maintain full financial oversight of your AI operations.
Transform isolated expertise into collective intelligence. The Intelligence Marketplace enables community-driven sharing of execution patterns, domain knowledge, agent teams, skills, extensions, and knowledge graphs—reducing costs through proven strategies and creating a collaborative ecosystem where collective intelligence amplifies individual capabilities.
The Uderia Platform's features are organized around the six core principles that define its value proposition. Each principle is realized through a comprehensive set of capabilities designed to deliver enterprise-grade AI orchestration.
Eliminate the friction between conversational exploration and production automation. The agent's unique architecture enables seamless operationalization of interactive queries.
-
Comprehensive REST API: Full programmatic control with asynchronous task-based architecture for reliable, scalable automation:
- Session management (create, delete, list with conversation history)
- Query execution with async submit + poll pattern
- Task management (status polling, cancellation, result retrieval)
- Configuration management (profiles, LLM providers, MCP servers)
- RAG collection CRUD operations
- Analytics endpoints (session costs, token usage, efficiency metrics)
-
Long-Lived Access Tokens: Secure automation without session management:
- Configurable expiration (90 days default, or never)
- SHA256 hashed storage with audit trail
- Usage tracking (last used timestamp, use count, IP address)
- Soft-delete preservation for compliance
- One-time display at creation for enhanced security
-
Apache Airflow Integration: Production-ready DAG examples for batch query automation:
- Session reuse via
tda_session_idvariable - Profile override via
tda_profile_idfor specialized workloads - Bearer token authentication for secure API access
- Async polling pattern for reliable long-running executions
- Complete example DAG (
tda_00_execute_questions.py) included
- Session reuse via
-
n8n Workflow Automation: Visual node-based workflow builder for enterprise automation:
- Three production-ready workflow templates (Simple Query, Scheduled Reports, Slack Integration)
- Profile override support via REST API
profile_idparameter - Event-driven triggers (webhooks, cron schedules, manual execution)
- Linear ultra-clean workflow pattern for reliability
- Business process routing (email, Slack, CRM, databases)
- Docker deployment with reverse proxy support
- Comprehensive documentation with troubleshooting guides (see docs/n8n)
-
Flowise Integration: Low-code workflow automation and chatbot development:
- Pre-built agent flow for TDA Conversation handling
- Asynchronous submit & poll pattern implementation
- Session management with multi-turn conversation support
- Bearer token authentication for secure API access
- Profile override capability for specialized workflows
- TTS payload extraction for voice-enabled chatbots
- Visual workflow designer for complex orchestration
- Import-ready JSON template included (see docs/Flowise)
-
IFOC Workflow - From Ideation to Operationalization: Revolutionary methodology that adapts to your needs—four execution modes in one conversation with zero friction:
- 🟢 IDEATE (Conversation): Brainstorm, explore, and draft solutions without touching live systems—creative ideation without constraints
- 🔵 FOCUS (Knowledge): Verified intelligence with zero-hallucination guarantee—every answer grounded in your documents for document-verified answers
- 🟠 OPTIMIZE (Efficiency): The powerhouse—Fusion Optimizer with full MCP Tools + Prompts support, strategic planning, and self-correction for sovereign efficiency
- 🟣 COORDINATE (Multi-Profile): Multi-level autonomous orchestration where coordinators manage specialist teams for cross-team orchestration
- Switch between modes instantly with a simple
@TAG(e.g.,@CHAT,@POLICY,@OPTIMIZER,@EXECUTIVE) - Temporary overrides via
@TAGsyntax for single queries without changing defaults - Nested coordination support: Build 3-level AI hierarchies (Master → Coordinators → Specialists)
- Complete safeguards: Circular dependency detection, depth limits, cost visibility at every level
- Stop force-fitting every problem into one AI—match your intent to the right intelligence phase
-
Session Primer & Automatic Context: Transform generic LLMs into pre-educated specialists from the first message:
- Auto-initialize new sessions with domain-specific knowledge
- Inject business context, schemas, and common patterns at session creation
- Profiles can define default primer content for consistent onboarding
- Eliminates repetitive context-setting for every new conversation
- Specialists understand your environment without manual training
-
Autonomous AI Organization (Genie Mode): From intent to autonomy—your AI organization that senses, reasons, and delivers:
- Multi-profile coordination where specialized agents work as a unified team
- Master coordinator intelligently routes queries to domain experts
- Automatic discovery and orchestration of specialist capabilities
- Cross-domain synthesis: agents gather data independently, then coordinate findings
- Real-time topology visualization showing agent activation and collaboration
- Stop orchestrating manually—start delegating to an AI organization that never sleeps
- Executive-level queries like "Improve Product Margin for Q4" automatically cascade to CFO, CMO, and Legal specialists
-
Intelligent MCP Server Import: Seamless integration of community MCP servers with dual format support:
- Import from official MCP Registry format (io.example/server-name specifications)
- Import from Claude Desktop configuration files (direct migration)
- Automatic format detection with validation
- Bulk import multiple servers at once
- Three transport types: 🟠 STDIO (local), 🔵 HTTP (network), 🟢 SSE (streaming)
- STDIO servers: automatic subprocess lifecycle management (npx, uvx, python)
- Server-side ID generation ensures uniqueness
- Duplicate detection prevents configuration conflicts
- One-click access to MCP community servers
-
Docker Deployment Support: Production-ready containerization:
- Multi-user support in single shared container
- Environment variable overrides
- Volume mounts for sessions, logs, and keys
- Load balancer ready for horizontal scaling
Build trust through complete visibility into every decision, action, and data point the agent processes.
-
Live Status Panel: Real-time window into the agent's reasoning process:
- Strategic plan visualization with phase-by-phase breakdown
- Tactical decision display showing tool selection rationale
- Raw data inspection for every tool response
- Self-correction events with recovery strategy visibility
- Streaming updates via Server-Sent Events (SSE)
- Dual-model cost breakdown for Fusion Optimizer showing strategic vs tactical costs with color-coded visualization (12-Feb-2026)
-
Dynamic Capability Discovery: Instant overview of agent potential:
- Automatic loading of all MCP Tools from connected servers
- Prompt library display with categorization
- Resource enumeration for data source visibility
- Real-time capability updates on configuration changes
- Visual organization in tabbed Capabilities Panel
-
Rich Data Rendering: Intelligently formats and displays various data types:
- Query results in interactive tables with sorting/filtering
- SQL DDL in syntax-highlighted code blocks
- Key metrics in summary cards
- Integrated charting engine for data visualization
- Real-time rendering as data streams in
-
Comprehensive Token Tracking: Per-turn visibility into LLM consumption:
- Input token counts for every request
- Output token counts for every response
- Token-to-cost mapping with provider-specific pricing
- Historical token trends across sessions
- Optimization insights for cost-conscious users
- Theme-aware KPI displays adapt seamlessly to dark and light themes (11-Feb-2026)
-
Anti-Hallucination by Architecture: Ground every answer in verified sources—zero fabrication:
- Strict retrieval-then-synthesize pattern where the LLM answers only from retrieved documents
- Knowledge Graph maps your databases (tables, relationships, business concepts) before query generation
- RAG system retrieves and scores documents from knowledge bases with full citations
- Source traceability with citations back to specific document chunks
- Transparent failure when no relevant sources exist (no guessing)
- Dual knowledge layers work in unison for comprehensive grounding
-
Execution Monitoring Dashboard: Cross-source workload tracking:
- Real-time task list (running, completed, failed)
- Detailed execution logs with reasoning steps
- Tool invocation history with arguments and responses
- Error messages and stack traces for debugging
- Task control (cancel, retry) for operational flexibility
-
Enterprise Audit Logging - From Uncertainty to Accountability: Every action recorded, every decision traceable:
- Complete forensic trail with user, IP, timestamp, and outcome for every interaction
- User authentication and authorization events (login attempts, OAuth flows, token generation)
- Configuration changes with before/after snapshots (LLM provider switches, profile updates)
- Prompt executions with full turn-level attribution and cost tracking
- API usage patterns and access history for security monitoring
- Admin actions on user accounts and system settings
- Progressive security lockouts and suspicious activity detection
- 20+ specialized logging functions for comprehensive coverage
- Configurable retention policies for GDPR and data sovereignty compliance
- REST API access for integration with compliance tools (SOC2, audit reports)
- From audit trail to compliance report in one click
-
Intelligent Context Window Management: Budget-aware orchestration of every token sent to the LLM:
- Modular architecture with 9 pluggable context modules (system prompt, tools, history, RAG, knowledge, documents, and more)
- Five-pass assembly pipeline: resolve → dynamic adjustments → allocate & assemble → surplus reallocation → condense
- Per-module budget allocation with min/max constraints and priority-based condensation
- Dynamic adjustment rules that adapt context composition at runtime (first turn, long conversations, high-confidence RAG)
- Real-time observability via context window snapshot events with per-module utilization metrics
- 4 predefined context window types (Balanced, Knowledge-Heavy, Conversation-First, Token-Efficient) plus custom types
- Admin UI with live budget visualization, condensation order editor, and dynamic rule builder
- Per-session utilization analytics dashboard with trend charts and module breakdown
- tiktoken-based BPE token estimation for accurate budget planning
- Full architecture documentation →
-
Interactive Visual Components: Modular, plugin-based UI component library:
- Canvas Component: Interactive code editor powered by CodeMirror 6 with syntax highlighting (SQL, Python, JavaScript), live database connectors, in-place query execution, split-panel and fullscreen modes, and result rendering directly in chat
- Chart Component: Data visualization via G2Plot with 16 chart types (bar, line, pie, scatter, heatmap, gauge, radar, treemap, and more), 5-stage mapping resolution pipeline with cardinality-aware column selection, deterministic fast-path execution, and LLM-assisted fallback
- Knowledge Graph: Entity-relationship visualization for context enrichment and document structure exploration
- Self-contained component architecture with manifest-driven discovery and hot-reload
- Profile-level intensity control and admin governance
- 3 render targets: inline (chat bubble), sub_window (persistent canvas panel), status_panel (Live Status area)
- Third-party extensibility: add custom components without modifying core files
-
System Customization: Take control of agent behavior:
- System Prompt Editor for per-model instruction customization
- Save and reset capabilities for experimentation
- Direct Model Chat for baseline testing without tools
- Dynamic Capability Management (enable/disable tools/prompts)
- Phased rollouts without server restart
The Fusion Optimizer delivers enterprise-grade performance, cost efficiency, and reliability.
Real-World Cost Savings:
-
Typical enterprise query: "Show me all products with low inventory and notify suppliers"
- Traditional LLM wrapper: 15,000 tokens (full schema + full history) = $0.45/query
- Fusion Optimizer: 6,000 tokens (plan hydration + tactical fast path) = $0.18/query
- 60% cost reduction on repeated similar queries
-
Monthly workload (500 queries/day):
- Traditional: 500 × $0.45 × 30 = $6,750/month
- Fusion Optimizer: 500 × $0.18 × 30 = $2,700/month
- Savings: $4,050/month ($48,600/year)
-
Self-correction efficiency: When errors occur, targeted replanning (2K tokens) vs full restart (15K tokens)
See the dedicated section below (The Heart of the Application - The Engine & its Fusion Optimizer) for comprehensive architectural details on:
- Multi-layered strategic and tactical planning
- Proactive optimization (Plan Hydration, Tactical Fast Path, Specialized Orchestrators)
- Autonomous self-correction and healing
- Context-aware learning from execution history
- Deterministic plan validation and hallucination prevention
Key efficiency highlights:
-
Self-Improving Learning System: Closed-loop learning from past successes:
- Automatic capture and archiving of all successful interactions
- Token-based efficiency analysis to identify "champion" strategies
- Few-shot learning through injection of best-in-class examples
- Asynchronous processing to eliminate user-facing latency
- Per-user cost savings attribution and tracking
-
Planner Repository Constructors: Modular plugin system for domain-specific optimization:
- Self-contained templates with validation schemas
- SQL query templates with extensibility for document Q&A, API workflows
- LLM-assisted auto-generation from database schemas
- Dynamic runtime registration from
rag_templates/directory - Programmatic population via REST API for CI/CD integration
-
Knowledge Repositories: Domain context injection for better planning:
- PDF, TXT, DOCX, MD document support
- Configurable chunking strategies (fixed-size, paragraph, sentence, semantic)
- Automatic retrieval during strategic planning for context-aware decisions
- Semantic search for relevant background information
- Intelligence Marketplace integration for community knowledge sharing
Maintain complete control over your data exposure strategy with flexible deployment and provider options.
-
Multi-Provider LLM Support: Freedom to choose your AI infrastructure:
- Cloud Hyperscalers: Google (Gemini), Anthropic (Claude), OpenAI (GPT-4o), Azure OpenAI
- AWS Bedrock: Foundation models and inference profiles for custom/provisioned models
- Friendli.AI: High-performance serverless and dedicated endpoint support
- Ollama: Fully local, offline LLM execution on your own infrastructure
- Dynamic provider switching without configuration restart
- Live model refresh to fetch latest available models
-
Comparative LLM Testing: Validate model behavior across providers:
- Identical MCP tools and prompts across different LLMs
- Side-by-side performance comparison
- Model capability robustness validation
- Direct model chat for baseline reasoning assessment
- Profile-based A/B testing with
@TAGoverrides
-
Encrypted Credential Storage: Enterprise-grade security:
- Fernet symmetric encryption for all API keys
- Per-user credential isolation in SQLite database
- Credentials never logged or exposed in UI/API responses
- Secure passthrough to LLM/MCP providers
- Admin oversight without credential access
-
System Prompt Encryption: Defense against prompt extraction and hijacking:
- All system prompts encrypted at rest in database (never stored as plain text)
- Two-layer encryption: distribution protection + license-tier keys
- Runtime-only decryption minimizes attack surface
- Database dumps and prompt injection attacks cannot extract system instructions
- Segregation of duty: all tiers decrypt for runtime execution, but only licensed Prompt Engineer/Enterprise tiers can view or edit system prompts in the UI — preventing unauthorized prompt tampering
-
Multi-User Isolation: Complete session and data segregation:
- JWT-based authentication with 24-hour expiry
- User-specific sessions in separate directories
- Database-level user UUID isolation
- Role-based access control (User, Developer, Admin)
- Simultaneous multi-user support with no cross-contamination
-
Flexible Deployment Options: Adapt to your infrastructure:
-
Single-user development (local Python process)
-
Multi-user production (load-balanced containers or shared instance)
-
HTTPS support via reverse proxy configuration
-
Docker volume mounts for persistent data
-
-
Voice Conversation Privacy: Optional Google Cloud TTS with user-provided credentials:
- User-controlled API key management
- No server-side credential storage for voice features
- Browser-based Speech Recognition (local processing)
- Hands-free operation with configurable voice modes
- Key observations handling (autoplay-off, autoplay-on, off)
-
Document Upload & Multimodal Analysis: Attach documents and images directly in chat conversations:
- Native multimodal delivery for capable providers (Google Gemini, Anthropic Claude, OpenAI GPT-4o, Azure, AWS Bedrock Claude)
- Automatic text extraction fallback for all other providers (Friendli, Ollama, Bedrock Nova)
- Supports PDF, DOCX, TXT, MD, and image formats (JPG, PNG, GIF, WebP)
- Drag-and-drop or click-to-attach with image thumbnail previews and Visual/Text processing badges
- Provider-aware routing: images sent natively to vision models, documents via base64 or text extraction as appropriate
- Up to 5 files per message, 50 MB per file
- Full REST API support for programmatic upload workflows
-
Decoupled Planning with Champion Cases - The Sovereignty Breakthrough: Uderia separates strategic intelligence from execution, enabling local models to perform like hyperscalers.
How It Works:
- Cloud Planning Phase: Hyperscaler LLM creates strategic plan using full reasoning capability
- Champion Case Injection: System retrieves proven execution patterns from organizational history
- Local Execution Phase: Private on-prem model (Ollama) executes plan with champion guidance
Result: Your data never leaves your infrastructure, yet you get cloud-level strategic thinking.
Example Workflow:
- Query: "Analyze Q4 customer churn by segment"
- Cloud Planner: Creates 3-phase strategy (retrieve data, segment analysis, visualize)
- Champion Cases: Injects 5 proven churn analysis patterns from past successes
- Local Executor: Runs analysis on your private database using proven patterns
- Zero cloud exposure, maximum intelligence
Business Impact:
- Regulatory compliance: PHI, PII, financial data stays local
- Cost optimization: Expensive planning calls (8K tokens) happen once; cheap execution (2K tokens) reuses patterns
- Best of both worlds: Hyperscaler reasoning + on-prem sovereignty
-
Enterprise OAuth Authentication: Federated identity with five providers:
- Supported Providers: Google (OIDC), GitHub (OAuth2), Microsoft/Azure AD (OIDC), Discord (OAuth2), Okta (OIDC)
- CSRF protection via cryptographic state parameter validation
- Email verification with configurable enforcement
- Account merging and deduplication (link multiple providers to one account)
- Rate limiting with abuse detection and progressive lockout
- Throwaway email blocking for registration integrity
- Brute force detection on login attempts
- Comprehensive audit logging for all authentication events
- Provider popularity analytics and usage tracking
- Account linking/unlinking for existing users
- Full REST API: initiate, callback, link, disconnect, verification endpoints
-
Three-Tier Role-Based Access Control: Hierarchical permission system with granular feature governance:
- User Tier (19 features): Execute prompts, use MCP tools, manage own sessions and credentials, basic configuration
- Developer Tier (+25 features, 44 total): RAG collection management, template creation/testing, MCP diagnostics, import/export, advanced configuration
- Admin Tier (+24 features, 68 total): User management, credential oversight, system configuration, security settings, database administration, compliance reporting
- 68 distinct feature tags mapped to tiers with
@require_featuredecorators - Hierarchical permission inheritance (Admin inherits all Developer features, Developer inherits all User features)
- 5 predefined feature groups for bulk permission checks (session_management, rag_management, template_management, user_management, system_admin)
- Tier-based UI adaptation: features appear/disappear based on user's tier
- REST API endpoint:
GET /api/v1/auth/me/featuresreturns user's available features - Admin self-protection: administrators cannot modify their own tier
- Backward compatible with legacy
is_adminfield
Transparent, real-time cost tracking with fine-grained control over spending at every level of abstraction.
-
Real-Time Cost Tracking: Per-interaction visibility:
- Automatic cost calculation using up-to-date provider pricing
- Per-turn breakdown (input tokens, output tokens, total cost)
- Session-level cumulative cost tracking
- User-level cost aggregation across all sessions
- Historical cost trends and analytics
-
Provider-Specific Pricing Models: Accurate cost attribution:
- Google Gemini (1.5 Pro, 1.5 Flash, etc.) with context length tiers
- Anthropic Claude (Opus, Sonnet, Haiku) with standard/batch pricing
- OpenAI GPT-4o and GPT-4o-mini with tiered pricing
- Azure OpenAI (GPT-4, GPT-3.5-Turbo) with regional pricing
- AWS Bedrock (foundation models, inference profiles)
- Friendli.AI serverless and dedicated endpoints
- Ollama (local models, zero external cost)
-
Database-Backed Cost Persistence: Complete financial audit trail:
llm_model_coststable with versioned pricingefficiency_metricstable tracking token usage and learning system savingsuser_sessionstable with per-session cost summarieslong_lived_access_tokenswith usage tracking- Exportable cost reports for budgeting and forecasting
-
Profile-Based Spending Controls: Optimize costs by workload:
- Tag profiles by cost characteristics (e.g., "COST" for Gemini Flash)
- Quick switching between expensive (Claude Opus) and economical (Gemini Flash) models
- Profile override via
@TAGsyntax for cost-conscious queries - REST API profile selection for automated cost optimization
-
Efficiency Attribution: Quantify learning system savings:
- Before/after token comparison for champion case-guided planning
- Estimated cost savings from few-shot learning
- Per-user attribution of efficiency gains
- Efficiency leaderboard for gamification
- Continuous improvement ROI visibility
-
Cost Optimization Recommendations: Actionable insights:
- Model selection guidance based on task complexity
- Context pruning opportunities for token reduction
- Champion case population priorities for maximum savings
- Profile configuration suggestions for workload patterns
-
Consumption Profile Enforcement: Granular usage controls and quotas:
- Four predefined tiers: Free, Pro, Enterprise, Unlimited
- Per-user prompt rate limits (hourly and daily)
- Monthly token quotas (input and output tokens separately)
- Configuration change rate limits per hour
- Profile activation/deactivation for testing
- Global override mode for emergency rate limiting
- Admin bypass for unrestricted system access
- Real-time enforcement with clear error messages
- Database-backed consumption tracking and audit trail
The Intelligence Marketplace transforms isolated expertise into collective intelligence. Share and discover execution patterns, domain knowledge, agent teams, skills, extensions, and knowledge graphs with one click. Leverage community-validated assets to reduce costs, accelerate onboarding, and benefit from battle-tested strategies.
See the full Intelligence Marketplace section in Core Components for:
- Product catalog (6 asset types)
- Smart discovery and search
- Reference-based subscriptions and forking
- Community ratings and quality assurance
- Publishing workflows and API integration
- Agent Packs for portable AI teams
Uderia ships with a ready-to-use MCP server for public internet search, located at mcp_servers/google_search.py. It uses Google's Gemini Grounded Search API to find current public information and return factual summaries with source citations.
How to activate:
-
Import MCP Server — Navigate to Setup → MCP Servers → Import and paste the following Claude Desktop configuration:
{ "mcpServers": { "Google Search": { "command": "python", "args": ["/app/mcp_servers/google_search.py"], "env": {"GEMINI_API_KEY": "your-gemini-api-key"} } } }Replace
/app/mcp_servers/...with the actual path if running outside Docker. -
Link to a Profile — Create or edit a profile (e.g.,
tool_enabledtype) and select "Google Search" as its MCP Server. -
Use — The
external_searchtool is now available. Queries routed to this profile will search the public internet via Gemini and return results with citations.
Each user provides their own Gemini API key through the env field, enabling per-user authentication without shared credentials.
The platform's capabilities are built on five core components that work together across all profile types — from execution methodology and optimization engine to knowledge retrieval, pre-processing skills, and post-processing extensions.
The Uderia Platform introduces the IFOC Workflow—four distinct execution modes that mirror how experts actually solve problems. From creative exploration to coordinated execution, these modes transform how organizations leverage AI.
┌─────────────────────────────────────────────────────────────────────────────────┐
│ THE IFOC WORKFLOW │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ 🟢 IDEATE 🔵 FOCUS 🟠 OPTIMIZE 🟣 COORDINATE │
│ ───────── ─────── ────────── ──────────── │
│ Brainstorm Research Execute Orchestrate │
│ Explore Verify Deliver Scale │
│ Draft Ground Operate Synthesize │
│ │
│ "What if...?" "What does "Do it." "Handle │
│ policy say?" everything." │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Every profile belongs to one of four classes, each designed for a specific phase of intelligent work. Together, they create a composable AI architecture that adapts to any challenge.
Philosophy: Creative exploration without constraints
┌─────────────────────────────────────────┐
│ User Question │
│ "How do I optimize this query?" │
└──────────────┬──────────────────────────┘
│
▼
┌──────────┐
│ LLM │ ← No tools, no data access
│ Analysis │ ← Pure reasoning & guidance
└─────┬────┘
│
▼
┌──────────────────────────────────────────┐
│ Expert Advice + Code Examples │
│ Ready for review before execution │
└──────────────────────────────────────────┘
The Value: Transform your LLM into a trusted thought partner. Explore possibilities, brainstorm solutions, and draft approaches—all without touching live systems. The Ideate phase is where creativity flows freely.
When to Use IDEATE:
- Exploring new ideas: "What approaches could solve this problem?"
- Learning concepts: "Explain CTEs in SQL with examples"
- Drafting solutions: "Write a query to calculate customer lifetime value"
- Planning ahead: "What should I consider before migrating this database?"
Breakthrough Potential:
- Zero-Cost Exploration: Learn complex concepts without expensive tool invocations
- Rapid Prototyping: Draft SQL, APIs, and workflows before committing resources
- Risk-Free Testing: Validate approaches before touching production systems
- Training Ground: Onboard new team members without data exposure
Example Profiles:
@CHAT- Your AI thought partner for any question@ARCHITECT- System design and architecture guidance@MENTOR- Code review and technical mentoring
Philosophy: Grounded answers from verified sources
┌─────────────────────────────────────────┐
│ User Question │
│ "What's our remote work policy?" │
└───────────────┬─────────────────────────┘
│
▼
┌───────────────────────┐
│ Semantic Search │
│ Your Document Store │ ← Policies, SOPs, Manuals
└───────┬───────────────┘
│
▼
┌──────────┐
│ LLM │
│ Synthesis│ ← ONLY uses retrieved docs
└─────┬────┘ NO general knowledge allowed
│
▼
┌───────────────────────────────────────────┐
│ Answer + Source Citations │
│ "Per HR Policy 3.2, page 7..." │
│ [View Source Document] │
└───────────────────────────────────────────┘
The Value: Eliminate hallucinations entirely. The Focus phase grounds every answer in your verified documents, policies, and institutional knowledge. When accuracy matters more than creativity, Focus delivers verified intelligence.
When to Use FOCUS:
- Compliance questions: "What does policy say about data retention?"
- Reference lookups: "What's the approved vendor list?"
- Verification: "Is this approach compliant with our security standards?"
- Institutional knowledge: "How did we handle this situation before?"
Breakthrough Potential:
- Zero Hallucination Guarantee: Answers only from your verified documents
- Institutional Memory: Never lose domain expertise when people leave
- Compliance Confidence: All responses traceable to source documents
- Instant Expertise: New hires access decades of knowledge immediately
Example Profiles:
@POLICY- Corporate policies and procedures@LEGAL- Contracts, compliance, and regulations@TECHNICAL- Engineering documentation and runbooks
Philosophy: Strategic execution that learns and heals
┌──────────────────────────────────────────┐
│ User Request │
│ "Show Q4 revenue by region" │
└───────────────┬──────────────────────────┘
│
▼
┌─────────────────────┐
│ FUSION OPTIMIZER │
│ Strategic Planning │ ← Multi-phase meta-plan
└────────┬────────────┘
│
▼
┌─────────────────────┐
│ Tactical Execution │ ← Per-phase tool selection
│ + Self-Correction │ ← Autonomous error recovery
└────────┬────────────┘
│
▼
┌───────────────────────┐
│ Execute Operations │ ← Database queries, APIs, tools
│ via MCP Server │ ← MCP Tools + Prompts access
└───────┬───────────────┘
│
▼
┌───────────────────────────────────────────┐
│ Results + Visualizations │
│ Strategic plans + Self-healing execution │
│ Full transparency + audit trail │
└───────────────────────────────────────────┘
The Value: This is where ideas become reality. The Optimize phase is powered by the revolutionary Fusion Optimizer—a multi-layered AI architecture that doesn't just execute tasks, it thinks strategically, learns from experience, and heals itself.
When to Use OPTIMIZE:
- Live data operations: "Show me Q4 results by region"
- Complex workflows: "Calculate inventory turnover and flag anomalies"
- Automated tasks: "Export customer segments to CSV"
- Real-time monitoring: "Alert if error rate exceeds threshold"
Breakthrough Potential:
- Strategic Intelligence: Creates multi-phase plans, not just single-shot responses
- Autonomous Self-Correction: Detects and fixes errors without human intervention
- Full MCP Integration: The only profile class that supports both MCP Tools AND MCP Prompts—execute pre-built workflows and complex multi-step operations directly
- Cost Optimization: 40% token reduction through plan hydration and tactical fast-path
- Proactive Optimization: Learns from context to skip redundant operations
- Democratize Expertise: Non-technical users execute complex operations through conversation
- Complete Transparency: See every decision, every tool call, every self-correction in real-time
Real-World Transformation:
- Before: Write SQL → Debug errors → Retry → Export → Format → Email (30 minutes)
- After: "Analyze Q4 sales trends and email the exec team" (2 minutes, auto-corrects, learns)
Example Profiles:
@OPTIMIZER- Full Fusion Optimizer with all features enabled@PROD- Production database operations with enterprise LLM@ANALYTICS- Business intelligence and self-service reporting@DEVOPS- Infrastructure monitoring and intelligent automation
Philosophy: Autonomous orchestration at scale
The Value: This is the breakthrough. The Coordinate phase creates autonomous AI organizations where specialized agents collaborate intelligently. One question triggers a cascade of expert consultations, data retrievals, and synthesis—all happening automatically.
When to Use COORDINATE:
- Multi-domain questions: "Analyze Q4, check compliance, and recommend strategy"
- Complex investigations: "Research this issue across all our systems"
- Executive summaries: "Prepare a board presentation on performance"
- Cross-functional work: "Coordinate finance, legal, and engineering review"
Breakthrough Potential:
- Multi-Level Intelligence: Coordinators can orchestrate other Coordinators, creating hierarchical AI organizations
- Compound Expertise: Combine database operations + knowledge retrieval + analysis in a single workflow
- Adaptive Problem Solving: The system decides which experts to consult based on the question
- Conversational State: Each expert maintains context across the entire conversation
- Scalable Architecture: Build AI "departments" with master coordinators managing specialized teams
The Game-Changer: Nested Coordination Unlike simple AI assistants, Coordinate profiles can orchestrate other Coordinate profiles, enabling unprecedented organizational depth:
┌─────────────────────────────────────┐
│ User: "@CEO, analyze Q4 and │
│ recommend strategy" │
└────────────────┬────────────────────┘
│
▼
╔═══════════════════════╗
║ @CEO (Level 0) ║ ← Master Coordinator
║ Strategic Genie ║
╚═══════════╤═══════════╝
│
┌─────────────────────┼─────────────────────┐
│ │ │
▼ ▼ ▼
┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ @CFO (Level 1) │ │ @CTO (Level 1) │ │ @LEGAL (Level 1) │
│ Financial Genie │ │ Technical Genie │ │ Policy Knowledge │
└────────┬─────────┘ └────────┬─────────┘ └──────────────────┘
│ │
┌────────┴────────┐ ┌───────┴────────┐
│ │ │ │
▼ ▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ @ACCT │ │ @AUDIT │ │ @DB_ADM │ │ @SECURE │
│ DB Ops │ │ Checks │ │ Schema │ │ Analysis│
│(Level 2)│ │(Level 2)│ │(Level 2)│ │(Level 2)│
└─────────┘ └─────────┘ └─────────┘ └─────────┘
How It Works:
@CEOreceives question, delegates to financial, technical, and legal experts@CFO(itself a Genie) autonomously coordinates@ACCTand@AUDIT@CTO(itself a Genie) autonomously coordinates@DB_ADMand@SECURE- Each specialist executes its task (queries, document retrieval, analysis)
- Results cascade back up: specialists → coordinators → master
@CEOsynthesizes comprehensive strategic recommendation
All of this happens automatically from a single user question.
Real-World Transformation:
Before Genie Profiles:
- User manually runs 5 separate queries
- Copies results between tools
- Synthesizes insights manually
- 30 minutes of repetitive work
With Genie Profiles:
- User: "@CEO, analyze Q4 performance, check compliance, and recommend next quarter strategy"
- Genie autonomously: Queries financial database → Retrieves policy docs → Analyzes trends → Cross-checks regulations → Synthesizes strategic recommendations
- Result delivered in 2 minutes, fully documented with audit trail
Example Profiles:
@EXECUTIVE- C-level strategic intelligence coordinator@ANALYST- Coordinates data retrieval, policy checks, and reporting@AUDITOR- Multi-source compliance verification@RESEARCHER- Deep-dive investigations across systems and knowledge bases
Safeguards & Control:
- Circular Dependency Prevention: Automatic detection prevents infinite loops
- Depth Limits: Configurable maximum nesting (default: 3 levels)
- Cost Visibility: Real-time token tracking across all coordination levels
- Transparent Execution: See exactly which experts are consulted and why
- Context Preservation: Each expert maintains conversation history across turns
Configuration Example:
{
"tag": "CEO",
"profile_type": "genie",
"genieConfig": {
"slaveProfiles": ["CFO_GENIE", "CTO_GENIE", "LEGAL_POLICY"],
"maxConcurrentSlaves": 3
}
}Where CFO_GENIE and CTO_GENIE are themselves Genie profiles that coordinate their own specialist teams—creating true organizational intelligence.
┌──────────────────────────────────────────────────────────────────────────────────────────┐
│ IFOC SELECTION MATRIX │
├──────────────┬────────────────┬───────────────┬──────────────┬─────────────────────────┤
│ │🟢 IDEATE │🔵 FOCUS │🟠 OPTIMIZE │🟣 COORDINATE │
│ │ (Conversation) │ (Knowledge) │ (Efficiency) │ (Multi-Profile) │
│ │ │ │ │ │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ PHILOSOPHY │ Explore │ Verify │ Execute │ Orchestrate │
│ │ Brainstorm │ Ground │ Deliver │ Scale │
│ │ Draft │ Reference │ Operate │ Synthesize │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ DATA ACCESS │ Optional │ Documents │ Full (MCP │ All Sources (Adaptive) │
│ │ (MCP Tools/RAG)│ Only │ Tools+Prompts│ │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ SAFETY │ Exploratory │ Zero │ Governed │ Composite (Inherits) │
│ │ (Interactive) │ Hallucinate │ Audit Trail │ │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ COST │ Low per turn │ Low-Moderate │ Lowest for │ Variable (Scales with │
│ (complex) │ (many turns) │ (~5K tokens) │ complex tasks│ complexity & depth) │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ SPEED │ Fastest │ Fast │ Fast + Smart │ Comprehensive │
│ │ │ │ (Self-heals) │ (Auto-parallel) │
├──────────────┼────────────────┼───────────────┼──────────────┼─────────────────────────┤
│ USE WHEN │ "How do I │ "What does │ "Show me Q4 │ "Analyze Q4, check │
│ │ optimize?" │ policy say?" │ results" │ compliance, recommend" │
└──────────────┴────────────────┴───────────────┴──────────────┴─────────────────────────┘
Pattern 1: Ideate → Focus → Optimize
1. 🟢 IDEATE "Draft a query to find inactive customers"
→ Get SQL without execution (safe, cheap)
2. 🔵 FOCUS "What's our customer data retention policy?"
→ Verify compliance from documents
3. 🟠 OPTIMIZE "Execute the query I just drafted"
→ Run against live database (controlled)
Pattern 2: Coordinate for Strategic Work
🟣 COORDINATE "Prepare board presentation on Q4 performance"
Automatically triggers:
→ 🟠 @CFO (OPTIMIZE: Financial analysis + database queries)
→ 🔵 @LEGAL (FOCUS: Compliance checks from policies)
→ 🟠 @ANALYST (OPTIMIZE: Trend analysis + visualizations)
→ Synthesis (Coordinated strategic narrative)
Result: Complete board deck in minutes, not days
Pattern 3: Learn → Apply → Deploy
1. 🟢 @MENTOR "Explain CTEs in SQL" ← IDEATE: Learning
2. 🟢 @CHAT "Draft a CTE for X" ← IDEATE: Practice
3. 🟠 @DEV "Test this CTE" ← OPTIMIZE: Safe execution
4. 🟠 @PROD "Deploy to production" ← OPTIMIZE: Controlled rollout
Traditional AI Assistants:
- One-size-fits-all approach
- High token costs on every query
- No separation of concerns
- Limited to single LLM's capabilities
Uderia's IFOC Architecture:
- Right phase for the task: Match your intent to the appropriate mode
- Composable intelligence: Combine phases for compound expertise
- Governed execution: Clear boundaries for safety and compliance
- Organizational scale: Coordinate specialists like a real team
The Bottom Line: Stop treating AI as a single assistant. The IFOC workflow mirrors how experts actually work: Ideate possibilities, Focus on verified knowledge, Optimize execution, and Coordinate complex multi-domain work. Build an AI organization where specialized experts collaborate intelligently.
The strategic planner understands profile class context and adapts behavior:
Recent Enhancement (Jan 2026): The planner now correctly disambiguates SQL queries when switching between profile classes. It prioritizes:
- SQL mentioned in most recent llm_only conversation
- SQL from most recent tool execution
- Historical queries with explicit turn metadata
This prevents the planner from executing the wrong query when users switch from @CHAT to @GOGET.
Every turn in a session records:
{
"turn": 3,
"profile_id": "profile-uuid",
"profile_tag": "CHAT",
"profile_type": "llm_only",
"turn_metadata": {
"turn_number": 3,
"profile_tag": "CHAT",
"profile_type": "llm_only",
"is_most_recent": true,
"sql_mentioned_in_conversation": [
"SELECT UserName FROM DBC.SessionsV WHERE SessionID <> 0"
]
}
}Key Fields:
profile_type- "llm_only", "tool_enabled", "rag_focused", or "genie"profile_tag- Short identifier for quick switchingsql_mentioned_in_conversation- Extracted SQL from llm_only responsesexecution_trace- Structured tool calls (only in tool_enabled)knowledge_retrieval_event- Document retrieval details (only in rag_focused)genie_metadata- Coordination details and child sessions (only in genie)
Profiles can be classified as:
Light Classification:
- Simple filter-based tool/prompt selection
- Fast, deterministic, no LLM call required
- Suitable for well-defined tool sets
Full Classification (LLM-Assisted):
- Dynamic categorization using LLM intelligence
- Adapts to ambiguous or complex tool selection
- Higher cost but more flexible
Note: Classification only applies to MCP-enabled profiles with multiple tools/prompts available.
The Value: Session Primer allows each profile to automatically execute an initialization question when a new session starts, pre-populating the context window with domain-specific knowledge. This transforms generic AI agents into instantly educated specialists.
Why This Matters: Instead of manually explaining your database schema, business rules, or domain terminology at the start of every conversation, the Session Primer does it automatically. The agent starts every session already understanding your context.
Configuration: In profile settings, enable "Session Primer" and provide an initialization question:
"Describe the database schema and explain the business meaning of each table""What KPIs are tracked in this system and how are they calculated?""Educate yourself on the API endpoints and their authentication requirements"
The Game-Changer: Specialized Expert Teams
Session Primer becomes transformational with Genie profiles. Build teams of pre-educated specialists:
┌────────────────────────────────────────────────────────────────────────────────┐
│ BUILDING AN AI EXPERT ORGANIZATION │
├────────────────────────────────────────────────────────────────────────────────┤
│ │
│ @ANALYST (Genie Coordinator) │
│ └─ Primer: "You coordinate business analysis. Understand the team below." │
│ │
│ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │
│ │ @KPI_EXPERT │ │ @SCHEMA_EXPERT │ │ @SQL_EXECUTOR │ │
│ │ │ │ │ │ │ │
│ │ Primer: │ │ Primer: │ │ Primer: │ │
│ │ "Learn all KPI │ │ "Learn the DB │ │ "Learn the │ │
│ │ definitions, │ │ schema and the │ │ available SQL │ │
│ │ formulas, and │ │ business context│ │ tools and │ │
│ │ business │ │ of each table │ │ execution │ │
│ │ thresholds" │ │ and column" │ │ patterns" │ │
│ │ │ │ │ │ │ │
│ │ → Knows: Revenue │ │ → Knows: Orders │ │ → Knows: How to │ │
│ │ targets, churn │ │ = transactions,│ │ write safe, │ │
│ │ definitions, │ │ Customers = │ │ optimized │ │
│ │ seasonality │ │ B2B accounts │ │ queries │ │
│ └───────────────────┘ └───────────────────┘ └───────────────────┘ │
│ │
└────────────────────────────────────────────────────────────────────────────────┘
Single Question, Compound Intelligence:
User: "@ANALYST, why did Q4 revenue drop?"
Execution Flow:
1. @KPI_EXPERT (already knows KPI definitions from primer)
→ "Revenue = sum(order_total) where status='completed'. Q4 target was $2M."
2. @SCHEMA_EXPERT (already knows table relationships from primer)
→ "Revenue data lives in orders table. Check order_status and created_at."
3. @SQL_EXECUTOR (already knows query patterns from primer)
→ Executes: SELECT month, SUM(order_total) FROM orders WHERE...
→ Returns: October $580K, November $420K, December $310K
4. @ANALYST synthesizes: "Q4 revenue was $1.31M vs $2M target (-34.5%).
December showed steepest decline. Recommend investigating..."
Without Session Primer: Each expert starts blank. User must explain schemas, KPIs, and context repeatedly.
With Session Primer: Each expert is pre-educated. They collaborate immediately with full domain understanding.
Best Practices:
- Efficiency Focused profiles: Prime with schema descriptions, API documentation
- Knowledge Focused profiles: Prime with "summarize the key topics in the knowledge base"
- Genie profiles: Prime with team structure and delegation guidelines
- Conversation profiles: Prime with domain terminology and business rules
@CHAT: "How do I calculate the average sale price by region?"
→ Agent provides SQL template and explanation (Conversation Focused)
@GOGET: "execute this query for the sales_data table"
→ Agent runs the query against live database (Efficiency Focused)
@CHAT: "Write a query to delete inactive customers"
→ Agent drafts DELETE query for review (Conversation Focused)
[User reviews, approves]
@PROD: "execute this query"
→ Agent executes against production database with audit trail (Efficiency Focused)
@RAG: "What are our approved customer retention strategies?"
→ Agent retrieves from strategy documents, synthesizes answer (Knowledge Focused)
@CHAT: "Help me design a retention campaign based on those strategies"
→ Agent provides implementation guidance (Conversation Focused)
@GOGET: "Execute a query to identify at-risk customers for the campaign"
→ Agent runs the query against live database (Efficiency Focused)
@RAG: "What does our security policy say about API key rotation?"
→ Agent retrieves exact policy language with citations (Knowledge Focused)
@GOGET: "Check which API keys in our system are older than 90 days"
→ Agent queries credential store via MCP tools (Efficiency Focused)
Profile Switching:
- Type
@in chat input to see all available profiles - Select with Tab/Enter or click
- Profile badge shows active override
- Session header displays both default (★) and override (⚡)
Execution Context:
- Conversation Focused (LLM): System prompt + conversation history
- MCP-Enabled Profiles: System prompt + conversation + tools + prompts + resources
- Knowledge Focused (RAG): RAG synthesis prompt + conversation + retrieved documents
Cost Implications:
- Conversation Focused: ~2,000 input tokens per turn
- Efficiency Focused: ~8,000+ input tokens per turn (includes planner context + full tool context)
- Conversation Focused (with MCP tools): ~3,000-4,000 input tokens per turn (LangChain agent with tool context)
- Knowledge Focused: ~3,000-5,000 input tokens per turn (depends on documents retrieved)
Historical Tracking:
profile_tags_used[]- All profiles used in sessionmodels_used[]- All LLM models used in sessionknowledge_retrieval_event- Document sources and relevance scores (RAG profiles)- Complete audit trail for cost attribution
- Start Conversational: Use Conversation Focused profiles to explore, learn, and draft queries
- Verify with Documents: Use Knowledge Focused profiles for policy, compliance, and reference lookups
- Execute When Needed: Switch to Efficiency Focused profiles only when live data operations are required
- Review Before Execution: Draft destructive queries in
@CHAT, review, then execute in@GOGET - Cost Attribution: Use profile tags to track which workloads drive costs
- Security: Restrict MCP-enabled profiles to authorized users via role-based access
- Knowledge Quality: Ensure Knowledge Focused profiles have well-curated knowledge collections
For Genie coordinator architecture and nested multi-level coordination, see: Nested Genie Upgrade Guide (docs/Architecture/NESTED_GENIE_UPGRADE_GUIDE.md)
The Uderia Platform is engineered to be far more than a simple LLM wrapper. Its revolutionary core is the Fusion Optimizer, a multi-layered engine designed for resilient, intelligent, and efficient task execution in complex enterprise environments. It transforms the agent from a mere tool into a reliable analytical partner.
The Optimizer deconstructs every user request into a sophisticated, hierarchical plan.
-
Strategic Planner: For any non-trivial request, the agent first generates a high-level meta-plan. This strategic blueprint outlines the major phases required to fulfill the user's goal, such as "Phase 1: Gather table metadata" followed by "Phase 2: Analyze column statistics."
-
Tactical Execution: Within each phase, the agent operates tactically, determining the single best next action (a tool or prompt call) to advance the plan.
-
Recursive Delegation: The Planner is fully recursive. A single phase in a high-level plan can delegate its execution to a new, subordinate instance of the Planner. This allows the agent to solve complex problems by breaking them down into smaller, self-contained sub-tasks, executing them, and then returning the results to the parent process.
The Fusion Optimizer supports heterogeneous model assignment across planning layers, enabling sophisticated cost-performance trade-offs:
-
Strategic Model: More capable model for high-level reasoning
- Handles complex meta-planning and multi-phase orchestration
- Examples: GPT-4o, Claude Opus 4.6, Gemini 2.0 Flash Thinking
- Runs once per query (low call frequency)
- Investment justified by quality of strategic decisions
-
Tactical Model: Faster, cost-efficient model for execution
- Handles tool selection and argument generation
- Examples: GPT-4o-mini, Claude Haiku, Llama 3.3 70B
- Runs multiple times per phase (high call frequency)
- 80-90% cost reduction vs. using premium model throughout
-
Real-Time Cost Visibility: Live Status panel displays color-coded cost breakdown
- Strategic cost (blue): Planning and orchestration overhead
- Tactical cost (green): Per-phase execution costs
- Enables data-driven model selection and optimization
Example Configuration:
Strategic: Claude Opus 4.6 ($15/$75 per 1M tokens)
Tactical: Claude Haiku 4.5 ($1/$5 per 1M tokens)
Result: 70% cost reduction with negligible quality impact
This architecture is particularly effective for:
- High-volume production workloads where tactical calls dominate
- Iterative refinement queries with multiple tactical cycles
- Multi-turn sessions with shared strategic context
- Budget-conscious deployments requiring predictable costs
Before and during execution, the Optimizer actively seeks to enhance performance and efficiency.
-
Plan Hydration: The agent intelligently inspects a new plan to see if its initial steps require data that was already generated in the immediately preceding turn. If so, it "hydrates" the new plan by injecting the previous results, skipping redundant tool calls and delivering answers faster. This is particularly effective for follow-up clarifications and iterative refinements.
-
Tactical Fast Path: For simple, single-tool phases where all required arguments are known, the Optimizer bypasses the tactical LLM call entirely and executes the tool directly, dramatically reducing latency. This eliminates unnecessary LLM calls for trivial interactions while maintaining conversational fluidity.
-
Specialized Orchestrators: The agent is equipped with programmatic orchestrators to handle common complex patterns. For example, it can recognize a date range query (e.g., "last week") and automatically execute a single-day tool iteratively for each day in the range. The Comparative Llama Invocation Orchestrator executes deterministic prompt sequences across multiple LLMs, collects responses, and generates analytical comparisons for model behavior analysis.
-
Context Distillation: To prevent context window overflow with large datasets, the agent automatically distills large tool outputs into concise metadata summaries before passing them to the LLM for planning, ensuring robust performance even with enterprise-scale data.
The agent learns from every successful interaction, building an ever-growing repository of "champion" strategies that guide future planning. This closed-loop learning system transforms individual successes into organizational knowledge.
-
Automatic Case Capture: Every completed session is analyzed and archived:
- Full conversation history with query-response pairs
- Complete tool invocation sequences with arguments
- Strategic plan and tactical execution details
- Token usage and cost metrics
- Success indicators (no errors, user satisfaction signals)
-
Efficiency Analysis and Scoring: Each case is evaluated for optimization potential:
- Token reduction opportunities (e.g., plan hydration candidates)
- Fast-path opportunities (e.g., queries that didn't need tools)
- Tool selection improvements (e.g., more direct paths to answers)
- Context management efficiency (e.g., Turn Summaries vs. Full Context)
- Before/after cost comparison for savings attribution
-
Champion Strategy Selection: The learning system identifies best-in-class examples:
- Lowest token count for similar query patterns
- Fastest execution time for interactive workloads
- Highest success rate for complex multi-step tasks
- Most elegant tool orchestration sequences
- User-endorsed solutions (via explicit feedback)
-
Few-Shot Learning Injection: Planning-time retrieval enhances strategic decisions:
_retrieve_similar_plans()searches the Planner Repository for analogous cases- Top-K similar cases injected into strategic planner context
- LLM leverages past successes to guide current planning
- Continuous improvement without model retraining
- Per-user savings attribution for efficiency tracking
-
Asynchronous Processing: Zero user-facing latency:
- Case archiving happens in background threads
- Champion case retrieval during planning overlaps with user response rendering
- No blocking operations on critical path
- Graceful degradation if learning system unavailable
The engine provides comprehensive observability and built-in safeguards against runaway execution.
Real-Time Performance Tracking:
-
Token Consumption Monitoring: Per-turn and cumulative tracking:
- Input tokens (prompt + context + few-shot examples)
- Output tokens (strategic plan + tactical steps + tool arguments + final response)
- Token-to-cost mapping with provider-specific pricing
- Historical trends and anomaly detection
-
Execution Time Profiling: Detailed timing breakdown:
- Strategic planning latency
- Tactical loop execution time per iteration
- Tool invocation duration (network + processing)
- Response generation time
- End-to-end query latency with percentile metrics
-
Resource Utilization: System-level metrics:
- Active session count and concurrency
- MCP server connection pool status
- ChromaDB vector store query performance
- SQLite database read/write latency
- Memory footprint per session
Built-in Safeguards:
- Tactical Loop Iteration Limit: Maximum 15 cycles per query to prevent infinite loops
- Maximum Tool Invocations: Cap on tool calls per tactical iteration to contain runaway execution
- Context Window Management: Budget-aware five-pass assembly with automatic condensation when approaching model limits (architecture details)
- Timeout Enforcement: Configurable query timeout with graceful degradation
- Error Accumulation Threshold: Abort after N consecutive tool failures to prevent thrashing
When errors occur, the Optimizer initiates a sophisticated, multi-tiered recovery process.
-
Pattern-Based Correction: The agent first checks for known, recoverable errors (e.g., "table not found," "column not found").
-
Targeted Recovery Prompts: For these specific errors, it uses highly targeted, specialized prompts that provide the LLM with the exact context of the failure and guide it toward a precise correction (e.g., "You tried to query table 'X', which does not exist. Here is a list of similar tables...").
-
Generic Recovery & Replanning: If the error is novel, the agent falls back to a generic error-handling mechanism or, in the case of persistent failure, can escalate to generating an entirely new strategic plan to achieve the user's goal via an alternative route.
-
Strategic Correction with Learning System: The integrated champion case learning system provides the highest level of self-healing. By retrieving proven strategies from past successes, the agent can discard a flawed or inefficient plan entirely and adopt a proven, optimal approach, learning from its own history to correct its course.
The Optimizer is built with enterprise-grade reliability in mind.
-
Deterministic Plan Validation: Before execution begins, the agent deterministically validates the LLM-generated meta-plan for common structural errors (e.g., misclassifying a prompt as a tool) and corrects them, preventing entire classes of failures proactively.
-
Hallucination Prevention: Specialized orchestrators detect and correct "hallucinated loops," where the LLM incorrectly plans to iterate over a list of strings instead of a valid data source. The agent semantically understands the intent and executes a deterministic, correct loop instead.
-
Definitive Error Handling: The agent recognizes unrecoverable errors (e.g., database permission denied) and halts execution immediately, providing a clear explanation to the user instead of wasting resources on futile retry attempts.
For comprehensive details on the budget-aware context window orchestrator — including the five-pass assembly pipeline, 9 pluggable modules, dynamic adjustment rules, surplus reallocation, condensation strategies, and per-turn observability snapshots — see: Context Window Architecture (docs/Architecture/CONTEXT_WINDOW_ARCHITECTURE.md)
The Uderia Platform integrates a powerful Retrieval-Augmented Generation (RAG) system designed to create a self-improving agent. This closed-loop feedback mechanism allows the agent's Planner to learn from its own past successes, continuously enhancing its decision-making capabilities over time.
The core value of this RAG implementation is its ability to automatically identify and leverage the most efficient strategies for given tasks. It works by:
- Capturing and Archiving: Every successful agent interaction is captured and stored as a "case study."
- Analyzing Efficiency: The system analyzes each case based on token cost to determine its efficiency.
- Identifying Champions: It identifies the single "best-in-class" or "champion" strategy for any given user query.
- Augmenting Future Prompts: When a similar query is received in the future, the system retrieves the champion case and injects it into the Planner's prompt as a "few-shot" example.
This process guides the Planner to generate higher-quality, more efficient plans based on proven, successful strategies, reducing token consumption and improving response quality without manual intervention. The entire process runs asynchronously in the background to ensure no impact on user-facing performance.
The application supports two distinct types of repositories, each serving a different purpose in the AI agent ecosystem:
Purpose: Store execution strategies and planning patterns
- Capture successful agent interactions as few-shot learning examples
- Contain SQL query patterns, API workflows, and proven execution traces
- Retrieved by the RAG system to guide future planning decisions
- Built via Planner Repository Constructors - modular templates for domain-specific pattern generation
- Automatically populated from agent execution history or manually via REST API
- Enable the agent to learn from past successes and improve over time
- Available in Intelligence Marketplace for community sharing
Purpose: Provide reference documentation and domain knowledge
- Store general documents, technical manuals, and business context
- Support for PDF, TXT, DOCX, MD, and other document formats
- Configurable chunking strategies (fixed-size, paragraph, sentence, semantic)
- Seamlessly integrated with strategic planning for intelligent context injection
- Retrieved during planning to inject domain context into strategic decision-making
- Enable the agent to query relevant background information when making decisions
- Available in Intelligence Marketplace for community sharing
- Feature Status: ✅ Fully integrated (Phase 1 complete - Nov 2025)
Both repository types are available in the Intelligence Marketplace for community sharing, discovery, and collaboration. The marketplace enables reference-based subscriptions, forking for customization, ratings and reviews, and flexible publishing options. This separation ensures that execution patterns (how to accomplish tasks) remain distinct from domain knowledge (what the agent needs to know), while both can be leveraged through the unified RAG system.
The RAG system now features a modular template architecture that enables domain-specific customization and extensibility:
- Plugin-Based Design: Templates are self-contained plugins with their own schemas, validation logic, and population strategies
- Template Types: Support for SQL query templates, with extensibility for document Q&A, API workflows, and custom domains
- Manifest System: Each template declares its capabilities, required fields, and validation rules via a standardized manifest
- Dynamic Registration: Templates are automatically discovered and registered at runtime from the
rag_templates/directory - Programmatic & LLM-Assisted Population: Templates can be populated via REST API with structured examples or through LLM-assisted generation in the UI
- Auto-Generation: Built-in LLM workflows to automatically generate domain-specific examples from database schema or documentation
This modular approach allows organizations to extend the RAG system with custom templates tailored to their specific data patterns, query types, and business domains without modifying core agent code.
While Planner Repositories power the self-improving Optimizer, Knowledge Repositories serve an entirely different purpose: they deliver grounded, hallucination-free answers from verified documents. This is the engine behind the Focus profile class (🔵 rag_focused).
Traditional LLMs generate answers from training data — a black box of uncertain provenance. Knowledge Retrieval inverts this model:
- Zero Hallucination by Design: The LLM synthesizes answers exclusively from retrieved documents. No general knowledge is injected. If the knowledge base doesn't contain the answer, the system says so transparently rather than fabricating one.
- Institutional Memory at Scale: Corporate policies, engineering runbooks, product documentation, compliance frameworks — all searchable via natural language. When experts leave, their knowledge stays.
- Source Traceability: Every answer includes citations back to specific documents, chunks, and metadata. Auditors and compliance teams can verify any claim.
- Freshness-Aware Ranking: Documents are scored using a hybrid of semantic relevance and temporal freshness, ensuring recent updates rank appropriately against older but relevant content.
┌──────────────────────────────────────────────────────────────────────────┐
│ KNOWLEDGE RETRIEVAL PIPELINE │
├──────────────────────────────────────────────────────────────────────────┤
│ │
│ User Query │
│ "What is our data retention policy for EU customers?" │
│ │ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Configuration Resolution │ ← Three-tier: Global → Profile → Lock │
│ │ maxDocs, freshnessWeight, │ │
│ │ minRelevance, maxTokens │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Semantic Search (ChromaDB) │ ← Embedding: all-MiniLM-L6-v2 │
│ │ Query each knowledge │ │
│ │ collection assigned to │ │
│ │ the profile │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Hybrid Scoring │ │
│ │ adjusted = (1-fw) × sim │ fw = freshnessWeight │
│ │ + fw × freshness │ sim = 1 - cosine_distance │
│ │ │ freshness = e^(-decay × days_old) │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ Per-Document Deduplication │ ← maxChunksPerDocument limit │
│ │ + Minimum Relevance Filter │ ← minRelevanceScore threshold │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ ┌─────────────────────────────┐ │
│ │ LLM Synthesis │ ← Custom synthesis prompt override │
│ │ System prompt + retrieved │ available per profile │
│ │ documents + user query │ │
│ └──────────────┬──────────────┘ │
│ ▼ │
│ Answer with Source Citations │
│ "Per the EU Data Governance Policy (Section 4.2)..." │
│ │
└──────────────────────────────────────────────────────────────────────────┘
| Aspect | Planner Repositories | Knowledge Repositories |
|---|---|---|
| Purpose | Self-improving execution strategies | Grounded document retrieval |
| Profile Class | 🟠 Optimize (tool_enabled) | 🔵 Focus (rag_focused) |
| Data Source | Auto-captured execution traces | Uploaded documents (PDF, DOCX, TXT, MD) |
| Scoring | Similarity with efficiency penalties | Hybrid similarity + freshness |
| Tool Execution | Yes — full MCP tool calling | None — pure retrieval + synthesis |
| Hallucination Risk | Mitigated via proven patterns | Eliminated by design |
Knowledge retrieval behavior is controlled through a three-tier configuration resolution:
- Admin-Locked (highest priority): Global settings locked by admin override all profile values
- Profile Override: Per-profile settings in the profile's
knowledgeConfig - Global Default (lowest priority): Platform-wide defaults
| Parameter | Description | Default |
|---|---|---|
maxDocs |
Maximum documents returned | 3 |
minRelevanceScore |
Minimum cosine similarity threshold | 0.30 |
maxTokens |
Token budget for knowledge context | 2,000 |
maxChunksPerDocument |
Limit chunks from same source | 0 (unlimited) |
freshnessWeight |
Blend ratio: 0.0 = pure relevance, 1.0 = pure freshness | 0.0 |
freshnessDecayRate |
Exponential decay rate for age penalty | 0.005 |
synthesisPromptOverride |
Custom system prompt for LLM synthesis | (none) |
Knowledge Repositories support multiple document formats (PDF, DOCX, TXT, Markdown) with configurable chunking strategies:
- Paragraph-based (default): Respects natural document structure, combines small paragraphs, splits oversized ones
- Sentence-based: Fine-grained chunking for dense technical content
- Fixed-size: Character-count chunking with configurable overlap
- Semantic: Boundary-aware splitting that preserves meaning
Each chunk is embedded using all-MiniLM-L6-v2 and stored in ChromaDB with metadata (title, author, creation date, source filename, category, tags) enabling rich filtering and freshness scoring.
For the comprehensive architecture deep-dive including scoring algorithms, execution flow, and advanced features, see: Knowledge Retrieval Architecture (docs/Architecture/KNOWLEDGE_RETRIEVAL_ARCHITECTURE.md)
For a comprehensive overview of the RAG architecture, template development, and maintenance utilities, please see the detailed documentation: RAG System Documentation (docs/RAG/RAG.md) RAG Template Plugin Development (rag_templates/README.md)
The platform's RAG and Knowledge capabilities are powered by a pluggable vector store layer that decouples all embedding, storage, and retrieval operations from any single database vendor. This means you can start with the built-in local store and scale to enterprise infrastructure — without changing a single profile, collection, or workflow.
Three production backends, one unified interface:
| ChromaDB (Default) | Qdrant Cloud | Teradata Enterprise Vector Store | |
|---|---|---|---|
| Best for | Local / single-user / getting started | Cloud-native / managed vector search | Enterprise / shared infrastructure / governed data |
| Deployment | Embedded, zero-config | Managed cloud (Qdrant Cloud) | Server-side, connects to existing Teradata environment |
| Embedding | Client-side (SentenceTransformer) | Client-side (SentenceTransformer) | Server-side (Amazon Bedrock or Azure AI) — no local GPU needed |
| Chunking | Client-side (platform-managed) | Client-side (platform-managed) | Client-side or server-side — pass raw files and let the database handle it |
| Search Modes | Semantic | Semantic, Hybrid (semantic + BM25 keyword) | Semantic |
| Scaling | Single-node | Cloud-managed, horizontal scaling | Massively parallel, leverages Teradata's query engine |
Why this matters:
- No vendor lock-in. Collections created on ChromaDB work identically on Qdrant or Teradata. Switch backends by changing a configuration — existing profiles, RAG templates, and marketplace packs continue to work unchanged.
- From laptop to cloud to enterprise. Start local with ChromaDB, move to Qdrant Cloud for managed scalability, or deploy on Teradata for governed enterprise infrastructure — all without touching your workflows. The Teradata backend adds server-side embedding (no local GPU costs), server-side chunking (upload raw PDFs and let the database handle splitting), and connection resilience with automatic stale-connection detection and serialized reconnect to survive transient network issues.
- Hybrid Search. Supporting backends (e.g., Qdrant) offer hybrid search mode combining semantic vector similarity with BM25 keyword matching via rank fusion. Configure the keyword weight (0.0–1.0) per collection to balance precision and recall for your domain.
- Teradata server-side chunking control. When using the Teradata EVS backend, you can choose between Optimized (structure-aware dynamic chunking that follows document layout) and Fixed Size (character-based splitting with configurable chunk size). Header and footer trimming is available for PDF documents to exclude page headers/footers before chunking. All parameters are persisted at the collection level and editable per-upload.
- Asynchronous counters. For Teradata server-side ingestion, document and chunk counts update asynchronously. After a document upload completes, the platform polls the EVS backend at 30s, 60s, and 90s intervals to retrieve the final chunk count. During this window, the Knowledge Repository card may temporarily show stale counts — a notification is emitted once the actual count is available.
- Capability-based negotiation. Each backend declares what it supports (e.g.,
SERVER_SIDE_CHUNKING,GET_ALL,METADATA_FILTERING). The platform adapts its behavior automatically — no feature flags or conditional code in your workflows. - Safe concurrent access. The factory uses config-fingerprinted singleton caching with per-key async locks, ensuring multiple sessions never race to initialize the same backend.
For the full architectural deep-dive — including ingestion paths, connection resilience patterns, and the EVS object ownership model — see: Vector Store Abstraction Architecture (docs/Architecture/VECTOR_STORE_ABSTRACTION_ARCHITECTURE.md)
The Intelligence Marketplace transforms individual expertise into collective intelligence. Share proven execution patterns, domain knowledge, complete agent teams, behavioral skills, processing extensions, and entity-relationship models with the community—turning isolated insights into a powerful collaborative ecosystem that reduces costs, accelerates onboarding, and amplifies capabilities.
The marketplace is Uderia's collaborative ecosystem for sharing, discovering, and deploying enterprise AI assets. It transforms the platform from a single-user tool into a network where community-validated strategies, domain knowledge, and agent configurations circulate freely—reducing token costs through proven patterns, accelerating onboarding with battle-tested solutions, and enabling continuous improvement through fork-and-improve workflows.
Product Catalog — Six interconnected asset types:
| Product Type | Description | Acquisition |
|---|---|---|
| Planner Repositories (📋) | Proven execution patterns and strategies for autonomous task completion | Subscribe, Fork |
| Knowledge Repositories (📄) | Reference documents and domain knowledge for planning context | Subscribe, Fork |
| Agent Packs | Bundled agent teams (coordinator, experts, knowledge collections) as portable .agentpack files |
Install, Fork, Publish |
| Skills | Pre-processing prompt injections that modify LLM behavior (Claude Code compatible format) | Install |
| Extensions | Reusable processing modules with tiered complexity (convention, simple, standard, LLM) | Install |
| Knowledge Graphs | Entity-relationship models for database topology, business concepts, and domain ontologies | Install, Fork |
Smart Discovery & Search
Find exactly what you need through powerful search and filtering:
- Keyword search across collection names and descriptions
- Filter by repository type (Planner vs. Knowledge)
- Filter by visibility (Public, Unlisted), rating, and install counts
- Sort by popularity, rating, or recency
- Pagination for browsing large catalogs
- View metadata: owner, subscriber count, ratings, case/document counts
Reference-Based Subscriptions
Access shared collections without data duplication:
- Subscribe to expert-curated collections with one click
- Automatic integration into your RAG system
- Planner retrieves cases from subscribed collections seamlessly
- No storage overhead—references original collection
- Unsubscribe anytime to manage your collection portfolio
- Live updates when publishers improve collections
Fork for Customization
Create independent copies for your specific needs:
- Full copy including embeddings, files, and metadata
- Customize forked collections without affecting originals
- Perfect for adapting community patterns to your domain
- Iterative refinement through fork-and-improve workflow
- Build on proven strategies while maintaining independence
Community Quality Assurance
Trust community validation through ratings and reviews:
- 1-5 star rating system with optional text reviews
- Average ratings displayed on asset cards
- Cannot rate own assets (ensures objectivity)
- Browse top-rated assets for proven quality
- Community feedback guides discovery
Flexible Publishing Options
Share your expertise with granular visibility control:
- Public: Fully discoverable in marketplace browse
- Unlisted: Accessible via direct link only (share with specific teams)
- Private: Owner-only access (default)
- Update visibility anytime
- Must meet minimum requirements to publish (e.g., ≥1 RAG case/document)
- Maintain full ownership and control
Secure Access Control
Enterprise-grade authorization and privacy:
- JWT-authenticated API endpoints
- Ownership validation on all operations
- Cannot subscribe to own assets
- Must be owner to publish or modify
- Usernames visible for transparency and attribution
- Privacy-first design with granular visibility controls
REST API Integration
Programmatic marketplace operations for automation:
- Browse assets with search/filter parameters
- Subscribe/unsubscribe programmatically
- Fork assets via API for CI/CD workflows
- Publish assets as part of deployment pipelines
- Rate assets for automated quality tracking
- Full CRUD operations for marketplace management
Agent Packs bundle complete agent configurations—coordinator profiles, expert profiles, knowledge collections, and MCP server references—into a single .agentpack file that can be installed, exported, and shared across environments.
Why Agent Packs matter: Building a well-tuned multi-profile agent team (e.g., a Genie coordinator with specialized RAG experts) requires significant effort: creating profiles, assigning knowledge collections, configuring child relationships, and testing the whole ensemble. Agent Packs capture this investment as a portable, versioned artifact that can be deployed in seconds on any Uderia instance.
How to use Agent Packs:
- Install: Import an
.agentpackfile via Setup → Agent Packs → Import. All profiles, collections, and dependencies are created automatically with conflict resolution. - Export: Select an installed pack and click Export to produce a
.agentpackfile containing the current live state of all profiles and collections. - Publish: Share your pack to the Intelligence Marketplace for other users to discover and install with one click.
- Harmonize LLM: After importing a pack, use the Harmonize LLM feature to switch all pack profiles to your preferred LLM provider in one operation.
- Live References: Packs store references to profiles and collections, not copies. Changes you make to pack-managed profiles (e.g., adding an expert to a coordinator) take effect immediately and are captured in subsequent exports.
Agent Packs turn the collaborative marketplace into a distribution channel for complete AI solutions—not just individual knowledge repositories, but fully operational agent teams ready for production use.
Cost Reduction Through Reuse
Leverage proven patterns to minimize token consumption:
- Reuse champion execution strategies instead of trial-and-error
- Access domain expertise without rebuilding from scratch
- Community-validated patterns reduce failed attempts
- Lower onboarding costs for new users and use cases
- Network effects: more users = more valuable patterns
Collaborative Intelligence Platform
The marketplace transforms the Uderia Platform from a single-user tool into a collaborative intelligence platform. By enabling pattern sharing, community validation, and knowledge reuse, it reduces costs, improves quality, and accelerates time-to-value for all users. Whether you're publishing your expertise or subscribing to community wisdom, the marketplace creates a powerful ecosystem where collective intelligence amplifies individual capabilities.
Discovering Assets:
- Navigate to the Intelligence pane (light bulb icon)
- Select the product type tab (Planner Repositories, Knowledge Repositories, Agent Packs, etc.)
- Use search and filters to find relevant assets
- View ratings, subscriber counts, and metadata before subscribing
Subscribing to Collections:
- Click Subscribe on a collection card for reference-based access
- Subscribed collections automatically integrate into your RAG system
- View your subscriptions in the My Collections tab
Forking for Customization:
- Click Fork to create an independent copy
- Customize the fork without affecting the original
- Perfect for adapting community patterns to your specific domain
Publishing Your Own Assets:
- Create a collection with at least 1 RAG case or document
- Navigate to My Collections → select the collection → click Publish
- Choose visibility (Public or Unlisted)
- Add descriptive metadata to help others discover your work
Rating and Reviewing:
- Rate assets you've used (1-5 stars with optional review)
- Help the community identify high-quality patterns
- Cannot rate your own assets
For detailed marketplace architecture, product type specifications, and API documentation, see: Intelligence Marketplace Architecture (docs/Architecture/MARKETPLACE_ARCHITECTURE.md)
Skills are reusable markdown instruction sets that shape how the agent reasons, responds, and formats — injected into LLM context before query execution. They provide transparent, auditable control over agent behavior without modifying system prompts permanently.
How it works: Type # in the chat input to trigger autocomplete, select a skill, and optionally add a parameter with :param syntax (e.g., #sql-expert:strict). Skills appear as emerald-green badges in the input area and on chat messages.
Built-in skills:
| Skill | Purpose | Parameters |
|---|---|---|
#sql-expert |
SQL best practices, optimization, and conventions | :strict (enforce ANSI compliance), :lenient (accept valid SQL) |
#table-format |
Format all data responses as clean Markdown tables | — |
#concise |
Brief, focused responses without preamble or filler | — |
#detailed |
Thorough analysis with reasoning, context, and alternatives | — |
#step-by-step |
Chain-of-thought reasoning with numbered steps | — |
Key characteristics:
- Fully transient — skill content is injected per-request into local LLM context variables, never stored in conversation history. Deactivating a skill means complete elimination from all future context
- Works across all profile types — Optimizer, Conversation, Knowledge, and Genie profiles all support skill injection at the appropriate execution point
- Parameterizable — skills support runtime parameters via
<!-- param:name -->blocks for fine-grained behavior variants within a single skill - Create your own — via the visual Skill Editor (three levels: Citizen, Intermediate, Expert) or drop a
.mdfile into~/.tda/skills/for zero-friction authoring - Portable — export/import as
.zipfor sharing across environments - Admin-governed — administrators can disable specific skills, control user skill creation, and manage availability globally
- Transparent —
skills_appliedevents in the Live Status Window show which skills were injected and their estimated token cost
Architecture details: Skill Architecture (docs/Architecture/SKILL_ARCHITECTURE.md)
Extensions transform LLM answers into structured, machine-parseable output for downstream automation. While Skills inject context before the query, Extensions run after the answer is received — converting non-deterministic LLM output into deterministic formats for workflow tools like n8n, Airflow, and Flowise.
User Query → LLM Answer → !Extension Post-Processing → Structured Output → n8n / Airflow / API
How it works: Type ! in the chat input to trigger autocomplete, select an extension, and optionally add a parameter (e.g., !decision:critical). Multiple extensions can be chained in a single query — they execute serially, each receiving results from prior extensions.
Built-in extensions:
| Extension | Purpose | LLM Cost | Output |
|---|---|---|---|
!json |
Wraps answer + metadata into standardized JSON for APIs | No | Chat |
!decision |
Semantic analysis for workflow branching (binary or severity-based) | Yes | Silent |
!extract |
Regex-based extraction of numbers, percentages, entities | No | Silent |
!classify |
Semantic categorization (alert, performance, data quality, security) | Yes | Silent |
!summary |
Executive summary with key points and action items | Yes | Chat |
!pdf |
Downloadable PDF export with Markdown-aware formatting | No | Chat |
Key characteristics:
- Deterministic output — transforms natural-language answers into structured formats that workflow tools can reliably parse and branch on
- Four-tier extension framework — from zero-friction Convention (drop a
.pyfile) through Simple and Standard tiers to LLM-powered extensions with automatic token tracking - Serial chaining — compose extensions (
!extract !decision:critical) where each extension accesses prior results for progressive data refinement - Isolated error handling — extension failures never break the main LLM answer; each extension succeeds or fails independently
- Create your own — via the UI scaffold (generates tier-appropriate Python boilerplate with real-time validation), REST API, or manual file drop to
~/.tda/extensions/ - Portable — export/import as
.extensionzip for sharing across environments - Admin-governed — administrators control whether users can create custom extensions
- Automatic cost tracking — LLM-powered extensions (Tiers 2-3) automatically track input/output tokens and cost, integrated into session totals
REST API example:
{
"prompt": "What is the CPU usage?",
"extensions": [
{"name": "decision", "param": "critical"}
]
}The !decision extension produces {result, severity, branch_key} — exactly what n8n Switch nodes need for deterministic routing (e.g., threshold_exceeded_critical → PagerDuty, nominal_ok → log only).
Architecture details: Extension Architecture (docs/Architecture/EXTENSION_ARCHITECTURE.md)
The platform renders LLM output through a plugin-based component library where each content type — chart, canvas, knowledge graph — is a self-contained module with its own prompt instructions, backend handler, and frontend renderer. Components are toggled per-profile, injected into LLM prompts at runtime, and rendered via tool calls or data-driven detection.
Canvas — Interactive Code & Document Workspace
The Canvas component (TDA_Canvas) transforms LLM-generated code and documents into an editable, executable workspace. Built on CodeMirror 6 with a modular capability-plugin architecture (11 capabilities, 4 execution connectors):
- Editing: Syntax-highlighted code editor with SQL, Python, JavaScript, HTML, and Markdown support
- Live Preview: Responsive HTML preview with viewport simulation, Markdown rendering, SVG rendering
- Execution: Run SQL queries directly against connected databases (PostgreSQL, MySQL, SQLite, Teradata) via MCP server connectors — results render in-place
- Version Tracking: Turn-based version history with side-by-side diff view across conversation turns
- Inline AI: Select code and request targeted modifications without regenerating the entire canvas
- Split-Screen: Toggle between editor-only, preview-only, and split-panel modes; fullscreen expansion
- Template Gallery: 12 starter templates for common tasks (SQL queries, HTML pages, data analysis)
- RAG-Aware: Source attribution badges when canvas content is generated from knowledge base retrieval
Charts — Automated Data Visualization
The Chart component (TDA_Charting) renders data as interactive visualizations via G2Plot. Supports 16 chart types (column, line, pie, scatter, area, bar, heatmap, gauge, radar, treemap, donut, funnel, waterfall, histogram, box, dual-axis). Key to reliability is the 5-stage mapping validation pipeline — a deterministic repair chain that guarantees valid chart specs even when the LLM hallucinates column names or swaps axes:
- Sanitize — normalize LLM field names to canonical schema
- Validate — check column existence against actual query results
- Deterministic Repair — cardinality-aware column reassignment using data shape heuristics
- LLM Repair — (if needed) targeted LLM call to resolve ambiguous mappings
- Positional Fallback — last-resort mapping by column position
Charts render inline in chat bubbles by default, with optional sub-window mode for interactive exploration.
Knowledge Graph — Context-Enriching Entity-Relationship System
The Knowledge Graph component (TDA_KnowledgeGraph) maintains a living, queryable model of database topology, business concepts, metrics, and domain taxonomies as a typed entity-relationship graph. Scoped per (profile_id, user_uuid) for multi-user isolation.
- Planner Context Injection: Before every strategic planning call, the system extracts a relevant subgraph based on the user's query and injects it as context — guiding tool selection, SQL construction, and argument generation. Functions as a semantic guardrail that reduces hallucination
- Visualization: D3.js force-directed graph with three display modes, interactive node exploration, and theme compliance
- Progressive Enrichment: Graphs grow through LLM-inferred entities during conversation, manual JSON import, or (V2) MCP schema auto-discovery
- Intensity Control: At
mediumintensity, graph context is advisory; atheavyintensity, the LLM strictly validates against known relationships
Component Architecture:
- Self-contained modules — each component is a directory with
manifest.json,instructions.json,handler.py, andrenderer.js - Two handler tiers — Action (LLM explicitly calls
TDA_*tool) and Structural (automatic rendering from data type) - Three render targets —
inline(chat bubble),sub_window(persistent panel),status_panel(Live Status area) - Deterministic fast-path — components with predictable output bypass the tactical LLM entirely, saving ~3,000 tokens per invocation
- Profile-level intensity control —
componentConfigon the profile JSON toggles components and sets instruction intensity (none/medium/heavy) - Admin governance — platform-wide component control (all/selective mode, user imports, marketplace access)
- Third-party extensibility — add custom components without modifying core files; manifest-driven discovery with hot-reload
Architecture details: Component Architecture (docs/Architecture/COMPONENT_ARCHITECTURE.md) · Canvas Architecture (docs/Architecture/CANVAS_ARCHITECTURE.md) · Knowledge Graph Architecture (docs/Architecture/KNOWLEDGE_GRAPH_ARCHITECTURE.md)
Uderia implements two complementary cryptographic security systems that together provide end-to-end trust guarantees no other agentic AI platform offers:
-
License-Based Prompt Encryption — Protects the intellectual property embedded in system prompts through a multi-layered encryption architecture with tier-based access control. Ensures that the strategic reasoning instructions powering the platform remain protected during distribution, at rest in the database, and at runtime.
-
Execution Provenance Chain (EPC) — Creates an immutable, cryptographically signed audit trail from user query through every LLM decision, tool call, and response. Enables offline verification that no step was injected, tampered with, or replayed. Covers all five execution paths across the platform.
Together, these systems establish a zero-trust execution model: the prompts that drive the AI are cryptographically protected, and every action the AI takes is cryptographically recorded. This positions Uderia for enterprise compliance requirements including SOX audit trails, GDPR accountability, and EU AI Act transparency mandates.
System prompts — encoding strategic planning logic, tactical tool selection heuristics, error recovery strategies, and domain-specific reasoning patterns — are protected through a multi-layered encryption architecture tied to each customer's license.
How it works:
- At development time, plain-text prompts are encrypted with a key derived from the platform's RSA-4096 public key and distributed as an encrypted artifact (
default_prompts.dat) - At first startup, the bootstrap process decrypts each prompt and re-encrypts it with a key derived from the customer's unique license signature and tier — binding database content to the specific license
- At runtime, the
PromptLoaderdecrypts prompts on demand using the license-derived key, with results cached in memory for performance
Tier-based access control ensures that while all license tiers can decrypt prompts for LLM conversations (runtime usage), only Prompt Engineer and Enterprise tiers can view and edit prompt content through the System Prompts UI editor.
| Capability | Standard | Prompt Engineer | Enterprise |
|---|---|---|---|
| Runtime LLM usage | Yes | Yes | Yes |
| View/edit in UI | — | Yes | Yes |
| Create prompt overrides | — | Profile-level | User + Profile |
Key properties:
- License-specific keys — different customers cannot decrypt each other's database content
- RSA-PSS (4096-bit) license signing — prevents license forgery
- Fernet authenticated encryption (AES-128-CBC + HMAC-SHA256) — provides both confidentiality and integrity
- Zero-downtime deployment — prompt updates to running installations via
update_prompt.pywith automatic cache invalidation
The EPC is implemented as a blockchain-like hash chain signed with Ed25519.
How it works:
- Each execution step (query intake, strategic plan, tool call, tool result, synthesis, etc.) is recorded as a provenance step
- Each step's content is SHA-256 hashed (content is hashed, never stored — no sensitive data in the provenance record)
- Steps are hash-chained: each step's chain hash incorporates its index, type, content hash, and the previous step's chain hash
- Every chain hash is signed with Ed25519, enabling offline verification with just the public key
- Across turns, the first step of each turn links to the last step of the previous turn, creating full session integrity
Turn 1: [query] -> [plan] -> [tool_call] -> [tool_result] -> [complete]
h0 -> h1 -> h2 -> h3 -> h4
|
Turn 2: [query] -> [llm_call] -> [response] -> [complete] |
h5 -> h6 -> h7 -> h8 |
^ |
previous_turn_tip = h4 (cross-turn cryptographic link) +
22 step types across all five profile classes ensure comprehensive coverage:
| Profile | Example Steps Recorded |
|---|---|
| Optimize (tool_enabled) | Strategic plan, plan rewrites, tactical decisions, tool calls, tool results, self-corrections |
| Ideate (llm_only) | Knowledge retrieval, LLM calls, LLM responses |
| Focus (rag_focused) | RAG search, RAG results with doc IDs and scores, synthesis |
| Ideate + MCP (conversation_with_tools) | Agent tool calls, agent tool results, agent LLM steps |
| Coordinate (genie) | Child profile dispatch, cross-session chain references, coordinator synthesis |
Three verification levels:
- Chain Integrity (offline-capable) — verifies hash linking, hash computation, and Ed25519 signatures using only the public key
- Content Verification — confirms content hashes match actual session data
- Session Integrity — verifies cross-turn linking across all turns, including genie's cross-session Merkle-tree structure
REST API for auditors:
| Endpoint | Purpose |
|---|---|
GET /api/v1/sessions/{id}/provenance |
Full provenance for all turns |
POST /api/v1/sessions/{id}/provenance/verify |
Verify chain integrity |
GET /api/v1/sessions/{id}/provenance/export |
Download JSON for offline audit |
GET /api/v1/provenance/public-key |
Download public key for independent verification |
Key properties:
- Zero new dependencies — uses the
cryptographylibrary already present in the project - Negligible overhead — ~100 microseconds per step (hashing + signing), invisible against LLM latency
- Degraded mode — if the signing key is unavailable, hashes are still recorded (unsigned); execution is never blocked
- Key rotation —
maintenance/rotate_provenance_key.pygenerates new keys; old chains remain verifiable via stored key fingerprints - Backward compatible — existing sessions without provenance data are handled gracefully
| Regulation | Requirement | How Uderia Addresses It |
|---|---|---|
| EU AI Act | Transparency and traceability for AI systems | EPC records every LLM call, tool selection, and response with cryptographic proof |
| GDPR Art. 22 | Right to explanation of automated decisions | Provenance chain traces from query to final answer |
| SOX | Audit trails for financial reporting | Tamper-evident record of every AI-assisted operation |
| ISO 27001 | Information security management | Encrypted prompts, signed execution logs, key management |
Full technical details: Security Architecture (docs/Architecture/SECURITY_ARCHITECTURE.md)
The Uderia Platform is built on a modern, asynchronous client-server architecture with four primary layers:
┌──────────┐ ┌─────────────┐ ┌──────────┐ ┌─────┐ ┌─────────┐
│ Browser │ SSE │ Backend │ HTTP │ LLM │ HTTP │ MCP │ SQL │ Data │
│ (UI) │◄────►│ (Quart) │─────►│ Provider │ │ Svr │─────►│ Source │
└──────────┘ └─────────────┘ └──────────┘ └─────┘ └─────────┘
Communication Flow:
- User sends query via browser → Backend receives via REST/SSE
- Backend orchestrates → LLM generates plan
- LLM requests tools → MCP Server executes against data source
- Results flow back → Backend formats → Browser renders in real-time
- Technology: Single-page app (Vanilla JS, Tailwind CSS, HTML)
- Communication: REST API for requests, Server-Sent Events (SSE) for real-time updates
- Key Features: Live status monitoring, session management, context controls
- State Management: Browser localStorage (respects server persistence settings)
- Technology: Quart (async Python web framework)
- Responsibilities:
- Session management and user isolation
- LLM orchestration with Fusion Optimizer engine
- Configuration management and credential handling
- RAG system integration
- Key Modules:
api/- REST endpoints and SSE handlersagent/- Executor, Formatter, and planning logicllm/- Multi-provider LLM connectorsmcp/- MCP protocol clientcore/- Configuration, sessions, utilities
- Supported Providers: Google (Gemini), Anthropic (Claude), OpenAI, Azure OpenAI, AWS Bedrock, Friendli.AI, Ollama
- Authentication: Dynamic credential handling per session
- Protocol: REST API calls with structured prompts (system + user + tools)
- Protocol: Model Context Protocol - standardized tool/prompt/resource exposure
- Transport Types:
- 🟠 STDIO - Local servers via subprocess (npx, uvx, python)
- 🔵 HTTP - Remote servers via network REST API
- 🟢 SSE - Streaming servers via Server-Sent Events
- Import Formats: MCP Registry specification and Claude Desktop configuration
- Security: Credential passthrough, no credential storage in agent
- Lifecycle Management: Automatic process spawning and cleanup for STDIO servers
Authentication Flow:
- User logs in → Backend validates credentials
- JWT token issued (24-hour expiry) → Stored in browser localStorage as
tda_auth_token - All API requests include
Authorization: Bearer <token>header - Token refreshed automatically or user re-authenticates
Configuration Flow:
- User enters credentials (LLM + MCP) → Validated by backend
- Credentials encrypted using Fernet → Stored per-user in
tda_auth.db - MCP/LLM profiles created → Associated with user account
- Configuration persists across sessions (user-specific)
Query Execution Flow:
- User query → Backend authenticates JWT → Creates/loads session
- Backend invokes Fusion Optimizer with context (conversation history + workflow summaries)
- Optimizer generates strategic plan → Executes via LLM + MCP tools
- Results streamed via SSE → UI updates in real-time
- Session persisted with turn history and summaries
Session Isolation:
- Each user identified by database user ID (from JWT token)
- Sessions stored in
tda_sessions/{session_id}/with conversation and workflow history - User credentials isolated in encrypted database storage
- Multi-user support: Multiple users can access simultaneously with separate sessions
Deployment Architectures:
Single-User (Development):
Local Machine → Python Process → localhost:5050
Multi-User (Production):
Option 1: Load Balancer → Multiple Container Instances (port 5050, 5051, 5052...)
Security Considerations:
- Credentials: LLM/MCP credentials never logged or exposed in UI
- Isolation: Session data segregated by user UUID
- Transport: HTTPS recommended for production (configure via reverse proxy)
-
Python 3.8+ and
pip. -
Access to a running MCP Server.
-
An API Key from a supported LLM provider or a local Ollama installation. The initial validated providers are Google, Anthropic, Amazon Web Services (AWS), Friendli.AI, and Ollama.
-
You can obtain a Gemini API key from the Google AI Studio.
-
You can obtain a Claude API key from the Anthropic Console.
-
For Azure, you will need an Azure OpenAI Endpoint, API Key, API Version, and a Model Deployment Name.
-
For AWS, you will need an AWS Access Key ID, Secret Access Key, and the Region for your Bedrock service.
-
You can obtain a Friendli.AI API key from the Friendli Suite.
-
For Ollama, download and install it from ollama.com and pull a model (e.g.,
ollama run llama2).
-
git clone https://github.com/rgeissen/uderia.git
cd uderia
It is highly recommended to use a Python virtual environment.
Option A: Using Python venv
-
Create and activate a virtual environment:
# For macOS/Linux python3 -m venv venv source venv/bin/activate # For Windows python -m venv venv .\venv\Scripts\activate -
Install the required packages:
pip install -r requirements.txt
Option B: Using Conda (Recommended for consistent environments)
-
Create and activate a conda environment:
conda create -n tda python=3.13 conda activate tda -
Install the required packages:
pip install -r requirements.txt
⚠️ CRITICAL SECURITY STEPS
The application requires two secret keys for secure operation:
The application uses SECRET_KEY for session management (cookies, CSRF protection). This must be set before starting the application.
# Generate secure SECRET_KEY and create .env file
python -c "import secrets; key=secrets.token_urlsafe(32); open('.env', 'w').write(f'SECRET_KEY={key}\n'); print(f'✓ Created .env with SECRET_KEY')"Alternatively, manually create .env in the project root:
SECRET_KEY=your_random_secret_key_here_make_it_long_and_random
Note: Without this file, the application will not start and will show: ValueError: SECRET_KEY is not set
The application ships with a default JWT secret key for user authentication tokens. You must regenerate this key for your installation.
python maintenance/regenerate_jwt_secret.pyThis will:
- Generate a new unique JWT secret key for your installation
- Save it to
tda_keys/jwt_secret.keywith secure permissions (600) - Ensure your user authentication tokens cannot be forged
Note: If you skip this step, your installation will use the default key, which is a security risk.
In the project's root directory, create a new file named pyproject.toml. This file is essential for Python to recognize the project structure.
Copy and paste the following content into pyproject.toml:
[project]
name = "uderia"
version = "0.1.0"
requires-python = ">=3.8"
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[tool.setuptools.packages.find]
where = ["src"]
This crucial step links your source code to your Python environment, resolving all import paths. Run this command from the project's root directory.
pip install -e .
The -e flag stands for "editable," meaning any changes you make to the source code will be immediately effective without needing to reinstall.
The application uses a bootstrap configuration system with tda_config.json as a read-only template. This file provides default profiles, MCP servers, and LLM configurations that are copied to each user on their first login.
Understanding the Bootstrap System:
- tda_config.json - Read-only template containing default configurations
- First Login - Configuration is copied from template to user's database record
- Per-User Storage - All subsequent changes are stored in the user's database (isolated from other users)
- Future Users - New users automatically receive the current template configuration
- Admin Customization - Administrators can modify
tda_config.jsonto customize defaults for future users
Default Bootstrap Configuration:
The template includes:
- 2 Profiles: Google "Reduced Stack" (GOGET, default) and Friendly AI "Reduced Stack" (FRGOT)
- 1 MCP Server: Teradata MCP (requires configuration of host/port)
- 6 LLM Configurations: Google, Anthropic, OpenAI, Azure, AWS Bedrock, Friendli.AI (require API keys)
- 30 Tools and 1 Prompt enabled by default in profiles
Customizing the Bootstrap (Optional):
Before starting the application for the first time, you can customize tda_config.json to pre-configure settings for all future users:
- Edit
tda_config.jsonin the project root - Modify MCP Servers - Add your production MCP server connection details
- Adjust Profiles - Change default profiles, tools, or prompts
- Set LLM Defaults - Pre-configure LLM provider settings (API keys should still be entered per-user)
- Changes to
tda_config.jsononly affect new users created after the modification - Existing users retain their database-stored configuration (not affected by template changes)
- Each user's configuration is completely isolated - changes by one user don't affect others
- The application never modifies
tda_config.json- it remains a read-only template
The application uses a multi-user authentication system with JWT tokens and encrypted credential storage. Authentication is always required for all users.
Run the application:
python -m trusted_data_agent.mainThe application will:
- Automatically create
tda_auth.db(SQLite database with encrypted credentials) - Initialize default admin account:
admin/admin(⚠️ change immediately!) - Start the web server on
http://localhost:5050
- Open your browser to
http://localhost:5050 - Login with default credentials:
admin/admin ⚠️ IMPORTANT: Immediately change the admin password in the Administration panel- Bootstrap Applied - On first login, your account receives the template configuration from
tda_config.json - Configure Setup - Complete your setup in the Setup panel:
- Add API keys for LLM providers
- Configure MCP server connection details (host, port, path)
- Enable/disable profiles as needed
- Create User Accounts - Admin users can create additional users (each receives the bootstrap configuration)
Authentication Features:
- ✅ JWT tokens (24-hour expiry) for web UI sessions
- ✅ Long-lived access tokens for REST API automation
- ✅ Per-user credential encryption using Fernet
- ✅ User tiers:
user,developer,admin - ✅ Soft-delete audit trail for revoked tokens
- ✅ Session management with persistent context
- ✅ Bootstrap configuration copied to each user on first login
- ✅ Consumption profile enforcement with granular usage quotas
- ℹ️ Rate limiting disabled by default (configurable in Administration → App Config)
The Uderia Platform includes a comprehensive consumption profile enforcement system that provides granular control over resource usage across different user tiers and deployment scenarios.
Consumption profiles enable administrators to:
- Set per-user rate limits on prompts (hourly and daily)
- Enforce monthly token quotas (input and output tokens tracked separately)
- Control configuration change frequency
- Test profiles before activation
- Override profiles with global emergency limits
- Track usage in real-time with detailed audit trails
Four consumption profiles are available out-of-the-box:
| Profile | Prompts/Hour | Prompts/Day | Input Tokens/Month | Output Tokens/Month | Config Changes/Hour |
|---|---|---|---|---|---|
| Free | 50 | 500 | 100,000 | 50,000 | 5 |
| Pro | 200 | 2,000 | 500,000 | 250,000 | 10 |
| Enterprise | 500 | 5,000 | 2,000,000 | 1,000,000 | 20 |
| Unlimited | 1,000 | 10,000 | ∞ | ∞ | 50 |
By default, new users receive the Unlimited profile. Administrators can change the default profile or assign specific profiles to individual users.
Administrators can:
- Create and configure custom consumption profiles
- Assign profiles to specific users
- Activate/deactivate profiles for testing without deleting them
- View real-time consumption statistics per user
- Set global override mode for emergency rate limiting
Profile Testing: Each profile includes a "Toggle Active for Consumption" button that:
- Temporarily activates the profile for testing
- Classifies and validates profile configuration
- Shows real-time enforcement without affecting other users
- Allows safe testing before production deployment
Access rate limiting controls through Administration → App Config → Security & Rate Limiting:
Enable Rate Limiting:
- Master switch for all consumption enforcement
- Must be enabled for profiles to work
- Disabled by default for single-user installations
Global Override Mode:
- Emergency toggle to override ALL user profiles
- Forces global limits on all users (including Enterprise/Unlimited)
- Per-user fallback limits applied when profiles aren't assigned
- Useful for system-wide capacity management
Per-User Limits (when Global Override is enabled):
- Prompts per Hour (default: 100)
- Prompts per Day (default: 1,000)
- Configuration Changes per Hour (default: 10)
Per-IP Limits (always enforced for anonymous traffic):
- Login attempts per minute
- Registrations per hour
- API calls per minute
For authenticated users:
- Admin users bypass all consumption limits
- Regular users with profiles assigned → profile limits enforced
- Users without profiles → falls back to default profile or global limits
- Global override enabled → overrides all profiles with global settings
Error handling:
- Clear error messages when limits are exceeded
- Retry-after information in responses
- Fail-open design: allows execution if enforcement check fails
- Full audit trail in logs and database
The system maintains detailed consumption records:
- Per-turn tracking - Individual prompt costs and token usage
- Session aggregation - Cumulative costs per conversation
- User summaries - Total consumption across all sessions
- Historical trends - Month-over-month usage analytics
- Audit trail - Complete record of all consumption events
View consumption details through:
- Dashboard - Real-time cost and usage overview
- REST API - Programmatic access to consumption data
- Database exports - Complete audit trail for compliance
For single-user installations:
- Leave rate limiting disabled (default)
- Use Unlimited profile for maximum flexibility
For team deployments:
- Enable rate limiting
- Assign profiles based on user roles
- Use Global Override for emergency capacity management
- Monitor consumption trends through Dashboard
For testing:
- Use "Toggle Active for Consumption" to test profiles
- Set low limits (1 prompt/hour) to verify enforcement
- Check logs for detailed enforcement flow
- Test with non-admin users for accurate results
For production:
- Enable rate limiting before go-live
- Set appropriate default profile for new users
- Monitor consumption patterns and adjust profiles
- Use token quotas to manage monthly costs
- Review audit logs periodically for anomalies
Supported LLM Providers:
- AWS Bedrock (requires: Access Key, Secret Key, Region)
- Anthropic Claude (requires: API Key)
- OpenAI (requires: API Key)
- Google Gemini (requires: API Key)
- Azure OpenAI (requires: Endpoint, API Key, Deployment Name)
- Friendli.AI (requires: API Key)
- Ollama (local - requires: Ollama installation)
Important: All commands must be run from the project's root directory.
For standard operation:
python -m trusted_data_agent.main
When adding or editing LLM configurations, you can switch between Recommended and All models using the toggle in the model selection UI:
- Recommended (default): Shows only tested and recommended models enabled for selection. Non-recommended models are visible but disabled.
- All: Enables all available models from the provider for selection.
Note: Ollama models are community-maintained. When connecting to Ollama, you can use the "All" toggle to select any locally installed model.
The Uderia Platform supports several command-line options for different deployment scenarios and operational modes:
python -m trusted_data_agent.main [OPTIONS]| Option | Description | Default |
|---|---|---|
--host |
Host address to bind the server to. Use 0.0.0.0 for Docker deployments. |
127.0.0.1 |
--port |
Port to bind the server to. | 5050 |
--nogitcall |
Disable GitHub API calls to fetch repository star count. | Enabled |
--offline |
Use cached HuggingFace models only (skip remote version checks). Useful when internet is slow or unavailable. | Disabled |
Standard production deployment:
python -m trusted_data_agent.mainOffline mode (use cached models):
python -m trusted_data_agent.main --offlineDocker deployment:
python -m trusted_data_agent.main --host 0.0.0.0 --port 5050Combined options:
python -m trusted_data_agent.main --offline --nogitcallThe --offline flag is particularly useful when:
- Internet connection is slow or unreliable
- HuggingFace model downloads are timing out
- Working in air-gapped or restricted network environments
- Models are already cached from previous installations
Note: The offline mode requires that HuggingFace models have been previously downloaded to the cache directory (~/.cache/huggingface/hub/). First-time installations will need internet access to download required models.
This comprehensive guide covers everything you need to know to use the Uderia Platform effectively, from basic operations to advanced features and automation.
- MCP Server Connection - Where your data and tools live
- LLM Provider Configuration - The AI model that powers the agent
- Profile Creation - Combines MCP + LLM into a usable configuration
Without these configurations, the "Start Conversation" button will remain disabled.
The Uderia Platform uses a modern, modular configuration system that separates infrastructure (MCP Servers, LLM Providers) from usage patterns (Profiles). This architecture provides maximum flexibility for different use cases.
Configuration Flow: MCP Servers → LLM Providers → Profiles → Start Conversation
-
Login: Navigate to
http://localhost:5050and login with your credentials. -
Navigate to Setup: Click on the Setup panel in the left sidebar (person icon).
-
MCP Servers Tab: Select the "MCP Servers" tab and configure one or more MCP Server connections:
- Name: A friendly identifier for this server (e.g., "Production Database", "Dev Environment")
- Host: The hostname or IP address of your MCP Server
- Port: The port number (e.g., 8888)
- Path: The endpoint path (e.g., /mcp)
-
Save: Click "Add MCP Server" to save. You can configure multiple servers for different environments.
Instead of manually configuring servers, you can import MCP server definitions from standardized configuration files. Uderia supports two formats with automatic detection:
Supported Formats:
- MCP Registry Format - Official MCP server specification
- Claude Desktop Format - Import servers from your Claude Desktop configuration
How to Import:
- Navigate to Setup → MCP Servers
- Click the "📥 Import from server.json" button
- Paste your server configuration JSON
- Click "Import Server(s)"
The system automatically detects the format and imports all server configurations.
Format 1: MCP Registry Specification
The official MCP registry format supports both HTTP/SSE and stdio transports:
{
"name": "io.example/my-server",
"version": "1.0.0",
"description": "Example MCP server",
"packages": [
{
"transport": {
"type": "sse",
"url": "http://localhost:8000/sse"
}
}
]
}stdio Transport Example:
{
"name": "time-server",
"version": "1.0.0",
"packages": [
{
"transport": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-time", "--local-timezone=America/New_York"],
"env": {}
}
}
]
}Format 2: Claude Desktop Configuration
Import servers directly from your claude_desktop_config.json:
{
"mcpServers": {
"time": {
"command": "uvx",
"args": ["mcp-server-time", "--local-timezone=America/New_York"],
"env": {}
},
"filesystem": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-filesystem", "/Users/username/Documents"],
"env": {}
}
}
}MCP servers connect using three transport protocols, each designed for specific deployment scenarios:
┌─────────────────────────────────────────────────────────────────────────────────┐
│ MCP TRANSPORT TYPES │
├─────────────────────────────────────────────────────────────────────────────────┤
│ │
│ 🟠 STDIO 🔵 HTTP 🟢 SSE │
│ ───────── ────── ───── │
│ Local Network Streaming │
│ Development Production Real-time │
│ Subprocess REST API Events │
│ │
│ "Run locally" "Connect to "Stream live │
│ remote server" updates" │
│ │
└─────────────────────────────────────────────────────────────────────────────────┘
Best for: Development, local databases, file systems
{
"transport": {
"type": "stdio",
"command": "uvx",
"args": ["mcp-server-time", "--local-timezone=America/New_York"],
"env": {}
}
}Characteristics:
- Runs as a local subprocess (npx, uvx, python, node)
- Automatic process lifecycle management (spawn/cleanup)
- No network required - communicates via stdin/stdout
- Ideal for development and local tooling
Common Use Cases:
- Local file system access (
@modelcontextprotocol/server-filesystem) - Development databases
- Local AI model servers (Ollama integrations)
- CLI tool wrappers
Best for: Cloud services, remote databases, microservices
{
"transport": {
"type": "http",
"url": "https://mcp.example.com/api"
}
}Characteristics:
- Standard HTTP/HTTPS connections
- Firewall-friendly (uses standard web ports)
- Supports authentication headers
- Production-ready with load balancing support
Common Use Cases:
- Cloud-hosted databases (Teradata, Snowflake, BigQuery)
- Enterprise APIs with authentication
- Microservices architecture
- Production deployments behind reverse proxy
Best for: Live data feeds, real-time updates
{
"transport": {
"type": "sse",
"url": "http://localhost:8000/sse"
}
}Characteristics:
- Unidirectional streaming from server to client
- Persistent connections for real-time updates
- Lower latency than polling
- Built-in reconnection handling
Common Use Cases:
- Real-time monitoring dashboards
- Live data feeds (stock prices, metrics)
- Event-driven architectures
- Long-running operations with progress updates
| Scenario | Recommended Transport |
|---|---|
| Local development | 🟠 STDIO |
| Testing MCP servers | 🟠 STDIO |
| Production cloud database | 🔵 HTTP |
| Enterprise deployment | 🔵 HTTP |
| Real-time monitoring | 🟢 SSE |
| Live data streaming | 🟢 SSE |
Import Benefits:
- Bulk import multiple servers at once (Claude Desktop format)
- Format auto-detection - no need to specify which format you're using
- Duplicate detection - warns if server already exists
- Validation - checks for required fields before import
- Server-side ID generation - ensures unique server identifiers
Tip: You can find community MCP servers at the MCP Registry and import them directly.
-
LLM Providers Tab: Configure one or more LLM provider connections:
- Name: A descriptive name (e.g., "Google Gemini 2.0", "Claude Sonnet")
- Provider: Select from Google, Anthropic, OpenAI, Azure, AWS Bedrock, Friendli.AI, or Ollama
- Model: Choose a specific model from the provider
- Credentials: Enter required authentication details:
- Cloud providers: API Key
- Azure: Endpoint URL, API Key, API Version, Deployment Name
- AWS: Access Key ID, Secret Access Key, Region
- Ollama: Host URL (e.g.,
http://localhost:11434)
-
Fetch Models: Click the refresh icon to retrieve available models from your provider.
-
Save: Click "Add LLM Configuration" to save. You can configure multiple LLM providers to compare performance.
Profiles combine an MCP Server with an LLM Provider to create named configurations for different use cases.
-
Profiles Tab: After configuring at least one MCP Server and one LLM Provider, create profiles:
- Profile Name: Descriptive name (e.g., "Production Analysis", "Cost-Optimized Research")
- Tag: Short identifier for quick selection (e.g., "PROD", "COST") - used for temporary overrides
- MCP Server: Select which MCP Server this profile uses
- LLM Provider: Select which LLM configuration this profile uses
- Description: Optional details about when to use this profile
- Set as Default: Mark one profile as the default for all queries
-
Active for Consumption: Toggle which profiles are available for temporary override selection (see below).
-
Save: Click "Add Profile" to create the profile.
-
Navigate back to the Conversations panel using the left sidebar (chat icon).
-
Click the "Start Conversation" button to activate your default profile.
-
The application will:
- Validate your MCP Server connection
- Authenticate with your LLM provider
- Load all available tools, prompts, and resources from the MCP Server
- Display them in the Capabilities Panel at the top
- Enable the chat input for you to start asking questions
✅ You're Ready! Once you see the capabilities loaded and the chat input is active, you can start interacting with your data through natural language queries.
Example First Query: "What databases are available?" or "Show me all tables in the DEMO_DB database"
The application provides a multi-panel interface accessible through the left sidebar navigation. Click the hamburger menu (☰) in the top-left to expand/collapse the sidebar.
Available Panels:
- Conversations - Main conversational interface with the agent
- Executions - Real-time dashboard for monitoring all agent tasks
- Intelligence - Manage knowledge base collections and Planner Repository Constructors
- Marketplace - Browse and install AI assets from the community
- Setup - Configure LLM providers, MCP Servers, and profiles
- Administration - User management and system settings (admin only)
Customize the application's visual appearance to suit your preferences or working environment. Access theme settings from the user dropdown menu in the top-right corner.
Available Themes:
-
Legacy Theme: Classic dark gray color palette with warm tones. Best for users familiar with the original interface.
-
Modern Theme: Contemporary dark blue/slate design with glass-panel effects and refined depth. The default theme for new installations.
-
Light Theme: Clean, professional light mode with high contrast for well-lit environments or accessibility preferences.
Theme preferences are saved per user and persist across sessions.
When you select the Conversations panel, the interface is organized into several key areas:
-
Session History (Left): Lists all your conversation sessions. Click to switch between sessions or start a new conversation with the "+" button.
-
Capabilities Panel (Top): Your library of available actions from the connected MCP Server, organized into tabs:
-
Tools: Single-action functions the agent can call (e.g.,
base_tableList) -
Prompts: Pre-defined, multi-step workflows the agent can execute (e.g.,
qlty_tableQualityReport) -
Resources: Other available assets from the MCP Server
-
-
Chat Window (Center): Where your conversation with the agent appears, showing both user queries and agent responses.
-
Chat Input (Bottom): Type your questions in natural language here. Supports profile overrides and voice input.
-
Live Status Panel (Right): The transparency window showing real-time logs of the agent's internal reasoning, tool executions, and raw data responses.
The Executions panel provides a comprehensive, real-time dashboard for monitoring all agent workloads across the application:
- Task List: View all running, completed, and failed tasks with their status, duration, and resource usage
- Real-time Updates: Tasks automatically update as they progress through stages (planning, execution, synthesis)
- Detailed Execution View: Click any task to see its full execution log, including:
- Agent reasoning and planning steps
- Tool invocations and responses
- Error messages and stack traces
- Token usage and cost estimates
- Task Control: Cancel running tasks or retry failed executions
- Cross-Source Monitoring: Track tasks initiated from the UI, REST API, or scheduled workflows
This panel is especially valuable for monitoring REST API-triggered workloads and debugging complex agent behaviors.
The Intelligence panel is your control center for managing both types of repositories in the system:
-
Planner Repositories:
- Execution strategies and planning patterns from successful agent interactions
- View, create, update, or delete collections
- Inspect individual execution traces and their embeddings
- Bulk import/export collection data
- Automatically populated from agent executions or via Planner Repository Constructors
-
Knowledge Repositories:
- General documents and reference materials for planning context
- Upload PDF, TXT, DOCX, MD files with configurable chunking strategies
- View document metadata, chunk counts, and storage details
- Delete documents or entire collections
- Search within Knowledge repositories using semantic similarity
- Fully integrated with planner for domain context retrieval
-
Planner Repository Constructors:
- Browse installed constructors (templates for building Planner Repositories)
- Configure constructor parameters and populate collections
- LLM-assisted auto-generation from database schemas or documentation
- View usage statistics and manage constructor lifecycle
- Enable/disable specific constructors
-
Content Operations:
- Generate contextual questions for documents
- Populate collections with new content via manual or automated workflows
- Provide feedback on RAG retrieval quality
- Preview document chunking before committing
- Clean orphaned or invalid entries
For detailed RAG workflows and maintenance procedures, see the RAG Maintenance Guide.
The Intelligence Marketplace enables discovery and sharing of AI assets (Planner Repositories, Knowledge Repositories, Agent Packs, Skills, Extensions, and Knowledge Graphs). Browse public collections, subscribe for reference-based access, fork for customization, rate and review, and publish your own expertise. The marketplace transforms isolated knowledge into a collaborative ecosystem that reduces costs and improves quality through community validation.
For detailed marketplace features, product types, and workflows, see the Intelligence Marketplace section in Core Components.
The Setup panel is where you configure all external connections and create profiles. This is typically the first panel you'll use when setting up the application:
-
MCP Servers Tab:
- Configure Model Context Protocol server connections
- Test server connectivity and capability discovery
- Manage server-specific settings and parameters
-
LLM Providers Tab:
- Add connections to Google, Anthropic, OpenAI, Azure, AWS Bedrock, Friendli.AI, or Ollama
- Configure API keys, endpoints, and authentication
- Test model availability and fetch model lists
- Compare multiple providers side-by-side
-
Profiles Tab:
- Create named profiles combining MCP servers with LLM providers
- Set default profiles and configure profile tags for quick switching
- Enable/disable profiles for temporary override selection
- Manage profile-specific settings and descriptions
-
Advanced Settings Tab:
- Access Token Management: Create, view, and revoke long-lived API tokens for REST API automation
- Token Security: Tokens are shown only once at creation and stored as SHA256 hashes
- Usage Tracking: Monitor token usage with last used timestamps, use counts, and IP addresses
- Audit Trail: Revoked tokens remain visible with revocation dates for compliance and forensics
- Charting Configuration: Toggle chart rendering on/off and configure charting intensity levels
All credential data is encrypted using Fernet encryption and stored securely in the user database.
The Administration panel provides system-level management capabilities (visible only to admin users):
-
User Management:
- Create, modify, and deactivate user accounts
- Assign roles and permissions
- Monitor user activity and session history
- Reset passwords and manage authentication
-
System Configuration:
- Configure application-wide settings
- Manage logging levels and retention
- Set resource limits and quotas
- Monitor system health and performance
-
Audit Logs:
- View all system activities and changes
- Track API usage and access patterns
- Export audit data for compliance reporting
- Sidebar Toggle: Click the hamburger menu (☰) or use keyboard shortcut to expand/collapse the navigation sidebar
- Panel Switching: Click any panel name in the sidebar to instantly switch views
- Multi-Panel Workflow: Open the Setup panel to configure connections, then switch to Conversations to start chatting
- Monitoring While Working: Keep the Executions panel open in a separate browser tab to monitor long-running tasks
- Keyboard Shortcuts: Use
Ctrl/Cmd + Numberto jump directly to panels (where supported)
Example Workflow:
- Setup → Configure LLM providers and MCP servers → Create profiles
- Marketplace → Browse and install AI assets for your domain
- Intelligence → Populate knowledge collections with your knowledge base
- Conversations → Start chatting with the agent using your enriched context
- Executions → Monitor task progress and review execution logs
Simply type your request into the chat input at the bottom of the Conversations panel and press Enter.
- Example:
"What tables are in the DEMO_DB database?"
The agent will analyze your request using your default profile, display its thought process in the Live Status panel, execute the necessary tool (e.g., base_tableList), and then present the final answer in the chat window.
The profile system allows you to temporarily switch to a different LLM provider for a single query without changing your default configuration. This is powerful for:
- Testing: Compare how different LLMs handle the same question
- Cost Optimization: Use cheaper models for simple queries, premium models for complex analysis
- Specialized Tasks: Route specific query types to models optimized for that domain
How to Use Profile Override:
-
Type
@in the question box - A dropdown appears showing all profiles marked as "Active for Consumption" -
Select a profile - The default profile appears first (non-selectable), followed by available alternatives:
- Use arrow keys to navigate
- Press Tab or Enter to select
- Or click on a profile
-
Profile badge appears - A colored badge shows the active override profile with the provider color
-
Type your question - The query will be executed using the selected profile's LLM provider
-
Remove override - Click the × on the badge or press Backspace (when input is empty) to revert to the default profile
Visual Indicators:
- Question Box Badge: Shows the temporary override profile with provider-specific colors
- Session Header: Displays both the default profile (★ icon) and override profile (⚡ icon with subtle animation)
- Color Coding: Each LLM provider has a distinct color (Google=blue, Anthropic=purple, OpenAI=green, etc.)
Example Workflow:
1. Default: "Google Gemini 2.0" profile
2. Type: @CLAUDE <Tab>
3. Badge shows: @CLAUDE (purple)
4. Ask: "Analyze the performance metrics"
5. Query uses Claude instead of Gemini
6. Click × to return to default
You can directly trigger a multi-step workflow without typing a complex request.
-
Go to the Capabilities Panel and click the "Prompts" tab.
-
Browse the categories and find the prompt you want to run (e.g.,
base_tableBusinessDesc). -
Click on the prompt. A modal will appear asking for the required arguments (e.g.,
db_name,table_name). -
Fill in the arguments and click "Run Prompt".
The agent will execute the entire workflow and present a structured report.
You can change how the agent thinks and behaves by editing its core instructions (available in the Conversations panel).
-
Click the "System Prompt" button in the conversation header.
-
The editor modal will appear, showing the current set of instructions for the selected model.
-
You can make any changes you want to the text.
-
Click "Save" to apply your changes. The agent will use your new instructions for all subsequent requests in the session.
-
Click "Reset to Default" to revert to the original, certified prompt for that model.
To test the raw intelligence of a model without the agent's tool-using logic, you can use the direct chat feature (available in the Conversations panel).
-
Click the "Chat" button in the conversation header.
-
A modal will appear, allowing you to have a direct, tool-less conversation with the currently configured LLM. This is useful for evaluating a model's baseline knowledge or creative capabilities.
The Uderia Platform provides a sophisticated, budget-aware Context Window Orchestrator that automatically manages every token sent to the LLM — plus manual controls for fine-grained user intervention. Together, these features help you optimize costs, maximize accuracy, and maintain full control over agent behavior.
The orchestrator runs a five-pass assembly pipeline on every LLM call, ensuring optimal token utilization without manual intervention:
| Pass | Name | What It Does |
|---|---|---|
| 1 | Resolve | Determines which of the 9 context modules are active based on your profile type and context window type |
| 2 | Dynamic Adjustments | Applies runtime rules — e.g., allocate full budget to tools on the first turn, transfer budget away from documents when none are attached |
| 3 | Allocate & Assemble | Distributes token budgets per module (with min/max constraints) and calls each module to contribute content |
| 3b | Surplus Reallocation | Redistributes unused budget from low-utilization modules to high-demand ones |
| 4 | Condense | If still over budget, compresses lowest-priority modules first (e.g., summarize conversation history, reduce tool definitions to names-only) |
9 Pluggable Context Modules:
| Module | Purpose |
|---|---|
| System Prompt | LLM behavioral instructions |
| Tool Definitions | MCP tool schemas (auto-condensed after first turn) |
| Conversation History | Budget-aware sliding window of chat history |
| RAG Context | Retrieved champion cases for the planner |
| Knowledge Context | Knowledge repository documents |
| Plan Hydration | Previous turn results injection (skip redundant tool calls) |
| Document Context | User-uploaded document text |
| Component Instructions | Chart/canvas rendering instructions |
| Workflow History | Cross-turn execution traces |
Context Window Types — Choose a preset or create your own via Setup → Configuration → Context Window:
| Type | Strategy |
|---|---|
| Balanced (default) | Even distribution across all active modules |
| Knowledge-Heavy | Prioritizes knowledge retrieval for document-intensive workflows |
| Conversation-First | Maximizes conversation history for multi-turn dialogues |
| Token-Efficient | Minimal viable context for cost-sensitive workloads |
The admin UI provides a live budget visualization bar, condensation order editor (drag to reorder compression priorities), and a dynamic adjustment rule builder for creating custom runtime rules.
Utilization Analytics — Navigate to Setup → Configuration → Context Window → Utilization Analytics, select a session, and see:
- Summary metrics (average/peak utilization, condensation events, turns analyzed)
- Per-turn stacked bar chart showing module-level token consumption
- Module utilization table with condensation frequency
- Dynamic adjustment firing log
Every LLM call emits a context window snapshot visible in the Live Status panel as a color-coded stacked bar chart, providing real-time observability into exactly how your token budget is being spent.
For full technical details, see the Context Window Architecture documentation.
The agent's "memory" is composed of three synchronized histories managed by the orchestrator:
-
LLM Conversation History (
chat_object): The raw, turn-by-turn dialogue between you and the agent. The conversation_history module manages this within its allocated token budget, applying a sliding window when the conversation exceeds its allocation. -
Chat History (
session_history): Used exclusively for rendering the conversation in the user interface. It is not sent to the LLM for context. -
Turn Summaries (
workflow_history): A structured summary of the agent's actions. For each turn, it includes the plan, tools executed, results, and the context window snapshot. The workflow_history module manages this within its budget allocation.
You have several ways to control the agent's context directly from the UI:
You can activate or deactivate the context of any individual turn by clicking on the small numbered badge that appears on the user ("U") and assistant ("A") avatars.
- Clicking a badge will toggle the
isValidstatus for that entire turn. - Inactive turns are visually dimmed and their conversational history is completely excluded from the context sent to the LLM. This is a powerful way to surgically remove parts of the conversation that might be confusing the agent.
You can deactivate all previous turns at once by clicking the Context Indicator dot in the main header (next to the "MCP" and "LLM" indicators).
- Clicking the dot will prompt you for confirmation.
- Upon confirmation, all past turns in the session will be marked as inactive (
isValid = false), and the indicator will blink white three times. This effectively resets the agent's conversational and planning memory, forcing it to start fresh from your next query.
You can re-execute the original query for any turn by clicking and holding the assistant ("A") avatar for that turn.
- Press and hold for 1.5 seconds. A circular animation will appear to indicate the action.
- Upon completion, the agent will re-run the original user query for that turn, generating a brand new plan. This is useful for retrying a failed turn or exploring an alternative approach.
The agent provides two primary modes for handling conversational history, allowing you to control the context sent to the LLM for each query. You can see the current mode in the hint text below the chat input bar.
Important Note: In both modes, only turns that are currently active (isValid = true) are included in the context. Deactivated turns are completely ignored by the LLM and the planner.
In this mode, the agent maintains a complete conversational memory. It sends the LLM Conversation History (chat_object) from all active turns with each new request.
- Best for: Conversational queries, follow-up questions, and tasks that require the agent to remember the back-and-forth of the dialogue.
- Impact: Uses more tokens, as the conversation history from all active turns is included in the context.
When activated, this mode disables the LLM Conversation History. The agent becomes conversationally "stateless" but still operates with full knowledge of its past actions from active turns.
- What it sends:
- The Current User Prompt.
- The Turn Summaries (
workflow_history) from all active turns. - The full System Prompt (including all available tools).
- Best for: "One-shot" commands, saving tokens, or preventing a long, complex conversation from confusing the planner.
- How to activate:
- Hold
Altwhile sending a message to use it for a single query. - Press
Shift+Altto lock the mode on for subsequent queries.
- Hold
The Uderia Platform includes a powerful, asynchronous REST API to enable programmatic control, automation, and integration into larger enterprise workflows.
This API exposes the core functionalities of the agent, allowing developers to build custom applications, automate complex analytical tasks, and manage the agent's configuration without using the web interface.
Authentication:
The REST API uses Bearer token authentication for all protected endpoints. You have two authentication options:
- JWT Tokens (Web UI): 24-hour tokens automatically issued when you log in to the web interface
- Access Tokens (REST API): Long-lived tokens created in the Advanced Settings panel for automation
Creating Access Tokens:
- Navigate to Setup → Advanced Settings
- Click "Create Token"
- Provide a descriptive name (e.g., "Production Server", "CI/CD Pipeline")
- Set expiration (default: 90 days, or never expires)
- Copy the token immediately - it's shown only once!
- Store securely (e.g., environment variables, secrets manager)
Using Access Tokens:
# Set your token as an environment variable
export TDA_ACCESS_TOKEN="tda_9DqZMBXh-OK4H4F7iI2t3EcGctldT-iX"
# Use with example scripts
./docs/RestAPI/scripts/rest_run_query.sh "$TDA_ACCESS_TOKEN" "What tables exist?"
# Or directly with curl
curl -X POST http://localhost:5050/api/v1/configure \
-H "Authorization: Bearer $TDA_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"provider": "Google", "model": "gemini-2.0-flash-exp"}'Token Management:
- View all tokens and usage statistics in Advanced Settings
- Revoke tokens immediately (soft-delete preserves audit trail)
- Monitor token usage: last used timestamp, use count, IP address
- Tokens are SHA256 hashed in database for security
Important Notes:
- Example scripts (
rest_run_query.sh,rest_check_status.sh) now require an access token as the first argument - The
rest_run_query.shscript can optionally accept a--session-idto run a query in an existing session - Example scripts support a
--verboseflag - by default they output only JSON tostdout
-
Asynchronous Architecture: The API is built on a robust, task-based pattern. Long-running queries are handled as background jobs, preventing timeouts and allowing clients to poll for status and retrieve results when ready.
-
Programmatic Configuration: Automate the entire application setup process. The API provides an endpoint to configure LLM providers, credentials, and the MCP server connection, making it ideal for CI/CD pipelines and scripted deployments.
-
Full Agent Functionality: Create sessions and submit natural language queries or execute pre-defined prompts programmatically, receiving the same rich, structured JSON output as the web UI.
For complete technical details, endpoint definitions, and cURL examples, please see the full documentation: REST API Documentation (docs/RestAPI/restAPI.md)
The Uderia Platform's UI serves as a powerful, real-time monitoring tool that provides full visibility into all agent workloads, regardless of whether they are initiated from the user interface or the REST API. This capability is particularly valuable for developers and administrators interacting with the agent programmatically.
When a task is triggered via a REST call, it is not a "black box" operation. Instead, the entire execution workflow is visualized in real-time within the UI's Live Status panel. This provides a granular, step-by-step view of the agent's process, including:
- Planner Activity: See the strategic plan the agent creates to address the request.
- Tool Execution: Watch as the agent executes specific tools and gathers data.
- Response Synthesis: Observe the final phase where the agent synthesizes the gathered information into a coherent answer.
This provides a level of transparency typically not available for REST API interactions, offering a "glass box" view into the agent's operations. The key benefit is that you can trigger a complex workflow through a single API call and then use the UI to visually monitor its progress, understand how it's being executed, and immediately diagnose any issues that may arise. This turns the UI into an essential tool for the development, debugging, and monitoring of any integration with the Uderia Platform.
The Uderia Platform is designed to facilitate a seamless transition from interactive development in the UI to automated, operational workflows via its REST API. This process allows you to build, test, and refine complex data interactions in an intuitive conversational interface and then deploy them as robust, repeatable tasks.
Step 1: Develop and Refine in the UI
The primary development environment is the web-based UI. Here, you can:
- Prototype Workflows: Engage in a dialogue with the agent to build out your desired sequence of actions.
- Test and Debug: Interactively test the agent's understanding and execution of your requests. The real-time feedback and detailed status updates in the UI are invaluable for debugging and refining your prompts.
- Validate Outcomes: Ensure the agent produces the correct results and handles edge cases appropriately before moving to automation.
Step 2: Isolate the Core Workflow Requests
Once you have a conversation that successfully executes your desired workflow, you can identify the key session turns that drive the process. These are the prompts you will use to build your REST API requests. The UI helps you distill a complex interaction into a series of precise, automatable commands.
Step 3: Automate via the REST API
With your workflow defined, you can transition to the REST API for operational use cases. This is done by sending your prompts to the appropriate API endpoint using an access token for authentication.
Create an Access Token:
- Navigate to Setup → Advanced Settings → Access Token Management
- Click "Create Token" and provide a name (e.g., "Production Automation")
- Copy the token immediately (shown only once!)
- Store securely in your environment or secrets manager
- Example
curlcommand:
# Set your access token
export TDA_TOKEN="tda_9DqZMBXh-OK4H4F7iI2t3EcGctldT-iX"
# Execute query in a session
curl -X POST http://localhost:5050/api/v1/sessions/{session_id}/query \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $TDA_TOKEN" \
-d '{
"prompt": "Your refined prompt from the UI"
}'This allows you to integrate the Uderia Platform into larger automated systems, CI/CD pipelines, or other applications.
Production Workflow Automation:
For more advanced orchestration and scheduling, the Uderia Platform integrates with multiple workflow automation platforms:
-
Apache Airflow: Enterprise batch processing with DAG-based orchestration. Detailed documentation and example DAGs in the Airflow Integration Guide (docs/Airflow/Airflow.md).
-
n8n: Visual node-based workflow builder for event-driven automation. Three production-ready templates (Simple Query, Scheduled Reports, Slack Integration) with comprehensive deployment guides in the n8n Integration Guide (docs/n8n/README.md).
-
Flowise: Low-code chatbot and workflow development. Pre-built agent flows with visual designer in the Flowise Integration Guide (docs/Flowise/Flowise.md).
Step 4: Real-Time Monitoring of REST-driven Workflows
A key feature of the platform is the ability to monitor REST-initiated tasks in real-time through the UI. When a workflow is triggered via the API, the UI (if viewing the corresponding session) will display:
- The incoming request, flagged with a "Rest Call" tag.
- The complete sequence of agent thoughts, plans, and tool executions as they happen.
- Live status updates, providing the same level of transparency as if you were interacting with the agent directly in the UI.
This hybrid approach gives you the best of both worlds: the automation and scalability of a REST API, combined with the rich, real-time monitoring and debugging capabilities of the interactive UI. It provides crucial visibility into your operationalized data workflows.
-
ModuleNotFoundError: This error almost always means you are either (1) not in the project's root directory, or (2) you have not runpip install -e .successfully in your active virtual environment. -
Connection Errors: Double-check all host, port, path, and API key information. Ensure no firewalls are blocking the connection. If you receive an API key error, verify that the key is correct and has permissions for the model you selected.
-
"Failed to fetch models": This usually indicates an invalid API key, an incorrect Ollama host, or a network issue preventing connection to the provider's API.
-
AWS Bedrock Errors:
-
Ensure your AWS credentials have the necessary IAM permissions (
bedrock:ListFoundationModels,bedrock:ListInferenceProfiles,bedrock-runtime:InvokeModel). -
Verify that the selected model is enabled for access in the AWS Bedrock console for your specified region.
-
The Uderia Platform can be deployed in Docker containers for production use, testing, and multi-user scenarios. The application includes built-in support for credential isolation in shared deployments.
# Build the image
docker build -t uderia:latest .
# Generate a secure SECRET_KEY (required)
export SECRET_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(32))")
# Run the container
docker run -d \
-p 5050:5050 \
-e SECRET_KEY=$SECRET_KEY \
-e CORS_ALLOWED_ORIGINS=https://your-domain.com \
uderia:latestImportant: The
SECRET_KEYenvironment variable is required. Without it, the container will fail to start withValueError: SECRET_KEY is not set.
The application now supports true multi-user authentication with user isolation:
- Single shared container supports multiple simultaneous users
- Each user has their own account with encrypted credentials
- User tiers control access to features (user, developer, admin)
- Session data isolated per user with JWT authentication
- Best for: Production deployments, team collaboration
# Generate SECRET_KEY if not already set
export SECRET_KEY=$(python -c "import secrets; print(secrets.token_urlsafe(32))")
docker run -d \
-p 5050:5050 \
-v $(pwd)/tda_auth.db:/app/tda_auth.db \
-v $(pwd)/tda_sessions:/app/tda_sessions \
-e SECRET_KEY=$SECRET_KEY \
-e CORS_ALLOWED_ORIGINS=https://your-domain.com \
uderia:latestImportant Security Steps:
- Set
SECRET_KEYenvironment variable (required) - used for session management; the app will not start without it - Mount volumes for
tda_auth.db(user database) andtda_sessions(session data) - Change default admin password immediately after first login
- Create individual user accounts for team members
- Configure HTTPS reverse proxy (nginx, traefik) for production
- Set
TDA_ENCRYPTION_KEYenvironment variable for production encryption
Optional Security Configuration:
- Rate Limiting: Disabled by default, can be enabled and configured through the web UI:
- Navigate to Administration → App Config → Security & Rate Limiting
- Toggle "Enable Rate Limiting" and configure limits for your deployment
- Per-user limits: prompts per hour/day, configuration changes
- Per-IP limits: login attempts, registrations, API calls
- Changes take effect within 60 seconds (cache refresh)
You can bake MCP Server configuration into the Docker image:
- Before building, edit
tda_config.json:
{
"mcp_servers": [
{
"name": "Your MCP Server",
"host": "your-host.com",
"port": "8888",
"path": "/mcp",
"id": "default-server-id"
}
],
"active_mcp_server_id": "default-server-id"
}- Build the image - MCP configuration is included
- Users only need to configure LLM credentials - much simpler onboarding!
For comprehensive information on Docker deployment, credential isolation, security considerations, and troubleshooting, see: Docker Credential Isolation Guide (docs/Docker/DOCKER_CREDENTIAL_ISOLATION.md)
This project is licensed under the GNU Affero General Public License v3.0. The full license text is available in the LICENSE file in the root of this repository.
Under the AGPLv3, you are free to use, modify, and distribute this software. However, if you run a modified version of this software on a network server and allow other users to interact with it, you must also make the source code of your modified version available to those users. There are 4 License Modes available.
Tiers 1 through 3 are governed by the GNU Affero General Public License v3.0 (AGPLv3). This is a "strong copyleft" license, meaning any modifications made to the software must be shared back with the community if the software is run on a network. This model fosters open collaboration and ensures that improvements benefit all users.
-
Software License: GNU Affero General Public License v3.0
-
Intended User: Software developers integrating the standard, out-of-the-box agent into other AGPLv3-compatible projects.
-
Description: This tier provides full programmatic access to the agent's general-purpose tools, code and associated architetcture. It is designed for developers who need to use the agent as a standard component in a larger system, with the understanding that their combined work will also be licensed under the AGPLv3. This is a application developer-focused license; access to prompt editing capabilities is not included.
-
Software License: GNU Affero General Public License v3.0
-
Intended User: AI developers and specialists focused on creating and testing prompts for community contribution.
-
Description: A specialized tier for crafting new prompts and workflows. It provides the necessary tools and diagnostic access to develop new prompts that can be contributed back to the open-source project, enhancing the agent for all AGPLv3 users. This is a prompt developer-focused license; access to prompt editing capabilities is included.
-
Software License: GNU Affero General Public License v3.0
-
Intended User: Business users or teams requiring a tailored, but not exclusive, data agent.
-
Description: This license is for a version of the application that has been customized for specific business needs (e.g., a "Financial Reporting" or "Marketing Analytics" package). The only difference between this and the "App Developer" license is that the deliverable is a pre-configured solution rather than a general toolkit. It is ideal for organizations using open-source software that need a solution for a specific, common business function. This is a usage-focused license; access to prompt editing capabilities is not included.
-
Software License: MIT License
-
Intended User: Commercial organizations, power users, and data scientists requiring maximum flexibility and control for proprietary use.
-
Description: This is the premium commercial tier and the only one that uplifts the software license to the permissive MIT License. This allows organizations to modify the code, integrate it into proprietary applications, and deploy it without any obligation to share their source code. Crucially, this is also is the only tier that enables full prompt editing capabilities (including the licensing system for prompts), giving businesses complete control to customize and protect their unique analytical workflows and intellectual property. This license is designed for commercial entities that need to maintain a competitive advantage.
-
Author/Initiator: Rainer Geissendoerfer (LinkedIn)
-
Source Code & Contributions: The Uderia Platform is licensed under the GNU Affero General Public License v3.0. Contributions are highly welcome. Please visit the main Git repository to report issues or submit pull requests.
-
Git Repository: https://github.com/rgeissen/uderia.git
This list reflects the recent enhancements and updates to the Uderia Platform, as shown on the application's welcome screen.
- 10-Mar-2026: Hybrid Search - Three search modes for Knowledge Repositories on Qdrant Cloud: Semantic (dense vector similarity), Hybrid (Reciprocal Rank Fusion combining dense + sparse BM25), and Keyword (sparse BM25 only) with per-collection configuration, adjustable keyword weight slider, capability-based UI controls, and automatic fallback to Semantic for ChromaDB and Teradata backends
- 08-Mar-2026: Teradata EVS Backend - Enterprise vector store integration with server-side embedding (Amazon Bedrock / Azure AI), server-side chunking, connection resilience with stale-connection detection and serialized reconnect, and EVS object ownership safety
- 07-Mar-2026: ChromaDB & Qdrant Cloud Backends - Two production-ready backends: ChromaDB (embedded, zero-config default) and Qdrant Cloud (managed cloud with AsyncQdrantClient, optional gRPC, deterministic UUID5 ID mapping)
- 06-Mar-2026: Vector Store Abstraction Layer - Pluggable multi-backend architecture with async-first interface, capability-based negotiation, config-fingerprinted singleton factory with per-key async locks, and unified filter/embedding provider system
- 01-Mar-2026: Context Window Management - Budget-aware five-pass orchestrator with 9 pluggable modules, 4 predefined context window types (Balanced, Knowledge-Heavy, Conversation-First, Token-Efficient), profile and session context limit sliders, dynamic adjustment rules, surplus reallocation, priority-based condensation, tiktoken BPE estimation, per-turn snapshot observability, and admin UI with live budget visualization and condensation order editor
- 29-Feb-2026: Canvas Component - Interactive code editor powered by CodeMirror 6 with SQL syntax highlighting, live database connectors, in-place query execution, and result rendering directly in the chat canvas
- 28-Feb-2026: Generative UI Components - Plugin-based component architecture with manifest-driven discovery, hot-reload, profile-level intensity control, admin governance, and CDN dependency management for extensible frontend rendering
- 27-Feb-2026: Data Visualization (Chart Component) - Interactive charting via G2Plot with 16 chart types (bar, line, pie, scatter, heatmap, gauge, radar, treemap, and more), 5-stage mapping resolution pipeline with cardinality-aware column selection, deterministic fast-path execution, and LLM-assisted fallback for ambiguous data
- 23-Feb-2026: Extensions - Post-processing transformation pipeline with
#trigger, 6 built-in extensions (json, decision, extract, classify, summary, pdf), four-tier custom extension framework, and serial chaining for workflow automation - 22-Feb-2026: Skills - Pre-processing prompt injections with
!trigger, 5 built-in skills, parameterizable behavior, visual editor, and admin governance for transparent LLM context control - 15-Feb-2026: Dual-Model Cost Breakdown - Live Status displays strategic vs tactical costs for Fusion Optimizer dual-model executions with color-coded visualization
- 14-Feb-2026: Theme-Aware Token/Cost KPIs - Live Status token and cost displays now adapt to both dark and light themes with consistent visibility
- 13-Feb-2026: n8n Integration - Visual workflow automation with three production-ready templates (Simple Query, Scheduled Reports, Slack Integration) and comprehensive deployment guides
- 08-Feb-2026: Agent Packs - Bundle complete agent teams (coordinator, experts, knowledge collections) into portable
.agentpackfiles for one-click install, export, and marketplace sharing - 01-Feb-2026: Text to Speech - Voice output now available for all profile classes with Escape-to-cancel and smart text truncation
- 31-Jan-2026: Document Upload & Multimodal Analysis - Attach documents and images in chat with native multimodal delivery and automatic text extraction fallback across all providers
- 30-Jan-2026: All Models Available - Access all provider models with ★ recommended highlights
- 30-Jan-2026: Light Theme - Full light mode support with proper contrast and visibility across all components
- 25-Jan-2026: Legacy Theme - Fixed color bleeding issues with opaque backgrounds and consistent gray palette
- 23-Jan-2026: Modern Theme - Enhanced glass-panel effects and refined slate/blue color palette
- 17-Jan-2026: Genie Profiles - Hierarchical AI Organizations with multi-level autonomous coordination (Parent → Child agents)
- 16-Jan-2026: Session Primer - Auto-initialize sessions with domain knowledge, transforming generic LLMs into pre-educated specialists
- 16-Jan-2026: Genie Profile Type (Beta) - Multi-profile coordination with LangChain orchestration
- 16-Jan-2026: Genie UI Features - Inline progress cards, collapsible child sessions, split view access
- 09-Jan-2026: Knowledge Focused (RAG) Profile Type - Mandatory knowledge retrieval with anti-hallucination safeguards
- 11-Jan-2026: Profile Classes - Four Execution Modes (Conversation, Efficiency, Knowledge, Genie)
- 10-Jan-2026: Export/Import Knowledge Repositories
- 09-Jan-2026: Export/Import Planner Repositories
- 02-Jan-2026: OAuth Implementation - Google, Github
- 02-Jan-2026: Email Validation for Registration
- 20-Dec-2025: Extended Prompt Management System - Dynamic Workflow Prompts
- 19-Dec-2025: Extended Prompt Management System - Dynamic Variables
- 19-Dec-2025: Extended Bootstrapping - Enhanced Bootstraping Parameter Configuration
- 12-Dec-2025: Enhanced Prompt Encryption/Decryption Process using Database Encryption
- 12-Dec-2025: Migration from File based Application Configuration to Database Schema
- 05-Dec-2025: Consumption Profile Enforcement - Rate Limiting and Usage Quotas
- 07-Dec-2025: Financial Governance - Dashboards and LiteLLM Integration
- 06-Dec-2025: Planner Constructor: SQL Query - Document Context
- 05-Dec-2025: Planner Constructor: SQL Query - Database Context
- 29-Nov-2025: Knowledge Repository Constructor - Document Storage
- 28-Nov-2025: Knowledge Repository Integration
- 28-Nov-2025: Multi-User Authentication - JWT tokens, access tokens, user tiers
- 22-Nov-2025: Profile System - Modular Configuration & Temporary Overrides
- 21-Nov-2025: Planner Repository Constructors - Modular Plugin System
- 21-Nov-2025: Modern UI Design
- 15-Nov-2025: Flowise Integration
- 15-Nov-2025: Airflow Integration
- 14-Nov-2025: Self-Improving AI (RAG)
- 07-Nov-2025: UI Real-Time Monitoring of Rest Requests
- 31-Oct-2025: Fully configurable Context Management (Turn & Session)
- 26-Oct-2025: Turn Replay & Turn Reload Plan
- 25-Oct-2025: Stop Button Added - Ability to immediately Stop Workflows
- 24-Oct-2025: Robust Multi-Tool Phase Handling
- 11-Oct-2025: Friendly.AI Integration
- 10-Oct-2025: Context Aware Rendering of the Collateral Report
- 20-Sep-2025: Microsoft Azure Integration
- 19-Sep-2025: REST Interface for Engine Configuration, Execution & Monitoring
- 12-Sep-2025: Significant Formatting Upgrade (Canonical Baseline Model for LLM Provider Rendering)
- 05-Sep-2025: Conversation Mode (Google Cloud Credentials required)
