NOTE: this document describes target architecture and the current state of the codebase. Some components and scenarios are not yet implemented.
This document describes the HyperSpot Server components and their roles in typical scenarios. Every feature or scenario step has an inline indicator of the priority/phase tag (p1-p5) and implementation status of given functionality:
- - not implemented
- - implemented
The objective of such notation is to provide a clear overview of the current state of the codebase and the next priorities of selected scenarios.
HyperSpot uses the Global Type System (specification) to implement a powerful extension point architecture where virtually everything in the system can be extended without modifying core code.
The GTS naming conventions provide simple, human-readable, globally unique identifier and referencing system for data type definitions (e.g., JSON Schemas) and global data instances (e.g., JSON objects).
The diagram above illustrates the principal HyperSpot module architecture. The deployed component set depends on the target environment and build configuration; for example it can be a single executable for the desktop build or multiple containers for a cloud server.
Each module encapsulates a well-defined piece of business logic and exposes versioned contracts to its consumers via Rust-native interfaces, HTTP APIs, or gRPC. In addition, modules can define their own plugin interfaces that allow pluggable implementations of processing and storage concerns, enabling extensibility without coupling core logic to concrete backends. Additionally, modules can define adapter interfaces for compile-time selection of an implementation.
All interaction between modules and between modules and their plugins happens strictly through these versioned public interfaces. No module or plugin is allowed to depend on another module’s internal structures or implementation details. This enforces loose coupling, enables independent evolution and versioning, and allows modules or plugin implementations to be replaced without impacting the rest of the system.
All modules can be divided into several categories:
- Business Logic Modules - modules implementing main SaaS service logic that can be build on top of HyperSpot
- Shared Modules - these are the building blocks for supporting functionality for SaaS services development, including Generative AI and Shared Control Plane modules
- Core Platform Integration Modules - interfaces for other modules and adapters for real Core Platform services (see below)
- Core Platform Services - external services that implement Core Platform functionality, such as tenancy management, access policies, licensing, etc.
The Core Platform Integration Modules layer abstracts integration with core platform services, such as IdP, policy management, licensing, and credentials management that is out of scope of HyperSpot. This keeps HyperSpot reusable: it can run as a standalone platform, or it can integrate into an existing enterprise platform by wiring adapters to the platform’s services.
- Authentication/authorization: all external HTTP traffic is enforced by
api-gatewaymiddleware, and secure ORM access is scoped bySecurityContext. In-process calls must propagateSecurityContextand use SDK/clients; bypassing middlewares is not permitted for gateway paths. - Generative AI Modules MAY depend on Shared Control Plane Modules
- Generative AI Modules MUST NOT depend on Core Platform Services directly
- Control Plane Modules MUST NOT depend on GenAI Modules
- Only integration/adapters talk to external components
- No “cross-category sideways” deps except through contracts.
- No circular dependencies allowed
API Gateway is the single public entry point into HyperSpot for all external clients. It terminates protocols, exposes versioned REST APIs with OpenAPI documentation, and applies a consistent middleware stack for authentication, authorization hooks, rate limiting, validation, and observability. API Gateway is responsible for request shaping and policy enforcement, but contains no business logic.
Once a request is validated, it is routed to the appropriate module via stable contracts. All domain decisions and state changes occur downstream, allowing gateway to remain simple, auditable, and scalable while internal modules evolve independently.
Every external request MUST pass through: API Gateway → Auth Resolver → Policy Engine → License Resolver → Execution Module → Tenancy Check → Audit/Usage → Response
Provide the single public API entrypoint for HyperSpot, including request routing, auth hooks, versioned REST surface, and OpenAPI publication.
- p1 - route versioned HTTP APIs to modules and expose OpenAPI
- p1 - enforce request limits, timeouts, and basic middleware
- p2 - unified authn/z + license checks at gateway
- p3 - streaming endpoints (SSE) for long-running operations
- p4 - multi-region routing and traffic shaping policies
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Generative AI Modules provide the core AI capabilities of HyperSpot and represent the primary value layer for building AI-powered SaaS applications. These modules encapsulate domain-specific GenAI functionality such as conversational orchestration, model inference, retrieval-augmented generation (RAG), agent execution, prompt management, and tool invocation. They are responsible for transforming user intent and contextual data into AI-generated outputs while enforcing platform-level constraints such as tenancy, security, policy, and usage limits.
These modules are designed to be highly composable and extensible: they rely on shared platform services (e.g., settings, usage tracking, audit) and integrate with external AI providers or local runtimes through well-defined gateways. Generative AI Modules do not directly manage enterprise governance concerns (licensing, identity, credentials); instead, they delegate those responsibilities to control plane modules and core platform adapters to remain focused on AI behavior and orchestration logic.
- Chat Engine / API-triggered entry
- Configuration & assets (Settings, Prompts, Models)
- Retrieval (RAG, Search, Data Connectors)
- Execution (LLM Gateway, MCP, Tools)
- Orchestration (Agents, Agent Runtime)
- Persistence & feedback (Memory, Usage, Audit)
Provide conversational capabilities (chat messages, conversation history) as a core GenAI building block for SaaS applications.
- p1 - create chat sessions and append messages
- p2 - chat messages interceptors and custom hooks support
- p2 - streaming assistant responses with tool-call metadata
- p3 - multi-tenant retention, export, and compliance controls
- p4 - conversation evaluation and quality metrics integration
- p5 - enterprise-grade auditability and policy enforcement across conversations
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Maintain a catalog of available models with tenant-level availability and approval workflow.
- p1 - get tenant model (availability check)
- p1 - list tenant models with filtering
- p2 - model discovery from providers (via Outbound API Gateway)
- p2 - model approval workflow (pending → approved | rejected | revoked)
- p2 - capability tagging (embeddings, vision, tools, function calling)
- p3 - auto-approval configuration per tenant/provider
- p4 - model lifecycle tracking (deprecated, archived)
- PRD
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Manage versioned prompt assets (system prompts, templates, chains) with governance and rollout controls.
- p1 - create, version, and retrieve prompts
- p2 - tenant-scoped and environment-scoped prompt variants
- p3 - prompt evaluation, approval workflows, and rollback
- p4 - A/B rollout and progressive delivery of prompt versions
- p5 - safety, policy, and compliance validation on prompt publish
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide unified access to multiple LLM providers with multimodal support, tool calling, and enterprise-governance controls.
- p1 - chat completion routed to configured provider
- p1 - streaming chat completion (SSE)
- p1 - embeddings generation
- p1 - multimodal input/output (vision, audio, video, documents)
- p1 - tool/function calling with schema resolution
- p1 - structured output with schema validation
- p1 - model discovery (delegation to Model Registry)
- p2 - provider fallback on failure
- p2 - retry with exponential backoff
- p2 - request/response interceptors (hook plugins)
- p2 - per-tenant budget enforcement (usage plugin)
- p2 - rate limiting (tenant and user levels)
- p2 - async jobs for long-running operations
- p2 - realtime audio (WebSocket)
- p2 - request cancellation
- p3 - cost/latency-aware routing
- p3 - embeddings batching
- p4 - audit events (audit plugin)
- PRD
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Manage local model lifecycle (download, storage, loading, and runtime wiring) to support on-device/on-prem deployments.
- p1 - download and store models via pluggable backends
- p2 - manage model cache, versions, and disk quotas
- p2 - traffic tunneling for distributed inference
- p3 - start/stop local runtimes and expose endpoints to LLM gateway
- p4 - hardware-aware configuration (GPU/CPU, quantization profiles)
- p5 - fleet management for distributed on-prem deployments
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Integrate MCP-compatible tools and services as first-class capabilities for agents and automation.
- p1 - connect to MCP servers and list available tools
- p2 - enforce auth and tenant scoping on MCP tool calls
- p3 - intercept/transform MCP traffic for policy and observability
- p4 - tool discovery, caching, and capability matching
- p5 - governed tool marketplaces and tenant allowlists
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide a unified abstraction over web search providers, with consistent response shapes for downstream RAG/agents.
- p1 - execute web search queries and return normalized results
- p2 - search traffic interception and hooks for custom policies
- p2 - provider plugins with per-tenant configuration
- p3 - pluggable search providers
- p3 - safe browsing policies and content filtering
- p4 - query rewriting and enrichment via LLM gateway
- p5 - compliance and audit trails for outbound searches
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide fast local indexing and retrieval over ingested content for search and RAG, independent of external providers.
- p1 - index documents and run keyword/vector queries
- p1 - Qdrant provider support
- p1 - multi-tenant isolation
- p2 - hybrid search and relevance tuning
- p2 - other pluggable index backends (e.g., Meilisearch)
- p3 - incremental updates and delete propagation
- p4 - enterprise-scale sharding
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Orchestrate retrieval-augmented generation: chunking strategies, retrieval, context assembly, and grounded generation.
- p1 - retrieve relevant chunks and assemble prompts
- p1 - configurable chunking, ranking, and citation support
- p2 - multi-store retrieval (local index + external connectors)
- p3 - evaluation workflows for grounding, faithfulness, and latency
- p4 - governed enterprise RAG with policies, audit, and per-tenant controls
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Parse and extract structured content from user files for downstream indexing, RAG, and business workflows.
- p1 - parse common document types (DOCX, PPTX, PDF, Markdown, HTML, text) and extract text/metadata
- p2 - plugin parsers (embedded, Apache Tika, custom)
- p3 - streaming parsing for large files
- p4 - entity extraction and enrichment hooks
- p5 - compliance controls and redaction pipelines
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Store and retrieve files and media for LLM Gateway (input-media assets, generated content).
- p1 - fetch media by URL for LLM input
- p1 - store generated content (images, audio, video)
- p1 - get file metadata
- p2 - tenant quotas and usage reporting integration
- p2 - pluggable backends (filesystem, object storage)
- p3 - encryption, retention, and lifecycle policies
- p4 - compliance exports and legal hold support
- PRD
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Connect to external data sources (DBs, SaaS APIs, file stores) to ingest and synchronize data for the platform.
- p1 - define connector configs and run a basic pull/sync
- p1 - secure credential usage via Credential Resolver adapter
- p2 - incremental sync, change tracking, and scheduling hooks
- p3 - connector health monitoring and retries/backoff
- p4 - governed connector marketplace with tenant-scoped permissions
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide the agents layer as a user-facing abstraction: agent definitions, tools, skills, and orchestration policies.
- p1 - create agents with basic tool invocation
- p2 - multi-step planning and tool chaining
- p3 - policy-aware tool access and tenant scoping
- p4 - agent evaluation, monitoring, and safety guardrails
- p5 - enterprise-grade agent governance and lifecycle management
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Execute agent workloads in controlled runtimes (sandboxes), providing scheduling, isolation, and runtime observability.
- p1 - execute a single agent run with tool calls
- p2 - concurrency control, cancellation, and timeouts
- p3 - runtime isolation profiles (resource limits, sandboxing)
- p4 - distributed execution and scale-out
- p5 - regulated execution with attestations and audit integration
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Persist and retrieve agent memory (short-term and long-term) to enable personalization, continuity, and automation.
- p1 - store and retrieve episodic memory entries
- p1 - tenant isolation and proper access checks
- p2 - vector/kv backends and retrieval strategies
- p3 - privacy controls, and TTLs
- p4 - memory governance and redaction workflows
- p5 - enterprise portability and compliance exports
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide workflow orchestration and serverless-style functions for automation, integrations, and agentic pipelines.
- p1 - define and execute workflows and basic functions
- p2 - scheduled triggers and event-driven execution
- p3 - integration with Jobs Manager for durable execution
- p4 - visual workflows
- p5 - reusable workflow marketplaces
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide typed configuration and preferences at tenant/user scope, supporting feature flags and customization.
- p1 - CRUD settings per tenant and per user
- p1 - schema validation and versioning
- p2 - settings inheritance rules
- p3 - feature flags and rollout controls
- p3 - events generation per setting creation/update/deletion
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Shared Control Plane Modules provide the cross-cutting governance and operational capabilities required to run HyperSpot as a secure, observable, and policy-driven system. They implement system-wide concerns such as auditing, usage tracking, policy enforcement, background job execution, eventing, settings management, and type registration. These modules define and enforce global invariants that apply uniformly across all workloads, regardless of which Generative AI modules or adapters are involved.
Shared Control Plane Modules do not contain domain-specific or generative AI logic and are not directly exposed as end-user features. Instead, they act as the authoritative control layer that all execution paths must pass through, ensuring consistency, compliance, and operational correctness. By centralizing governance and orchestration in the control plane, HyperSpot enables higher-level modules to remain focused on business and AI behavior while inheriting uniform guarantees around security, observability, and usage enforcement.
Capture immutable audit events for security-relevant and business-relevant actions across the platform.
- p1 - record audit events with actor/tenant/resource context
- p1 - query audit events with pagination and filters
- p2 - export audit events to external systems
- p3 - compliance retention policies and legal hold
- p4 - cross-tenant governance and anomaly detection signals
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide an event bus for domain events and integration events across modules with durable delivery patterns.
- p1 - publish and subscribe to basic events, replay
- p2 - event filtering (CEL)
- p3 - custom storage backend adapters (e.g. ELK, Kafka)
- p4 - streaming analytics integrations
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Measure platform usage (API calls, compute, storage) for quotas, billing, and internal capacity planning.
- p1 - record usage events with tenant or resource attribution (push model)
- p1 - comprehensive usage metrics API
- p2 - pull model
- p3 - aggregate reports and dashboards, data export
- p4 - custom storages support (e.g. Clickhouse)
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Run and coordinate background jobs (download/upload, benchmarks, parsing, indexing, workflows) with retries and scheduling.
- p1 - enqueue and execute jobs with status tracking
- p1 - jobs suspend/resume
- p2 - retry policies, backoff, and dead-letter handling
- p3 - scheduling and periodic jobs
- p4 - distributed workers and horizontal scale
- p5 - SLA management and priority queues per tenant
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
GTS schema-storage service for tool definitions and contracts.
- p1 - get schema by ID (for LLM Gateway tool resolution)
- p1 - batch get schemas
- p2 - validate, register and resolve types and instances by versioned identifiers
- p2 - distribute GTS instances and schemas updates across modules safely via events generation
- p3 - schemas and instances import/export in different formats (YAML, RAML)
- PRD
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Manage platform processes and runtimes (in-process and out-of-process modules), including lifecycle, health, and orchestration.
- p1 - start/stop module runtimes and report lifecycle state
- p2 - resource limits control (CPU, memory)
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Maintain registry of HyperSpot nodes/deployments and their capabilities for discovery and operational management.
- p1 - register nodes and list node inventory
- p2 - node health and heartbeat tracking
- p3 - capability-aware routing and scheduling hints
- p4 - multi-region topology awareness
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide monitoring primitives and integrations: health checks, alerts, and operational dashboards.
- p1 - collect metrics
- p1 - metrics aggregates
- p2 - custom dashboards
- p2 - alert hooks and incident signals
- p3 - trace/log correlation across modules
- p4 - SLOs and error budget tracking
- p4 - Custom runtime-level metrics registration
- p5 - automated remediation workflows
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Provide generic CRUD storage for typed resources that do not warrant a dedicated module, using a fixed schema envelope (identity, ownership, timestamps) and a flexible JSON payload governed by GTS type definitions.
- p1 - create, read, update, and soft-delete typed resources with tenant isolation and GTS type-based access control
- p1 - OData $filter/$orderby and cursor-based pagination on schema fields
- p1 - GTS type existence validation via Types Registry
- p1 - pluggable storage backend (Relational Database plugin via SecureORM as default)
- p1 - configurable soft-delete retention with background purge task
- p2 - batch CRUD operations (POST /resources:batch, POST /resources:batch-get) per DNA BATCH.md
- p2 - per-resource-type lifecycle notification events (created/updated/deleted) via Events Broker
- p2 - per-resource-type audit events via Audit Module
- p3 - alternative storage plugins (search engines, vendor-provided backends) with per-type routing
- p3 - resource groups for lifecycle-linked collections
- p4 - full-text search API with search-capable plugin support
Core Platform Integration Modules provide a thin abstraction layer between HyperSpot and external or enterprise-grade platform services such as identity providers, license managers, credential stores, and outbound traffic governance systems. These modules expose minimal, stable interfaces that HyperSpot modules can depend on without being coupled to a specific vendor, protocol, or deployment environment.
The primary role of these adapter modules is decoupling: they allow HyperSpot to operate either as a standalone platform (using local implementations) or as a component embedded into a larger enterprise ecosystem. Adapter modules do not own authoritative state or business rules; instead, they translate HyperSpot’s internal contracts into calls to external core platform services, handling protocol adaptation, caching, and integration-specific concerns.
Introduces an abstraction layer over tenant relationship services. The goal is to expose a single entry point for retrieving related tenants (parents, children, siblings) without coupling modules to a specific directory implementation.
- p1 - resolve related tenant IDs (parent, children) based on given ID
- p1 - integrated adapter for single-tenant and single-user use-case (desktop app)
- p2 - tenant resolution cache with invalidation rules
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Introduces an abstraction layer behind real token validation and claims extraction. Contains minimalistic logic as main goal is to provide a single entrypoint for policy rules retrieval
- p1 - validate JWTs and extract claims (roles and permissions)
- p1 - integrated adapter for single-tenant and single-user use-case (desktop app)
- p2 - tokens cache with invalidation rules
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Introduces an abstraction layer over the upstream License Manager service. The goal is to provide a single entry point for license retrieval without coupling feature code to a specific subscription & billing system.
- p1 - features and quota provisioning on tenants/users/resources
- p1 - adapter for single-user and single-tenant use-cases (desktop app)
- p2 - cache and refresh license state
- p2 - metrics collection for license acquisitions
- p3 - audit with retention for license acquisitions
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Introduces an abstraction layer over the underlying Credential Store service. The goal is to provide a single entry point for credential retrieval.
- p1 - store/retrieve secrets with tenant scoping
- p1 - adapter for single-user and single-tenant use-cases (desktop app)
- p2 - metrics collection
- p3 - audit with retention
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Introduces an abstraction layer behind real Outbound API gateway. Contains minimalistic logic as main goal is to provide a single entrypoint for outbound calls.
- p1 - define outbound endpoints and execute calls with tracing
- p2 - adapter for single-user and single-tenant use-cases (desktop app)
- p2 - outbound calls metrics collection
- p3 - minimalistic rate limiting
- p4 - audit with retention for outbound calls
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Core Platform Services are authoritative, enterprise-level services that may exist outside of HyperSpot and act as systems of record for critical governance domains such as accounts, identity, access policies, licensing, credentials, and outbound egress control. These components typically belong to an organization’s broader platform or SaaS ecosystem and may already be deployed, certified, and governed independently of HyperSpot.
HyperSpot does not aim to be the system of record for these capabilities at enterprise level, but allows to integrate with external components operating in an integrated environment. It relies on adapter modules to interact with these external components through well-defined contracts. This approach allows HyperSpot to inherit enterprise-grade security, compliance, and governance guarantees while remaining portable, reusable, and safe to embed into existing platforms without duplicating or conflicting with core business infrastructure.
Core platform service managing accounts and tenant relationships (system of record when HyperSpot runs standalone).
- p1 - create and manage accounts/tenants and users
- p2 - hierarchical multi-tenancy
- p2 - link tenants to identities and organizations
- p3 - account lifecycle (suspend, soft-delete, hard-delete, archive, move)
- p4 - map external tenant IDs to internal IDs
- p4 - enterprise org structures and delegated administration
- p5 - federation across multiple account systems
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Core platform service managing authorization policies for resources and actions.
- p1 - user/client roles definition
- p1 - evaluate policies for API requests
- p2 - role/attribute-based policy models
- p3 - policy authoring and versioning
- p3 - enterprise SSO patterns (SAML/LDAP) via adapters
- p4 - audit integration and policy analytics
- p5 - advanced enterprise policy federation
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Core platform service responsible for local license state, quota enforcement, feature gating hooks, and integration with License Resolver.
- p1 - features and quota provisioning on tenants/users/resources
- p3 - per-resource feature check and assignment
- p2 - integrate with Usage Tracker for quota enforcement
- p3 - manage plan tiers and feature bundles
- p4 - support offline/air-gapped license operation
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Core platform service managing credentials lifecycle and access control, coordinating with the Credential Store adapter.
- p1 - manage credential metadata and access policies
- p2 - integrate with external vault backends (AWS Secrets Manager, HashiCorp Vault, etc.)
- p3 - rotation workflows and secret health checks
- p4 - delegated admin and approval workflows
- p5 - enterprise compliance audit, reporting and attestations
- TODO: Design link
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
Centralized gateway for external-API calls with credentials injection, reliability, and observability.
- p1 - HTTP requests to external APIs
- p1 - SSE streaming
- p1 - WebSocket connections
- p1 - credential injection via Credential Resolver
- p2 - retry with exponential backoff
- p2 - circuit breaker
- p2 - rate limiting (per-target)
- p2 - timeouts (connect, read, total)
- p3 - audit with retention
- TODO: PRD
- TODO: Scenarios link
- TODO: API link
- TODO: SDK link
This diagram reflects the actual middleware stack from api-gateway (see apply_middleware_stack in modules/system/api-gateway/src/lib.rs).
Middleware execution order (outermost → innermost):
- Request ID (SetRequestId + PropagateRequestId)
- Trace span (tower-http TraceLayer)
- Timeout (30s default)
- Body limit
- CORS (if enabled)
- MIME validation
- Rate limiting (per-route RPS + in-flight semaphore)
- Error mapping (converts errors to RFC-9457 Problem)
- Auth (JWT validation → RBAC check → build SecurityContext with tenant from claims)
- Policy engine injection
- License validation (checks
license_requirementfrom OperationSpec) - Router → Handler
sequenceDiagram
autonumber
participant C as Client (Web/Mobile)
box "External Core Platform"
participant IdP as IdP / JWKS endpoint
participant LICM as License Manager
end
box "HyperSpot"
participant I as API gateway (api-gateway)
participant LIC as License resolver
participant M as Target module (REST handler)
participant D as Domain service
participant DB as DB (SecureConn)
participant EB as Events broker
participant AUD as Audit
participant UT as Usage tracker
end
C->>I: HTTP request (Authorization: Bearer, traceparent, x-request-id)
Note over I: 1. SetRequestId + PropagateRequestId
I->>I: Generate/propagate x-request-id
Note over I: 2. TraceLayer - create span
I->>I: Create tracing span (method, uri, request_id, trace_id)
Note over I: 3-6. Timeout → BodyLimit → CORS → MIME
I->>I: Validate request basics (timeout, size, content-type)
Note over I: 7. Rate limiting
I->>I: Check RPS bucket + in-flight semaphore
alt Rate limit exceeded
I-->>C: 429 Too Many Requests (Retry-After header)
end
Note over I: 8. Error mapping layer (wraps inner errors)
Note over I: 9. Auth layer (AuthPolicyLayer)
I->>I: Resolve route policy (public / required / optional)
alt Route is public
I->>I: Insert anonymous SecurityContext
else Route requires auth
I->>IdP: Validate JWT (cached JWKS)
IdP-->>I: Token valid + claims (subject, tenant_id, permissions[])
I->>I: RBAC check: claims.permissions vs route SecRequirement
alt RBAC denied
I-->>C: 403 Forbidden (Problem)
end
I->>I: Build SecurityContext(tenant_id, subject_id, scope)
end
Note over I: 10. Inject PolicyEngine into extensions
Note over I: 11. License validation
I->>LIC: Check license features (from OperationSpec.license_requirement)
LIC->>LICM: Check license features (from OperationSpec.license_requirement)
LICM-->>LIC: Allowed | FeatureMissing
LIC-->>I: Allowed | FeatureMissing
alt License check failed
I-->>C: 403 Forbidden (license feature required)
end
Note over I: 12. Router dispatches to handler
I->>M: Call handler (SecurityContext in Extension)
M->>D: Execute domain logic (ctx, command/query)
D->>DB: SecureConn.find/insert/update (ctx applies tenant filter)
DB-->>D: Scoped results (WHERE tenant_id IN ...)
D->>EB: Publish domain event (optional)
D->>AUD: Emit audit event (actor, tenant, resource, action)
D->>UT: Record usage (tenant, operation, tokens/bytes)
D-->>M: Domain result
M-->>I: Map to DTO + OpenAPI response
I-->>C: HTTP 200/201 (JSON) or SSE stream
Chat hooks allow integrations to intercept internal message/file/search traffic within the chat system. Hooks enable:
- Blocking: Return error and stop processing
- Override: Modify content before proceeding
| Hook ID | Trigger point | Capabilities | Use case |
|---|---|---|---|
gts.x.genai.flow.hook.v1~x.genai.chat.user_message_pre_store.v1~ |
After user message submitted, before DB store | BLOCK, OVERRIDE | DLP: scan outgoing content |
gts.x.genai.flow.hook.v1~x.genai.file.post_parse.v1~ |
After file content parsed | INFORMATIVE | Audit, classification |
gts.x.genai.flow.hook.v1~x.genai.llm.pre_call.v1~ |
Before final message goes to LLM | BLOCK, OVERRIDE | Content filtering, PII redaction |
gts.x.genai.flow.hook.v1~x.genai.llm.post_response.v1~ |
After LLM response, before DB store | BLOCK, OVERRIDE | Response filtering |
gts.x.genai.flow.hook.v1~x.genai.search.pre_request.v1~ |
Before search request (RAG or WebSearch) | BLOCK, OVERRIDE | Query sanitization |
gts.x.genai.flow.hook.v1~x.genai.search.post_response.v1~ |
After search response received | BLOCK, OVERRIDE | Result filtering |
All the hook types are registered in GTS and can be enabled/disabled per tenant/user by customers or integrations. All the registered hooks will be executed in the priority order.
sequenceDiagram
autonumber
participant C as Client UI
box "External Core Platform"
participant HK as Hook endpoint (external)
end
box "HyperSpot"
participant CE as Chat engine
participant SET as Settings service
participant TR as Types Registry
participant EGR as Outbound API gateway
participant CS as Credential Resolver
participant AUD as Audit
end
Note over CE,SET: [ ] p3 - Step 1: Check if hook is registered
CE->>SET: Get hooks for tenant/user (tenant_id, user_id, hook_type)
SET-->>CE: {hooks_enabled: true, hook_ids: ["hook_xyz"]}
alt No hooks registered
CE->>CE: Skip hook invocation, proceed normally
else Hooks registered
Note over CE,TR: [ ] p3 - Step 2: Get hook details from GTS
CE->>TR: GET /types/v1/instances?$filter=type_id eq 'gts.x.genai.flow.hook.v1~*'
Note right of CE: Filter by hook_ids from settings
TR-->>CE: Hook definitions[] {id, endpoint_url, auth_config, timeout_ms}
Note over CE,EGR: [ ] p3 - Step 3: Invoke hook via Outbound API gateway
CE->>EGR: Invoke hook (endpoint_url, auth_config, payload)
EGR->>CS: Resolve credentials (tenant_id, hook.auth_config)
CS-->>EGR: Credential material (API key, OAuth token, mTLS cert)
EGR->>HK: POST {hook_type, payload, context}
Note right of EGR: payload = message_content | file_content | search_query | llm_response
HK-->>EGR: {action: "allow" | "block" | "override", reason?, modified_content?}
EGR-->>CE: Hook response
Note over CE,AUD: [ ] p3 - Step 4: Process hook response
CE->>AUD: Audit: hook.invoked {hook_id, hook_type, action, reason}
alt action == "block"
CE->>CE: Abort processing
CE-->>CE: Return error: {code: "hook_blocked", reason}
else action == "override"
CE->>CE: Replace content with modified_content
CE->>CE: Continue processing with modified content
else action == "allow"
CE->>CE: Continue processing unchanged
end
end
user_message.pre_store:
{
"hook_type": "gts.x.genai.flow.hook.v1~x.genai.chat.user_message_pre_store.v1~",
"payload": {
"message_id": "msg_123",
"content": "Please analyze this financial report",
"attachments": [{"file_id": "file_456"}]
},
"context": {"tenant_id": "...", "user_id": "...", "conversation_id": "..."}
}llm.pre_call:
{
"hook_type": "gts.x.genai.flow.hook.v1~x.genai.chat.llm_pre_call.v1~",
"payload": {
"messages": [...],
"tools": [...],
"model": "gpt-4",
"estimated_tokens": 4500
},
"context": {"tenant_id": "...", "conversation_id": "..."}
}NOTE: This is target architecture and not the current state of the codebase. Some components and scenarios steps are not yet implemented.
This scenario follows patterns from LangChain/LangGraph (agent loop, state machine) and Rig (Rust AI framework):
- ReAct pattern: Reason → Act → Observe loop for tool calls
- Streaming-first: SSE for real-time token delivery
- Async file processing: Background jobs for parsing/indexing
Steps:
- User uploads file + sends message (file stored, job enqueued) — Hook: user_message.pre_store
- File processed asynchronously (parse → chunk → embed → index) — Hook: file.post_parse
- RAG retrieval from indexed documents — Hooks: search.pre_request, search.post_response
- WebSearch for real-time information (if enabled) — Hooks: search.pre_request, search.post_response
- Agent state preparation (tools + prompt + model + token budget) — Hooks: llm.pre_call
- Agent loop + SSE streaming — Hooks: llm.pre_call, llm.post_response
File upload stores the blob, then Chat Engine orchestrates job creation. The UI tracks job progress via SSE or polling before proceeding.
Key architectural points:
- API gateway remains simple (middleware + routing only)
- Chat Engine owns orchestration — it triggers the Jobs Manager
- UI must wait for job completion before file content is usable
sequenceDiagram
autonumber
participant U as User
participant C as Client UI
box "HyperSpot"
participant I as API gateway
participant FS as File storage
participant CE as Chat engine
participant HK as Hook invocation
participant JM as Jobs manager
participant DB as Chat DB
participant EB as Events broker
end
U->>C: Attach file + type message
Note over C,FS: [ ] p2 - Step 1a: Upload file (store blob only)
C->>I: POST /files/v1/upload (multipart, SecurityContext)
I->>FS: Store blob (tenant_id, content_hash)
FS-->>I: file_id, size, mime_type
I-->>C: 201 Created {file_id, size, mime_type}
Note over C,CE: [ ] p1 - Step 1b: Create chat message + trigger ingestion
C->>I: POST /chat/v1/conversations/{conv_id}/messages
Note right of C: {content: "Analyze this document", attachments: [{file_id}]}
I->>CE: Create user message (SecurityContext)
Note over CE,HK: [ ] p2 - HOOK: user_message.pre_store (see hook sub-scenario)
CE->>HK: Invoke hook (user_message.pre_store, {content, attachments})
HK-->>CE: {action: allow | block | override}
alt action == "block"
CE-->>I: 422 Unprocessable (hook_blocked)
I-->>C: 422 {error: "content_blocked", reason}
else action == "override"
CE->>CE: Replace message content with modified_content
end
CE->>DB: Persist message (conv_id, role: user, content, attachments[])
CE->>CE: Orchestration: detect attachment requires ingestion
CE->>JM: Request job: file_ingestion(file_id, tenant_id, message_id)
JM-->>CE: job_id (status: queued)
CE->>DB: Update message.job_id = job_id
CE->>EB: Publish event: chat.message.created {message_id, job_id}
CE-->>I: {message_id, job_id, status: "processing"}
I-->>C: 201 Created {message_id, job_id, status: "processing"}
Note over C,JM: [ ] p3 - Step 1c: UI tracks job progress (SSE preferred)
C->>I: GET /jobs/v1/{job_id}/stream (Accept: text/event-stream)
I->>JM: Subscribe to job progress (SecurityContext, job_id)
loop Job progress events
JM-->>I: SSE: {status: "queued" | "parsing" | "chunking" | "embedding" | "indexing"}
I-->>C: SSE: {status, progress_pct, details}
end
JM-->>I: SSE: {status: "done", doc_id, chunk_count}
I-->>C: SSE: {status: "done", doc_id}
Note over C: UI now knows file is ready for RAG retrieval
The Jobs Manager executes the file ingestion pipeline asynchronously, emitting progress events for UI tracking. When complete, Chat Engine proceeds with RAG retrieval.
sequenceDiagram
autonumber
box "HyperSpot"
participant JM as Jobs manager
participant FP as File parser gateway
participant FS as File storage
participant HK as Hook invocation
participant LLM as LLM gateway (embeddings)
participant LSI as Local search index
participant EB as Events broker
participant CE as Chat engine
participant RAG as RAG gateway
end
Note over JM,FP: [ ] p2 - Background job execution (p2: progress events)
JM->>JM: Dequeue job: file_ingestion(file_id)
JM->>EB: Emit progress: {status: "parsing"}
JM->>FS: Fetch file bytes (file_id)
FS-->>JM: File content stream
JM->>FP: Parse file (mime_type, content)
FP-->>JM: Parsed result {text, metadata, structure}
Note over JM,HK: [ ] p3 - HOOK: file.post_parse (informative only)
JM->>HK: Invoke hook (file.post_parse, {file_id, parsed_text, metadata})
Note right of JM: Informative hook - cannot block or override
Note over JM,LSI: [ ] p2 - Chunking + embedding + indexing
JM->>EB: Emit progress: {status: "chunking"}
JM->>JM: Split text into chunks (overlap, max_tokens)
JM->>EB: Emit progress: {status: "embedding"}
JM->>LLM: Generate embeddings (chunks[])
LLM-->>JM: vectors[]
JM->>EB: Emit progress: {status: "indexing"}
JM->>LSI: Index chunks (tenant_id, doc_id, chunks[], vectors[])
LSI-->>JM: indexed_count
JM->>JM: Update job status: done
JM->>EB: Emit progress: {status: "done", doc_id, chunk_count}
Note over CE,RAG: [ ] p2 - Chat engine proceeds with RAG retrieval
EB-->>CE: Event: file.ingestion.completed {message_id, doc_id}
CE->>CE: Mark message ready for processing
Retrieve relevant context from indexed documents using hybrid search (vector + keyword).
sequenceDiagram
autonumber
box "HyperSpot"
participant CE as Chat engine
participant SET as Settings service
participant HK as Hook invocation
participant RAG as RAG gateway
participant LSI as Local search index
end
Note over CE,SET: [ ] p1 - Load user/tenant configuration
CE->>SET: Get settings (tenant_id, user_id)
SET-->>CE: {enabled_tool_ids[], model_policy, agent_config, websearch_enabled}
Note over CE,RAG: [ ] p2 - RAG retrieval with hooks
CE->>CE: Build search query from user message
Note over CE,HK: [ ] p3 - HOOK: search.pre_request (RAG)
CE->>HK: Invoke hook (search.pre_request, {query, search_type: "rag"})
HK-->>CE: {action: allow | block | override}
alt action == "block"
CE->>CE: Skip RAG retrieval (or return error)
else action == "override"
CE->>CE: Use modified query
end
CE->>RAG: Retrieve context (query, filters: {doc_id})
RAG->>LSI: Hybrid search (vector + keyword, tenant_id)
LSI-->>RAG: Top-K chunks with scores
RAG->>RAG: Rerank + deduplicate + format citations
RAG-->>CE: ContextPack {chunks[], citations[], token_count}
Note over CE,HK: [ ] p3 - HOOK: search.post_response (RAG)
CE->>HK: Invoke hook (search.post_response, {chunks[], citations[]})
HK-->>CE: {action: allow | block | override}
alt action == "override"
CE->>CE: Use modified chunks/citations
end
When WebSearch is enabled, query external search engines for real-time information. Results are merged with RAG context.
WebSearch best practices:
- Query rewriting (LLM-assisted or rule-based)
- Result deduplication with RAG context
- Source URL attribution for citations
sequenceDiagram
autonumber
box "HyperSpot"
participant CE as Chat engine
participant HK as Hook invocation
participant WS as WebSearch gateway
end
Note over CE,WS: [ ] p4 - WebSearch (if enabled)
alt websearch_enabled == true
CE->>CE: Rewrite query for web search (LLM-assisted or rule-based)
Note over CE,HK: [ ] p5 - HOOK: search.pre_request (WebSearch)
CE->>HK: Invoke hook (search.pre_request, {query, search_type: "web"})
HK-->>CE: {action: allow | block | override}
alt action == "block"
CE->>CE: Skip WebSearch
else action == "override"
CE->>CE: Use modified query
end
CE->>WS: Search web (query, max_results, safe_search)
WS-->>CE: WebResults[] {title, url, snippet, published_date}
Note over CE,HK: [ ] p5 - HOOK: search.post_response (WebSearch)
CE->>HK: Invoke hook (search.post_response, {web_results[]})
HK-->>CE: {action: allow | block | override}
alt action == "override"
CE->>CE: Use filtered/modified results
end
CE->>CE: Deduplicate + merge with RAG context
CE->>CE: Format web citations with source URLs
end
Prepare the full agent state before LLM invocation.
Key rules:
- No runtime tool validation via MCP (too slow) — rely on GTS-registered definitions
- Token budget check before LLM call — reject or mitigate if context too large
sequenceDiagram
autonumber
box "HyperSpot"
participant CE as Chat engine
participant TR as Types Registry
participant PR as Prompts registry
participant MR as Models registry
participant AM as Agent Memory
participant UT as Usage tracker
end
Note over CE,TR: [ ] p4 - Resolve tool definitions from GTS (no MCP validation)
CE->>TR: GET /types/v1/instances?$filter=type_id eq 'gts.x.genai.mcp.tools.v1~*'
Note right of CE: Filter by enabled_tool_ids from settings
TR-->>CE: Tool definitions[] {id, schema, mcp_server_uri, auth_config}
CE->>CE: Use GTS-registered tools directly (trust registration)
Note over CE,PR: [ ] p1 - Resolve prompt configuration
CE->>PR: Get prompt (conversation.agent_type, tenant_id)
PR-->>CE: {system_prompt, tool_usage_instructions, output_format}
Note over CE,MR: [ ] p1 - Select model
CE->>MR: Get model (model_policy, required_capabilities: [tools, streaming])
MR-->>CE: {model_id, provider, context_window, supports_tools}
Note over CE,AM: [ ] p5 - Load agent memory (optional)
CE->>AM: Get relevant memories (user_id, conversation_id)
AM-->>CE: Memory entries[] (episodic, semantic)
Note over CE,CE: [ ] p3 - TOKEN BUDGET CHECK (critical for production)
CE->>CE: Calculate prompt_tokens = system_prompt + history + RAG_context + web_context + tool_schemas
CE->>CE: remaining_tokens = context_window - prompt_tokens
alt remaining_tokens < min_required (e.g., 500)
CE->>CE: Apply mitigation strategy
alt Strategy: summarize history
CE->>CE: Compress older messages to summary
else Strategy: reduce RAG context
CE->>CE: Keep only top-K most relevant chunks
else Strategy: shrink tool descriptors
CE->>CE: Use compact tool descriptions
else No mitigation possible
CE-->>CE: Reject with error: "Context too large"
end
end
CE->>UT: Check user/tenant token budget remaining
UT-->>CE: {budget_remaining, budget_limit}
alt budget_remaining <= 0
CE-->>CE: Reject with error: "Token budget exceeded"
end
CE->>CE: Build AgentState {messages[], tools[], rag_context, web_context, memory, model, token_budget}
This implements the ReAct pattern (Reason + Act): the agent iteratively calls the LLM, executes any requested tools, and feeds results back until the LLM produces a final answer.
sequenceDiagram
autonumber
box "External Core Platform"
participant EXT as External Tool/Service
end
box "HyperSpot"
participant CE as Chat engine
participant HK as Hook invocation
participant LLM as LLM gateway
participant PM as Policy manager
participant MCP as MCP gateway
participant EGR as Outbound API gateway
participant CS as Credential Resolver
participant AUD as Audit
participant UT as Usage tracker
participant DB as Chat DB
end
Note over CE,LLM: [ ] p4 - Agent loop starts (p2: tool execution)
CE->>CE: Initialize: iteration=0, max_iterations=10
loop ReAct Loop (until finish or max_iterations)
Note over CE,HK: [ ] p5 - HOOK: llm.pre_call (before each LLM invocation)
CE->>HK: Invoke hook (llm.pre_call, {messages[], tools[], model})
HK-->>CE: {action: allow | block | override}
alt action == "block"
CE->>CE: Abort agent loop
CE-->>CE: Return error: {code: "llm_call_blocked", reason}
else action == "override"
CE->>CE: Use modified messages/tools
end
CE->>LLM: Chat completion (messages + tools + context)
LLM-->>CE: Response {content?, tool_calls[]?, finish_reason}
Note over CE,HK: [ ] p5 - HOOK: llm.post_response (after each LLM response)
CE->>HK: Invoke hook (llm.post_response, {content, tool_calls[], finish_reason})
HK-->>CE: {action: allow | block | override}
alt action == "block"
CE->>CE: Discard response, return error
CE-->>CE: Return error: {code: "response_blocked", reason}
else action == "override"
CE->>CE: Use modified content/tool_calls
end
CE->>UT: Record LLM usage (input_tokens, output_tokens, model_id)
alt finish_reason == "stop" (no tool calls)
CE->>CE: Break loop - final answer ready
else finish_reason == "tool_calls" (p2: tool execution)
CE->>DB: Persist assistant message (tool_calls pending)
loop For each tool_call in tool_calls[]
CE->>PM: Authorize tool (SecurityContext, tool_id, args_hash)
PM-->>CE: Allow | Deny (+ reason)
alt Denied by policy
CE->>CE: tool_result = {error: "policy_denied", reason}
else Allowed
CE->>MCP: Execute tool (tool_id, args, timeout)
MCP->>EGR: Prepare egress request
EGR->>CS: Resolve credentials (tenant_id, tool.auth_config)
CS-->>EGR: Credential material
EGR->>EXT: HTTP/gRPC call to external service
EXT-->>EGR: Response
EGR-->>MCP: Normalized result
MCP-->>CE: tool_result {output, duration_ms}
CE->>AUD: Audit: tool.executed {tool_id, args_hash, status, duration}
CE->>UT: Record tool usage (tool_id, tenant_id)
end
end
CE->>CE: Append tool_results to messages[]
CE->>CE: iteration++
end
end
alt max_iterations exceeded
CE->>CE: Force stop - append "max iterations reached" message
end
CE->>DB: Persist final assistant message
The final answer is streamed to the client using Server-Sent Events (SSE). The Chat engine uses ModKit's SseBroadcaster for efficient fan-out.
Key rules:
- SSE throttling: If user/tenant consumes too many tokens, slow down or terminate stream
- Track token budget in real-time during streaming
sequenceDiagram
autonumber
participant C as Client UI
box "HyperSpot"
participant I as API gateway
participant CE as Chat engine
participant LLM as LLM gateway
participant DB as Chat DB
participant AM as Agent Memory
participant EB as Events broker
participant AUD as Audit
participant UT as Usage tracker
end
Note over C,I: [ ] p1 - Client opens SSE connection
C->>I: GET /chat/v1/conversations/{conv_id}/stream (Accept: text/event-stream)
I->>CE: Subscribe to conversation stream (SecurityContext, conv_id)
CE-->>I: SSE connection established
I-->>C: HTTP 200 (Content-Type: text/event-stream)
Note over CE,LLM: [ ] p1 - Stream final response (or continue from agent loop)
CE->>LLM: Chat completion (messages, stream: true)
CE->>CE: Initialize: tokens_emitted=0, throttle_state=normal
loop Token streaming
LLM-->>CE: delta {content_chunk, index}
CE->>CE: Accumulate full_content
CE->>CE: tokens_emitted += estimate_tokens(chunk)
Note over CE,UT: [ ] p3 - SSE THROTTLING CHECK
CE->>UT: Update usage + check budget (tenant_id, tokens_emitted)
UT-->>CE: {budget_remaining, throttle_action}
alt throttle_action == "normal"
CE-->>I: SSE event: {"type": "delta", "content": chunk}
I-->>C: SSE: data: {"type": "delta", ...}
else throttle_action == "slow_down"
CE->>CE: Batch next N tokens before emitting
CE->>CE: Optional: sleep(throttle_delay_ms)
CE-->>I: SSE event: {"type": "delta", "content": batched_chunk, "throttled": true}
I-->>C: SSE: data: {"type": "delta", "throttled": true, ...}
else throttle_action == "terminate"
CE->>CE: Cancel LLM stream
CE-->>I: SSE event: {"type": "error", "code": "budget_exceeded", "message": "Token budget exhausted"}
I-->>C: SSE: data: {"type": "error", ...}
CE->>DB: Persist partial assistant message (truncated)
CE->>AUD: Audit: chat.response.terminated {reason: "budget_exceeded"}
I-->>C: SSE connection closed
end
end
LLM-->>CE: finish_reason: "stop", usage: {prompt_tokens, completion_tokens}
Note over CE,DB: [ ] p1 - Persist and finalize
CE->>DB: Insert assistant message (conv_id, role: assistant, content, citations[])
CE->>UT: Record final usage (tenant_id, model_id, total_tokens)
CE->>AUD: Audit: chat.response.completed {conv_id, message_id, tool_count, duration}
Note over CE,AM: [ ] p4 - Update agent memory (optional)
CE->>AM: Store episodic memory (conversation summary, key facts)
CE->>EB: Publish event: chat.response.completed
CE-->>I: SSE event: {"type": "done", "message_id": ..., "usage": {...}}
I-->>C: SSE: data: {"type": "done", ...}
I-->>C: SSE connection closed
Note over C: Client renders final message with citations
This is a simpler alternative version of the async scenario:
- No Jobs Manager — file is parsed immediately during the request
- No RAG — file content is injected directly into chat context
- No WebSearch — no external search engines are used
- Aligned with current Go implementation (
/chat/threads/{thread_id}/attachmentand/chat/attachments)
Steps:
- User uploads file → synchronous parse → create "file attachment message" — Hook: file.post_parse
- User sends message — Hook: user_message.pre_store
- Prepare agent state + agent loop + SSE streaming — Hooks: llm.pre_call, llm.post_response (same as async Steps 5-6)
File is uploaded, parsed immediately (using File Parser), and a file attachment message is created with the parsed/truncated content. No background job, no RAG indexing, no WebSearch.
sequenceDiagram
autonumber
participant U as User
participant C as Client UI
box "HyperSpot"
participant I as API gateway
participant CE as Chat engine
participant FP as File parser gateway
participant HK as Hook invocation
participant DB as Chat DB
participant AUD as Audit
end
U->>C: Attach file + type message
Note over C,CE: [ ] p1 - Option A: Upload to existing thread
C->>I: POST /chat/v1/threads/{thread_id}/attachment (multipart)
I->>CE: Handle attachment upload (SecurityContext, thread_id, file)
Note over CE,FP: [ ] p1 - Synchronous file parsing
CE->>CE: Validate file size (max_size_kb from config)
CE->>FP: Parse file (mime_type, content)
FP-->>CE: Parsed result {text, metadata}
Note over CE,HK: [ ] p3 - HOOK: file.post_parse (informative)
CE->>HK: Invoke hook (file.post_parse, {file_id, parsed_text, metadata})
Note right of CE: Informative hook - cannot block or override
Note over CE,CE: [ ] p1 - Apply content limits
CE->>CE: Check content length vs max_content_length
alt Content too large
CE->>CE: Truncate at whitespace boundary
CE->>CE: Mark as truncated (preserve metadata)
end
Note over CE,DB: [ ] p1 - Create file attachment message
CE->>DB: Insert message (thread_id, role: "file_attachment")
Note right of CE: {content: formatted_text, filename, file_ext, original_size, is_truncated}
CE->>AUD: Audit: chat.attachment.created {thread_id, filename, size}
CE-->>I: {message_id, thread_id, content_length, is_truncated}
I-->>C: 201 Created {message_id, is_truncated}
Note over C,CE: [ ] p1 - Option B: Create new thread with attachment
C->>I: POST /chat/v1/attachments (multipart, ?group_id)
I->>CE: Create thread + attachment (SecurityContext, group_id?, file)
CE->>DB: Create new thread (group_id)
CE->>FP: Parse file (same as above)
FP-->>CE: Parsed result
CE->>HK: [ ] p3 - HOOK: file.post_parse (informative)
CE->>CE: Apply content limits (same as above)
CE->>DB: Insert file attachment message
CE-->>I: {message_id, thread_id, content_length, is_truncated}
I-->>C: 201 Created {message_id, thread_id}
Note over C: UI can now send user message referencing this thread
After the file attachment message exists, user sends their actual question. Chat Engine prepares agent state with file content included in context.
sequenceDiagram
autonumber
participant C as Client UI
box "HyperSpot"
participant I as API gateway
participant CE as Chat engine
participant HK as Hook invocation
participant DB as Chat DB
participant SET as Settings service
participant TR as Types Registry
participant PR as Prompts registry
participant MR as Models registry
participant UT as Usage tracker
end
Note over C,CE: [ ] p1 - User sends message
C->>I: POST /chat/v1/threads/{thread_id}/messages
Note right of C: {content: "Summarize this document", model_name, stream: true}
I->>CE: Create user message (SecurityContext)
Note over CE,HK: [ ] p3 - HOOK: user_message.pre_store
CE->>HK: Invoke hook (user_message.pre_store, {content, attachments})
HK-->>CE: {action: allow | block | override}
alt action == "block"
CE-->>I: 422 Unprocessable (hook_blocked)
I-->>C: 422 {error: "content_blocked", reason}
else action == "override"
CE->>CE: Replace message content with modified_content
end
CE->>DB: Persist user message
Note over CE,DB: [ ] p1 - Load conversation context (including file attachment)
CE->>DB: Get thread messages (thread_id)
DB-->>CE: messages[] including file_attachment_message
Note over CE,SET: [ ] p1 - Load settings (p2: tools)
CE->>SET: Get settings (tenant_id, user_id)
SET-->>CE: {enabled_tool_ids[], model_policy}
CE->>TR: [ ] p2 - GET tool definitions (gts.x.genai.mcp.tools.v1~*)
TR-->>CE: Tool definitions[] (no runtime validation)
Note over CE,PR: [ ] p1 - Resolve prompt + model
CE->>PR: Get prompt (agent_type, tenant_id)
PR-->>CE: {system_prompt}
CE->>MR: Get model (model_policy)
MR-->>CE: {model_id, context_window}
Note over CE,CE: [ ] p2 - TOKEN BUDGET CHECK (critical for production)
CE->>CE: prompt_tokens = system_prompt + file_content + history + tool_schemas
CE->>CE: remaining_tokens = context_window - prompt_tokens
alt remaining_tokens < min_required
alt File content too large
CE->>CE: Truncate file content further
else History too long
CE->>CE: Summarize older messages
else Still too large
CE-->>I: Error: "Context exceeds model limit"
I-->>C: 400 Bad Request
end
end
CE->>UT: Check token budget
UT-->>CE: {budget_remaining}
alt budget_remaining <= 0
CE-->>I: Error: "Token budget exceeded"
I-->>C: 402 Payment Required
end
CE->>CE: Build AgentState {messages[], tools[], model, token_budget}
For the agent loop and SSE streaming, refer to the async scenario Step 6/6 above. The flow is identical:
- ReAct agent loop (LLM call → tool execution → repeat)
- SSE streaming with throttling
The only difference is that the context includes the full file attachment content (possibly truncated) directly in messages, rather than RAG-retrieved chunks with citations.
