This project implements a custom LLM-powered chat service using Express.js, supporting multiple LLM providers including OpenAI's API and DataStax Langflow, to create a custom LLM for use in the Agora Convo AI Engine. It supports both streaming and non-streaming responses, function calling capabilities, and includes RAG (Retrieval Augmented Generation) functionality.
This project implements basic tools and a tool calling mechanism. The tools use Agora Signaling Service to send messages into a real-time messaging channel.
This service supports multiple LLM providers:
- OpenAI Chat Completions API - The standard OpenAI chat completions endpoint
- OpenAI Responses API - OpenAI's new Responses API with improved streaming
- DataStax Langflow - Visual AI flow builder with DataStax integration
The service now includes full support for DataStax Langflow, allowing you to:
- Connect to DataStax-hosted Langflow instances
- Use custom AI flows built with Langflow's visual interface
- Maintain session state across conversations
- Execute function calls from within Langflow flows
- Stream responses in real-time
To use Langflow, configure the following environment variables:
LANGFLOW_URL=https://api.langflow.astra.datastax.com/lf/your-instance
LANGFLOW_API_KEY=your_langflow_api_key
LANGFLOW_FLOW_ID=your_flow_id
Then update your route configuration to use the Langflow service instead of OpenAI.
graph LR
Client[Client] <--> |Voice/Text Stream| ConvoAI[Agora Convo AI]
ConvoAI --> |ASR Text| Server[Express Server]
Server --> |Auth| AuthMiddleware[Auth Middleware]
AuthMiddleware --> ChatRouter[Chat Router]
ChatRouter --> LLMService[LLM Service<br/>#40;OpenAI/Langflow#41;]
LLMService --> |Get Context| RagService[RAG Service]
RagService --> |Return Context| LLMService
LLMService --> |System Prompt + RAG + ASR Text| LLMProvider[LLM Provider<br/>#40;OpenAI/Langflow#41;]
LLMProvider --> |Response| LLMService
LLMService --> |Function Calls| Tools[Tools Service]
Tools --> |Agora RTM API| Agora[Agora Signaling Service]
LLMService --> |Response| Server
Server --> |Response| ConvoAI
ConvoAI --> |Audio + Text| Client
subgraph Services
LLMService
RagService
Tools
end
subgraph Config
Utils[Utils/Config]
ToolDefs[Tool Definitions]
end
Services --> Config
For a detailed diagram of the sequence flow, see the Sequence Flow section, and for more information on the entities, see the Component Details and Data Models sections.
Heroku | Netlify | Render | Vercel |
---|---|---|---|
Each platform requires the appropriate configuration:
- Heroku: Uses the
app.json
file andProcfile
- Netlify: Uses the
netlify.toml
file and the Netlify function innetlify/functions/api.js
- Render: Uses the
render.yaml
file - Vercel: Uses the
vercel.json
file
- Install dependencies:
npm install
- Create environment variables file:
cp .env.example .env
- Configure the environment variables:
# Agora Configuration
AGORA_APP_ID=your_app_id
AGORA_APP_CERTIFICATE=your_certificate
AGORA_CUSTOMER_ID=your_customer_id
AGORA_CUSTOMER_SECRET=your_customer_secret
# Agent Configuration
AGENT_ID=your_agent_id
# LLM
LLM_PROVIDER=langflow # options: langflow or openai
#OpenAI
OPENAI_API_KEY=your_openai_api_key
OPENAI_MODEL=gpt-4o-mini # or choose a different model
USE_RESPONSES_API=false # Use OpenAI Responses API instead of Chat Completions
# Langflow Configuration (for Langflow service)
LANGFLOW_URL=https://api.langflow.astra.datastax.com/lf/your-instance
LANGFLOW_API_KEY=your_langflow_api_key
LANGFLOW_FLOW_ID=your_flow_id
# Server Configuration
PORT=3000
- Start the server:
npm start
This server supports two different OpenAI API implementations:
- Chat Completions API - The standard OpenAI chat completions endpoint
- Responses API - OpenAI's new Responses API
For a detailed comparison of the two APIs, see the Open AI's Responses vs Chat Completions page.
You can switch between these APIs using the USE_RESPONSES_API
environment variable:
# Use Responses API
USE_RESPONSES_API=true
# Use Chat Completions API
USE_RESPONSES_API=false
Both APIs provide similar functionality but the Responses API offers improved performance because it emits semantic events detailing precisely what changed (e.g., specific text additions), so you can write integrations targeted at specific emitted events (e.g., text changes). Whereas the Chat Completions API continuously appends to the content field as tokens are generated—requiring you to manually track differences between each state.
Use Docker to run this application:
# Build the Docker image
docker build -t agora-convo-ai-custom-llm .
# Run the container
docker run -p 3000:3000 --env-file .env agora-convo-ai-custom-llm
You can also use Docker Compose to run the application with all required services:
# Start the services
docker-compose up -d
# View logs
docker-compose logs -f
# Stop the services
docker-compose down
This microservice is meant to be used as a drop-in with the Agora Convo AI service. It acts as a middleware application that accepts ASR text and processes it before sending it to OpenAI's servers. While there is an exposed chat completion endpoint, you should only need to use it during the initial testing.
Returns a simple "pong" message to check the server's health.
Request:
curl http://localhost:3000/ping
Response:
{ "message": "pong" }
Handles chat completion requests with optional streaming support.
Request Body:
{
"messages": [{ "role": "user", "content": "Hello!" }],
"model": "gpt-4o-mini",
"stream": false,
"channel": "default",
"userId": "user123",
"appId": "app123"
}
Example Request:
curl -X POST http://localhost:3000/v1/chat/completion \
-H "Authorization: Bearer <your-llm-api-key>" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'
To test the llm locally we recommend using the ngrok
tool to expose your local server to the internet.
ngrok http localhost:3000
This will expose your local server to the internet and you can then use the ngrok url to test the llm.
curl -X POST https://<ngrok-url>/v1/chat/completion \
-H "Authorization: Bearer <your-llm-api-key>" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Hello!"}]}'
Response:
- Non-streaming: JSON response with completion
- Streaming: Server-sent events (SSE) with completion chunks
sequenceDiagram
participant C as Client
participant CA as Agora Convo AI
participant ASR as ASR Service
participant S as Express Server
participant A as Auth Middleware
participant O as OpenAI Service
participant R as RAG Service
participant T as Tools Service
participant AI as OpenAI API
participant AG as Agora RTM
C->>CA: Stream Audio
CA->>ASR: Process Audio
ASR->>CA: Return Text
CA->>S: POST /chat/completion
S->>A: Validate Token
A->>S: Token Valid
S->>O: Process Chat Completion
O->>R: Request Context
R-->>O: Return RAG Data
O->>AI: Send System Prompt + RAG + ASR Text
AI-->>O: Return Response
alt Function Call Required
O->>T: Execute Function
T->>AG: Send RTM Message
AG-->>T: Confirm Message
T-->>O: Return Result
O->>AI: Send Updated Context
AI-->>O: Return Final Response
end
O->>S: Return Response
S->>CA: Send Response
CA->>C: Stream Audio + Text Response
1. Server (server.ts)
- Main Express application entry point
- Configures middleware (helmet, cors, morgan, json parser)
- Mounts chat routes and health check endpoint
2. Chat Completion Router (chatCompletion.ts)
- Handles POST requests to /chat/completion
- Validates request parameters
- Manages both streaming and non-streaming responses
3. Authentication (auth.ts)
- Middleware for token-based authentication
- Validates Bearer tokens against configuration
The application supports multiple LLM providers through dedicated service modules:
- OpenAI Completions (openaiCompletionsService.ts) - Standard Chat Completions API
- OpenAI Responses (openaiResponsesService.ts) - Advanced Responses API with improved streaming
- Langflow Service (langflowService.ts) - DataStax Langflow integration
- Connects to Langflow flows via the DataStax Langflow client
- Maintains session state across conversations
- Supports both streaming and non-streaming responses
- Integrates with function calling mechanism
- Formats responses to match OpenAI Chat Completions API structure
All services provide:
- RAG integration through the RAG Service
- Function calling capabilities
- Streaming and non-streaming response modes
- Compatible response formatting
5. RAG Service (ragService.ts)
- Provides retrieval augmented generation data
- Maintains hardcoded knowledge base
- Formats data for system prompts
6. Tools Service (tools.ts)
- Implements function calling capabilities
- Handles Agora RTM integration
- Provides utility functions (sendPhoto, orderSandwich)
7. Tool Definitions (toolDefinitions.ts)
- Defines available functions for LLM
- Specifies function parameters and schemas
8. Utils (utils.ts)
- Manages configuration and environment variables
- Validates required settings
- Provides centralized config object
classDiagram
class Config {
+port: number
+agora: AgoraConfig
+llm: LLMConfig
+langflow: LangflowConfig
+agentId: string
}
class AgoraConfig {
+appId: string
+appCertificate: string
+authToken: string
}
class LLMConfig {
+openaiApiKey: string
+model: string
+useResponsesApi: boolean
}
class LangflowConfig {
+url: string
+apiKey: string
+flowId: string
}
class ChatMessage {
+role: string
+content: string
+name?: string
+function_call?: FunctionCall
}
class FunctionDefinition {
+name: string
+description: string
+parameters: FunctionParameter
}
class RagData {
+doc1: string
+doc2: string
+doc3: string
+doc4: string
}
Config -- AgoraConfig
Config -- LLMConfig
Config -- LangflowConfig