-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Summary
Implement an OpenAI provider (pkg/provider/openai) supporting chat completions, embeddings, image generation, and audio output. This will likely require extending the response content model beyond text to support multi-modal outputs.
Requirements
Core Provider
- Implement
llm.Clientinterface for OpenAI API (chat completions, model listing) - API key via
OPENAI_API_KEYenvironment variable - Support streaming and non-streaming chat completions
- Support tool/function calling
- Support thinking/reasoning (o1, o3 models)
Image Output
- Support DALL-E and GPT-image models for image generation
- Responses may contain image data (base64 or URLs) alongside text
- Extend the content block model to represent image outputs (not just text)
- Images should be renderable in Telegram (send as photo) and CLI (save to file or display URL)
Audio Output
- Support audio output from GPT-4o-audio and similar models
- Responses may contain audio data (base64 WAV/MP3)
- Extend the content block model to represent audio outputs
- Audio should be sendable in Telegram (as voice/audio message) and CLI (save to file)
Content Model Changes
- Current
schema.Contentmay need to support typed content blocks: text, image, audio, etc. - Each block should carry MIME type and either inline data or a URL
- Downstream consumers (Telegram bot, CLI, API responses) need to handle multi-modal content blocks
- Consider how this interacts with session storage (storing large binary blobs vs references)
Models
- GPT-4o, GPT-4o-mini, GPT-4.1, o1, o3, o4-mini (chat)
- DALL-E 3, GPT-image (image generation)
- GPT-4o-audio (audio output)
- Embedding models (text-embedding-3-small, text-embedding-3-large)
Motivation
OpenAI is a major LLM provider and its multi-modal output capabilities (images, audio) will drive the content model to support rich responses across all providers, improving the overall architecture.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels