Implement OpenAI provider with multi-modal output (images, audio)

## Summary

Implement an OpenAI provider (`pkg/provider/openai`) supporting chat completions, embeddings, image generation, and audio output. This will likely require extending the response content model beyond text to support multi-modal outputs.

## Requirements

### Core Provider
- Implement `llm.Client` interface for OpenAI API (chat completions, model listing)
- API key via `OPENAI_API_KEY` environment variable
- Support streaming and non-streaming chat completions
- Support tool/function calling
- Support thinking/reasoning (o1, o3 models)

### Image Output
- Support DALL-E and GPT-image models for image generation
- Responses may contain image data (base64 or URLs) alongside text
- Extend the content block model to represent image outputs (not just text)
- Images should be renderable in Telegram (send as photo) and CLI (save to file or display URL)

### Audio Output
- Support audio output from GPT-4o-audio and similar models
- Responses may contain audio data (base64 WAV/MP3)
- Extend the content block model to represent audio outputs
- Audio should be sendable in Telegram (as voice/audio message) and CLI (save to file)

### Content Model Changes
- Current `schema.Content` may need to support typed content blocks: text, image, audio, etc.
- Each block should carry MIME type and either inline data or a URL
- Downstream consumers (Telegram bot, CLI, API responses) need to handle multi-modal content blocks
- Consider how this interacts with session storage (storing large binary blobs vs references)

### Models
- GPT-4o, GPT-4o-mini, GPT-4.1, o1, o3, o4-mini (chat)
- DALL-E 3, GPT-image (image generation)
- GPT-4o-audio (audio output)
- Embedding models (text-embedding-3-small, text-embedding-3-large)

## Motivation

OpenAI is a major LLM provider and its multi-modal output capabilities (images, audio) will drive the content model to support rich responses across all providers, improving the overall architecture.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement OpenAI provider with multi-modal output (images, audio) #22

Summary

Requirements

Core Provider

Image Output

Audio Output

Content Model Changes

Models

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement OpenAI provider with multi-modal output (images, audio) #22

Description

Summary

Requirements

Core Provider

Image Output

Audio Output

Content Model Changes

Models

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions