Skip to content

Latest commit

 

History

History
28 lines (17 loc) · 436 Bytes

File metadata and controls

28 lines (17 loc) · 436 Bytes

Multimodal model capabilties

This playground is about using image, text and audio as input for models to decide and reason about human input.

multimodal

Vision interaction

Look at a single image

python inspect-image.py

Compare multiple images

python compare-images.py

Voice agents (requires access to realtime model)

launch the app

python voice-interaction/app.py