Simple prototype using OpenAI's Realtime API with Vision. In order for this to work, you need to supply an OPENAI_API_KEY environment variable.
This works by using the Realtime API with a vision describe tool that calls the OpenAI Vision API to describe images. The Realtime API is used to stream the results back to the client.
npm install
npm run dev
Make sure you are running over https, or the camera will not work. You can also run on http://localhost as another option.