Article suggests text-to-speech model output format is an image

### Existing documentation URL(s)

https://developers.cloudflare.com/workers-ai/models/aura-1/

>>>### Output

>>>The binding returns a `ReadableStream` with the image in JPEG or PNG format (check the model's output schema).

### What changes are you suggesting?

I haven't worked with this worker / model, but I'm assuming the output is audio.

### Additional information

_No response_