Annotate_audio.py not outputting transcription.

Hi, 
Thanks for creating this great project. 
The `annotate_audio.py` is working well to give emotion annotations, but does not output an accompanying transcription of the speech. This may be my mis-understanding of your use of the word 'transcription'. Your [Example Output](https://github.com/LAION-AI/emotion-annotations?tab=readme-ov-file#example-output-your_audiojson) suggests caption contains 'transcribed text' but actually is a high level description of the text. It would be worth improving naming conventions or adding transcription of spoken text to avoid further confusion. 

Thanks,
Caspar

E.g `my_audio_file.json`:

```json
    "caption": "AA medium-quality recording of a male speaker describing a painting. The speaker sounds calm and informative, with a slightly nostalgic tone. The recording quality is decent, with no noticeable background noise.",
```


There are no obvious error messages although in the console. Although perhaps these are relevant.
```
Transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English. This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. See https://github.com/huggingface/transformers/pull/28687 for more details.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.  
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Annotate_audio.py not outputting transcription. #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Annotate_audio.py not outputting transcription. #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions