Skip to content

Annotate_audio.py not outputting transcription. #3

@InfantLab

Description

@InfantLab

Hi,
Thanks for creating this great project.
The annotate_audio.py is working well to give emotion annotations, but does not output an accompanying transcription of the speech. This may be my mis-understanding of your use of the word 'transcription'. Your Example Output suggests caption contains 'transcribed text' but actually is a high level description of the text. It would be worth improving naming conventions or adding transcription of spoken text to avoid further confusion.

Thanks,
Caspar

E.g my_audio_file.json:

    "caption": "AA medium-quality recording of a male speaker describing a painting. The speaker sounds calm and informative, with a slightly nostalgic tone. The recording quality is decent, with no noticeable background noise.",

There are no obvious error messages although in the console. Although perhaps these are relevant.

Transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English. This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass `language='en'`. See https://github.com/huggingface/transformers/pull/28687 for more details.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.  

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions