Emilia fully annotated with EmoNet / Empathic Insight Voice emotion captions & scores - ready for training :)

Hey, we ran our state of the art emotion captioning and emotion estimation scores over all of Emilia. You can find the results here distributed over these five repositories.


https://huggingface.co/datasets/laion/Emilia-with-Emotion-Annotations
https://huggingface.co/datasets/laion/Emilia-with-Emotion-Annotations2
https://huggingface.co/datasets/laion/Emilia-with-Emotion-Annotations3
https://huggingface.co/datasets/laion/Emilia-with-Emotion-Annotations4
https://huggingface.co/datasets/laion/Emilia-with-Emotion-Annotations5


https://huggingface.co/Orange/Speaker-wavLM-tbr

I would suggest fine-tuning that is also conditioned on these speaker embeddings because they capture the time-independent attributes of a voice that make up a speaker's identity without confusing it with emotions and arousal. This could eventually enable voice cloning without having a pair for the reference audio. Just take the embedding of the target as conditioning together with the emotion scores.


Have fun! :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emilia fully annotated with EmoNet / Empathic Insight Voice emotion captions & scores - ready for training :) #237

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Emilia fully annotated with EmoNet / Empathic Insight Voice emotion captions & scores - ready for training :) #237

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions