Align Larger Audio File 

Hi @cschaefer26,
You have done nice job. I'm using your repo. But while aligning larger audio (> 1 minute) with its character (phone) sequence at inference period, the number of predicted values in duration file (. npy file) does not match with the number of characters (phones) that I input with the audio file. What is the problem here? I want to use pretrained model (trained on bangla dataset [audio, phoneme sequence] ) for phoneme duration prediction.So accuracy is a major concern for me. 

Note that: While training, I have used  10-15 second larger audio files and corresponding transcriptions (phoneme sequences). And I customized your code (preprocess.py  and extract_durations.py) to fit the inference for single audio and its transcription.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align Larger Audio File #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Align Larger Audio File #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions