Please update this so it works for latest generation diffsinger models that have linguistic.onnx models 

So it looks like newer generation diffsinger models now have linguistic models that take in tokens, word divisions and word durations where the output is encoder_out and x_masks which then feed to the duration.onnx model

Example below(please tell me the if zeroes are needed in the below example) 
results = linguistic_model.run(None, {
    "tokens":[[26, 1, 22, 35, 11]] ,
    "word_div": [[3,2,0,0,0]],
    "word_dur": [[48,24,0,0,0]]
})

Happy to get your thoughts, thank you!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Please update this so it works for latest generation diffsinger models that have linguistic.onnx models #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Please update this so it works for latest generation diffsinger models that have linguistic.onnx models #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions