Skip to content

Question about instance_group usage in CosyVoice2 Triton deployment #1834

@lucgeo

Description

@lucgeo

Hi,

I'm tuning CosyVoice2 performance with Triton and I would like some clarification about instance_group in config.pbtxt.
What exactly is the role of instance_group? Does increasing it allow more inference requests to run in parallel? How does it interact with dynamic batching?

For which components of the model (cosyvoice2, audio_tokenizer, speaker_embedding, tensorrt_llm, token2wav) is it useful to increase instance_group? Should multiple instances be configured for all of them, or only for specific components (e.g. tensorrt_llm or token2wav)?

What is the relationship between the number of simultaneous inference requests and instance_group?
If I want to support N concurrent TTS requests, should instance_group scale proportionally?

Any best practices for configuring this for low latency and stable streaming under moderate concurrency would be very helpful.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions