-
Notifications
You must be signed in to change notification settings - Fork 114
Open
Description
Description
I would like to request the implementation of batch inference in LightX2V, allowing multiple prompts to be processed in a single forward pass to improve horizontal scaling and performance.
Motivation
Currently, LightX2V processes individual prompts through the generate() method of LightX2VPipeline. To process multiple prompts, the current approach requires using multiple servers in parallel as shown in post_multi_servers_tv2.py .
Implementing batch inference would provide:
- Better performance: Reduced overhead by processing multiple prompts in a single forward pass
- Horizontal scalability: Easier processing of large volumes of requests
- Resource optimization: Better utilization of GPU memory and compute
Current Behavior
LightX2VPipeline.generate()accepts individual parameters:seed,prompt,negative_prompt, etc.- Models like
WanModelandHunyuanVideo15Modelprocess withbatch_size=1 - The server handles tasks one by one to manage GPU memory effectively
Proposed Solution
- Extend pipeline interface: Modify
generate()to accept lists of prompts - Batch support in models: Add batch dimension in
_infer_cond_uncond()methods - Memory management: Adjust memory handling for larger batches
- Maintain compatibility: Preserve current API for individual use
Possible Implementations
# Proposed API
pipe.generate_batch(
prompts=["prompt1", "prompt2", "prompt3"],
negative_prompts=["neg1", "neg2", "neg3"],
seeds=[42, 43, 44],
save_result_paths=["out1.mp4", "out2.mp4", "out3.mp4"]
)Expected Impact
- Significant reduction in processing time for multiple videos
- Better GPU resource utilization
- Easier high-volume production deployments
Metadata
Metadata
Assignees
Labels
No labels