Skip to content

Commit 55834ed

Browse files
Update docs
1 parent 003adaf commit 55834ed

File tree

1 file changed

+11
-4
lines changed

1 file changed

+11
-4
lines changed

src/diffusers/pipelines/cosmos/pipeline_cosmos2_5_transfer.py

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ def retrieve_latents(
151151

152152
class Cosmos2_5_TransferPipeline(DiffusionPipeline):
153153
r"""
154-
Pipeline for Cosmos Transfer2.5 base model.
154+
Pipeline for Cosmos Transfer2.5, supporting auto-regressive inference.
155155
156156
This model inherits from [`DiffusionPipeline`]. Check the superclass documentation for the generic methods
157157
implemented for all pipelines (downloading, saving, running on a particular device, etc.).
@@ -538,18 +538,25 @@ def __call__(
538538
num_latent_conditional_frames: Optional[int] = None,
539539
):
540540
r"""
541-
The call function supports a predict-compatible path when `controls` is `None` (or `self.controlnet` is
542-
`None`). In that mode it follows the same input semantics as `Cosmos2_5_PredictPipeline`:
541+
The call function can be used in two modes: with or without controls.
543542
543+
When controls are not provided (`controls is None`), inference works in the same manner as predict2.5 (see
544+
`Cosmos2_5_PredictPipeline`). This mode strictly uses the base transformer (`self.transformer`) to perform
545+
inference and accepts as input an optional `image` or `video` along with a `prompt` / `negative_prompt`, and
546+
can be used in the following ways:
544547
- **Text2World**: `image=None`, `video=None`, `prompt` provided.
545548
- **Image2World**: `image` provided, `video=None`, `prompt` provided.
546549
- **Video2World**: `video` provided, `image=None`, `prompt` provided.
547550
548551
When `controls` are provided and a ControlNet is attached, `controls` drive the conditioning and `video` &
549-
`image` is ignored.
552+
`image` is ignored. Controls are assumed to be pre-processed, e.g. edge maps are pre-computed.
550553
551554
Setting `num_frames` will restrict the total number of frames output, if not provided or assigned to None
552555
(default) then the number of output frames will match the input `video`, `image` or `controls` respectively.
556+
Auto-regressive inference is supported and thus a sliding window of `num_frames_per_chunk` frames are used per
557+
denoising loop. In addition, when auto-regressive inference is performed, the previous
558+
`num_latent_conditional_frames` or `num_conditional_frames` are used to condition the following denoising
559+
inference loops.
553560
554561
Args:
555562
image (`PipelineImageInput`, *optional*):

0 commit comments

Comments
 (0)