Bug Report

1. Incorrect scaling factor

In the following experiments, we incorrectly scale the focal point by a factor defined in src/misc/dl3dv_utils.py; line:45. The trained model in this case would just adapt to the incorrect scaling factor since the representation is structured implicitly and can only be observed through the renderer. For consistency with standard 3D approaches, we will fix this bug in the future versions.

scenetok_va-vdc_shift4_dl3dv_finetuned
scenetok_va-vdc_shift8_dl3dv_finetuned

The intrinsics are already normalized by the stored height and width therefore does not require additional scaling. The dont have this bug in other experiments.

2. Incorrect scaling factor for context

Similar to above, we incorrectly scale the focal point of the context views only since the experiments used the same precomputed latents for VA-VAE as in the previous issue

scenetok_va-wan_shift4_dl3dv_finetuned
scenetok_va-wan_shift8_dl3dv_finetuned

3. Incorrect `temporal_downsample` factor passed to camera embedder

In src/model/embedding/lvsm_embed.py, we provide with the parameter temporal_downsample for the reshapping layer as shown in the following,

nn.Sequential(
    Rearrange(
        "b ... (t c) (hh ph) (ww pw) -> b ... (hh ww) (t ph pw c)",
        ph=cfg.patch_size,
        pw=cfg.patch_size,
        t=temporal_downsample
    ),
    nn.Linear(
        cfg.in_channels * (cfg.patch_size**2),
        embed_dim,
        bias=False,
    ),
)

Normally the correct value should be temporal_downsample=4 for both VideoDCAE and WanVAE, but the model we trained for WanVAE has temporal_downsample=1. This should not impact the downstream camera controlled rendering, since only the order of channels are different before applying MLP projection. The following configs are affected by this bug, (same as in 2)

scenetok_va-wan_shift4_dl3dv_finetuned
scenetok_va-wan_shift8_dl3dv_finetuned

Note

In all future experiments, we will address these bugs, and for trained checkpoints, we added parameters to allow these intentionally for inference. Make sure to disable them when training your own model from scratch.

dataset.scale_focal_by_256: true # for Bug: 1
dataset.scale_context_focal_by_256: true # for Bug: 2
model.force_incorrect: true # for Bug: 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Report

1. Incorrect scaling factor

2. Incorrect scaling factor for context

3. Incorrect `temporal_downsample` factor passed to camera embedder

FilesExpand file tree

KNOWN_BUGS.md

Latest commit

History

KNOWN_BUGS.md

File metadata and controls

Bug Report

1. Incorrect scaling factor

2. Incorrect scaling factor for context

3. Incorrect temporal_downsample factor passed to camera embedder

3. Incorrect `temporal_downsample` factor passed to camera embedder