Skip to content

Latest commit

 

History

History
62 lines (43 loc) · 2.4 KB

File metadata and controls

62 lines (43 loc) · 2.4 KB

Bug Report

1. Incorrect scaling factor

In the following experiments, we incorrectly scale the focal point by a factor defined in src/misc/dl3dv_utils.py; line:45. The trained model in this case would just adapt to the incorrect scaling factor since the representation is structured implicitly and can only be observed through the renderer. For consistency with standard 3D approaches, we will fix this bug in the future versions.

scenetok_va-vdc_shift4_dl3dv_finetuned
scenetok_va-vdc_shift8_dl3dv_finetuned

The intrinsics are already normalized by the stored height and width therefore does not require additional scaling. The dont have this bug in other experiments.

2. Incorrect scaling factor for context

Similar to above, we incorrectly scale the focal point of the context views only since the experiments used the same precomputed latents for VA-VAE as in the previous issue

scenetok_va-wan_shift4_dl3dv_finetuned
scenetok_va-wan_shift8_dl3dv_finetuned

3. Incorrect temporal_downsample factor passed to camera embedder

In src/model/embedding/lvsm_embed.py, we provide with the parameter temporal_downsample for the reshapping layer as shown in the following,

nn.Sequential(
    Rearrange(
        "b ... (t c) (hh ph) (ww pw) -> b ... (hh ww) (t ph pw c)",
        ph=cfg.patch_size,
        pw=cfg.patch_size,
        t=temporal_downsample
    ),
    nn.Linear(
        cfg.in_channels * (cfg.patch_size**2),
        embed_dim,
        bias=False,
    ),
)

Normally the correct value should be temporal_downsample=4 for both VideoDCAE and WanVAE, but the model we trained for WanVAE has temporal_downsample=1. This should not impact the downstream camera controlled rendering, since only the order of channels are different before applying MLP projection. The following configs are affected by this bug, (same as in 2)

scenetok_va-wan_shift4_dl3dv_finetuned
scenetok_va-wan_shift8_dl3dv_finetuned

Note

In all future experiments, we will address these bugs, and for trained checkpoints, we added parameters to allow these intentionally for inference. Make sure to disable them when training your own model from scratch.

dataset.scale_focal_by_256: true # for Bug: 1
dataset.scale_context_focal_by_256: true # for Bug: 2
model.force_incorrect: true # for Bug: 3