Skip to content

pablovela5620/mini-dust3r

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mini-Dust3r

A miniature version of dust3r only for performing inference. This makes it much easier to use without needing the training/data/eval code. Tested on Linux, Apple Silicon Macs, and Windows (Thanks @Vincentqyw)

example output

Installation

Easily installable via pip

pip install mini-dust3r

Demo

A hosted demo can be found on huggingface here

or from source using Pixi

git clone https://github.com/pablovela5620/mini-dust3r.git
pixi run app

You can also just use rerun demo directly with

pixi run rerun-demo

Minimal Example

Uses Rerun to visualize the outputs

import rerun as rr
from pathlib import Path
from argparse import ArgumentParser
import torch

from mini_dust3r.api import OptimizedResult, inferece_dust3r_from_rgb, log_optimized_result
from mini_dust3r.model import AsymmetricCroCo3DStereo
from mini_dust3r.utils.image import load_images_from_dir_or_list


def main(image_dir: Path):
    if torch.backends.mps.is_available():
        device = "mps"
    elif torch.cuda.is_available():
        device = "cuda"
    else:
        device = "cpu"

    model = AsymmetricCroCo3DStereo.from_pretrained(
        "nielsr/DUSt3R_ViTLarge_BaseDecoder_512_dpt"
    ).to(device)

    # Load images from directory
    rgb_list:list[UInt8[np.ndarray, "H W 3"]]  = load_images_from_dir_or_list(image_dir)

    optimized_results: OptimizedResult = inferece_dust3r_from_rgb(
        rgb_list=rgb_list,
        model=model,
        device=device,
        batch_size=1,
    )
    log_optimized_result(optimized_results, Path("world"))


if __name__ == "__main__":
    parser = ArgumentParser("mini-dust3r rerun demo script")
    parser.add_argument(
        "--image-dir",
        type=Path,
        help="Directory containing images to process",
        required=True,
    )
    rr.script_add_args(parser)
    args = parser.parse_args()
    rr.script_setup(args, "mini-dust3r")
    main(args.image_dir)
    rr.script_teardown(args)

Calling Model Directly

Requires converting rgb numpy arrays to torch tensors, making a dict that is defined in typed_dict ImageDict and generating pairs to be fed into the Dust3r model.

    processed_imgs: list[Float32[torch.Tensor, "3 H W"]] = [
        preprocess_rgb(rgb_img, image_size, square_ok=False) for rgb_img in rgb_list
    ]
    imgs: list[ImageDict] = [
        ImageDict(
            img=rearrange(img, "c h w -> 1 c h w"),
            true_shape=np.int32([[img.shape[1], img.shape[2]]]),
            idx=idx,
            instance=str(idx),
        )
        for idx, img in enumerate(processed_imgs)
    ]
    assert imgs, "no images found"

    # if only one image was loaded, duplicate it to feed into stereo network
    if len(imgs) == 1:
        imgs = [imgs[0], copy.deepcopy(imgs[0])]
        imgs[1]["idx"] = 1

    pairs: list[tuple[ImageDict, ImageDict]] = make_pairs(
        imgs, scene_graph="complete", prefilter=None, symmetrize=True
    )
    output: Dust3rResult = inference(pairs, model, device, batch_size=batch_size)

Inputs and Outputs

Inference Function

def inferece_dust3r_from_rgb(
    rgb_list: list[np.ndarray],
    model: AsymmetricCroCo3DStereo,
    device: Literal["cpu", "cuda", "mps"],
    batch_size: int = 1,
    image_size: Literal[224, 512] = 512,
    niter: int = 100,
    schedule: Literal["linear", "cosine"] = "linear",
    min_conf_thr: float = 0.25,
) -> OptimizedResult:

Consists of

  • rgb_list - List of RGB images as numpy arrays
  • model - The Dust3r model to use for inference
  • device - device to use for inference ("cpu", "cuda", or "mps")
  • batch_size - The batch size for inference. Defaults to 1.
  • image_size - The size of the input images. Defaults to 512.
  • niter - The number of iterations for the global alignment optimization. Defaults to 100.
  • schedule - The learning rate schedule for the global alignment optimization. Defaults to "linear"
  • min_conf_thr - The minimum confidence threshold for the optimized result. Defaults to 0.25.

Output from OptimizedResult

@dataclass
class OptimizedResult:
    rgb_hw3_list: list[Float32[np.ndarray, "h w 3"]]
    pinhole_param_list: list[PinholeParameters]
    depth_hw_list: list[Float32[np.ndarray, "h w"]]
    conf_hw_list: list[Float32[np.ndarray, "h w"]]
    masks_list: list[Bool[np.ndarray, "h w"]]
    point_cloud: trimesh.PointCloud
    mesh: trimesh.Trimesh

Consists of

  • rgb_hw3_list - list of RGB images shape (list[hw3])
  • pinhole_param_list - list of camera parameters (intrinsics and extrinsics)
  • depth_hw_list - list of normalized depth maps shape (list[hw])
  • conf_hw_list - list of normalized confidence values (list[hw])
  • masks_list - list of masks (list[hw])
  • point_cloud - as a trimesh pointcloud object
  • mesh - as a trimesh mesh object

References

Full credit goes the Naver for their great work on

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •