Stability-AI
diff --git a/‎README.md‎
Lines changed: 5 additions & 5 deletions b/‎README.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎assets/sv4d_example_video/human_slow_black_bg.mp4‎
-549 KB b/‎assets/sv4d_example_video/human_slow_black_bg.mp4‎
-549 KB
diff --git a/‎assets/sv4d_example_video/bunnyman.mp4‎ ‎assets/sv4d_videos/bunnyman.mp4‎assets/sv4d_example_video/bunnyman.mp4 renamed to assets/sv4d_videos/bunnyman.mp4 b/‎assets/sv4d_example_video/bunnyman.mp4‎ ‎assets/sv4d_videos/bunnyman.mp4‎assets/sv4d_example_video/bunnyman.mp4 renamed to assets/sv4d_videos/bunnyman.mp4
diff --git a/‎assets/sv4d_example_video/dolphin.mp4‎ ‎assets/sv4d_videos/dolphin.mp4‎assets/sv4d_example_video/dolphin.mp4 renamed to assets/sv4d_videos/dolphin.mp4 b/‎assets/sv4d_example_video/dolphin.mp4‎ ‎assets/sv4d_videos/dolphin.mp4‎assets/sv4d_example_video/dolphin.mp4 renamed to assets/sv4d_videos/dolphin.mp4
diff --git a/‎assets/sv4d_example_video/green_robot.mp4‎ ‎assets/sv4d_videos/green_robot.mp4‎assets/sv4d_example_video/green_robot.mp4 renamed to assets/sv4d_videos/green_robot.mp4 b/‎assets/sv4d_example_video/green_robot.mp4‎ ‎assets/sv4d_videos/green_robot.mp4‎assets/sv4d_example_video/green_robot.mp4 renamed to assets/sv4d_videos/green_robot.mp4
diff --git a/‎assets/sv4d_example_video/guppie_v0.mp4‎ ‎assets/sv4d_videos/guppie_v0.mp4‎assets/sv4d_example_video/guppie_v0.mp4 renamed to assets/sv4d_videos/guppie_v0.mp4 b/‎assets/sv4d_example_video/guppie_v0.mp4‎ ‎assets/sv4d_videos/guppie_v0.mp4‎assets/sv4d_example_video/guppie_v0.mp4 renamed to assets/sv4d_videos/guppie_v0.mp4
diff --git a/‎…ets/sv4d_example_video/hiphop_parrot.mp4‎ ‎assets/sv4d_videos/hiphop_parrot.mp4‎assets/sv4d_example_video/hiphop_parrot.mp4 renamed to assets/sv4d_videos/hiphop_parrot.mp4 b/‎…ets/sv4d_example_video/hiphop_parrot.mp4‎ ‎assets/sv4d_videos/hiphop_parrot.mp4‎assets/sv4d_example_video/hiphop_parrot.mp4 renamed to assets/sv4d_videos/hiphop_parrot.mp4
diff --git a/‎assets/sv4d_example_video/human5.mp4‎ ‎assets/sv4d_videos/human5.mp4‎assets/sv4d_example_video/human5.mp4 renamed to assets/sv4d_videos/human5.mp4 b/‎assets/sv4d_example_video/human5.mp4‎ ‎assets/sv4d_videos/human5.mp4‎assets/sv4d_example_video/human5.mp4 renamed to assets/sv4d_videos/human5.mp4
diff --git a/‎assets/sv4d_example_video/human7.mp4‎ ‎assets/sv4d_videos/human7.mp4‎assets/sv4d_example_video/human7.mp4 renamed to assets/sv4d_videos/human7.mp4 b/‎assets/sv4d_example_video/human7.mp4‎ ‎assets/sv4d_videos/human7.mp4‎assets/sv4d_example_video/human7.mp4 renamed to assets/sv4d_videos/human7.mp4
diff --git a/‎assets/sv4d_example_video/lucia_v000.mp4‎ ‎assets/sv4d_videos/lucia_v000.mp4‎assets/sv4d_example_video/lucia_v000.mp4 renamed to assets/sv4d_videos/lucia_v000.mp4 b/‎assets/sv4d_example_video/lucia_v000.mp4‎ ‎assets/sv4d_videos/lucia_v000.mp4‎assets/sv4d_example_video/lucia_v000.mp4 renamed to assets/sv4d_videos/lucia_v000.mp4
@@ -9,23 +9,23 @@
 - We are releasing **[Stable Video 4D (SV4D)](https://huggingface.co/stabilityai/sv4d)**, a video-to-4D diffusion model for novel-view video synthesis. For research purposes:
     - **SV4D** was trained to generate 40 frames (5 video frames x 8 camera views) at 576x576 resolution, given 5 context frames (the input video), and 8 reference views (synthesised from the first frame of the input video, using a multi-view diffusion model like SV3D) of the same size, ideally white-background images with one object.
     - To generate longer novel-view videos (21 frames), we propose a novel sampling method using SV4D, by first sampling 5 anchor frames and then densely sampling the remaining frames while maintaining temporal consistency.
-    - You can run the community-build gradio demo locally by running `python -m scripts.demo.gradio_app_sv4d`.
+    - To run the community-build gradio demo locally, run `python -m scripts.demo.gradio_app_sv4d`.
     - Please check our [project page](https://sv4d.github.io), [tech report](https://sv4d.github.io/static/sv4d_technical_report.pdf) and [video summary](https://www.youtube.com/watch?v=RBP8vdAWTgk) for more details.
 
-**QUICKSTART** : `python scripts/sampling/simple_video_sample_4d.py --input_path assets/sv4d_example_video/test_video1.mp4 --output_folder outputs/sv4d` (after downloading [sv4d.safetensors](https://huggingface.co/stabilityai/sv4d) and [sv3d_u.safetensors](https://huggingface.co/stabilityai/sv3d) from HuggingFace into `checkpoints/`)
+**QUICKSTART** : `python scripts/sampling/simple_video_sample_4d.py --input_path assets/sv4d_videos/test_video1.mp4 --output_folder outputs/sv4d` (after downloading [sv4d.safetensors](https://huggingface.co/stabilityai/sv4d) and [sv3d_u.safetensors](https://huggingface.co/stabilityai/sv3d) from HuggingFace into `checkpoints/`)
 
 To run **SV4D** on a single input video of 21 frames:
 - Download SV3D models (`sv3d_u.safetensors` and `sv3d_p.safetensors`) from [here](https://huggingface.co/stabilityai/sv3d) and SV4D model (`sv4d.safetensors`) from [here](https://huggingface.co/stabilityai/sv4d) to `checkpoints/`
 - Run `python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>`
     - `input_path` : The input video `<path/to/video>` can be
-      - a single video file in `gif` or `mp4` format, such as `assets/sv4d_example_video/test_video1.mp4`, or
+      - a single video file in `gif` or `mp4` format, such as `assets/sv4d_videos/test_video1.mp4`, or
       - a folder containing images of video frames in `.jpg`, `.jpeg`, or `.png` format, or
       - a file name pattern matching images of video frames.
     - `num_steps` : default is 20, can increase to 50 for better quality but longer sampling time.
     - `sv3d_version` : To specify the SV3D model to generate reference multi-views, set `--sv3d_version=sv3d_u` for SV3D_u or `--sv3d_version=sv3d_p` for SV3D_p.
     - `elevations_deg` : To generate novel-view videos at a specified elevation (default elevation is 10) using SV3D_p (default is SV3D_u), run `python scripts/sampling/simple_video_sample_4d.py --input_path test_video1.mp4 --sv3d_version sv3d_p --elevations_deg 30.0`
-    - **Background removal** : For input videos with plain background, (optionally) use [rembg](https://github.com/danielgatis/rembg) to remove background and crop video frames by setting `--remove_bg=True`. To obtain higher quality outputs on real-world input videos with noisy background, try segmenting the foreground object using [Cliipdrop](https://clipdrop.co/) before running SV4D.
-    - **Low VRAM environment** : To run on GPUs with low VRAM, try setting `--decoding_t=1` (of frames decoded at a time) or lower video resolution like `--img_size=512`.
+    - **Background removal** : For input videos with plain background, (optionally) use [rembg](https://github.com/danielgatis/rembg) to remove background and crop video frames by setting `--remove_bg=True`. To obtain higher quality outputs on real-world input videos with noisy background, try segmenting the foreground object using [Clipdrop](https://clipdrop.co/) or [SAM2](https://github.com/facebookresearch/segment-anything-2) before running SV4D.
+    - **Low VRAM environment** : To run on GPUs with low VRAM, try setting `--encoding_t=1` (of frames encoded at a time) and `--decoding_t=1` (of frames decoded at a time) or lower video resolution like `--img_size=512`.
 
   ![tile](assets/sv4d.gif)