Checkpoint link update + minor changes

sayands · sayands · commit 35f9c5bfd869 · 2025-09-29T02:19:19.000-07:00
Minor typo fix

Minor fix
diff --git a/README.md b/README.md
@@ -68,7 +68,6 @@ assume complete data availability across all modalities. We present **CrossOver*
 
 
 # :newspaper: News
-<!-- > 📡 Stay tuned for stronger checkpoint release trained on many more datasets! -->
 - ![](https://img.shields.io/badge/New!-8A2BE2) **Version 1.0** - **CrossOver is now stronger than ever**. We recommend updating to this version; changes include:
   - More powerful pre-trained checkpoints; now available on Huggingface 👉 [here](https://huggingface.co/gradient-spaces/CrossOver/tree/main).
   - Support for 2 additional datasets - ARKitScenes & MultiScan
@@ -132,26 +131,6 @@ See [DATA.MD](DATA.md) for detailed instructions on data download, preparation a
 
 # :film_projector: Demo
 
-## Scene Retrieval Demo
-
-This demo script allows users to process a custom scene and retrieve the closest match from the supported datasets using different modalities. Detailed usage can be found inside the script. Example usage below:
-
-```bash
-$ python demo/demo_scene_retrieval.py
-```
-
-Various configurable parameters:
-
-- `--query_path`: Path to the query scene file (eg: `./example_data/dining_room/scene_cropped.ply`).
-- `--database_path`: Path to the precomputed embeddings of the database scenes downloaded before (eg: `./release_data/embed_scannet.pt`).
-- `--query_modality`: Modality of the query scene, Options: `point`, `rgb`, `floorplan`, `referral`
-- `--database_modality`: Modality used for retrieval. Same options as above.
-- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`).
-
-For embedding and pre-trained model download, refer to [generated embedding data](DATA.md#generated-embedding-data) and [checkpoints](#checkpoints) sections.
-
-> We also provide scripts for inference on a single scan of the supported datasets. Details in **Single Inference** section in [TRAIN.md](TRAIN.md).
-
 ## Instance Retrieval Demo
 
 This demo script allows users to process a custom object and run cross-modal retrieval to find the closest matched object within a target scene . Detailed usage can be found inside the script. Example usage below:
@@ -173,6 +152,27 @@ Various configurable parameters:
 - `--top_k`: Number of top results to return - default: `5`
 
 
+## Scene Retrieval Demo
+
+This demo script allows users to process a custom scene and retrieve the closest match from the supported datasets using different modalities. Detailed usage can be found inside the script. Example usage below:
+
+```bash
+$ python demo/demo_scene_retrieval.py
+```
+
+Various configurable parameters:
+
+- `--query_path`: Path to the query scene file (eg: `./example_data/dining_room/scene_cropped.ply`).
+- `--database_path`: Path to the precomputed embeddings of the database scenes downloaded before (eg: `./release_data/embed_scannet.pt`).
+- `--query_modality`: Modality of the query scene, Options: `point`, `rgb`, `floorplan`, `referral`
+- `--database_modality`: Modality used for retrieval. Same options as above.
+- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`).
+
+For embedding and pre-trained model download, refer to [generated embedding data](DATA.md#generated-embedding-data) and [checkpoints](#checkpoints) sections.
+
+> [!TIP]
+> We also provide scripts for inference on a single scan of the supported datasets. Details in **Single Inference** section in [TRAIN.md](TRAIN.md).
+
 
 # :weight_lifting: Training and Inference 
 
@@ -194,13 +194,13 @@ We provide all available checkpoints on huggingface 👉 [here](https://huggingf
 |Instance CrossOver trained on 3RScan        | [3RScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scan3r.pth) |
 |Instance CrossOver trained on ScanNet        | [ScanNet](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scannet.pth) |
 |Instance CrossOver trained on ScanNet + 3RScan        | [ScanNet+3RScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scannet%2Bscan3r.pth) |
-|Instance CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan        | [ScanNet+3RScan+ARKitScenes+MultiScan]() |
+|Instance CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan        | [ScanNet+3RScan+ARKitScenes+MultiScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scannet%2Bscan3r%2Bmultiscan%2Barkitscenes.pth) |
 
 ##### ```scene_crossover```
 | Description            | Checkpoint Link |
 | ------------------ | -------------- |
 | Unified CrossOver trained on ScanNet + 3RScan        | [ScanNet+3RScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/scene_crossover_scannet%2Bscan3r.pth) |
-| Unified CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan        | [ScanNet+3RScan+ARKitScenes+MultiScan]() |
+| Unified CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan        | [ScanNet+3RScan+ARKitScenes+MultiScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/scene_crossover_scannet%2Bscan3r%2Bmultiscan%2Barkitscenes.pth) |
 
 
 ## 🚧 TODO List
diff --git a/TRAIN.md b/TRAIN.md
@@ -45,13 +45,13 @@ We provide all available checkpoints on huggingface 👉 [here](https://huggingf
 |Instance CrossOver trained on 3RScan        | [3RScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scan3r.pth) |
 |Instance CrossOver trained on ScanNet        | [ScanNet](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scannet.pth) |
 |Instance CrossOver trained on ScanNet + 3RScan        | [ScanNet+3RScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scannet%2Bscan3r.pth) |
-|Instance CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan        | [ScanNet+3RScan+ARKitScenes+MultiScan]() |
+|Instance CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan        | [ScanNet+3RScan+ARKitScenes+MultiScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/instance_crossover_scannet%2Bscan3r%2Bmultiscan%2Barkitscenes.pth) |
 
 ##### ```scene_crossover```
 | Description            | Checkpoint Link |
 | ------------------ | -------------- |
 | Unified CrossOver trained on ScanNet + 3RScan        | [ScanNet+3RScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/scene_crossover_scannet%2Bscan3r.pth) |
-| Unified CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan       | [ScanNet+3RScan+ARKitScenes+MultiScan]() |
+| Unified CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan        | [ScanNet+3RScan+ARKitScenes+MultiScan](https://huggingface.co/gradient-spaces/CrossOver/tree/main/scene_crossover_scannet%2Bscan3r%2Bmultiscan%2Barkitscenes.pth) |
 
 
 # :shield: Single Inference
diff --git a/prepare_data/README.md b/prepare_data/README.md
@@ -119,7 +119,7 @@ Scan3R/
 1. Download ARKitScenes 3dod data using the following command:
 
 ```bash
-python ARKitScenes/download_data.py 3dod --video_id_csv PATH_TO_3dod_train_val_splits.csv --download_dir PATH_TO_ARKITSCENES
+python download_data.py 3dod --video_id_csv PATH_TO_3dod_train_val_splits.csv --download_dir PATH_TO_ARKITSCENES
 ```
 The files mentioned in the above command - ```download_data.py``` and ```3dod_train_val_splits.csv``` can be found in the official repository [here](https://github.com/apple/ARKitScenes), along with more detailed instructions and descriptions of the data.
 
@@ -157,14 +157,14 @@ ARKitScenes/
 ```
 
 #### MultiScan
-1. Download MultiScan data into MultiScan/scenes and run the following to extract MultiScan data 
+1. Download MultiScan data into MultiScan/scenes and run the following to extract MultiScan data.
  
  ```bash
 cd MultiScan/scenes
 unzip '*.zip'
 rm -rf '*.zip'
 ```
-3. To generate sequence of RGB images and corresponding camera poses from the ```.mp4``` file, run the follwing
+3. To generate sequence of RGB images and corresponding camera poses from the ```.mp4``` file, run the following:
 ```bash
 cd prepare_data/multiscan
 python preprocess_2d_multiscan.py --base_dir PATH_TO_MULTISCAN --frame_interval {frame_interval}
@@ -191,4 +191,4 @@ MultiScan/
     ├── test_scans.txt
     └── sceneverse  
         └── ssg_ref_rel2_template.json
-```
+```
diff --git a/trainer/build.py b/trainer/build.py
@@ -122,6 +122,7 @@ def backward(self, loss: torch.Tensor) -> None:
         if self.grad_norm is not None and self.accelerator.sync_gradients:
             self.accelerator.clip_grad_norm_(self.model.parameters(), 1.0)
         self.optimizer.step()
+        self.scheduler.step()
     
     def log(self, results: Dict[str, Any], mode: str = "train") -> None:
         """Log training metrics and learning rates."""
diff --git a/trainer/grounding_trainer.py b/trainer/grounding_trainer.py
@@ -37,8 +37,6 @@ def train_step(self, epoch: int) -> None:
                 
                 pbar.update(1)
         
-        self.scheduler.step()
-        
     @torch.no_grad()
     def eval_step(self, epoch: int) -> bool:
         self.model.eval()
diff --git a/trainer/unified_trainer.py b/trainer/unified_trainer.py
@@ -61,8 +61,6 @@ def train_step(self, epoch: int) -> None:
                 
                 pbar.update(1)
         
-        self.scheduler.step()
-        
     @torch.no_grad()
     def eval_step(self, epoch: int) -> bool:
         self.model.eval()