You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: DATA.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,8 +46,8 @@ File structure below:
46
46
## :wrench: Data Preprocessing
47
47
In order to process data faster during training + inference, we preprocess 1D (referral), 2D (RGB + floorplan) & 3D (Point Cloud + CAD) for both object instances and scenes. Note that, since for 3RScan dataset, they do not provide frame-wise RGB segmentations, we project the 3D data to 2D and store it in `.npz` format for every scan. We provide the scripts for projection. Here's an overview which data features are precomputed:
48
48
49
-
- Object Instance: Referral, Multi-view RGB images, Point Cloud & CAD (only for ScanNet)
50
-
- Scene: Referral, Multi-view RGB images, Floorplan (only for ScanNet) Point Cloud
49
+
- Object Instance: Referral, Multi-view RGB images, Point Cloud, & CAD (only for ScanNet)
50
+
- Scene: Referral, Multi-view RGB images, Floorplan (only for ScanNet), & Point Cloud
51
51
52
52
We provide the preprocessing scripts which should be easily cusotmizable for new datasets. Further instructions below.
Copy file name to clipboardExpand all lines: TRAIN.md
+28-3Lines changed: 28 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,31 @@ We provide all available checkpoints on huggingface 👉 [here](https://huggingf
55
55
56
56
57
57
# :shield: Single Inference
58
-
We release script to perform inference (generate scene-level embeddings) on a single scan of 3RScan/Scannet. Detailed usage in the file. Quick instructions below:
58
+
59
+
## Instance Inference
60
+
We provide script to perform instance-level cross-modal retrieval inference on a single scan, and report retrieval metrics and matched objects within the scene, across all available modality pairs. Detailed usage in the file. Quick instructions below:
61
+
62
+
```bash
63
+
$ python single_inference/instance_inference.py
64
+
```
65
+
66
+
Various configurable parameters:
67
+
68
+
-`--dataset`: Dataset name - Options: `scannet`, `scan3r`, `arkitscenes`, `multiscan`
69
+
-`--process_dir`: Path to processed features directory containing preprocessed object data
70
+
-`--ckpt`: Path to the pre-trained instance crossover model checkpoint (details [here](TRAIN.md#checkpoint-inventory)), example_path: `./checkpoints/instance_crossover_scannet+scan3r+multiscan+arkitscenes.pth`
71
+
-`--scan_id`: Scan ID to run inference on (e.g., `scene_00004_00`)
72
+
-`--modalities`: List of modalities to use (default: `['rgb', 'point', 'cad', 'referral']`)
73
+
-`--input_dim_3d`: Input dimension for 3D features (default: 384)
74
+
-`--input_dim_2d`: Input dimension for 2D features (default: 1536)
75
+
-`--input_dim_1d`: Input dimension for 1D features (default: 768)
> **Note**: This script requires preprocessed object data for the target scene, namely `objectsDataMultimodal.npz` files generated during data preprocessing as described in [DATA.md](DATA.md/#wrench-data-preprocessing). The scan must have valid object instances across the specified modalities.
80
+
81
+
## Scene Inference
82
+
We release a script to perform inference (generate scene-level embeddings) on a single scan of all supported datasets. Detailed usage in the file. Quick instructions below:
59
83
60
84
```bash
61
85
$ python single_inference/scene_inference.py
@@ -65,12 +89,13 @@ Various configurable parameters:
65
89
66
90
-`--dataset`: dataset name, Scannet/Scan3R
67
91
-`--data_dir`: data directory (eg: `./datasets/Scannet`, assumes similar structure as in `preprocess.md`).
68
-
-`--floorplan_dir`: directory consisting of the rasterized floorplans (this can point to the downloaded preprocessed directory), only for Scannet
69
-
-`--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](TRAIN.md#checkpoint-inventory)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`).
92
+
-`--process_dir`: preprocessed data directory (this can point to the downloaded preprocessed directory)
93
+
-`--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](TRAIN.md#checkpoint-inventory)), example_path: (`./checkpoints/scene_crossover_scannet+scan3r.pth/`).
70
94
-`--scan_id`: the scan id from the dataset you'd like to calculate embeddings for (if not provided, embeddings for all scans are calculated).
71
95
72
96
The script will output embeddings in the same format as provided [here](DATA.md/#generated-embedding-data).
73
97
98
+
74
99
# :bar_chart: Evaluation
75
100
#### Cross-Modal Object Retrieval
76
101
Run the following script (refer to the script to run instance baseline/instance crossover) for object instance + scene retrieval results using the instance-based methods. Detailed usage inside the script.
'data_dir': '/drive/datasets/Scannet',# Update this with your data path
30
-
'process_dir': '/drive/dumps/multimodal-spaces/preprocess_feats/Scannet',# Update this with your processed data path
31
-
'ckpt': '/drive/dumps/multimodal-spaces/runs/new_runs/instance_crossover_scannet+scan3r+multiscan+arkitscenes.pth', # Update this with your model checkpoint
32
-
'scan_id': 'scene0568_00', # Default scan to search in
33
-
'query_modality': 'point', # point, rgb, referral
34
-
'target_modality': 'referral', # point, rgb, referral, cad
35
-
'query_path': './demo_data/kitchen/scene.ply', # Path to your query file
0 commit comments