Skip to content

Commit e0cb480

Browse files
committed
scene-level embeddings updated on gdrive
1 parent cc0b3f7 commit e0cb480

File tree

4 files changed

+71
-33
lines changed

4 files changed

+71
-33
lines changed

DATA.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,11 @@ We detail data download and release instructions for preprocessing with scripts
2020
### Generated Embedding Data
2121
We release the scene level embeddings created with CrossOver on the currenly used datasets in [GDrive](https://drive.google.com/drive/folders/12vn5CCvnI9zagFyYrGzLLlMPTgF7rndW?usp=sharing), which can be used for cross-modal retrieval with custom data as detailed in demo section.
2222

23-
- `embed_scannet.pt`: Scene Embeddings For All Modalities (Point Cloud, RGB, Floorplan, Referral) in ScanNet
24-
- `embed_scan3r.pt` : Scene Embeddings For All Modalities (Point Cloud, RGB, Referral) in 3RScan
23+
- `embed_scannet.npz`: Scene Embeddings For All Modalities (Point Cloud, RGB, Floorplan, Referral) in ScanNet
24+
- `embed_scan3r.npz` : Scene Embeddings For All Modalities (Point Cloud, RGB, Referral) in 3RScan
25+
- `embed_multiscan.npz` : Scene Embeddings For All Modalities (Point Cloud, RGB, Referral) in MultiScan
26+
- `embed_arkitscenes.npz` : Scene Embeddings For All Modalities (Point Cloud, RGB, Referral) in ARKitScenes
27+
2528

2629
> You agree to the terms of ScanNet, 3RScan, ShapeNet, Scan2CAD, MultiScan, ARKitScenes and SceneVerse datasets by downloading our hosted data.
2730

README.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -70,10 +70,8 @@ assume complete data availability across all modalities. We present **CrossOver*
7070
# :newspaper: News
7171
- ![](https://img.shields.io/badge/New!-8A2BE2) **Version 1.0** - **CrossOver is now stronger than ever**. We recommend updating to this version; changes include:
7272
- More powerful pre-trained checkpoints; now available on Huggingface 👉 [here](https://huggingface.co/gradient-spaces/CrossOver/tree/main).
73-
- Support for 2 additional datasets - ARKitScenes & MultiScan
74-
73+
- Support for 2 additional datasets - ARKitScenes & MultiScan
7574

76-
- [2025-05] Pretrained checkpoints have been moved to HuggingFace 👉 [here](https://huggingface.co/gradient-spaces/CrossOver/tree/main).
7775
- [2025-03] CrossOver is accepted to **CVPR 2025** as **Highlight**. 🔥
7876
- [2025-02] **Version 0.1** - We release CrossOver on arXiv with codebase + pre-trained checkpoints. Checkout our [paper](https://arxiv.org/abs/2502.15011) and [website](https://sayands.github.io/crossover/).
7977

@@ -141,15 +139,15 @@ $ python demo/demo_instance_retrieval.py
141139

142140
Various configurable parameters:
143141

144-
- `--query_path`: Path to query object(point cloud, image, or text)
142+
- `--query_path`: Path to query object (point cloud, image, or text)
145143
- `--query_modality`: Query modality - Options: `point`, `rgb`, `referral`
146144
- `--scan_id`: Scene ID to search in
147145
- `--target_modality`: Target modality to match against - Options: `point`, `rgb`, `referral`, `cad`
148146
- `--dataset`: Dataset name - Options: `scannet`, `scan3r`, `arkitscenes`, `multiscan`
149-
- `--data_dir`: Path to dataset directory - default: `/drive/datasets/Scannet`
147+
- `--data_dir`: Path to dataset directory
150148
- `--process_dir`: Path to preprocessed features directory (for gt-projection-seg.npz)
151-
- `--ckpt`: Path to model checkpoint
152-
- `--top_k`: Number of top results to return - default: `5`
149+
- `--ckpt`: Path to pre-trained instance crossover model checkpoint (details [here](#checkpoints))
150+
- `--top_k`: Number of top results to return
153151

154152

155153
## Scene Retrieval Demo
@@ -166,7 +164,7 @@ Various configurable parameters:
166164
- `--database_path`: Path to the precomputed embeddings of the database scenes downloaded before (eg: `./release_data/embed_scannet.pt`).
167165
- `--query_modality`: Modality of the query scene, Options: `point`, `rgb`, `floorplan`, `referral`
168166
- `--database_modality`: Modality used for retrieval. Same options as above.
169-
- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`).
167+
- `--ckpt`: Path to the pre-trained scene crossover model checkpoint (details [here](#checkpoints)), example_path: `./checkpoints/scene_crossover_scannet+scan3r.pth/`.
170168

171169
For embedding and pre-trained model download, refer to [generated embedding data](DATA.md#generated-embedding-data) and [checkpoints](#checkpoints) sections.
172170

single_inference/datasets/arkit.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
import albumentations as A
1616

1717
class ARKitScenesInferDataset(Dataset):
18-
def __init__(self, data_dir, process_dir, voxel_size=0.02, frame_skip=5, image_size=[224, 224]) -> None:
18+
def __init__(self, data_dir, process_dir, voxel_size=0.02, frame_skip=1, image_size=[224, 224]) -> None:
1919
self.voxel_size = voxel_size
2020
self.frame_skip = frame_skip
2121
self.image_size = image_size
@@ -45,7 +45,8 @@ def __init__(self, data_dir, process_dir, voxel_size=0.02, frame_skip=5, image_s
4545
self.normalize_color = A.Normalize(mean=color_mean, std=color_std)
4646

4747
def extract_images(self, scan_id, color_path):
48-
pose_data = arkit.load_poses(self.scans_dir, scan_id, skip=self.frame_skip)
48+
scan_dir = osp.join(self.scans_dir, scan_id)
49+
pose_data = arkit.load_poses(scan_dir, scan_id, skip=self.frame_skip)
4950
frame_idxs = list(pose_data.keys())
5051

5152
pose_data_arr = []

single_inference/scene_inference.py

Lines changed: 57 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,8 @@ def run_inference(args, scan_id=None):
5050
# print(f'Total number of parameters: {total_params}')
5151
# assert False
5252

53-
data = { 'scene': []}
5453
if scan_id is not None:
54+
# Single scan inference
5555
data_dict = dataset[scan_id]
5656
with torch.no_grad():
5757
output = model(data_dict)
@@ -60,43 +60,79 @@ def run_inference(args, scan_id=None):
6060
for modality in output['embeddings']:
6161
output_np[modality] = output['embeddings'][modality].cpu().numpy()
6262

63-
data['scene'].append({'scan_id': scan_id, 'scene_embeds': output_np, 'masks': output['masks']})
63+
data = {'scene': [{'scan_id': scan_id, 'scene_embeds': output_np, 'masks': output['masks']}]}
6464
save_data = {
6565
'scene': data['scene']
6666
}
6767
np.savez(f'embed_{args.dataset.lower()}_{scan_id}.npz', **save_data)
6868
log.info(f'Saved embeddings for {scan_id}.')
6969

7070
else:
71-
for idx, scan_id in tqdm(enumerate(dataset.scan_ids)):
72-
data_dict = dataset[idx]
73-
with torch.no_grad():
74-
output = model(data_dict)
71+
output_file = f'/drive/dumps/multimodal-spaces/v1.0_release/embed_{args.dataset.lower()}.npz'
72+
73+
existing_data = {'scene': []}
74+
processed_scan_ids = set()
75+
76+
if osp.exists(output_file):
77+
try:
78+
existing_npz = np.load(output_file, allow_pickle=True)
79+
existing_data = {'scene': existing_npz['scene'].tolist()}
80+
processed_scan_ids = {item['scan_id'] for item in existing_data['scene']}
81+
log.info(f'Loaded existing embeddings for {len(processed_scan_ids)} scans. Resuming from where we left off.')
82+
except Exception as e:
83+
log.warning(f'Could not load existing file {output_file}: {e}. Starting fresh.')
84+
existing_data = {'scene': []}
85+
processed_scan_ids = set()
86+
87+
remaining_scans = [(idx, scan_id) for idx, scan_id in enumerate(dataset.scan_ids)
88+
if scan_id not in processed_scan_ids]
89+
90+
if not remaining_scans:
91+
log.info('All scans already processed.')
92+
return
93+
94+
log.info(f'Processing {len(remaining_scans)} remaining scans out of {len(dataset.scan_ids)} total scans.')
95+
96+
for idx, scan_id in tqdm(remaining_scans, desc="Processing scans"):
97+
try:
98+
data_dict = dataset[idx]
99+
with torch.no_grad():
100+
output = model(data_dict)
101+
102+
output_np = {}
103+
for modality in output['embeddings']:
104+
output_np[modality] = output['embeddings'][modality].cpu().numpy()
105+
106+
existing_data['scene'].append({
107+
'scan_id': scan_id,
108+
'scene_embeds': output_np,
109+
'masks': output['masks']
110+
})
75111

76-
output_np = {}
77-
for modality in output['embeddings']:
78-
output_np[modality] = output['embeddings'][modality].cpu().numpy()
112+
save_data = {
113+
'scene': existing_data['scene']
114+
}
115+
np.savez_compressed(output_file, **save_data)
116+
log.info(f'Processed and saved scan {scan_id} ({len(existing_data["scene"])}/{len(dataset.scan_ids)} total).')
79117

80-
data['scene'].append({'scan_id': scan_id, 'scene_embeds': output_np, 'masks': output['masks']})
81-
82-
save_data = {
83-
'scene': data['scene']
84-
}
85-
np.savez(f'/drive/dumps/multimodal-spaces/v1.0_release/embed_{args.dataset.lower()}.npz', **save_data)
86-
log.info(f'Saved embeddings for {len(data["scene"])} scenes.')
118+
except Exception as e:
119+
log.error(f'Error processing scan {scan_id}: {e}. Skipping and continuing.')
120+
continue
121+
122+
log.info(f'Completed processing. Final embeddings saved for {len(existing_data["scene"])} scenes.')
87123

88124
if __name__ == '__main__':
89125
parser = argparse.ArgumentParser(description='Scene Inference')
90-
parser.add_argument('--dataset', default='Scannet', type=str, required=False)
91-
parser.add_argument('--data_dir', default='/drive/datasets/Scannet', type=str, required=False)
92-
parser.add_argument('--process_dir', default='/drive/dumps/multimodal-spaces/preprocess_feats/Scannet', type=str, required=False)
93-
parser.add_argument('--ckpt', default='/drive/dumps/multimodal-spaces/runs/new_runs/rgb/scene_crossover_scannet+scan3r+multiscan+arkitscenes_scratch.pth', type=str, required=False)
126+
parser.add_argument('--dataset', default='Scan3R', type=str, required=False)
127+
parser.add_argument('--data_dir', default='/scratch/users/gauravp/datasets/Scan3R', type=str, required=False)
128+
parser.add_argument('--process_dir', default='/scratch/users/gauravp/dumps/preprocess_feats/Scan3R', type=str, required=False)
129+
parser.add_argument('--ckpt', default='/scratch/users/gauravp/ckpts/scene_crossover_scannet+scan3r+multiscan+arkitscenes_scratch.pth', type=str, required=False)
94130
parser.add_argument('--scan_id', default='', type=str, required=False)
95131
parser.add_argument('--input_dim_3d', default=512, type=int, required=False)
96132
parser.add_argument('--input_dim_2d', default=1536, type=int, required=False)
97133
parser.add_argument('--input_dim_1d', default=768, type=int, required=False)
98134
parser.add_argument('--out_dim', default=768, type=int, required=False)
99-
135+
100136
# Reproducibility
101137
random.seed(42)
102138
np.random.seed(42)

0 commit comments

Comments
 (0)