Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

🚀 News

2025.7.15 code released
2025.7.15 arXiv preprint released
2025.6.26 accepted by ICCV 2025

📖 Introduction

3D Semantic Scene Completion (SSC) has gained increasing attention due to its pivotal role in 3D perception. Recent advancements have primarily focused on refining voxel-level features to construct 3D scenes. However, treating voxels as the basic interaction units inherently limits the utilization of class-level information, which is proven critical for enhancing the granularity of completion results. To address this, we propose Disentangling Instance and Scene Contexts (DISC), a novel dual-stream paradigm that enhances learning for both instance and scene categories through separated optimization. Specifically, we replace voxel queries with discriminative class queries, which incorporate class-specific geometric and semantic priors. Additionally, we exploit the intrinsic properties of classes to design specialized decoding modules, facilitating targeted interactions and efficient class-level information flow. Experimental results demonstrate that DISC achieves state-of-the-art (SOTA) performance on both SemanticKITTI and SSCBench-KITTI-360 benchmarks, with mIoU scores of 17.35 and 20.55, respectively. Remarkably, DISC even outperforms multi-frame SOTA methods using only single-frame input and significantly improves instance category performance, surpassing both single-frame and multi-frame SOTA instance mIoU by 17.9% and 11.9%, respectively, on the SemanticKITTI hidden test.

⚙️ Method

The overall architecture. (a) DISC is a novel semantic scene completion method with a dual-stream framework for specialized instance and scene categories processing. (b) The Discriminative Query Generator (DQI) integrates geometric and contextual priors into instance and scene queries based on category attributes. (c) Details of the Adaptive Instance Layer (AIL) and the Global Scene Layer (GSL), which address the distinct challenges faced by instance and scene categories during the reconstruction process in a differentiated manner. For clarity, the Feed-Forward Network (FFN) and positional embedding are omitted in the figure.

📊 Quantitative Results

Table 1. Quantitative results on SemanticKITTI test. Among all methods, the top three ranked approaches are marked as red, bold, and underlined. For single-frame methods, DISC achieves SOTA performance in mIoU, IoU, InsM, and ScnM. Notably, using only single-frame input, DISC surpasses even multi-frame SOTA methods in mIoU, IoU, and InsM.

Table 2. Quantitative results on SSCBench-KITTI360 test. Among all methods, the top three ranked approaches are marked as red, bold, and underlined. DISC achieves SOTA results in mIoU and InsM, while surpassing LiDAR-based methods across multiple category-specific metrics.

🏃‍♂️ Getting Started

Step 1. Installation

Install PyTorch and Torchvision.
Install MMDetection.
Install the rest of the requirements with pip.
```
pip install -r requirements.txt
```
Download the pre-trained models of maskdino and swin, and place them in the pretrain folder.

Step 2. Dataset Preparation

Please refer to Symphonies to complete the preparation of SemanticKitti and Kitti-360 datasets. Also, modify the corresponding path configurations in configs/datasets.

Step 3. Training and Inference

Setup
```
export PYTHONPATH=`pwd`:$PYTHONPATH
```
Training
```
python tools/train.py 
```
Testing

Generate the outputs for submission on the evaluation server
```
python tools/test.py
```

🏆 Model Zoo

We provide the pretrained weights on SemanticKITTI and KITTI360 datasets, reproduced with the released codebase.

Dataset	IoU	mIoU	Model Weights	Output Log
SemanticKITTI	45.32	17.35	Link	Log
KITTI360	47.35	20.55	Link	-

🌟 Acknowledgement

We extend our sincere gratitude to these outstanding open source projects:

Since it is difficult to include every referenced project, please let us know if your repository is missing from the list, and we will update it accordingly.

📄 Citation

If you find our work beneficial for your research, please consider citing our paper and give us a star:

@misc{liu2025disentanglinginstancescenecontexts,
      title={Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion}, 
      author={Enyu Liu and En Yu and Sijia Chen and Wenbing Tao},
      year={2025},
      eprint={2507.08555},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2507.08555}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
configs		configs
docs		docs
maskdino		maskdino
ssc_ia		ssc_ia
tools		tools
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

🚀 News

📖 Introduction

⚙️ Method

📊 Quantitative Results

🏃‍♂️ Getting Started

Step 1. Installation

Step 2. Dataset Preparation

Step 3. Training and Inference

🏆 Model Zoo

🌟 Acknowledgement

📄 Citation

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Disentangling Instance and Scene Contexts for 3D Semantic Scene Completion

🚀 News

📖 Introduction

⚙️ Method

📊 Quantitative Results

🏃‍♂️ Getting Started

Step 1. Installation

Step 2. Dataset Preparation

Step 3. Training and Inference

🏆 Model Zoo

🌟 Acknowledgement

📄 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages