Skip to content

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

License

Notifications You must be signed in to change notification settings

TempleRAIL/s3_net

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

S³-Net implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.

Related Resources

S³-Net: Stochastic Semantic Segmentation Network

License: MIT

S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.

Demo Results

S³-Net Segmentation S³-Net Segmentation

Semantic Mapping Semantic Mapping

Semantic Navigation Semantic Navigation

Model Architecture

S³-Net uses an encoder-decoder architecture with stochastic latent representations:

Input (3 channels: scan, intensity, angle of incidence)
    │
    ▼
┌─────────────────────────────────────┐
│  Encoder (Conv1D + Residual Blocks) │
│  - Conv1D (3 → 32) stride=2         │
│  - Conv1D (32 → 64) stride=2        │
│  - Residual Stack (2 layers)        │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  VAE Reparameterization             │
│  - μ (mean) and σ (std) estimation  │
│  - Latent sampling z ~ N(μ, σ²)     │
│  - Monte Carlo KL divergence        │
└─────────────────────────────────────┘
    │
    ▼
┌─────────────────────────────────────┐
│  Decoder (Residual + TransposeConv) │
│  - Residual Stack (2 layers)        │
│  - TransposeConv1D (64 → 32)        │
│  - TransposeConv1D (32 → 10)        │
│  - Softmax (10 semantic classes)    │
└─────────────────────────────────────┘
    │
    ▼
Output (10 channels: semantic probabilities)

Key Features:

  • 3 Input Channels: Range scan, intensity, angle of incidence
  • 10 Output Classes: Background + 9 semantic classes
  • Stochastic Inference: Multiple forward passes enable uncertainty estimation via majority voting
  • Loss Function: Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence

Semantic Classes

ID Class Description
0 Other Background/unknown
1 Chair Office and lounge chairs
2 Door Doors (open/closed)
3 Elevator Elevator doors
4 Person Dynamic pedestrians
5 Pillar Structural pillars/columns
6 Sofa Sofas and couches
7 Table Tables of all types
8 Trash bin Waste receptacles
9 Wall Walls and flat surfaces

Requirements

  • Python 3.7+
  • PyTorch 1.7.1+
  • TensorBoard
  • NumPy
  • Matplotlib
  • tqdm

Install dependencies:

pip install torch torchvision tensorboardX numpy matplotlib tqdm

Dataset Structure

S³-Net expects the Semantic2D dataset organized as follows:

~/semantic2d_data/
├── dataset.txt                # List of dataset folders
├── 2024-04-11-15-24-29/       # Dataset folder 1
│   ├── train.txt              # Training sample list
│   ├── dev.txt                # Validation sample list
│   ├── scans_lidar/           # Range scans (.npy)
│   ├── intensities_lidar/     # Intensity data (.npy)
│   └── semantic_label/        # Ground truth labels (.npy)
├── 2024-04-04-12-16-41/       # Dataset folder 2
│   └── ...
└── ...

dataset.txt format:

2024-04-11-15-24-29
2024-04-04-12-16-41

Usage

Training

Train S³-Net on your dataset:

sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/

Arguments:

  • $1 - Training data directory (contains dataset.txt and subfolders)
  • $2 - Validation data directory

Training Configuration (in scripts/train.py):

Parameter Default Description
NUM_EPOCHS 20000 Total training epochs
BATCH_SIZE 1024 Samples per batch
LEARNING_RATE 0.001 Initial learning rate
BETA 0.01 β-VAE weight for KL divergence

Learning Rate Schedule:

  • Epochs 0-50000: 1e-4
  • Epochs 50000-480000: 2e-5
  • Epochs 480000+: Exponential decay

The model saves checkpoints every 2000 epochs to ./model/.

Inference Demo

Run semantic segmentation on test data:

sh run_eval_demo.sh ~/semantic2d_data/

Arguments:

  • $1 - Test data directory (reads dev.txt for sample list)

Output:

  • ./output/semantic_ground_truth_*.png - Ground truth visualizations
  • ./output/semantic_s3net_*.png - S³-Net predictions

Example Output:

Ground Truth S³-Net Prediction
Ground Truth S³-Net Prediction

Stochastic Inference

S³-Net performs 32 stochastic forward passes per sample and uses majority voting to determine the final prediction. This provides:

  • More robust predictions
  • Implicit uncertainty estimation
  • Reduced noise in segmentation boundaries

File Structure

s3_net/
├── demo/                           # Demo GIFs
│   ├── 1.lobby_s3net_segmentation.gif
│   ├── 2.lobby_semantic_mapping.gif
│   └── 3.lobby_semantic_navigation.gif
├── model/
│   └── s3_net_model.pth            # Pretrained model weights
├── output/                         # Inference output directory
├── scripts/
│   ├── model.py                    # S³-Net model architecture
│   ├── train.py                    # Training script
│   ├── decode_demo.py              # Inference/demo script
│   └── lovasz_losses.py            # Lovasz-Softmax loss function
├── run_train.sh                    # Training driver script
├── run_eval_demo.sh                # Inference driver script
├── LICENSE                         # MIT License
└── README.md                       # This file

TensorBoard Monitoring

Training logs are saved to ./runs/. View training progress:

tensorboard --logdir=runs

Monitored metrics:

  • Training/Validation loss
  • Cross-Entropy loss
  • Lovasz-Softmax loss

Pre-trained Model

A pre-trained model is included at model/s3_net_model.pth. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.

To use the pre-trained model:

sh run_eval_demo.sh ~/semantic2d_data/

Citation

@article{xie2026semantic2d,
  title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
  author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
  journal={arXiv preprint arXiv:2409.09899},
  year={2026}
}

About

Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published