S³-Net implementation code for our paper "Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone". Video demos can be found at multimedia demonstrations. The Semantic2D dataset can be found and downloaded at: https://doi.org/10.5281/zenodo.18350696.
- Dataset Download: https://doi.org/10.5281/zenodo.18350696
- SALSA (Dataset and Labeling Framework): https://github.com/TempleRAIL/semantic2d
- S³-Net (Stochastic Semantic Segmentation): https://github.com/TempleRAIL/s3_net
- Semantic CNN Navigation: https://github.com/TempleRAIL/semantic_cnn_nav
S³-Net (Stochastic Semantic Segmentation Network) is a deep learning model for semantic segmentation of 2D LiDAR scans. It uses a Variational Autoencoder (VAE) architecture with residual blocks to predict semantic labels for each LiDAR point.
S³-Net uses an encoder-decoder architecture with stochastic latent representations:
Input (3 channels: scan, intensity, angle of incidence)
│
▼
┌─────────────────────────────────────┐
│ Encoder (Conv1D + Residual Blocks) │
│ - Conv1D (3 → 32) stride=2 │
│ - Conv1D (32 → 64) stride=2 │
│ - Residual Stack (2 layers) │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ VAE Reparameterization │
│ - μ (mean) and σ (std) estimation │
│ - Latent sampling z ~ N(μ, σ²) │
│ - Monte Carlo KL divergence │
└─────────────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Decoder (Residual + TransposeConv) │
│ - Residual Stack (2 layers) │
│ - TransposeConv1D (64 → 32) │
│ - TransposeConv1D (32 → 10) │
│ - Softmax (10 semantic classes) │
└─────────────────────────────────────┘
│
▼
Output (10 channels: semantic probabilities)
Key Features:
- 3 Input Channels: Range scan, intensity, angle of incidence
- 10 Output Classes: Background + 9 semantic classes
- Stochastic Inference: Multiple forward passes enable uncertainty estimation via majority voting
- Loss Function: Cross-Entropy + Lovasz-Softmax + β-VAE KL divergence
| ID | Class | Description |
|---|---|---|
| 0 | Other | Background/unknown |
| 1 | Chair | Office and lounge chairs |
| 2 | Door | Doors (open/closed) |
| 3 | Elevator | Elevator doors |
| 4 | Person | Dynamic pedestrians |
| 5 | Pillar | Structural pillars/columns |
| 6 | Sofa | Sofas and couches |
| 7 | Table | Tables of all types |
| 8 | Trash bin | Waste receptacles |
| 9 | Wall | Walls and flat surfaces |
- Python 3.7+
- PyTorch 1.7.1+
- TensorBoard
- NumPy
- Matplotlib
- tqdm
Install dependencies:
pip install torch torchvision tensorboardX numpy matplotlib tqdmS³-Net expects the Semantic2D dataset organized as follows:
~/semantic2d_data/
├── dataset.txt # List of dataset folders
├── 2024-04-11-15-24-29/ # Dataset folder 1
│ ├── train.txt # Training sample list
│ ├── dev.txt # Validation sample list
│ ├── scans_lidar/ # Range scans (.npy)
│ ├── intensities_lidar/ # Intensity data (.npy)
│ └── semantic_label/ # Ground truth labels (.npy)
├── 2024-04-04-12-16-41/ # Dataset folder 2
│ └── ...
└── ...
dataset.txt format:
2024-04-11-15-24-29
2024-04-04-12-16-41
Train S³-Net on your dataset:
sh run_train.sh ~/semantic2d_data/ ~/semantic2d_data/Arguments:
$1- Training data directory (containsdataset.txtand subfolders)$2- Validation data directory
Training Configuration (in scripts/train.py):
| Parameter | Default | Description |
|---|---|---|
NUM_EPOCHS |
20000 | Total training epochs |
BATCH_SIZE |
1024 | Samples per batch |
LEARNING_RATE |
0.001 | Initial learning rate |
BETA |
0.01 | β-VAE weight for KL divergence |
Learning Rate Schedule:
- Epochs 0-50000:
1e-4 - Epochs 50000-480000:
2e-5 - Epochs 480000+: Exponential decay
The model saves checkpoints every 2000 epochs to ./model/.
Run semantic segmentation on test data:
sh run_eval_demo.sh ~/semantic2d_data/Arguments:
$1- Test data directory (readsdev.txtfor sample list)
Output:
./output/semantic_ground_truth_*.png- Ground truth visualizations./output/semantic_s3net_*.png- S³-Net predictions
Example Output:
| Ground Truth | S³-Net Prediction |
|---|---|
![]() |
![]() |
S³-Net performs 32 stochastic forward passes per sample and uses majority voting to determine the final prediction. This provides:
- More robust predictions
- Implicit uncertainty estimation
- Reduced noise in segmentation boundaries
s3_net/
├── demo/ # Demo GIFs
│ ├── 1.lobby_s3net_segmentation.gif
│ ├── 2.lobby_semantic_mapping.gif
│ └── 3.lobby_semantic_navigation.gif
├── model/
│ └── s3_net_model.pth # Pretrained model weights
├── output/ # Inference output directory
├── scripts/
│ ├── model.py # S³-Net model architecture
│ ├── train.py # Training script
│ ├── decode_demo.py # Inference/demo script
│ └── lovasz_losses.py # Lovasz-Softmax loss function
├── run_train.sh # Training driver script
├── run_eval_demo.sh # Inference driver script
├── LICENSE # MIT License
└── README.md # This file
Training logs are saved to ./runs/. View training progress:
tensorboard --logdir=runsMonitored metrics:
- Training/Validation loss
- Cross-Entropy loss
- Lovasz-Softmax loss
A pre-trained model is included at model/s3_net_model.pth. This model was trained on the Semantic2D dataset with the Hokuyo UTM-30LX-EW LiDAR sensor.
To use the pre-trained model:
sh run_eval_demo.sh ~/semantic2d_data/@article{xie2026semantic2d,
title={Semantic2D: Enabling Semantic Scene Understanding with 2D Lidar Alone},
author={Xie, Zhanteng and Pan, Yipeng and Zhang, Yinqiang and Pan, Jia and Dames, Philip},
journal={arXiv preprint arXiv:2409.09899},
year={2026}
}



