Skip to content

maomao0819/BEVANet

Repository files navigation

BEVANet: Bilateral Efficient Visual Attention Network for Real-time Semantic Segmentation

arXiv paper ICIP paper 🤗 Hugging Face License: MIT

This is the official repository for BEVANet (ICIP 2025 spotlight).

Highlights

overview-of-our-method
Comparison of inference speed and accuracy for real-time models on Cityscapes validation set.

  • Efficient Attention Mechanisms: We leverage large kernel attention to design the EVA block, SDLSKA, CKS, and DLKPPM modules. These components enlarge and dynamically adjust receptive fields, enhance feature representation, capture contextual information,and refine object details, improving spatial modeling.
  • Branch Interaction: Frequent communication between high- and low-level branches through the bilateral architecture and the BGAF module enhances semantic concepts and detail contour by sharing information, enabling adaptive feature fusion.
  • Performance: BEVANet offers a superior balance of speed and accuracy to existing models.It achieves real-time segmentation over 30 FPS with 81.0% mIoU in Cityscapes after pre-training on ImageNet and maintains 79.3% mIoU without pre-training, showing less dependency on large pre-training datasets. Its variant BEVANet-S further achieves 83% mIoU on CamVid, demonstrating its scalability.

Overview

overview-of-our-method
An overall architecture of our proposed Bilateral Efficient Visual Attention Network (BEVANet).

Demo

demo
The visualization results.

Models

We provide ImageNet pretrained models here to facilitate reproduction.

Model (ImageNet) Acc@1 Acc@5 Param (M) FPS@RTX4090
BEVANet-S 71.1 90.1 16.3 198.6
BEVANet 76.3 93.0 57.4 82.3

Additionally, the finetuned models on Cityscapes, CamVid, and ADE20k are available for direct deployment.

Model (Cityscapes) Val (% mIOU) Param (M) FPS@RTX3090
BEVANet-S 78.2 15.2 70.0
BEVANet 81.0 58.6 32.8
Model (CamVid) Val (% mIOU) Param (M) FLOPs (G) FPS@RTX3090
BEVANet-S 83.1 15.2 20.1 79.4
Model (ADE20k) Val (% mIOU) Param (M) FPS@RTX3090
BEVANet 39.8 58.9 73.3

Prerequisites

This implementation is adapted by PIDNet. Please refer to its repository for installation and dataset preparation.

Usage

0. Prepare the dataset

  • (Optional) Download the ImageNet dataset and unzip it into data/imagenet directory.
  • Download the Cityscapes, CamVid, and ADE20k datasets, extracting them to data/cityscapes, data/camvid, and data/ade20k directories, respectively.
  • Ensure that the file paths listed in data/list correctly correspond to the dataset images.

1. Pretraining (Optional)

  • For example, pretrain the BEVANet on ImageNet:
python tools/pretrain.py --cfg configs/cityscapes/BEVANet.yaml

2. Training

  • Pretrain with step 1. or download the ImageNet pretrained models and put them into pretrained_models/imagenet/ directory.
  • For example, finetune the BEVANet on Cityscapes:
python tools/train.py --cfg configs/cityscapes/BEVANet.yaml MODEL.PRETRAINED pretrained_models/imagenet/BEVANet_ImageNet.pth
  • Or train the BEVANet from scratch on Cityscapes with batch size of 12 on 2 GPUs:
python tools/train.py --cfg configs/cityscapes/BEVANet.yaml ENV.GPUS (0,1) TRAIN.BATCH_SIZE_PER_GPU 6
  • Or train the BEVANet from scratch on Cityscapes using train and val sets simultaneously with batch size of 12 on 4 GPUs:
python tools/train.py --cfg configs/cityscapes/BEVANet_trainval.yaml GPUS (0,1,2,3) TRAIN.BATCH_SIZE_PER_GPU 3

3. Evaluation

  • Download the finetuned models for Cityscapes, CamVid, and ADE20k and put them into checkpoint/cityscapes/, checkpoint/camvid/, and checkpoint/ade20k/ directories, respectively.
  • For example, evaluate the BEVANet on Cityscapes val set:
python tools/eval.py --cfg configs/cityscapes/BEVANet.yaml \
                          TEST.MODEL_FILE checkpoint/cityscapes/BEVANet_VAL_Cityscapes.pt \
                          DATASET.TEST_SET list/cityscapes/val.lst
  • Or, evaluate the BEVANet-S on CamVid val set:
python tools/eval.py --cfg configs/camvid/BEVANet_S.yaml \
                          TEST.MODEL_FILE checkpoint/camvid/BEVANet_S_VAL_Camvid.pt \
                          DATASET.TEST_SET list/camvid/val.lst
  • Or, evaluate the BEVANet on ADE20k val set:
python tools/eval.py --cfg configs/ade20k/BEVANet.yaml \
                          TEST.MODEL_FILE checkpoint/ade20k/BEVANet_VAL_ADE20k.pt \
                          DATASET.TEST_SET list/ade20k/val.lst
  • Generate the testing results of BEVANet on Cityscapes test set:
python tools/eval.py --cfg configs/cityscapes/BEVANet.yaml \
                          TEST.MODEL_FILE checkpoint/cityscapes/BEVANet_TEST_Cityscapes.pt \
                          DATASET.TEST_SET list/cityscapes/test.lst

4. Speed Measurement

  • Measure the inference speed of BEVANet for Cityscapes:
python models/speed/BEVANet_speed.py --cfg configs/cityscapes/BEVANet.yaml --c 19 --r 1024 2048
  • Measure the inference speed of BEVANet-S for CamVid:
python models/speed/BEVANet_speed.py --cfg configs/camvid/BEVANet_S.yaml --c 11 --r 720 960

5. Custom Inputs

  • Place all your images in the samples/ directory, then run the following command using the Cityscapes pretrained BEVANet model for .png image format:
python tools/custom.py --cfg configs/cityscapes/BEVANet.yaml --p 'checkpoint/cityscapes/BEVANet_VAL_Cityscapes.pt' --t '.png'

Acknowledgement

  • Our implementation is modified based on PIDNet.
  • Thanks for their nice contribution.

About

Official Repository for BEVANet: Bilateral Efficient Visual Attention Network for Real-time Semantic Segmentation (ICIP 2025 Spotlight Oral)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors