BEVANet: Bilateral Efficient Visual Attention Network for Real-time Semantic Segmentation

This is the official repository for BEVANet (ICIP 2025 spotlight).

Highlights

Comparison of inference speed and accuracy for real-time models on Cityscapes validation set.

Efficient Attention Mechanisms: We leverage large kernel attention to design the EVA block, SDLSKA, CKS, and DLKPPM modules. These components enlarge and dynamically adjust receptive fields, enhance feature representation, capture contextual information,and refine object details, improving spatial modeling.
Branch Interaction: Frequent communication between high- and low-level branches through the bilateral architecture and the BGAF module enhances semantic concepts and detail contour by sharing information, enabling adaptive feature fusion.
Performance: BEVANet offers a superior balance of speed and accuracy to existing models.It achieves real-time segmentation over 30 FPS with 81.0% mIoU in Cityscapes after pre-training on ImageNet and maintains 79.3% mIoU without pre-training, showing less dependency on large pre-training datasets. Its variant BEVANet-S further achieves 83% mIoU on CamVid, demonstrating its scalability.

Overview

An overall architecture of our proposed Bilateral Efficient Visual Attention Network (BEVANet).

Demo

The visualization results.

Models

We provide ImageNet pretrained models here to facilitate reproduction.

Model (ImageNet)	Acc@1	Acc@5	Param (M)	FPS@RTX4090
BEVANet-S	71.1	90.1	16.3	198.6
BEVANet	76.3	93.0	57.4	82.3

Additionally, the finetuned models on Cityscapes, CamVid, and ADE20k are available for direct deployment.

Model (Cityscapes)	Val (% mIOU)	Param (M)	FPS@RTX3090
BEVANet-S	78.2	15.2	70.0
BEVANet	81.0	58.6	32.8

Model (CamVid)	Val (% mIOU)	Param (M)	FLOPs (G)	FPS@RTX3090
BEVANet-S	83.1	15.2	20.1	79.4

Model (ADE20k)	Val (% mIOU)	Param (M)	FPS@RTX3090
BEVANet	39.8	58.9	73.3

Prerequisites

This implementation is adapted by PIDNet. Please refer to its repository for installation and dataset preparation.

Usage

0. Prepare the dataset

(Optional) Download the ImageNet dataset and unzip it into data/imagenet directory.
Download the Cityscapes, CamVid, and ADE20k datasets, extracting them to data/cityscapes, data/camvid, and data/ade20k directories, respectively.
Ensure that the file paths listed in data/list correctly correspond to the dataset images.

1. Pretraining (Optional)

For example, pretrain the BEVANet on ImageNet:

python tools/pretrain.py --cfg configs/cityscapes/BEVANet.yaml

2. Training

Pretrain with step 1. or download the ImageNet pretrained models and put them into pretrained_models/imagenet/ directory.
For example, finetune the BEVANet on Cityscapes:

python tools/train.py --cfg configs/cityscapes/BEVANet.yaml MODEL.PRETRAINED pretrained_models/imagenet/BEVANet_ImageNet.pth

Or train the BEVANet from scratch on Cityscapes with batch size of 12 on 2 GPUs:

python tools/train.py --cfg configs/cityscapes/BEVANet.yaml ENV.GPUS (0,1) TRAIN.BATCH_SIZE_PER_GPU 6

Or train the BEVANet from scratch on Cityscapes using train and val sets simultaneously with batch size of 12 on 4 GPUs:

python tools/train.py --cfg configs/cityscapes/BEVANet_trainval.yaml GPUS (0,1,2,3) TRAIN.BATCH_SIZE_PER_GPU 3

3. Evaluation

Download the finetuned models for Cityscapes, CamVid, and ADE20k and put them into checkpoint/cityscapes/, checkpoint/camvid/, and checkpoint/ade20k/ directories, respectively.
For example, evaluate the BEVANet on Cityscapes val set:

python tools/eval.py --cfg configs/cityscapes/BEVANet.yaml \
                          TEST.MODEL_FILE checkpoint/cityscapes/BEVANet_VAL_Cityscapes.pt \
                          DATASET.TEST_SET list/cityscapes/val.lst

Or, evaluate the BEVANet-S on CamVid val set:

python tools/eval.py --cfg configs/camvid/BEVANet_S.yaml \
                          TEST.MODEL_FILE checkpoint/camvid/BEVANet_S_VAL_Camvid.pt \
                          DATASET.TEST_SET list/camvid/val.lst

Or, evaluate the BEVANet on ADE20k val set:

python tools/eval.py --cfg configs/ade20k/BEVANet.yaml \
                          TEST.MODEL_FILE checkpoint/ade20k/BEVANet_VAL_ADE20k.pt \
                          DATASET.TEST_SET list/ade20k/val.lst

Generate the testing results of BEVANet on Cityscapes test set:

python tools/eval.py --cfg configs/cityscapes/BEVANet.yaml \
                          TEST.MODEL_FILE checkpoint/cityscapes/BEVANet_TEST_Cityscapes.pt \
                          DATASET.TEST_SET list/cityscapes/test.lst

4. Speed Measurement

Measure the inference speed of BEVANet for Cityscapes:

python models/speed/BEVANet_speed.py --cfg configs/cityscapes/BEVANet.yaml --c 19 --r 1024 2048

Measure the inference speed of BEVANet-S for CamVid:

python models/speed/BEVANet_speed.py --cfg configs/camvid/BEVANet_S.yaml --c 11 --r 720 960

5. Custom Inputs

Place all your images in the samples/ directory, then run the following command using the Cityscapes pretrained BEVANet model for .png image format:

python tools/custom.py --cfg configs/cityscapes/BEVANet.yaml --p 'checkpoint/cityscapes/BEVANet_VAL_Cityscapes.pt' --t '.png'

Acknowledgement

Our implementation is modified based on PIDNet.
Thanks for their nice contribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BEVANet: Bilateral Efficient Visual Attention Network for Real-time Semantic Segmentation

Highlights

Overview

Demo

Models

Prerequisites

Usage

0. Prepare the dataset

1. Pretraining (Optional)

2. Training

3. Evaluation

4. Speed Measurement

5. Custom Inputs

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
checkpoint		checkpoint
configs		configs
data		data
datasets		datasets
figs		figs
models		models
pretrained_models/imagenet		pretrained_models/imagenet
samples		samples
tools		tools
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

BEVANet: Bilateral Efficient Visual Attention Network for Real-time Semantic Segmentation

Highlights

Overview

Demo

Models

Prerequisites

Usage

0. Prepare the dataset

1. Pretraining (Optional)

2. Training

3. Evaluation

4. Speed Measurement

5. Custom Inputs

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages