This is the official repository for BEVANet (ICIP 2025 spotlight).

Comparison of inference speed and accuracy for real-time models on Cityscapes validation set.
- Efficient Attention Mechanisms: We leverage large kernel attention to design the EVA block, SDLSKA, CKS, and DLKPPM modules. These components enlarge and dynamically adjust receptive fields, enhance feature representation, capture contextual information,and refine object details, improving spatial modeling.
- Branch Interaction: Frequent communication between high- and low-level branches through the bilateral architecture and the BGAF module enhances semantic concepts and detail contour by sharing information, enabling adaptive feature fusion.
- Performance: BEVANet offers a superior balance of speed and accuracy to existing models.It achieves real-time segmentation over 30 FPS with 81.0% mIoU in Cityscapes after pre-training on ImageNet and maintains 79.3% mIoU without pre-training, showing less dependency on large pre-training datasets. Its variant BEVANet-S further achieves 83% mIoU on CamVid, demonstrating its scalability.

An overall architecture of our proposed Bilateral Efficient Visual Attention Network (BEVANet).
We provide ImageNet pretrained models here to facilitate reproduction.
| Model (ImageNet) | Acc@1 | Acc@5 | Param (M) | FPS@RTX4090 |
|---|---|---|---|---|
| BEVANet-S | 71.1 | 90.1 | 16.3 | 198.6 |
| BEVANet | 76.3 | 93.0 | 57.4 | 82.3 |
Additionally, the finetuned models on Cityscapes, CamVid, and ADE20k are available for direct deployment.
| Model (Cityscapes) | Val (% mIOU) | Param (M) | FPS@RTX3090 |
|---|---|---|---|
| BEVANet-S | 78.2 | 15.2 | 70.0 |
| BEVANet | 81.0 | 58.6 | 32.8 |
| Model (CamVid) | Val (% mIOU) | Param (M) | FLOPs (G) | FPS@RTX3090 |
|---|---|---|---|---|
| BEVANet-S | 83.1 | 15.2 | 20.1 | 79.4 |
| Model (ADE20k) | Val (% mIOU) | Param (M) | FPS@RTX3090 |
|---|---|---|---|
| BEVANet | 39.8 | 58.9 | 73.3 |
This implementation is adapted by PIDNet. Please refer to its repository for installation and dataset preparation.
- (Optional) Download the ImageNet dataset and unzip it into
data/imagenetdirectory. - Download the Cityscapes, CamVid, and ADE20k datasets, extracting them to
data/cityscapes,data/camvid, anddata/ade20kdirectories, respectively. - Ensure that the file paths listed in
data/listcorrectly correspond to the dataset images.
- For example, pretrain the BEVANet on ImageNet:
python tools/pretrain.py --cfg configs/cityscapes/BEVANet.yaml- Pretrain with step 1. or download the ImageNet pretrained models and put them into
pretrained_models/imagenet/directory. - For example, finetune the BEVANet on Cityscapes:
python tools/train.py --cfg configs/cityscapes/BEVANet.yaml MODEL.PRETRAINED pretrained_models/imagenet/BEVANet_ImageNet.pth- Or train the BEVANet from scratch on Cityscapes with batch size of 12 on 2 GPUs:
python tools/train.py --cfg configs/cityscapes/BEVANet.yaml ENV.GPUS (0,1) TRAIN.BATCH_SIZE_PER_GPU 6- Or train the BEVANet from scratch on Cityscapes using train and val sets simultaneously with batch size of 12 on 4 GPUs:
python tools/train.py --cfg configs/cityscapes/BEVANet_trainval.yaml GPUS (0,1,2,3) TRAIN.BATCH_SIZE_PER_GPU 3- Download the finetuned models for Cityscapes, CamVid, and ADE20k and put them into
checkpoint/cityscapes/,checkpoint/camvid/, andcheckpoint/ade20k/directories, respectively. - For example, evaluate the BEVANet on Cityscapes val set:
python tools/eval.py --cfg configs/cityscapes/BEVANet.yaml \
TEST.MODEL_FILE checkpoint/cityscapes/BEVANet_VAL_Cityscapes.pt \
DATASET.TEST_SET list/cityscapes/val.lst- Or, evaluate the BEVANet-S on CamVid val set:
python tools/eval.py --cfg configs/camvid/BEVANet_S.yaml \
TEST.MODEL_FILE checkpoint/camvid/BEVANet_S_VAL_Camvid.pt \
DATASET.TEST_SET list/camvid/val.lst- Or, evaluate the BEVANet on ADE20k val set:
python tools/eval.py --cfg configs/ade20k/BEVANet.yaml \
TEST.MODEL_FILE checkpoint/ade20k/BEVANet_VAL_ADE20k.pt \
DATASET.TEST_SET list/ade20k/val.lst- Generate the testing results of BEVANet on Cityscapes test set:
python tools/eval.py --cfg configs/cityscapes/BEVANet.yaml \
TEST.MODEL_FILE checkpoint/cityscapes/BEVANet_TEST_Cityscapes.pt \
DATASET.TEST_SET list/cityscapes/test.lst- Measure the inference speed of BEVANet for Cityscapes:
python models/speed/BEVANet_speed.py --cfg configs/cityscapes/BEVANet.yaml --c 19 --r 1024 2048- Measure the inference speed of BEVANet-S for CamVid:
python models/speed/BEVANet_speed.py --cfg configs/camvid/BEVANet_S.yaml --c 11 --r 720 960- Place all your images in the
samples/directory, then run the following command using the Cityscapes pretrained BEVANet model for .png image format:
python tools/custom.py --cfg configs/cityscapes/BEVANet.yaml --p 'checkpoint/cityscapes/BEVANet_VAL_Cityscapes.pt' --t '.png'- Our implementation is modified based on PIDNet.
- Thanks for their nice contribution.
