PlantAIM: A New Baseline Model Integrating Global Attention and Local Features for Enhanced Plant Disease Identification
This paper is accepted on Smart Agricultural Technology 2025 [Link]
Plant diseases significantly affect the quality and yield of agricultural production. Conventionally, detection has relied on plant pathologists, but recent advances in deep learning, particularly the Vision Transformer (ViT) and Convolutional Neural Network (CNN), have made it feasible for automated plant disease identification. Despite their prominence, there are still significant gaps in our understanding of how these models differ in feature extraction and representation, particularly in complex multi-crop disease identification tasks. This challenge arises from the simultaneous need to learn crop-specific and disease-specific features for accurate identification of crop species and its associated diseases. To address this, we introduce Plant Disease Glocal-Local Features Fusion Attention Model (PlantAIM), a new hybrid framework that fuses global attention mechanisms of ViT with local feature extraction capabilities of CNN. PlantAIM aims to improve the model's ability to simultaneously learn and focus on crop-specific and disease-specific features. We conduct extensive evaluations to assess the robustness and generalizability of PlantAIM compared to state-of-the-art (SOTA) models, including scenarios with limited training samples and real-world environmental data. Our results show that PlantAIM achieves superior performance. This research not only deepens our understanding of feature learning for ViT and CNN models, but also sets a new benchmark in the dynamic field of plant disease identification. The code will be made available upon publication.
- We introduce novel Plant Disease Global-Local Features Fusion Attention model (PlantAIM), which combines ViT and CNN components to enhance feature extraction for multi-crop plant disease identification.
- Our experimental results demonstrate PlantAIM's exceptional robustness and generalization, achieving state-of-the-art performance in both controlled environments and real-world scenarios.
- Our feature visualization analysis reveals that CNNs emphasize plant patterns, while ViTs focus on disease symptoms.
Plant Disease Global-Local Features Fusion Attention model (PlantAIM) [code]
- Key feature: combines ViT and CNN components to enhance feature extraction for multi-crop plant disease identification.
Proposed PlantAIM architecture.
-
PV Dataset: spMohanty Github
(You can group all images into single folder to directly use the csv file provided in this repo) -
PlantDoc dataset: Kaggle
-
IPM and Bing dataset will be release soon
-
download ViT pretrained weight link (From rwightman Github timm repo)
PlantAIM (2H) >> pytorch implementation code
PlantAIM (1H) >> pytorch implementation code
Notes
- The csv file (metadata of images) are here
Python 3.12.9
python -m venv py
cd .\py\Scripts
activate
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu126
pip install -r requirements_plantaim.txt
Creative Commons Attribution-Noncommercial-NoDerivative Works 4.0 International License (“the CC BY-NC-ND License”)
-
Pairwise Feature Learning for Unseen Plant Disease Recognition [Paper]
The first implementation of FF-ViT model with moving weighted sum. The current work improved and evaluated the performance of FF-ViT model on larger-scale dataset.
-
Unveiling Robust Feature Spaces: Image vs. Embedding-Oriented Approaches for Plant Disease Identification [Paper]
The analysis between image or embedding feature space for plant disease identifications.
-
PlantAIM: A New Baseline Model Integrating Global Attention and Local Features for Enhanced Plant Disease Identification [Paper] [Github]
Plant Disease Global-Local Features Fusion Attention model (PlantAIM) model which combines ViT and CNN components to enhance feature extraction for multi-crop plant disease identification.
-
Beyond-supervision-Harnessing-self-supervised-learning-in-unseen-plant-disease-recognition [Paper] [Github]
Cross Learning Vision Transformer (CL-ViT) model that incorporating self-supervised learning into a supervised model.
-
Can Language Improve Visual Features For Distinguishing Unseen Plant Diseases? [Paper] [Github]
FF-CLIP model that incorporate textual data as language cues in guiding visual features to improve the identification of unseen plant diseases.
-
Deep-Plant-Disease Dataset Is All You Need for Plant Disease Identification [Paper] [Github]
We curated the largest plant disease dataset with text descriptions known as Deep-Plant-Disease, comprising 248,578 images across 55 crop species, 175 disease classes, and 333 unique crop-disease compositions. We also conducted comprehensive benchmarking across multiple downstream tasks in plant disease identification under diverse conditions that simulate different real-world challenges.
@article{chai2025plantaim,
title={PlantAIM: A New Baseline Model Integrating Global Attention and Local Features for Enhanced Plant Disease Identification},
author={Chai, Abel Yu Hao and Lee, Sue Han and Tay, Fei Siang and Go{\"e}au, Herv{\'e} and Bonnet, Pierre and Joly, Alexis},
journal={Smart Agricultural Technology},
pages={100813},
year={2025},
publisher={Elsevier}
}



