Skip to content

This work uses Multimodal fusion - combining EfficientNet and ViT models for real-world livestock behavior monitoring.

Notifications You must be signed in to change notification settings

implosion07/multimodal-fusion-cow-behavior-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Multimodal Fusion for Cow Behavior Prediction

This is research-backed project exploring multimodal sensor fusion for precise and robust dairy cow behavior recognition using real-world barn data.
Developed at IIT Ropar, this work combines video, sensor and environmental signals using deep learning to classify behaviors like lying, standing, feeding and more.

This project was published at the 3rd International Conference on Agriculture-Centric Computation 2025 held at Guwahati, Assam, India, Paper ID: 93.

Overview

Traditional livestock monitoring methods (manual observation, single-sensor devices) are often inaccurate, labor-intensive, and lack temporal or spatial resolution.

In this project, we propose two deep learning architectures that fuse data from multiple sources:

  • RGB multi-view barn videos
  • UWB location tracking
  • IMMU motion data
  • Ankle posture and head direction sensors
  • Environmental THI readings

This fusion improves detection of behavior across varied lighting conditions, occlusion and different animals. It lays the groundwork for real-time livestock monitoring systems in commercial dairy farms.

Dataset – MMCows

We use the MMCows dataset, a large-scale multimodal cow behavior dataset collected from a functioning dairy barn.

Github Page: Click Here

Official Project Page: https://engineering.purdue.edu/neis/research/projects/mmcows.html

Dataset on Hugging Face: MMCows @ Hugging Face

Dataset Highlights:

  • 16 dairy cows tagged with sensors for 24/7 monitoring
  • Multi-sensor streams: UWB, IMMU, pressure, ankle sensors, vaginal temperature, and RGB video
  • 213,000+ labeled image bounding boxes across 20,000 images
  • 7 behavior classes annotated per second: Walking, Standing, Lying, Feeding (Up/Down), Licking, Drinking
  • Multi-angle camera setup (GoPro HERO11, 4.5K resolution)
  • Synchronized timestamps for fusion

Methods

We developed and compared two multimodal deep learning models:

1. Fusion 1 – EfficientNet + DNN + Attention

  • EfficientNet-B0 extracts visual features
  • Sensor data is processed through a deep neural network
  • Attention mechanism fuses both streams

2. Fusion 2 – Vision Transformer + Sensor Token (Best Performing)

  • Sensor data embedded as a token alongside image patches
  • ViT encoder learns global dependencies across modalities

Split Strategies

  • Object-wise Split: Generalization across different cows
  • Temporal Split: Generalization across environmental and lighting variations

Behaviors Labels

Behavior Description
Walking Locomotion across barn
Standing Stationary upright posture
Lying Resting or sleeping state
Feeding ↑ / ↓ Eating behavior with head angle
Licking Tongue contact with objects
Drinking Water intake behavior

Key Results

Model Average F1 Score Best Behavior Accuracy
UWB only 0.717 Lying (0.961)
RGB (multi-view) 0.632 Lying (0.883)
Fusion 1 (EffNet) 0.810 Lying (0.998)
Fusion 2 (ViT) 0.836 Lying (1.000), Standing (0.972)

See experiments/ for full results and ablation studies.

Project Structure

cow-behavior-fusion/
├── data/ # Dataset EDA and setup
├── models/ # Model architectures (EfficientNet, ViT)
├── preprocessing/ # Image & sensor processing scripts
├── experiments/ # Evaluation, modality ablations
├── modules/ # Cow detection & classification
├── scripts/ # Training, evaluation, configs 
└── README.md

Setup

1. Clone the Repository

git clone https://github.com/your-username/cow-behavior-fusion.git
cd cow-behavior-fusion

2. Install Dependencies

Make sure you have Python 3.8+ and install the required packages. Also, ensure compatible versions of torch, numpy, opencv-python, and transformers are installed. CUDA-enabled GPU is recommended for training and faster inference.

3. Download the data

Place the data from Hugging Faces after downloading into suitable location.

4. Running the Code

Train Model

python scripts/train.py --model fusion2 --split temporal

Evaluate Model

python scripts/evaluate.py --weights saved_models/fusion2_best.pt

Preprocess Data

python preprocessing/srgb_proc.py

Citation

If you use this work in your research or development, please cite it.

Contributors

This project was developed by students and faculty at Indian Institute of Technology Ropar (IIT Ropar):

  1. Ajeet Kumar, Department of CSE, IIT Ropar
  2. Abhinav Upadhyay, Department of CSE, IIT Ropar
  3. Varun Kukreti, Department of CSE, IIT Ropar
  4. Vajja Yashaswini, Department of CSE, IIT Ropar
  5. Dr. Neeraj Goel, Professor at Department of CSE, IIT Ropar
  6. Dr. Mukesh Saini, Professor at Department of CSE, IIT Ropar

Acknowledgements

We would like to thank:

  1. Authors of original MMCows dataset, which formed the foundation of this research.
  2. IIT Ropar for their continuous guidance, support and research infrastructure.
  3. The open-source community for tools like PyTorch, Transformers and OpenCV which made this project possible.

Contact

For questions, issues, or collaborations feel free to reach out to the contributors.

About

This work uses Multimodal fusion - combining EfficientNet and ViT models for real-world livestock behavior monitoring.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published