This is research-backed project exploring multimodal sensor fusion for precise and robust dairy cow behavior recognition using real-world barn data.
Developed at IIT Ropar, this work combines video, sensor and environmental signals using deep learning to classify behaviors like lying, standing, feeding and more.
This project was published at the 3rd International Conference on Agriculture-Centric Computation 2025 held at Guwahati, Assam, India, Paper ID: 93.
Traditional livestock monitoring methods (manual observation, single-sensor devices) are often inaccurate, labor-intensive, and lack temporal or spatial resolution.
In this project, we propose two deep learning architectures that fuse data from multiple sources:
- RGB multi-view barn videos
- UWB location tracking
- IMMU motion data
- Ankle posture and head direction sensors
- Environmental THI readings
This fusion improves detection of behavior across varied lighting conditions, occlusion and different animals. It lays the groundwork for real-time livestock monitoring systems in commercial dairy farms.
We use the MMCows dataset, a large-scale multimodal cow behavior dataset collected from a functioning dairy barn.
Github Page: Click Here
Official Project Page: https://engineering.purdue.edu/neis/research/projects/mmcows.html
Dataset on Hugging Face: MMCows @ Hugging Face
- 16 dairy cows tagged with sensors for 24/7 monitoring
- Multi-sensor streams: UWB, IMMU, pressure, ankle sensors, vaginal temperature, and RGB video
- 213,000+ labeled image bounding boxes across 20,000 images
- 7 behavior classes annotated per second: Walking, Standing, Lying, Feeding (Up/Down), Licking, Drinking
- Multi-angle camera setup (GoPro HERO11, 4.5K resolution)
- Synchronized timestamps for fusion
We developed and compared two multimodal deep learning models:
- EfficientNet-B0 extracts visual features
- Sensor data is processed through a deep neural network
- Attention mechanism fuses both streams
- Sensor data embedded as a token alongside image patches
- ViT encoder learns global dependencies across modalities
- Object-wise Split: Generalization across different cows
- Temporal Split: Generalization across environmental and lighting variations
Behavior | Description |
---|---|
Walking | Locomotion across barn |
Standing | Stationary upright posture |
Lying | Resting or sleeping state |
Feeding ↑ / ↓ | Eating behavior with head angle |
Licking | Tongue contact with objects |
Drinking | Water intake behavior |
Model | Average F1 Score | Best Behavior Accuracy |
---|---|---|
UWB only | 0.717 | Lying (0.961) |
RGB (multi-view) | 0.632 | Lying (0.883) |
Fusion 1 (EffNet) | 0.810 | Lying (0.998) |
Fusion 2 (ViT) | 0.836 | Lying (1.000), Standing (0.972) |
See experiments/
for full results and ablation studies.
cow-behavior-fusion/
├── data/ # Dataset EDA and setup
├── models/ # Model architectures (EfficientNet, ViT)
├── preprocessing/ # Image & sensor processing scripts
├── experiments/ # Evaluation, modality ablations
├── modules/ # Cow detection & classification
├── scripts/ # Training, evaluation, configs
└── README.md
git clone https://github.com/your-username/cow-behavior-fusion.git
cd cow-behavior-fusion
Make sure you have Python 3.8+ and install the required packages. Also, ensure compatible versions of torch, numpy, opencv-python, and transformers are installed. CUDA-enabled GPU is recommended for training and faster inference.
Place the data from Hugging Faces after downloading into suitable location.
python scripts/train.py --model fusion2 --split temporal
python scripts/evaluate.py --weights saved_models/fusion2_best.pt
python preprocessing/srgb_proc.py
If you use this work in your research or development, please cite it.
This project was developed by students and faculty at Indian Institute of Technology Ropar (IIT Ropar):
- Ajeet Kumar, Department of CSE, IIT Ropar
- Abhinav Upadhyay, Department of CSE, IIT Ropar
- Varun Kukreti, Department of CSE, IIT Ropar
- Vajja Yashaswini, Department of CSE, IIT Ropar
- Dr. Neeraj Goel, Professor at Department of CSE, IIT Ropar
- Dr. Mukesh Saini, Professor at Department of CSE, IIT Ropar
We would like to thank:
- Authors of original MMCows dataset, which formed the foundation of this research.
- IIT Ropar for their continuous guidance, support and research infrastructure.
- The open-source community for tools like PyTorch, Transformers and OpenCV which made this project possible.
For questions, issues, or collaborations feel free to reach out to the contributors.