Skip to content

Bajie202401/Multimodal-Depression-CN

Repository files navigation

Multimodal Depression Detection Framework: Cross-Domain Generalization & Clinical Deployment

License: MIT Python PyTorch

This repository contains the official implementation of the paper "A Multimodal Depression Detection Framework Based on Large-Scale Pre-trained Models: Cross-Domain Generalization and Clinical Deployment Strategies" (submitted to Journal of Medical Systems).

We provide a robust, reproducible framework for detecting depression from speech and text using Wav2Vec 2.0 and Chinese RoBERTa, specifically designed to handle cross-domain shifts in heterogeneous clinical environments.

🌟 Key Features

  • Dual-Stream Architecture: Late fusion of acoustic (Wav2Vec 2.0) and semantic (RoBERTa) features to capture complementary clinical cues.
  • Cross-Domain Adaptation: A complete pipeline for Zero-shot Evaluation and Few-shot Fine-tuning, addressing the "generalization gap" in multi-center deployment.
  • Trustworthy AI: Integrated Post-hoc Calibration (Temperature Scaling) mechanism to ensure reliable risk probability outputs (minimizing ECE).
  • Reproducibility: Standardized data processing and training protocols with fixed seeds.

🛠️ Installation

  1. Clone the repository

    git clone https://github.com/yourusername/multimodal-depression-detection.git
    cd multimodal-depression-detection
  2. Create a virtual environment (Recommended)

    conda create -n depression-detect python=3.10
    conda activate depression-detect
  3. Install dependencies

    pip install -r requirements.txt

📂 Data Preparation

Note on Privacy: Due to strict medical data privacy regulations, the original clinical datasets (CMDC, EATD, PDCH) cannot be publicly released. Researchers should prepare their own datasets following the format below.

Expected Data Format (CSV Manifest)

Prepare a .csv file with the following columns:

Column Description Example
audio_path Path to the .wav file data/audio/subject_001.wav
text Transcribed text "我最近睡眠质量很差..."
label 0 (Control) or 1 (Depressed) 1
split train / val / test train

A sample manifest is provided in data/sample_manifest.csv.

🚀 Usage

1. In-Domain Training (Baseline)

Train the multimodal model on your source dataset (e.g., CMDC-like data).

python train.py --config configs/config.yaml

2. Inference & Zero-shot Evaluation

Evaluate a trained model on a target dataset.

python infer.py \
  --config configs/config.yaml \
  --checkpoint_path outputs/best_model.ckpt \
  --output_csv outputs/predictions.csv

3. Cross-Domain Adaptation (Few-shot)

Fine-tune a pre-trained model on a small target dataset (e.g., PDCH) to fix distribution shifts.

python fine_tune_on_pdch.py \
  --base_config configs/config.yaml \
  --target_manifest data/target_manifest.csv \
  --pretrained_ckpt outputs/source_best.ckpt

4. Post-hoc Calibration

Apply temperature scaling to calibrate prediction probabilities.

python kfold_posthoc_calibration.py --kfold_root outputs/kfold_results

📊 Results Summary

Scenario Dataset Metric Score Note
Baseline CMDC (In-domain) F1-Macro 98.7% High robustness
Direct Transfer PDCH (Zero-shot) F1-Macro 26.0% Catastrophic drop
Adaptation PDCH (Few-shot) F1-Macro 70.5% +44.5% Recovery

See the paper for full experimental details.

📜 Citation

If you find this code useful, please cite our paper:

@article{meng2025multimodal,
  title={A Multimodal Depression Detection Framework Based on Large-Scale Pre-trained Models: Cross-Domain Generalization and Clinical Deployment Strategies},
  author={Meng, Yu and Wang, Yihe and Ouyang, Zhiyuan and et al.},
  journal={Journal of Medical Systems},
  year={2025}
}

🤝 Acknowledgement

We thank the authors of Wav2Vec 2.0 and Transformers for their open-source contributions.

About

Official implementation for the paper 'A Multimodal Depression Detection Framework

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages