Multimodal Depression Detection Framework: Cross-Domain Generalization & Clinical Deployment

This repository contains the official implementation of the paper "A Multimodal Depression Detection Framework Based on Large-Scale Pre-trained Models: Cross-Domain Generalization and Clinical Deployment Strategies" (submitted to Journal of Medical Systems).

We provide a robust, reproducible framework for detecting depression from speech and text using Wav2Vec 2.0 and Chinese RoBERTa, specifically designed to handle cross-domain shifts in heterogeneous clinical environments.

🌟 Key Features

Dual-Stream Architecture: Late fusion of acoustic (Wav2Vec 2.0) and semantic (RoBERTa) features to capture complementary clinical cues.
Cross-Domain Adaptation: A complete pipeline for Zero-shot Evaluation and Few-shot Fine-tuning, addressing the "generalization gap" in multi-center deployment.
Trustworthy AI: Integrated Post-hoc Calibration (Temperature Scaling) mechanism to ensure reliable risk probability outputs (minimizing ECE).
Reproducibility: Standardized data processing and training protocols with fixed seeds.

🛠️ Installation

Clone the repository

git clone https://github.com/yourusername/multimodal-depression-detection.git
cd multimodal-depression-detection

Create a virtual environment (Recommended)

conda create -n depression-detect python=3.10
conda activate depression-detect

Install dependencies
```
pip install -r requirements.txt
```

📂 Data Preparation

Note on Privacy: Due to strict medical data privacy regulations, the original clinical datasets (CMDC, EATD, PDCH) cannot be publicly released. Researchers should prepare their own datasets following the format below.

Expected Data Format (CSV Manifest)

Prepare a .csv file with the following columns:

Column	Description	Example
`audio_path`	Path to the `.wav` file	`data/audio/subject_001.wav`
`text`	Transcribed text	"我最近睡眠质量很差..."
`label`	0 (Control) or 1 (Depressed)	`1`
`split`	train / val / test	`train`

A sample manifest is provided in data/sample_manifest.csv.

🚀 Usage

1. In-Domain Training (Baseline)

Train the multimodal model on your source dataset (e.g., CMDC-like data).

python train.py --config configs/config.yaml

2. Inference & Zero-shot Evaluation

Evaluate a trained model on a target dataset.

python infer.py \
  --config configs/config.yaml \
  --checkpoint_path outputs/best_model.ckpt \
  --output_csv outputs/predictions.csv

3. Cross-Domain Adaptation (Few-shot)

Fine-tune a pre-trained model on a small target dataset (e.g., PDCH) to fix distribution shifts.

python fine_tune_on_pdch.py \
  --base_config configs/config.yaml \
  --target_manifest data/target_manifest.csv \
  --pretrained_ckpt outputs/source_best.ckpt

4. Post-hoc Calibration

Apply temperature scaling to calibrate prediction probabilities.

python kfold_posthoc_calibration.py --kfold_root outputs/kfold_results

📊 Results Summary

Scenario	Dataset	Metric	Score	Note
Baseline	CMDC (In-domain)	F1-Macro	98.7%	High robustness
Direct Transfer	PDCH (Zero-shot)	F1-Macro	26.0%	Catastrophic drop
Adaptation	PDCH (Few-shot)	F1-Macro	70.5%	+44.5% Recovery

See the paper for full experimental details.

📜 Citation

If you find this code useful, please cite our paper:

@article{meng2025multimodal,
  title={A Multimodal Depression Detection Framework Based on Large-Scale Pre-trained Models: Cross-Domain Generalization and Clinical Deployment Strategies},
  author={Meng, Yu and Wang, Yihe and Ouyang, Zhiyuan and et al.},
  journal={Journal of Medical Systems},
  year={2025}
}

🤝 Acknowledgement

We thank the authors of Wav2Vec 2.0 and Transformers for their open-source contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
configs		configs
data		data
models		models
scripts		scripts
utils		utils
README.md		README.md
analyze_results.py		analyze_results.py
fine_tune_on_pdch.py		fine_tune_on_pdch.py
generate_paper_figures.py		generate_paper_figures.py
infer.py		infer.py
kfold_posthoc_calibration.py		kfold_posthoc_calibration.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Depression Detection Framework: Cross-Domain Generalization & Clinical Deployment

🌟 Key Features

🛠️ Installation

📂 Data Preparation

Expected Data Format (CSV Manifest)

🚀 Usage

1. In-Domain Training (Baseline)

2. Inference & Zero-shot Evaluation

3. Cross-Domain Adaptation (Few-shot)

4. Post-hoc Calibration

📊 Results Summary

📜 Citation

🤝 Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multimodal Depression Detection Framework: Cross-Domain Generalization & Clinical Deployment

🌟 Key Features

🛠️ Installation

📂 Data Preparation

Expected Data Format (CSV Manifest)

🚀 Usage

1. In-Domain Training (Baseline)

2. Inference & Zero-shot Evaluation

3. Cross-Domain Adaptation (Few-shot)

4. Post-hoc Calibration

📊 Results Summary

📜 Citation

🤝 Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages