HiCur-NPC: Hierarchical Feature Fusion Curriculum Learning for Multi-Modal Foundation Model in Nasopharyngeal Carcinoma

News

🎉 October 29: Updated the complete data collection and organization process, providing detailed download and usage plans for our dataset! 📊 For more information, check out the Data Update Documentation.

🎥 October 30: Added a video demo showcasing model inference.

🚀 January 5: Updated the method for quickly migrating HiCur-NPC to other data tasks. Using CXR data as an example, we developed HiCur-CXR.

Demo

📽️ Demonstration Video: Check out our demonstration video to see the HiCur-NPC model in action! This video showcases the model's capabilities and how it processes nasopharyngeal carcinoma data.

HiCur-with-CC.mp4

Providing precise and comprehensive diagnostic information to clinicians is crucial for improving the treatment and prognosis of nasopharyngeal carcinoma. Multi-modal foundation models, which can integrate data from various sources, have the potential to significantly enhance clinical assistance. However, several challenges remain:

The lack of large-scale visual-language datasets for nasopharyngeal carcinoma.
Existing pre-training and fine-tuning methods that cannot learn the necessary hierarchical features for complex clinical tasks.
Current foundation models having limited visual perception due to inadequate integration of multi-modal information.

While curriculum learning can improve a model's ability to handle multiple tasks through systematic knowledge accumulation, it still lacks consideration for hierarchical features and their dependencies, affecting knowledge gains. To address these issues, we propose the Hierarchical Feature Fusion Curriculum Learning (HFFCL) method, which consists of three stages:

Visual Knowledge Learning (Stage I): We introduce the Hybrid Contrastive Masked Autoencoder (HCMAE) to pre-train visual encoders on 755K multi-modal images of nasopharyngeal carcinoma CT, MRI, and endoscopy to fully extract deep visual information.
Coarse-Grained Alignment (Stage II): We construct a 65K visual instruction fine-tuning dataset based on open-source data and clinician diagnostic reports, achieving coarse-grained alignment with visual information in a large language model.
Fine-Grained Fusion (Stage III): We design a Mixture of Experts Cross Attention structure for deep fine-grained fusion of global multimodal information.

Our model outperforms previously developed specialized models in all key clinical tasks for nasopharyngeal carcinoma, including diagnosis, report generation, tumor segmentation, and prognosis.

Repository Structure

StageI-HCMAE: Contains code and resources for visual knowledge learning using the Hybrid Contrastive Masked Autoencoder.
StageII-CGA: Includes scripts and datasets for coarse-grained alignment.
StageIII-FGF: Hosts the implementation for fine-grained fusion using the Mixture of Experts Cross Attention structure.
test: Provides the complete model architecture and inference examples.

Note

This is not the full version of the repository. Some code is currently being refined and will be released once it has been validated and reconstructed to ensure usability.

Installation

To install the necessary dependencies, run:

pip install -r requirements.txt

Usage

Detailed instructions for each stage can be found within their respective folders. To test the complete model, navigate to the test directory and follow the instructions in the README file provided there.

Contributing

We welcome contributions from the community. Please fork the repository and submit a pull request with your changes. Ensure your code adheres to our style guidelines and includes appropriate tests.

License

This project is licensed under the Apache License. See the LICENSE file for more details.

Contact

For any questions or inquiries, please contact us at [email protected].

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HiCur-NPC: Hierarchical Feature Fusion Curriculum Learning for Multi-Modal Foundation Model in Nasopharyngeal Carcinoma

News

Demo

Repository Structure

Note

Installation

Usage

Contributing

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Data		Data
HiCur-CXR		HiCur-CXR
StageI-HCMAE		StageI-HCMAE
StageII-CGA		StageII-CGA
StageIII-FGF		StageIII-FGF
images		images
test		test
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

displaywz/HiCur-NPC-preview

Folders and files

Latest commit

History

Repository files navigation

HiCur-NPC: Hierarchical Feature Fusion Curriculum Learning for Multi-Modal Foundation Model in Nasopharyngeal Carcinoma

News

Demo

Repository Structure

Note

Installation

Usage

Contributing

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages