Automatic analysis of cholecystectomy surgical videos has significant clinical value. However, current models are limited to simple tasks like single-frame phase recognition and multi-tool classification, failing to effectively utilize video context for complex clinical reasoning. They lack the ability to integrate medical textual knowledge with cholecystectomy images and long surgical videos. We propose CholecMamba, a model that compresses video feature sequences through the Mamba architecture and deeply integrates with large-scale reasoning language models to achieve multimodal reasoning capabilities for surgical videos. Our main contributions include:
- Designing a novel architecture that enables visual feature compression and knowledge feature injection, supporting multi-task video analysis of varying lengths.
- Innovatively incorporating segmentation category information generated by large language models into the decoder, enhancing surgical video understanding and reasoning segmentation capabilities through medical knowledge logical reasoning.
- Proposing the Surgical Reasoning Synthesis method, which leverages physician annotations and reinforcement learning with large language models to create the CholecReason dataset containing 49K multi-round dialogues, establishing a new benchmark for surgical video understanding and reasoning segmentation.
Our model achieves optimal performance on existing datasets and CholecReason, with a closed-test score of 0.822, significantly outperforming the best competing model’s score of 0.728.
CholecMamba_Demo.mp4
This is not the full version of the repository. Some code is currently being refined and will be released once it has been validated and reconstructed to ensure usability.
To install the necessary dependencies, run:
pip install -r requirements.txt
Detailed instructions for each stage can be found within their respective folders. To test the complete model, navigate to the test
directory and follow the instructions in the README file provided there.
We welcome contributions from the community. Please fork the repository and submit a pull request with your changes. Ensure your code adheres to our style guidelines and includes appropriate tests.
This project is licensed under the Apache License. See the LICENSE file for more details.
For any questions or inquiries, please contact us at