Skip to content

IAAR-Shanghai/RolePlay_LLMDoctor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 Dissecting Role Cognition in Medical LLMs via Neuronal Ablation

πŸ“˜ Overview

This repository supports the paper:

"Language Does Not Equal Cognition: Uniform Neural Patterns in Role-Conditioned Medical LLMs"

πŸ“˜ Abstract

Large language models (LLMs) have gained significant traction in medical decision support systems, particularly in the context of medical question answering and role-playing simulations. A common practice, Prompt-Based Role Playing (PBRP), instructs models to adopt different clinical roles (e.g., medical students, residents, attending physicians) to simulate varied professional behaviors. However, the impact of such role prompts on model reasoning capabilities remains unclear. This study introduces the RP-Neuron-Activated Evaluation Framework(RPNA) to evaluate whether role prompts induce distinct, role-specific cognitive processes in LLMs or merely modify linguistic style. We test this framework on three medical QA datasets, employing neuron ablation and representation analysis techniques to assess changes in reasoning pathways. Our results demonstrate that role prompts do not significantly enhance the medical reasoning abilities of LLMs. Instead, they primarily affect surface-level linguistic features, with no evidence of distinct reasoning pathways or cognitive differentiation across clinical roles. Despite superficial stylistic changes, the core decision-making mechanisms of LLMs remain uniform across roles, indicating that current PBRP methods fail to replicate the cognitive complexity found in real-world medical practice. This highlights the limitations of role-playing in medical AI and emphasizes the need for models that simulate genuine cognitive processes rather than linguistic imitation.


πŸ—‚οΈ Directory Structure

.
β”œβ”€β”€ dataset/                  # (You provide) test sets from MedQA, MedMCQA, MMLU-Med
β”œβ”€β”€ NeuronAnalysis/          # Core experimental pipeline
β”‚   β”œβ”€β”€ data_loader.py
β”‚   β”œβ”€β”€ model_utils.py
β”‚   β”œβ”€β”€ prompt_effects.py
β”‚   β”œβ”€β”€ utils.py
β”‚   β”œβ”€β”€ exp1_P.py / exp1_sig.py          # Exp1: QA Accuracy comparison
β”‚   β”œβ”€β”€ exp2_P.py / exp2_sig.py          # Exp2: JSD divergence analysis
β”‚   β”œβ”€β”€ exp3_P.py / exp3_sig.py          # Exp3: CKA/PCA for hierarchy perception
β”‚   β”œβ”€β”€ exp4.1_P.py / exp4.1_sig.py      # Exp4.1: Role-specific neuron masking
β”‚   β”œβ”€β”€ exp5cross_P.py / exp5cross_sig.py
β”‚   β”œβ”€β”€ exp5cross.py                     # Shared logic for Exp5
β”œβ”€β”€ Bloomclarify.py         # Bloom-level QA classification module
β”œβ”€β”€ Myutils.py              # I/O, formatting, and shared utilities
β”œβ”€β”€ requirements.txt        # Python environment requirements
└── README.md               # This file

πŸš€ Getting Started

🟦 1 Installation

Ensure Python 3.10+ is available. Then install dependencies:

pip install -r requirements.txt

🟦 2 Model Setup

This project uses instruction-tuned LLMs (mainly Qwen2.5 series). You may need:

Local weights for Qwen2.5-7B/14B/32B/72B-Instruct

Optionally: API access to GPT-4o, Deepseek-R1

🟦 3 Run Example

Each experiment has a main logic file and a significance test file. For example:

cd NeuronAnalysis

python exp1_P.py       # Run accuracy evaluation

About

Dissecting Role Cognition in Medical LLMs via Neuronal Ablation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages