GRPO Experiments before integrated into ECG-Bench
- Base Model: Llama-3.2-3B-Instruct
- Dataset: ECG-Expert-QA
- Method: GRPO (Group Relative Policy Optimization)
RL/
├── data/ # Dataset files
│ ├── raw/ # Raw ECG-Expert-QA JSON files
│ ├── processed/ # Processed training data
│ └── system_prompt.txt # ECG QA expert system prompt
├── models/ # Model checkpoints
│ ├── base/ # Base Llama model
│ ├── sft/ # Supervised fine-tuned model
│ └── grpo/ # GRPO checkpoints
├── scripts/
│ ├── prepare_data.sh # Prepare data for SFT/RL
│ ├── sft_train.sh # SFT (LoRA)
│ ├── run_grpo.sh # GRPO
│ ├── streamlit_chat.sh # Run Interface
│ └── chat_with_model.sh # Quick script to chat with model
├── configs/ # Training configurations
├── utils/ # Helper functions
├── chat.py # Interactive chat with models
├── requirement.txt # Python dependencies
├── README.md
├── check_dependencies.py # Check dependencies
├── prepare_data.py # Data preprocessing
├── sft_train.py # SFT training
├── analyze_data_samples.py # Analyze data
└── streamlit_chat.py # Interface
Refer to ECG-Bench
Install the uv package manager via bash pip install uv.
For Torch
uv pip uninstall -y vllm torch torchvision torchaudio
uv pip install "torch>=2.6.0" "torchvision>=0.21.0" \
--extra-index-url https://download.pytorch.org/whl/cu124For base installation
uv pip install -e . --no-build-isolationFor installation with flash attention
uv pip install -e ".[flash]" --no-build-isolationFor installation with judge
uv pip install -e ".[judge]"For installation of all packages
uv pip install -e ".[all]" --no-build-isolationgit submodule add https://github.com/volcengine/verl.git verl
git submodule update --init --recursivecd verl
pip install --no-deps -e .
cd ..huggingface-cli logingit submodule add https://github.com/Zaozzz/ECG-Expert-QA data/raw/ECG-Expert-QA
git submodule update --init --recursiveRun the data preparation script:
bash scripts/prepare_data.shTraining Parameters:
--model_name: Base model (meta-llama/Llama-3.2-3B-Instruct)--num_epochs--batch_size--gradient_accumulation_steps--learning_rate--max_seq_length--lora_r--lora_alpha
bash scripts/sft_train.shhttps://github.com/volcengine/verl/blob/main/examples/grpo_trainer/README.md
Need to design reward function under "verl/verl/utils/reward_score/"
Also add this to "verl/verl/utils/reward_score/init.py"
elif data_source == "ecg_expert_qa":
from . import ecg_expert_qa
res = ecg_expert_qa.compute_score(solution_str, ground_truth, method="hybrid")bash scripts/run_grpo.shbash scripts/chat_with_model.shstreamlit run streamlit_chat.pyIssue: vLLM 0.8.4 has a bug that prevents LoRA training with VERL (AttributeError: 'LoRALRUCache' object has no attribute '_LRUCache__update')
Fix Applied: Patched ~/anaconda3/envs/rlhf/lib/python3.10/site-packages/vllm/utils.py lines 277-280
What was changed:
# Before (buggy):
def touch(self, key: _K) -> None:
self._LRUCache__update(key) # type: ignore
# After (fixed):
def touch(self, key: _K) -> None:
# Fix for LoRA LRU cache bug - use move_to_end instead
if key in self._LRUCache__order: # type: ignore