Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
199 changes: 199 additions & 0 deletions F2LLM/MULTI_MODEL_GUIDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@
# F2LLM 多模型支持使用指南

## 概述

修改后的F2LLM现在支持多种decoder-only模型,包括Qwen、LLaMA、Baichuan、ChatGLM等系列模型。

## 支持的模型

### 已测试模型
- **Qwen系列**: Qwen-7B, Qwen-14B, Qwen3-4B等
- **LLaMA系列**: LLaMA-7B, LLaMA2-13B等
- **Baichuan系列**: Baichuan-13B, Baichuan2-13B等
- **ChatGLM系列**: ChatGLM-6B, ChatGLM2-6B等

### 理论支持的模型
任何基于transformers库的decoder-only模型都应该可以工作,包括:
- GPT系列
- CodeT5+
- CodeGen
- StarCoder
- 以及其他自定义decoder-only模型

## 使用方法

### 1. 模型配置

修改配置文件 `configs/config.json`:

```json
{
"model_path": "your-model-path",
"model_type": "auto", // 可选: auto, qwen, llama, baichuan等
"attn_implementation": "flash_attention_2", // flash_attention_2, sdpa, null
"use_flash_attention": true,
// ... 其他配置
}
```

#### 配置说明

- **model_path**: 模型路径或HuggingFace模型名称
- **model_type**: 模型类型,用于自动适配特殊处理
- **attn_implementation**: 注意力实现方式
- `"flash_attention_2"`: 使用Flash Attention 2(最快,但需要支持)
- `"sdpa"`: 使用PyTorch的Scaled Dot Product Attention
- `null`: 不使用特殊注意力实现
- **use_flash_attention**: 是否尝试使用flash attention

### 2.获取训练数据
#### 方案1:使用huggingface-cli

如果您想使用原始的huggingface-cli命令:

```bash
# 安装huggingface-hub
pip install huggingface-hub

# 从huggingface中下载训练数据,若遇网络问题,可以考虑使用镜像
export HF_ENDPOINT=https://hf-mirror.com
python -m huggingface_hub.cli download codefuse-ai/F2LLM --repo-type dataset --local-dir training_data --include "*.parquet"
```

#### 方案2:手动下载

1. 访问网站:https://huggingface.co/datasets/codefuse-ai/F2LLM
2. 手动下载.parquet文件
3. 保存到 `training_data/` 目录

### 3. 数据预处理

使用通用分词脚本处理数据:

```bash
# 基础用法
python tokenize_data.py --model_path "meta-llama/Llama-2-7b-hf" --max_seq_length 1023

# 完整参数
python tokenize_data.py \
--model_path "baichuan-inc/Baichuan2-13B-Base" \
--max_seq_length 1023 \
--data_dir "training_data" \
--output_dir "data_tokenized" \
--num_processes 16
```

### 4. 训练

```bash
# 单GPU训练
accelerate launch --config_file configs/accelerate_config.yaml run.py --config configs/config.json

# 多GPU训练
accelerate launch --config_file configs/accelerate_config.yaml --num_processes 8 run.py --config configs/config.json
```

## 模型特定配置

### LLaMA模型
```json
{
"model_path": "meta-llama/Llama-2-7b-hf",
"model_type": "llama",
"attn_implementation": "sdpa",
"use_flash_attention": true,
"max_seq_length": 2048
}
```

### Baichuan模型
```json
{
"model_path": "baichuan-inc/Baichuan2-13B-Base",
"model_type": "baichuan",
"attn_implementation": "flash_attention_2",
"use_flash_attention": true,
"max_seq_length": 2048
}
```

### ChatGLM模型
```json
{
"model_path": "THUDM/chatglm3-6b-base",
"model_type": "chatglm",
"attn_implementation": null,
"use_flash_attention": false,
"max_seq_length": 2048
}
```

## 故障排除

### 常见问题

1. **Flash Attention不支持**
- 错误信息: `FlashAttention only supports Ampere GPUs or newer.`
- 解决: 设置 `"use_flash_attention": false` 或 `"attn_implementation": "sdpa"`

2. **内存不足**
- 减小 `train_batch_size`
- 减小 `max_seq_length`
- 使用梯度累积

3. **模型加载失败**
- 确保模型路径正确
- 检查网络连接(如果是HF模型)
- 查看具体的错误信息,调整注意力配置

### 调试建议

1. **逐步测试**
```bash
# 先测试模型加载
python -c "from transformers import AutoModel; model = AutoModel.from_pretrained('your-model')"

# 再测试分词
python tokenize_data.py --model_path "your-model" --num_processes 1
```

2. **查看日志**
- 修改后的代码会输出详细的加载信息
- 关注警告信息,它们通常包含有用的回退信息

3. **性能优化**
- 优先使用Flash Attention 2(如果硬件支持)
- 使用SDPA作为第二选择
- 禁用特殊注意力实现作为最后手段

## 性能对比

| 模型 | 注意力实现 | 训练速度 | 内存使用 | 兼容性 |
|------|------------|----------|----------|---------|
| Qwen3-4B | flash_attention_2 | ★★★★★ | ★★★★★ | ★★★★☆ |
| LLaMA2-7B | sdpa | ★★★★☆ | ★★★★☆ | ★★★★★ |
| Baichuan2-13B | flash_attention_2 | ★★★★★ | ★★★★☆ | ★★★☆☆ |
| ChatGLM3-6B | default | ★★★☆☆ | ★★★☆☆ | ★★★★★ |

## 扩展支持

如果需要支持新的模型类型,可以:

1. 在 `model.py` 中添加模型特定的处理逻辑
2. 在配置文件中添加相应的模型类型标识
3. 测试并验证兼容性

## 注意事项

1. **模型许可**: 确保你有权使用指定的模型
2. **硬件要求**: 大型模型需要更多GPU内存
3. **数据格式**: 确保训练数据格式与模型要求一致
4. **分词器兼容性**: 不同模型可能使用不同的分词器

## 技术支持

如遇到问题,请提供以下信息:
- 模型名称和版本
- 完整的错误日志
- 硬件配置(GPU型号、内存等)
- 配置文件内容
4 changes: 4 additions & 0 deletions F2LLM/arguments.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,10 @@ class Args:
log_interval: int = 20
checkpointing_steps: int = 100
validation_steps: int = 100
# model configuration
model_type: str = "auto" # auto, qwen, llama, baichuan, etc.
attn_implementation: str = "flash_attention_2" # flash_attention_2, sdpa, None
use_flash_attention: bool = True
# just placeholder, for logging purpose
num_processes: int=0

Expand Down
5 changes: 4 additions & 1 deletion F2LLM/configs/config.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{
"model_path": "models/qwen3-4b",
"model_type": "qwen",
"experiment_id": "4b+lr.8e-6+bs.16x32+context.1024+2epochs",
"train_data_path": "training_data/data_tokenized_qwen",
"output_dir": "output",
Expand All @@ -15,5 +16,7 @@
"warmup_steps": 500,
"train_epochs": 2,
"log_interval": 100,
"num_hard_neg": 7
"num_hard_neg": 7,
"attn_implementation": "flash_attention_2",
"use_flash_attention": true
}
22 changes: 22 additions & 0 deletions F2LLM/configs/config_gpt_demo.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
{
"model_path": "microsoft/DialoGPT-medium",
"model_type": "gpt2",
"experiment_id": "gpt-final-fix",
"train_data_path": "data_tokenized/data_tokenized_DialoGPT-medium",
"output_dir": "output",
"tb_dir": "output/tb",
"cache_dir": "cache",
"train_batch_size": 1,
"checkpointing_steps": 10,
"validation_steps": 10,
"max_seq_length": 128,
"learning_rate": 1e-4,
"min_lr": 1e-6,
"weight_decay": 0.01,
"warmup_steps": 5,
"train_epochs": 1,
"log_interval": 1,
"num_hard_neg": 1,
"attn_implementation": null,
"use_flash_attention": false
}
98 changes: 91 additions & 7 deletions F2LLM/model.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
import torch
from transformers import AutoModel, AutoTokenizer
from transformers import AutoModel, AutoTokenizer, GPT2LMHeadModel, AutoModelForCausalLM
import warnings


class F2LLM:
Expand All @@ -12,9 +13,80 @@ def __init__(self,
self.args = args
self.dtype = torch.bfloat16
self.device = None # set after accelerator.prepare
self.lm = AutoModel.from_pretrained(model_path, trust_remote_code=True, torch_dtype=self.dtype, attn_implementation='flash_attention_2')

# 根据配置选择注意力实现方式
attn_implementation = getattr(args, 'attn_implementation', 'flash_attention_2') if args else 'flash_attention_2'
use_flash_attention = getattr(args, 'use_flash_attention', True) if args else True

# 尝试加载模型,支持多种decoder-only模型
try:
if use_flash_attention and attn_implementation:
# 使用配置的注意力实现
self.lm = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=self.dtype,
attn_implementation=attn_implementation
)
else:
# 不使用特殊注意力实现
self.lm = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=self.dtype
)
except Exception as e:
if use_flash_attention and attn_implementation:
warnings.warn(f"Failed to load model with {attn_implementation}: {e}. Trying fallback options...")

# 回退策略
fallback_options = ['sdpa', None] # 尝试sdpa,然后是不使用特殊注意力
loaded = False

for fallback_attn in fallback_options:
try:
if fallback_attn:
self.lm = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=self.dtype,
attn_implementation=fallback_attn
)
else:
self.lm = AutoModelForCausalLM.from_pretrained(
model_path,
trust_remote_code=True,
torch_dtype=self.dtype
)
warnings.warn(f"Successfully loaded model with {fallback_attn or 'default'} attention")
loaded = True
break
except Exception as e2:
warnings.warn(f"Failed to load model with {fallback_attn or 'default'} attention: {e2}")
continue

if not loaded:
raise RuntimeError(f"Failed to load model {model_path} with any attention implementation")

self.lm.config.use_cache = False
self.tokenizer = AutoTokenizer.from_pretrained(model_path)

# 加载分词器,添加trust_remote_code支持更多模型
self.tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True,
padding_side='right' # 大多数decoder-only模型需要右侧填充
)

# 确保分词器有pad_token
if self.tokenizer.pad_token is None:
if self.tokenizer.eos_token is not None:
self.tokenizer.pad_token = self.tokenizer.eos_token
else:
# 添加新的pad_token
self.tokenizer.add_special_tokens({'pad_token': '[PAD]'})
# 需要调整模型embedding大小
self.lm.resize_token_embeddings(len(self.tokenizer))

self.max_seq_length = max_seq_length

def set_device(self):
Expand All @@ -24,11 +96,23 @@ def forward(self, batch):
bs = batch['bs']
num_hard_neg = int((len(batch['input_ids']) - 2*bs) / bs)

outputs = self.lm(batch['input_ids'],
batch['attention_mask'],
)
outputs = self.lm(
input_ids=batch['input_ids'],
attention_mask=batch['attention_mask'],
return_dict=True,
output_hidden_states=True
)

# 对于CausalLM模型,获取最后一层的隐藏状态
if hasattr(outputs, 'hidden_states') and outputs.hidden_states is not None:
# hidden_states是一个元组,包含所有层的隐藏状态
passage_features_all_tokens = outputs.hidden_states[-1]
elif hasattr(outputs, 'last_hidden_state'):
passage_features_all_tokens = outputs.last_hidden_state
else:
# 回退到使用transformer的输出
passage_features_all_tokens = outputs[0]

passage_features_all_tokens = outputs.last_hidden_state
return {
'query_passage_features': torch.stack([passage_features_all_tokens[i, [batch['seq_lens'][i]-1]] for i in range(bs)]),
'passage_passage_features': torch.stack([passage_features_all_tokens[i, [batch['seq_lens'][i]-1]] for i in range(bs, 2*bs)]),
Expand Down
Loading