Skip to content

Commit 23df7f3

Browse files
authored
[template] fix vlm padding_free/logits_to_keep (#4444)
1 parent e9c3722 commit 23df7f3

File tree

4 files changed

+12
-9
lines changed

4 files changed

+12
-9
lines changed

docs/source/Instruction/命令行参数.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@
8181
- 🔥padding_free: 将一个batch中的数据进行展平而避免数据padding,从而降低显存占用并加快训练。默认为False。当前支持`swift pt/sft`
8282
- 注意:使用padding_free请结合`--attn_impl flash_attn`使用且"transformers>=4.44",具体查看[该PR](https://github.com/huggingface/transformers/pull/31629)。(同packing)
8383
- 支持的多模态模型与多模态packing支持情况相同。相较于packing,padding_free不额外消耗时间和空间。
84-
- Megatron-SWIFT默认使用padding_free,即`qkv_format='thd'`
84+
- Megatron-SWIFT默认使用padding_free,即`qkv_format='thd'`,不需要额外设置
8585
- padding_side: 当训练`batch_size>=2`时的padding_side,可选值为'left'、'right',默认为'right'。(推理时的batch_size>=2时,只进行左padding)。
8686
- loss_scale: 训练tokens的loss权重设置。默认为`'default'`,代表所有response(含history)以1计算交叉熵损失,忽略对应agent_template的`tool_response`的损失。可选值为'default'、'last_round'、'all'、'ignore_empty_think',以及agent需要的loss_scale: 'react'、'hermes'、'qwen'、'agentflan'、'alpha_umi'。agent部分可以查看[插件化](../Customization/插件化.md)[Agent文档](./Agent支持.md)
8787
- 'last_round': 只计算最后一轮response的损失。

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@ Hints:
8282
- 🔥padding_free: Flattens the data in a batch to avoid padding, thereby reducing memory usage and accelerating training. Default is False. Currently supports `swift pt/sft`.
8383
- Note: When using `padding_free`, it should be combined with `--attn_impl flash_attn` and "transformers>=4.44". For details, see [this PR](https://github.com/huggingface/transformers/pull/31629). (Same as packing)
8484
- The supported multimodal models are the same as those supported for multimodal packing. Compared to packing, padding_free does not consume additional time or space.
85-
- Megatron-SWIFT uses `padding_free` by default, i.e., `qkv_format='thd'`.
85+
- Megatron-SWIFT uses `padding_free` by default, i.e., `qkv_format='thd'`, and no additional configuration is required.
8686
- padding_side: Padding side when `batch_size>=2` during training. Options are 'left' and 'right', with 'right' as the default. (For inference with batch_size>=2, only left padding is applied.)
8787
- loss_scale: Weight setting for the loss of training tokens. Default is `'default'`, which means that all responses (including history) are used with a weight of 1 in cross-entropy loss, and the loss from the corresponding `tool_response` in the agent_template is ignored. Possible values include: 'default', 'last_round', 'all', 'ignore_empty_think', and agent-specific options: 'react', 'hermes', 'qwen', 'agentflan', 'alpha_umi'. For more details about the agent part, please refer to [Pluginization](../Customization/Pluginization.md) and [Agent Training](./Agent-support.md).
8888
- 'last_round': Only calculate the loss for the last round of response.

swift/llm/argument/train_args.py

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -132,14 +132,15 @@ def _init_lazy_tokenize(self):
132132
logger.info(f'Setting args.lazy_tokenize: {self.lazy_tokenize}')
133133

134134
def __post_init__(self) -> None:
135-
if (self.padding_free or self.packing) and self.attn_impl != 'flash_attn':
135+
if self.padding_free or self.packing:
136136
if self.packing:
137137
feature = 'packing'
138138
self.padding_free = False
139139
else:
140140
feature = 'padding_free'
141-
raise ValueError(f'The "{feature}" feature needs to be used in conjunction with "flash_attn". '
142-
'Please specify `--attn_impl flash_attn`.')
141+
if self.attn_impl != 'flash_attn':
142+
raise ValueError(f'The "{feature}" feature needs to be used in conjunction with "flash_attn". '
143+
'Please specify `--attn_impl flash_attn`.')
143144
if self.resume_from_checkpoint:
144145
self.resume_from_checkpoint = to_abspath(self.resume_from_checkpoint, True)
145146
if self.resume_only_model:

swift/llm/template/base.py

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ def __init__(
117117
self.mode: Literal['pt', 'vllm', 'lmdeploy', # infer
118118
'train', 'rlhf', 'kto', # train
119119
'seq_cls', 'embedding', 'prm'] = 'pt'
120-
self._packing = False
120+
self._packing = self.padding_free
121121
self.use_megatron = False
122122
self._handles = []
123123
self._deepspeed_initialize = None
@@ -1172,7 +1172,7 @@ def pre_forward_hook(self, model: nn.Module, args, kwargs):
11721172
old_kwargs = to_device(kwargs, model.device)
11731173
kwargs = to_device(self._post_encode(model, old_kwargs), model.device)
11741174
for k, v in old_kwargs.items():
1175-
if k in {'input_ids', 'attention_mask', 'labels', 'position_ids', 'output_hidden_states'
1175+
if k in {'input_ids', 'attention_mask', 'labels', 'position_ids', 'output_hidden_states', 'logits_to_keep'
11761176
} and k not in kwargs:
11771177
kwargs[k] = v
11781178
if 'inputs_embeds' in kwargs:
@@ -1359,9 +1359,11 @@ def _data_collator(self, batch: List[Dict[str, Any]], *, padding_to: Optional[in
13591359
assert self.tokenizer.pad_token_id is not None
13601360
padding_side = self.padding_side if self.is_training else 'left'
13611361
padding_right = padding_side == 'right'
1362-
packing_mode = self.use_megatron or self.padding_free or self._packing and 'position_ids' in batch[0]
1362+
packing_mode = self.use_megatron or self._packing
13631363
if self.padding_free:
1364-
batch = self._data_flatten(batch)
1364+
batch[:] = self._data_flatten(batch)
1365+
if self._packing:
1366+
assert 'position_ids' in batch[0], f'batch[0]: {batch[0]}'
13651367
res = {}
13661368
if packing_mode:
13671369
# only support llm

0 commit comments

Comments
 (0)