-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Thanks for your wonderful work!
When i am training Qwen3 models, the error occured as follow:
in RerankerTrainCollator.call(self, features)
8 def call(self, features):
9 student_features, teacher_features = zip(*features)
---> 10 student_collated = self.tokenizer_student.pad(sum(student_features, []),
11 padding='max_length',
12 max_length=self.max_q_len + self.max_p_len,
13 return_tensors="pt"
14 )
15 teacher_collated = self.tokenizer_teacher.pad(sum(teacher_features, []),
16 padding='max_length',
17 max_length=self.max_q_len + self.max_p_len,
18 return_tensors="pt"
19 )
20 return {"student": student_collated, "teacher": teacher_collated}
File /usr/local/lib/python3.10/dist-packages/transformers/tokenization_utils_base.py:3541, in PreTrainedTokenizerBase.pad(self, encoded_inputs, padding, max_length, pad_to_multiple_of, padding_side, return_attention_mask, return_tensors, verbose)
3539 return_tensors = "np" if return_tensors is None else return_tensors
3540 else:
-> 3541 raise ValueError(
3542 f"type of {first_element} unknown: {type(first_element)}. "
3543 "Should be one of a python, numpy, pytorch or tensorflow object."
3544 )
3546 for key, value in encoded_inputs.items():
3547 encoded_inputs[key] = to_py_obj(value)
ValueError: type of None unknown: <class 'NoneType'>. Should be one of a python, numpy, pytorch or tensorflow object.
The related code snippets is
@DataClass
class RerankerTrainCollator:
tokenizer_student: PreTrainedTokenizer
tokenizer_teacher: PreTrainedTokenizer
max_q_len: int = 32
max_p_len: int = 196
def __call__(self, features):
student_features, teacher_features = zip(*features)
student_collated = self.tokenizer_student.pad(sum(student_features, []),
padding='max_length',
max_length=self.max_q_len + self.max_p_len,
return_tensors="pt"
)
teacher_collated = self.tokenizer_teacher.pad(sum(teacher_features, []),
padding='max_length',
max_length=self.max_q_len + self.max_p_len,
return_tensors="pt"
)
return {"student": student_collated, "teacher": teacher_collated}
It seems that the Qwen3 tokenizer (Qwen2TokenizerFast) can not use this method , how can i solve this problem, thank you very much!