fix: Move deprecated positional arguments from SFTTrainer to SFTConfig (#399)

Luka-D · willmj · kmehant · web-flow · commit a098e702972e · 2024-12-13T12:59:39.000+05:30
* fix: set legacy behavior to false, enable new behavior Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Resolve push_to_hub_token warning Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Remove max_seq_length and dataset_text_field from SFTTrainer Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fmt Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Resolve tokenizer.padding_side warning Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * nit: restructure warning fixes Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fix: Add packing directly to SFTConfig Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * fmt Signed-off-by: Will Johnson <mwjohnson728@gmail.com> * Removed dataset_kwargs from SFTTrainer Removed the argument dataset_kwargs from the the invocation of SFTTRainer() because it will be deprecated in V1.0.0. Instead, dataset_kwargs have been added as a key to the training_args variable. Following the example provided by HF found here: https://huggingface.co/docs/trl/en/sft_trainer#training-the-vision-language-model Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * fix: Added max_seq_length back to SFTConfig() Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed legacy and padding_side args Removed these args as they were based on changes from @willmj that haven't been approved yet Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Moved all args to additional_args Following @kmehant suggestion. Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed packing and max_seq_length Removed packing and max_seq_length variables from additional_args Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed check is_pretokenized_dataset Co-authored-by: Mehant Kammakomati <kmehant@gmail.com> Signed-off-by: Luka-D <56648891+Luka-D@users.noreply.github.com> * Removed max_seq_length from additional_args Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * Removed error.log Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> * fix: move packing to SFTConfig as well Co-authored-by: Luka-D <56648891+Luka-D@users.noreply.github.com> Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> --------- Signed-off-by: Will Johnson <mwjohnson728@gmail.com> Signed-off-by: Luka Dojcinovic <56648891+Luka-D@users.noreply.github.com> Signed-off-by: Luka-D <56648891+Luka-D@users.noreply.github.com> Signed-off-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com> Co-authored-by: Will Johnson <mwjohnson728@gmail.com> Co-authored-by: Mehant Kammakomati <kmehant@gmail.com> Co-authored-by: Mehant Kammakomati <mehant.kammakomati2@ibm.com>
diff --git a/tuning/sft_trainer.py b/tuning/sft_trainer.py
@@ -318,27 +318,29 @@ def train(
     # this validation, we just drop the things that aren't part of the SFT Config and build one
     # from our object directly. In the future, we should consider renaming this class and / or
     # not adding things that are not directly used by the trainer instance to it.
+
     transformer_train_arg_fields = [x.name for x in dataclasses.fields(SFTConfig)]
     transformer_kwargs = {
         k: v
         for k, v in train_args.to_dict().items()
         if k in transformer_train_arg_fields
     }
-    training_args = SFTConfig(**transformer_kwargs)
+
+    additional_args = {
+        "dataset_text_field": dataset_text_field,
+        "dataset_kwargs": dataset_kwargs,
+    }
+    training_args = SFTConfig(**transformer_kwargs, **additional_args)
 
     trainer = SFTTrainer(
         model=model,
         tokenizer=tokenizer,
         train_dataset=formatted_train_dataset,
         eval_dataset=formatted_validation_dataset,
-        packing=train_args.packing,
         data_collator=data_collator,
-        dataset_text_field=dataset_text_field,
         args=training_args,
-        max_seq_length=max_seq_length,
         callbacks=trainer_callbacks,
         peft_config=peft_config,
-        dataset_kwargs=dataset_kwargs,
     )
 
     # We track additional metrics and experiment metadata after trainer object creation