-
Notifications
You must be signed in to change notification settings - Fork 32.2k
Description
System Info
transformers 4.38.2
Python 3.10.19
platform Linux
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
training_args = TrainingArguments(
output_dir=config["trainer"]["save_dir"],
per_device_train_batch_size=config["data_loader"]["args"]["batch_size"],
per_device_eval_batch_size=32,
# gradient_accumulation_steps=4,
gradient_checkpointing=False,
logging_steps=500,
evaluation_strategy="steps",
eval_steps=500,
save_strategy="steps",
optim= "sgd",
optim_args="momentum=0.9,weight_decay=0.0005,nesterov=True,dampening=0.0",
learning_rate=config["optimizer"]["args"]["lr"],
# lr_scheduler_type="cosine_with_restarts",
# lr_scheduler_kwargs={"num_cycles": 2},
num_train_epochs=config["trainer"]["epochs"],
save_steps=config["trainer"]["save_ckpt_steps"],
save_total_limit=config["trainer"]["save_total_limit"],
report_to="tensorboard",
remove_unused_columns=False,
dataloader_num_workers=8,
ddp_find_unused_parameters=False,
dataloader_pin_memory=True,
prediction_loss_only=False,
load_best_model_at_end=False,
)
Expected behavior
The optim_args specified for the optimizer should be mapped correctly, but it is not making use of optimizer args for SGD