Hi,
Thanks for open-sourcing the code for your work.
While reviewing the implementation and comparing it with the details in the paper, I noticed a discrepancy regarding the optimizer used for training. The paper states that the model was trained using the AdamW optimizer, but the current codebase appears to implement the standard Adam optimizer.
Hi,
Thanks for open-sourcing the code for your work.
While reviewing the implementation and comparing it with the details in the paper, I noticed a discrepancy regarding the optimizer used for training. The paper states that the model was trained using the AdamW optimizer, but the current codebase appears to implement the standard Adam optimizer.