Skip to content

Conversation

@jaffourt
Copy link

No description provided.

Comment on lines +80 to +82
grad_x = clamp!(gradient[x], -args.clip, args.clip)
# backprop
x .-= lr .* grad_x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the spirit of promoting best practices, we have https://fluxml.ai/Flux.jl/stable/training/optimisers/#Flux.Optimise.ClipValue and the rest of https://fluxml.ai/Flux.jl/stable/training/optimisers/ for this. I imagine you'd want to use something more sophisticated than plain SGD anyhow.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ToucheSir I opted out of the library methods as this explicit calculation provided better performance. However, for the promotion of best practices I agree and I can switch to using the library methods.


# logit cross entropy loss function
function loss(x, y)
Flux.reset!(model)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would highly recommend moving this outside of the loss function (i.e. into the training loop).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ToucheSir Can you elaborate on this? My understanding is that the loss function is called for each batch in a single training loop, thus for a sequential language model such as this one we want to reset the hidden state after each batch.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the suggestion is to keep it inside the for batch in data_loader loop, but outside the gradient call.

hold_out = zip(x_train[end-5:end], y_train[end-5:end])

# used for updating hyperparameters
best_val_loss = nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
best_val_loss = nothing
local best_val_loss

Co-authored-by: Brian Chen <[email protected]>
@mcabbott mcabbott added the new model new arrival in the zoo label Dec 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new model new arrival in the zoo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants