-
Notifications
You must be signed in to change notification settings - Fork 35
Open
Description
In the train_AREL.py. When calculate the Reward for training the generator:
rewards = Variable(gen_score.data - 0 * normed_seq_log_probs.data)
why you minus the 0 * normed_seq_log_probs.data? in the commit history, i notice you use the 0.0001 * normed_seq_log_probs.data.
In the original paper, i think it corresponding to the Eq(9) and the normed_seq_log_probs might be the log π(W), so the coefficient should be 1. Could you tell me your reason?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels