-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Based on the paper, Reward is D(y') - MSE. It's confusing as the reward should be based on how good the Generator is able to fool the Discriminator i.e. how close the D(y) and D(y') are rather than the absolute value of D(y'). As the discriminator values are not scaled the value of D(y') can keep on increasing. Shouldn't the reward be something like 1/ ||D(y') - D(y)||
Can you elaborate on this point or provide some reference for this?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels