Intuitive understanding of the algorithm?

Hey authors! I find your KTO paper quite interesting and would like to explore its application in my work. I am here to see if I can have a better intuitive understanding of the algorithm especially how it compares with RL-based methods such as PPO or DPO. I could be wrong or missed some key points in the paper, and would appreciate if you can point out!

Here are some of my questions:

1. Why that specific form of `r_\theta`? I didn't find sentences talking about the relationship between human utility and the preference probability for a pair of sentences (which is the Bradley-Terry style). For me the formula of `r_\theta` just came out of air in definition 3.4 and (I think) a natural question is whether there is a better formulation of `r_\theta` that gives better result. Although it is explained how this definition is compared to classic prospect theory, I find it hard to understand why we should define it in nats like this.
2. Why a biased KL divergence works? It is hard to see the estimate is "good". The experiments shows empirically it works, but what it means? Does that mean the estimate is not really noisy, or it is the existence instead of the value of the baseline is important? 
3. How does KTO intuitively work? Although the 6th page has a paragraph talking about "Intuitively, KTO works as follows" but does it really make sense as we have a noisy estimate of KL and it does not have gradient flow? It's not punishing a large KL at all and a positive KL will make the model to favor a even larger `r_theta`. This should only make "the model increases
the reward of a desirable example in a blunt manner" even worse.

Thanks for reading and look forward to hearing back!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intuitive understanding of the algorithm? #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Intuitive understanding of the algorithm? #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions