Making sure credence is properly calibrated and conveyed using hedge words

One of the complaints about ChatGPT is it's overconfidence. ChatGPT is probably better than previous assistants in this regard, but I have an idea for how open assistant might be able do better!

1. We train a small model on predicting how confident a claim is in terms of *credence*, which is essentially "How likely does the speaker think this is true, as a probability?". A potential starting point is the research described [here](https://hbr.org/2018/07/if-you-say-something-is-likely-how-likely-do-people-think-it-is#:~:text=The%20exhibit%20below%20summarizes%20the%20results%20from%201%2C700%20respondents.).
2. The assistant's neural network will generate a credence for each claim. We train these two aspects (at the same time as the RLHF step):
   1. The wording of the claim is such that it's credence as judged by the model in (1) is close to the credence generated by the assistance.
   2. The credence itself is [calibrated](https://acritch.com/credence-game). This means, for example, the 80% of the assistant's 80% credence claims will be correct. The scoring rule is just log(p) if the claim is true and log(1-p) if the claim is false (i.e. it is just cross entropy). (We ask the human if the claim is true or not during the human feedback phase.) This scoring rule incentives both better knowledge but also more accurate credence.

The important bit about credence calibration is that it gives a very *large* punishment if a high credence claim is incorrect. So even though humans typically prefer confident claims, the assistant still learns to hedge it's bets to avoid the possibility of a large credence penality. (The reward for correct claims is slightly higher for high credences though, so it's still optimal to give high credence to obvious claims.)

(A question is whether we treat the entire response as a single claim, or split it up using NLP (perhaps part of the model in (1)).)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making sure credence is properly calibrated and conveyed using hedge words #848

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Making sure credence is properly calibrated and conveyed using hedge words #848

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions