You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Post-training alignment often reduces LLM diversity, leading to a phenomenon known as <em>mode collapse</em>.
640
638
Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: <em>typicality bias</em> in preference data,
@@ -649,9 +647,49 @@ export default function HomePage() {
649
647
</div>
650
648
</div>
651
649
</div>
650
+
</section>
651
+
652
+
{/* Verbalized Sampling: Title & Description left, install/code right */}
@@ -745,24 +783,24 @@ export default function HomePage() {
745
783
<strong>Figure 4:</strong> Qualitative and quantitative examples of Verbalized Sampling on creative writing, dialogue simulation, and enumerative open-ended QA.
Our comprehensive experiments on multiple tasks demonstrate that Verbalized Sampling significantly improves the diversity-quality trade-off across tasks and model families,
751
-
without compromising factual accuracy and safety.
752
-
</p>
753
-
<p>
754
-
As shown in Figure 4, for <strong>story writing</strong>, VS improves the output diversity.
755
-
For <strong>dialogue simulation</strong>, VS simulates the donation amount distribution much closer to the human distribution, and generates more realistic persuasion behaviors.
756
-
On the task of <strong>enumerative open-ended QA</strong>, we ask the model to "generate US states". We first query a pretraining corpus (RedPajama) to establish a "reference" distribution of US
757
-
state names in the pretraining data. The verbalized probability distribution generated by VS, when averaged over 10 trials, closely aligns with this reference pretraining distribution (KL=0.12).
758
-
In contrast, direct prompting collapses into a few modes, repeatedly outputting states like California and Texas.
Our comprehensive experiments on multiple tasks demonstrate that Verbalized Sampling significantly improves the diversity-quality trade-off across tasks and model families,
789
+
without compromising factual accuracy and safety.
790
+
</p>
791
+
<p>
792
+
As shown in Figure 4, for <strong>story writing</strong>, VS improves the output diversity.
793
+
For <strong>dialogue simulation</strong>, VS simulates the donation amount distribution much closer to the human distribution, and generates more realistic persuasion behaviors.
794
+
On the task of <strong>enumerative open-ended QA</strong>, we ask the model to "generate US states". We first query a pretraining corpus (RedPajama) to establish a "reference" distribution of US
795
+
state names in the pretraining data. The verbalized probability distribution generated by VS, when averaged over 10 trials, closely aligns with this reference pretraining distribution (KL=0.12).
796
+
In contrast, direct prompting collapses into a few modes, repeatedly outputting states like California and Texas.
Copy file name to clipboardExpand all lines: app/(dashboard)/terminal_prompt.tsx
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -9,8 +9,8 @@ export function Terminal_Prompt() {
9
9
constterminalSteps=[
10
10
{line: 'You are a helpful assistant. For each user query, generate a set of five responses. Each response should be approximately 200 words.',showPrompt: true},
11
11
{line: 'Return the responses each within a separate <response> tag.',showPrompt: false},
12
-
{line: 'Each <response> tag include a <text> and a numeric <probability>.',showPrompt: false},
13
-
{line: 'Please sample at random from the full distribution.',showPrompt: false},
12
+
{line: 'Each <response> tag must include a <text> and a numeric <probability>.',showPrompt: false},
13
+
{line: 'Randomly sample the responses from the full distribution.',showPrompt: false},
14
14
{line: '<user_query>Write a short story about a bear.</user_query>',showPrompt: true},
0 commit comments