Skip to content

Commit 5556348

Browse files
committed
sync link and the title, etc
1 parent 1b552a3 commit 5556348

File tree

4 files changed

+13
-240
lines changed

4 files changed

+13
-240
lines changed

@

Whitespace-only changes.

app/(dashboard)/layout.tsx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ function UserMenu() {
4848
</Link>
4949
</Button>
5050
<Button asChild className="rounded-full">
51-
<Link href="https://arxiv.org/pdf/2502.00640" className="flex items-center gap-2">
51+
<Link href="https://arxiv.org/pdf/2510.01171" className="flex items-center gap-2">
5252
<FileText size={16} />
5353
Paper
5454
</Link>
@@ -63,7 +63,7 @@ function Header() {
6363
<div className="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8 py-4 flex justify-between items-center">
6464
<Link href="/" className="flex items-center">
6565
<CircleIcon className="h-6 w-6 text-orange-500" />
66-
<span className="ml-2 text-xl font-semibold text-gray-900">CollabLLM</span>
66+
<span className="ml-2 text-xl font-semibold text-gray-900">Verbalized Sampling</span>
6767
</Link>
6868
<div className="flex items-center space-x-4">
6969
<Suspense fallback={<div className="h-9" />}>

app/(dashboard)/page.tsx

Lines changed: 11 additions & 238 deletions
Original file line numberDiff line numberDiff line change
@@ -547,33 +547,30 @@ export default function HomePage() {
547547
{/* Paper Title and Authors Section */}
548548
<div className="text-center">
549549
<h1 className="text-4xl font-bold text-gray-700 tracking-tight sm:text-5xl mb-6">
550-
CollabLLM: From Passive Responders to Active Collaborators
550+
Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity
551551
</h1>
552552

553553
<div className="text-xl text-gray-600 mb-2 max-w-5xl mx-auto leading-relaxed">
554554
<div className="mb-1">
555-
<a href="https://cs.stanford.edu/~shirwu/" className="text-blue-400 hover:text-blue-400 transition-colors">Shirley Wu</a><sup className="text-orange-500">1</sup>,{' '}
556-
<a href="https://www.microsoft.com/en-us/research/people/mgalley/" className="text-blue-400 hover:text-blue-400 transition-colors">Michel Galley</a><sup className="text-orange-500">2</sup>,{' '}
557-
<a href="https://www.microsoft.com/en-us/research/people/baolinpeng/" className="text-blue-400 hover:text-blue-400 transition-colors">Baolin Peng</a><sup className="text-orange-500">2</sup>,{' '}
558-
<a href="https://sites.google.com/site/hcheng2site" className="text-blue-400 hover:text-blue-400 transition-colors">Hao Cheng</a><sup className="text-orange-500">2</sup>,{' '}
559-
<a href="https://scholar.google.com/citations?user=jJglcU8AAAAJ&hl=en" className="text-blue-400 hover:text-blue-400 transition-colors">Gavin Li</a><sup className="text-orange-500">1</sup>,{' '}
560-
<a href="https://yao-dou.github.io/" className="text-blue-400 hover:text-blue-400 transition-colors">Yao Dou</a><sup className="text-orange-500">3</sup>,{' '}
561-
<a href="https://www.linkedin.com/in/wilsoncai" className="text-blue-400 hover:text-blue-400 transition-colors">Weixin Cai</a><sup className="text-orange-500">1</sup>
555+
<a href="https://jiayizx.github.io/" className="text-blue-400 hover:text-blue-400 transition-colors">Jiayi Zhang</a><sup className="text-orange-500">1</sup>,{' '}
556+
<a href="https://simonucl.github.io/" className="text-blue-400 hover:text-blue-400 transition-colors">Simon Yu</a><sup className="text-orange-500">1</sup>,{' '}
557+
<a href="https://www.linkedin.com/in/derekch" className="text-blue-400 hover:text-blue-400 transition-colors">Derek Chong</a><sup className="text-orange-500">2</sup>,{' '}
558+
<a href="https://anthonysicilia.tech/" className="text-blue-400 hover:text-blue-400 transition-colors">Anthony Sicilia</a><sup className="text-orange-500">3</sup>
562559
</div>
563560
<div>
564-
<a href="https://www.james-zou.com/" className="text-blue-400 hover:text-blue-400 transition-colors">James Zou</a><sup className="text-orange-500">1</sup>,{' '}
565-
<a href="https://cs.stanford.edu/people/jure/" className="text-blue-400 hover:text-blue-400 transition-colors">Jure Leskovec</a><sup className="text-orange-500">1</sup>,{' '}
566-
<a href="https://www.microsoft.com/en-us/research/people/jfgao/" className="text-blue-400 hover:text-blue-400 transition-colors">Jianfeng Gao</a><sup className="text-orange-500">2</sup>
561+
<a href="https://tomz.people.stanford.edu/" className="text-blue-400 hover:text-blue-400 transition-colors">Michael R. Tomz</a><sup className="text-orange-500">2</sup>,{' '}
562+
<a href="https://nlp.stanford.edu/~manning/" className="text-blue-400 hover:text-blue-400 transition-colors">Christopher D. Manning</a><sup className="text-orange-500">2</sup>,{' '}
563+
<a href="https://wyshi.github.io/" className="text-blue-400 hover:text-blue-400 transition-colors">Weiyan Shi</a><sup className="text-orange-500">1</sup>
567564
</div>
568565
</div>
569566

570567
<div className="text-xl text-black-500 mb-4">
571-
<sup className="text-orange-500">1</sup>Stanford University, <sup className="text-orange-500">2</sup>Microsoft, <sup className="text-orange-500">3</sup>Georgia Tech
568+
<sup className="text-orange-500">1</sup>Northeastern University, <sup className="text-orange-500">2</sup>Stanford University, <sup className="text-orange-500">3</sup>West Virginia University
572569
</div>
573570

574-
<div className="text-xl font-bold">
571+
{/* <div className="text-xl font-bold">
575572
<span className="text-orange-600">ICML 2025 Outstanding Paper</span>
576-
</div>
573+
</div> */}
577574
</div>
578575
</div>
579576
</section>
@@ -746,231 +743,7 @@ export default function HomePage() {
746743
</div>
747744
</section>
748745

749-
{/* Blog Section */}
750-
<section id="blog" className="py-16 bg-white">
751-
<div className="max-w-7xl mx-auto px-4 sm:px-6 lg:px-8">
752-
{/* Blog Header */}
753-
<div className="text-center mb-12">
754-
<h2 className="text-3xl font-bold text-gray-700 sm:text-4xl mb-3">
755-
From the Blog
756-
</h2>
757-
<p className="text-lg text-gray-500">
758-
Insights and updates from our research team
759-
</p>
760-
</div>
761-
762-
{/* Blog Post */}
763-
<article className="max-w-none">
764-
{/* Blog Post Header */}
765-
<div className="mb-8">
766-
<h1 className="text-3xl font-bold text-gray-700 sm:text-4xl mb-4 font-serif">
767-
Building the Future of Collaborative AI: Our Journey with CollabLLM
768-
</h1>
769-
<div className="flex items-center gap-6 text-gray-500">
770-
<div className="flex items-center gap-2">
771-
<Calendar size={16} />
772-
<span className="text-sm">June 12, 2025</span>
773-
</div>
774-
<div className="flex items-center gap-2">
775-
<Clock size={16} />
776-
<span className="text-sm">6 min read</span>
777-
</div>
778-
<div className="flex items-center gap-2">
779-
<User size={16} />
780-
<span className="text-sm">Shirley Wu, Michel Galley</span>
781-
</div>
782-
<ViewerSystem />
783-
784-
</div>
785-
786-
{/* Viewer System Component */}
787-
</div>
788-
789-
{/* Blog Content */}
790-
<div className="">
791-
<div className="prose prose-lg max-w-none font-serif leading-relaxed text-gray-500">
792-
<p className="text-xl text-gray-500 mb-6 italic">
793-
"The future of AI isn't just about making models smarter—it's about making them truly collaborative partners in human endeavors."
794-
</p>
795-
796-
<h2 className="text-2xl font-bold text-gray-700 mt-10 mb-4">
797-
The Challenge We Set Out to Solve
798-
</h2>
799-
800-
801-
<p className="mb-4 text-lg">
802-
When we first started working with large language models, we noticed something puzzling. We saw that these models were incredibly capable. However, we all experienced a particular kind of frustration, illustrated perfectly by this example from{' '}
803-
<a
804-
href="https://www.platformer.news/openai-operator-ai-agent-hands-on/"
805-
className="text-blue-800 hover:text-blue-800 underline"
806-
target="_blank"
807-
rel="noopener noreferrer"
808-
>
809-
Casey Newton
810-
</a>:
811-
</p>
812-
813-
814-
{/* Quote Box */}
815-
<blockquote className="border-l-4 border-gray-500 pl-6 py-4 mb-6 bg-gray-50 rounded-r-lg italic">
816-
<p className="text-lg text-gray-500 mb-3">
817-
My most frustrating experience with Operator was my first one: trying to order groceries. </p>
818-
<p className="text-lg text-gray-500 mb-3">
819-
<em>“Help me buy groceries on Instacart,”</em> I said, expecting it to ask me some basic questions:
820-
Where do I live? What store do I usually buy groceries from? What kinds of groceries do I want?
821-
</p>
822-
<p className="text-lg text-gray-500">
823-
It didn’t ask me any of that. Instead, Operator opened Instacart in a browser tab and began
824-
searching for milk in grocery stores located in Des Moines, Iowa.
825-
</p>
826-
</blockquote>
827-
828-
<p className="mb-4 text-lg text-gray-500">
829-
It’s genuinely surprising: one of the <strong>smartest LLMs</strong>—capable of solving graduate-level math problems—
830-
can still fail at basic human communication.
831-
</p>
832-
833-
<p className="mb-4 text-lg text-gray-500">
834-
<strong>This is not a minor flaw.</strong> LLMs that lack effective communication skills pose challenges across key dimensions:
835-
<span className="italic"> performance, safety, and efficiency</span>.
836-
Ask yourself:
837-
</p>
838-
839-
<ul className="list-disc list-inside mb-4 text-lg text-gray-500">
840-
<li>How can we get satisfactory results if LLMs make assumptions about our preferences?</li>
841-
<li>How reliable is it to consult AI on healthcare, legal, or financial decisions?</li>
842-
<li>How much time and patience are we expected to waste just trying to get our point across?</li>
843-
</ul>
844-
845-
<p className="mb-6 text-lg text-gray-500">
846-
The problem runs deeper. We typically evaluate LLMs in <strong>simple, sanitized test environments</strong>—single-turn prompts with clear, unambiguous instructions. But is that how real communication works?
847-
</p>
848-
849-
<p className="mb-6 text-lg text-gray-500">
850-
In real life, solving meaningful problems requires <strong>collaboration, iteration, and contextual awareness</strong>. Moreover, if humans and LLMs are going to tackle groundbreaking problems together, AI systems can't just passively respond to human requests—they need to actively <strong>stimulate human creativity</strong> and guide the collaborative process.
851-
</p>
852-
<p className="mb-6 text-lg text-gray-500">
853-
That’s why we’re introducing <span className="font-semibold text-black-500">CollabLLM</span>:
854-
a framework designed to unlock the potential of human-AI collaboration by enabling LLMs to act
855-
as <em>active, collaborative partners</em> rather than passive responders.
856-
</p>
857-
<h2 className="text-2xl font-bold text-gray-700 mt-10 mb-4">
858-
Our Breakthrough Approach
859-
</h2>
860-
861-
<p className="mb-4 text-lg">
862-
The core idea behind CollabLLM is simple: in a multi-turn interaction, what matters most is not how good a single response is—but how it affects the rest of the conversation.
863-
</p>
864-
865-
<p className="mb-4 text-lg">
866-
Take this scene from{' '}
867-
<a
868-
href="https://www.youtube.com/watch?v=7fbaP2YjJ40&t=245s"
869-
className="text-blue-800 hover:text-blue-800 underline"
870-
target="_blank"
871-
rel="noopener noreferrer"
872-
>
873-
<em>Friends</em> (4:05 in the YouTube clip)
874-
</a>{' '}
875-
<a
876-
href="https://www.bilibili.com/video/BV1vJ4m1j7zF/?spm_id_from=333.337.search-card.all.click"
877-
className="text-blue-800 hover:text-blue-800 underline"
878-
target="_blank"
879-
rel="noopener noreferrer"
880-
>
881-
/ (1:42 in the Bilibili clip)
882-
</a>
883-
: Rachel and Joey are talking about dating strategies. Rachel asks a seemingly simple question:
884-
<em>"So, where'd you grow up?"</em> Joey immediately mocks her—<em>"That's your move?"</em>—implying the question is naive.
885-
But a few turns later, his tone changes. He's genuinely impressed: <em>"Wow!"</em>—because the question led him to open up and connect. The key insight? <strong>What matters isn't how a response is judged in the moment, but how it shapes the entire conversation.</strong>
886-
</p>
887-
888-
<p className="mb-4 text-lg">
889-
Now imagine a model that chooses to ask a clarifying question instead of giving a direct answer. Standard reinforcement learning from human feedback (RLHF) might penalize that—it didn't provide information right away. But if the question helps uncover useful context that improves the conversation downstream, shouldn't it be rewarded?
890-
</p>
891-
892-
{/* Key Concept Highlight */}
893-
<div className="bg-blue-50 border-l-4 border-blue-400 p-6 my-6 rounded-r-lg">
894-
<p className="text-lg text-gray-800 mb-0">
895-
That's exactly what CollabLLM does. We define a new reward function that measures the <strong>causal effect</strong> of a model's response on the future trajectory of a conversation. We call this the <strong>Multiturn-aware Reward (MR)</strong>. It evaluates a single model action based on its longer-term impact—not just immediate helpfulness.
896-
</p>
897-
</div>
898-
899-
<p className="mb-4 text-lg">
900-
<strong>Quiz:</strong> is asking a question always better than giving an answer? The answer is—not necessarily. It depends entirely on the objective.
901-
In most real-world situations, repeatedly asking questions without making progress is inefficient, because the ultimate goal remains unmet.
902-
But take the game <em>20 Questions</em> as an example—where the objective is to guess what someone is thinking by asking a limited number of yes/no questions.
903-
In that case, asking questions is essential, and giving an answer too early would break the format and defeat the purpose of the game.
904-
This is where Multiturn-aware Reward (MR) comes in: it allows the model to adapt its behavior based on the context, learning <em>when</em> to ask and <em>when</em> to answer—depending entirely on what the task requires.
905-
</p>
906-
907-
<p className="mb-4 text-lg">
908-
Now, going back to the <em>Friends</em> example with Rachel and Joey—how do we measure the value of Rachel's question over the course of a conversation? We need two components:<br />
909-
1) A <strong>user simulator</strong> to generate realistic follow-up responses (e.g., what Joey might say next), and<br />
910-
2) An <strong>evaluator</strong> to judge whether the interaction is successful—such as whether Joey becomes more romantically engaged.
911-
</p>
912-
913-
<p className="mb-4 text-lg">
914-
Fortunately, both parts are quite feasible. First, the model you're training—let's call it "Rachel"—serves as the policy model generating responses. To simulate realistic dialogue, we prompt another model to act as "Joey," a proxy for the user. While inspired by our earlier example, "Joey" can represent <strong>any user simulator</strong>: a shopper trying to order groceries, a student asking math questions, or a writer seeking feedback. Second, we define task-specific metrics to evaluate success. In the dating example, it might be emotional engagement; in writing, it could be clarity or persuasiveness; in a question-answering task, it might be factual correctness. These evaluation criteria can even be combined—it's entirely up to your application!
915-
</p>
916-
917-
<p className="mb-4 text-lg">
918-
With Multiturn-aware Reward in place, the goal becomes straightforward: train the policy model to maximize this reward. In doing so, the model learns to drive the conversation effectively toward the desired outcome—whether that's solving a task, clarifying a request, or building rapport.
919-
</p>
920-
921-
{/* Closing Statement */}
922-
<div className="bg-green-50 border-l-4 border-green-600 p-4 my-6 rounded-r-lg">
923-
<p className="text-lg text-gray-800 mb-0">
924-
After all, you don't need massive changes to build a collaborative model. Just a new way to define the objective—and a longer lens for measuring what matters in a conversation.
925-
</p>
926-
</div>
927-
928-
929-
<h2 className="text-2xl font-bold text-gray-700 mt-10 mb-4">
930-
Real-World Impact
931-
</h2>
932-
933-
<p className="mb-4 text-gray-500 text-lg">
934-
The applications of collaborative AI are vast and exciting. From working on document editing to solving complex scientific problems, CollabLLM opens up new possibilities for human-AI collaboration.
935-
</p>
936-
937-
<p className="mb-6 text-gray-500 text-lg">
938-
We've seen remarkable results in our initial testing, with collaborative LLMs outperforming non-collaboratively trained LLMs across various benchmarks. More importantly, users report a more efficient, engaging, and reliable interaction experience when working with the collaborative LLMs.
939-
</p>
940746

941-
<h2 className="text-2xl font-bold text-gray-700 mt-10 mb-4">
942-
What's Next?
943-
</h2>
944-
945-
<p className="mb-4 text-lg">
946-
We're continuously refining our approach, exploring new collaboration patterns. Our goal is to democratize collaborative AI and enable anyone to build more effective AI-powered solutions.
947-
</p>
948-
949-
<p className="text-lg font-medium text-gray-500 mb-8">
950-
Join us in building the future of collaborative AI. Check out our code, contribute to the project, and help us shape the next generation of AI systems that truly understand the power of working together.
951-
</p>
952-
</div>
953-
954-
{/* Call to Action */}
955-
<div className="mt-8 pt-6 border-t border-gray-200">
956-
<div className="flex flex-col sm:flex-row gap-4 justify-center">
957-
<a href="https://github.com/Wuyxin/collabllm.git" target="_blank">
958-
<Button size="lg" className="rounded-full">
959-
Explore the Code
960-
<ArrowRight className="ml-2 h-5 w-5" />
961-
</Button>
962-
</a>
963-
<a href="https://arxiv.org/pdf/2502.00640" target="_blank">
964-
<Button size="lg" variant="outline" className="rounded-full">
965-
Read Our Paper
966-
</Button>
967-
</a>
968-
</div>
969-
</div>
970-
</div>
971-
</article>
972-
</div>
973-
</section>
974747
</main>
975748
);
976749
}

next

Whitespace-only changes.

0 commit comments

Comments
 (0)