update

Evan Frick · Evan Frick · commit d8ef4524d96d · 2024-11-24T16:03:46.000-08:00
diff --git a/index.html b/index.html
@@ -45,7 +45,7 @@ <h1 class="header">About</h1>
             <br>
             <p>My research focuses on Reinforcement Learning with Human Feedback (RLHF) for fine-tuning LLMs. Currently, much of my efforts revolve around reward model training and benchmarking.</p>
             <br>
-            <p>I am also a Research Engineer at <a href="https://nexusflow.ai/">Nexusflow</a>, where I work on training LLMs like <a href="https://huggingface.co/Nexusflow/Athene-70B">Athene-70B</a>. I also work with <a href="https://lmsys.org/">LMSYS</a>, mainly on analyzing <a href="https://lmarena.ai/">Chatbot Arena</a> and building LLM benchmarks.</p>
+            <p>I am also a Research Engineer at <a href="https://nexusflow.ai/">Nexusflow</a>, where I work on training LLMs like <a href="https://huggingface.co/Nexusflow/Athene-70B">Athene-70B</a>. I also work with <a href="https://blog.lmarena.ai/about/">Chatbot Arena</a>, mainly on modeling human preferences and building LLM/RM benchmarks.</p>
           <!-- </div> -->
         </div>
       </div>
@@ -54,10 +54,13 @@ <h1 class="header">About</h1>
     <section id="publications">
       <h1 class="header">Select Publications</h1>
       <div class="hero">
-        <b><a href="https://openreview.net/pdf?id=GqDntYTTbk">Starling-7B: Improving Helpfulness and Harmlessness with RLAIF. [COLM]</a></b>
+        <b><a href="https://arxiv.org/abs/2410.14872">How to Evaluate Reward Models for RLHF [In Review]</a></b>
+        <p>Evan Frick, Tianle Li, Connor Chen, Wei-Lin Chiang, Anastasios N. Angelopoulos, Jiantao Jiao, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. (2024).</p>
+        <br>
+        <b><a href="https://openreview.net/pdf?id=GqDntYTTbk">Starling-7B: Improving Helpfulness and Harmlessness with RLAIF. [COLM Spotlight]</a></b>
         <p>Banghua Zhu, Evan Frick, Tianhao Wu, Hanlin Zhu, Karthik Ganesan, Wei-Lin Chiang, Jian Zhang, and Jiantao Jiao. (2024).</p>
         <br>
-        <b><a href="https://arxiv.org/abs/2406.11939">From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline. [Neurips: In Review]</a></b>
+        <b><a href="https://arxiv.org/abs/2406.11939">From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline. [In Review]</a></b>
         <p>Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. (2024).</p>
         <br>
         <b><a href="https://nexusflow.ai/blogs/athene"> Athene-70B: Redefining the Boundaries of Post-Training for Open Models.</a></b>