You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: about.html
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
<header>
11
11
<nav>
12
12
[<ahref="index.html">Home</a>]
13
-
[<ahref="#">Software Arena</a>]
13
+
[<ahref="#">SWE Arena</a>]
14
14
</nav>
15
15
</header>
16
16
@@ -71,7 +71,7 @@ <h2>Sponsors</h2>
71
71
Computer Intelligence Project is supported by donations from the following institutions: E2B, Hugging Face, IBM, and CSIRO's Data61.
72
72
</p>
73
73
<p>
74
-
We also thank the following companies for providing API credits to serve their models on Software Arena: Alibaba, and IBM.
74
+
We also thank the following companies for providing API credits to serve their models on SWE Arena: Alibaba, and IBM.
75
75
</p>
76
76
<p>
77
77
We welcome diverse forms of donations and sponsorships, including but not limited to cash, computing devices (e.g., GPUs), and cloud credits. Please contact us at <ahref="mailto:[email protected]"><u>[email protected]</u></a> to learn more about sponsorships and benefits.
Software Arena extends Chatbot Arena with powerful code execution capabilities, enabling evaluation of LLM-generated programs across a wide range of outputs - from simple computations to complex visual interfaces.
21
+
SWE Arena extends Chatbot Arena with powerful code execution capabilities, enabling evaluation of LLM-generated programs across a wide range of outputs - from simple computations to complex visual interfaces.
22
22
</p>
23
23
24
-
<h2>What is Software Arena?</h2>
24
+
<h2>What is SWE Arena?</h2>
25
25
<p>
26
-
Software Arena introduces a plug-and-play code execution environment for Chatbot Arena. It enables direct evaluation of LLM capabilities in:
26
+
SWE Arena introduces a plug-and-play code execution environment for Chatbot Arena. It enables direct evaluation of LLM capabilities in:
27
27
</p>
28
28
<ul>
29
29
<li>General-purpose code execution across multiple languages</li>
30
30
<li>Output visualization ranging from text, images, to interactive UIs</li>
31
31
</ul>
32
32
33
-
<h2>Why Software Arena?</h2>
33
+
<h2>Why SWE Arena?</h2>
34
34
<p>
35
-
Software Arena is designed to address the limitations of Chatbot Arena, particularly in terms of precise code evaluation.
35
+
SWE Arena is designed to address the limitations of Chatbot Arena, particularly in terms of precise code evaluation.
36
36
Human judgement on code generation is not always reliable [<ahref="https://arxiv.org/abs/2402.11296"><u>1</u></a>, <ahref="https://arxiv.org/abs/2410.03837"><u>2</u></a>], and generally requires non-trivial knowledge of the language and its libraries.
37
37
We consider this a significant limitation for the development of advanced AI systems.
38
38
</p>
@@ -42,10 +42,10 @@ <h2>Why Software Arena?</h2>
42
42
<ahref="https://support.anthropic.com/en/articles/9487310-what-are-artifacts-and-how-do-i-use-them"><u>Claude Artifacts</u></a> by Anthropic is one of the first features in this space to let users interact with LLM-generated frontend applications.
43
43
<ahref="https://v0.dev/"><u>v0</u></a> by Vercel also allows users to ship LLM-generated frontend applications with frontend frameworks.
44
44
Based on this, <ahref="https://chatbotarena.ai/webdev"><u>WebDev Arena</u></a> by Chatbot Arena and <ahref="https://www.llmcodearena.com/"><u>Code Arena</u></a> by Together AI focus on evaluating LLM-generated frontend applications.
45
-
<strong>Software Arena aims to extend this capability to a wider range of outputs, not just frontend applications, but also programs that can be run on backend servers and data analysis.</strong>
45
+
<strong>SWE Arena aims to extend this capability to a wider range of outputs, not just frontend applications, but also programs that can be run on backend servers and data analysis.</strong>
46
46
</p>
47
47
<h2>Supported Outputs</h2>
48
-
<p>Software Arena can visualize various types of code execution outputs:</p>
48
+
<p>SWE Arena can visualize various types of code execution outputs:</p>
49
49
<ul>
50
50
<li>Documents (Markdown or Plain Text)</li>
51
51
<li>Websites (single HTML webpage)</li>
@@ -61,7 +61,7 @@ <h2>Supported Outputs</h2>
61
61
62
62
<h2>Technical Implementation</h2>
63
63
<p>
64
-
Software Arena builds upon FastChat, the foundation of Chatbot Arena, providing seamless code execution capabilities. The implementation focuses on:
64
+
SWE Arena builds upon FastChat, the foundation of Chatbot Arena, providing seamless code execution capabilities. The implementation focuses on:
65
65
</p>
66
66
<ul>
67
67
<li><strong>Code Execution:</strong> Secure, sandboxed environment using <ahref="https://e2b.dev/"><u>E2B</u></a> for executing code in supported language (Python, JavaScript, etc.).</li>
Software Arena aims to deliver several key outcomes:
76
+
SWE Arena aims to deliver several key outcomes:
77
77
</p>
78
78
<ul>
79
79
<li><strong>Leaderboard:</strong> A dynamic Elo rating system tracking LLM performance in execution-based code generation, providing transparent comparisons across different models.</li>
@@ -83,39 +83,39 @@ <h2>Expected Outcomes</h2>
83
83
84
84
<h2>Future Plans</h2>
85
85
<p>
86
-
Software Arena is currently in the early stage of development.
86
+
SWE Arena is currently in the early stage of development.
87
87
We plan to continuously add more features towards the goal of Computer Intelligence.
88
88
</p>
89
89
<p>
90
-
Meanwhile, we are actively working with <ahref="https://lmarena.ai/"><u>Chatbot Arena</u></a> to integrate Software Arena into their platform.
90
+
Meanwhile, we are actively working with <ahref="https://lmarena.ai/"><u>Chatbot Arena</u></a> to integrate SWE Arena into their platform.
91
91
</p>
92
92
93
93
<h2>Frequently Asked Questions</h2>
94
94
<divclass="faq-section">
95
95
<divclass="faq-item">
96
-
<divclass="faq-question">Why is the code execution process of Software Arena a bit slow?</div>
96
+
<divclass="faq-question">Why is the code execution process of SWE Arena a bit slow?</div>
97
97
<divclass="faq-answer">
98
-
Before code execution, Software Arena parses the code and installs various packages to ensure the code can be executed.
98
+
Before code execution, SWE Arena parses the code and installs various packages to ensure the code can be executed.
99
99
This is why the code execution process is a bit slow.
100
100
</div>
101
101
</div>
102
102
<divclass="faq-item">
103
-
<divclass="faq-question">What can not Software Arena do?</div>
103
+
<divclass="faq-question">What can not SWE Arena do?</div>
104
104
<divclass="faq-answer">
105
-
Currently, Software Arena does not support programming languages other than JavaScript, TypeScript, HTML, and Python.
106
-
In addition, Software Arena can not execute code that use desktop-level UIs (e.g., Tkinter, PyQt, etc.) or take user inputs from the keyboard.
105
+
Currently, SWE Arena does not support programming languages other than JavaScript, TypeScript, HTML, and Python.
106
+
In addition, SWE Arena can not execute code that use desktop-level UIs (e.g., Tkinter, PyQt, etc.) or take user inputs from the keyboard.
107
107
</div>
108
108
</div>
109
109
<divclass="faq-item">
110
-
<divclass="faq-question">How do I know if Software Arena will use my personal identifiable information (PII)?</div>
110
+
<divclass="faq-question">How do I know if SWE Arena will use my personal identifiable information (PII)?</div>
111
111
<divclass="faq-answer">
112
-
While Software Arena collects the user input, we will redact the PII (e.g., API keys, etc.) by using <ahref="https://huggingface.co/bigcode/starpii"><u>StarPII</u></a>, an NER model that trained on a large-scale code dataset that can identify and mask the PII.
112
+
While SWE Arena collects the user input, we will redact the PII (e.g., API keys, etc.) by using <ahref="https://huggingface.co/bigcode/starpii"><u>StarPII</u></a>, an NER model that trained on a large-scale code dataset that can identify and mask the PII.
113
113
</div>
114
114
</div>
115
115
<divclass="faq-item">
116
116
<divclass="faq-question">Can I contribute to the project?</div>
117
117
<divclass="faq-answer">
118
-
Yes! Software Arena is an open-source project, and we welcome contributions. You can find our repository on GitHub and join our community through email. We appreciate help in various areas including development, testing, and documentation.
118
+
Yes! SWE Arena is an open-source project, and we welcome contributions. You can find our repository on GitHub and join our community through email. We appreciate help in various areas including development, testing, and documentation.
0 commit comments