Skip to content

Commit 770244a

Browse files
Merge branch 'lm-sys:main' into main
2 parents eb7bd11 + 587d5cf commit 770244a

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+6313
-1963
lines changed

README.md

Lines changed: 33 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# FastChat
2-
| [**Demo**](https://chat.lmsys.org/) | [**Discord**](https://discord.gg/HSWAKCrnFx) | [**X**](https://x.com/lmsysorg) |
2+
| [**Demo**](https://lmarena.ai/) | [**Discord**](https://discord.gg/6GXcFg3TH8) | [**X**](https://x.com/lmsysorg) |
33

44
FastChat is an open platform for training, serving, and evaluating large language model based chatbots.
5-
- FastChat powers Chatbot Arena (https://chat.lmsys.org/), serving over 10 million chat requests for 70+ LLMs.
6-
- Chatbot Arena has collected over 500K human votes from side-by-side LLM battles to compile an online [LLM Elo leaderboard](https://leaderboard.lmsys.org).
5+
- FastChat powers Chatbot Arena ([lmarena.ai](https://lmarena.ai)), serving over 10 million chat requests for 70+ LLMs.
6+
- Chatbot Arena has collected over 1.5M human votes from side-by-side LLM battles to compile an online [LLM Elo leaderboard](https://lmarena.ai/?leaderboard).
77

88
FastChat's core features include:
99
- The training and evaluation code for state-of-the-art models (e.g., Vicuna, MT-Bench).
@@ -26,7 +26,7 @@ FastChat's core features include:
2626

2727
</details>
2828

29-
<a href="https://chat.lmsys.org"><img src="assets/demo_narrow.gif" width="70%"></a>
29+
<a href="https://lmarena.ai"><img src="assets/demo_narrow.gif" width="70%"></a>
3030

3131
## Contents
3232
- [Install](#install)
@@ -97,7 +97,7 @@ You can use the commands below to chat with them. They will automatically downlo
9797

9898
## Inference with Command Line Interface
9999

100-
<a href="https://chat.lmsys.org"><img src="assets/screenshot_cli.png" width="70%"></a>
100+
<a href="https://lmarena.ai"><img src="assets/screenshot_cli.png" width="70%"></a>
101101

102102
(Experimental Feature: You can specify `--style rich` to enable rich text output and better text streaming quality for some non-ASCII content. This may not work properly on certain terminals.)
103103

@@ -202,7 +202,7 @@ export FASTCHAT_USE_MODELSCOPE=True
202202

203203
## Serving with Web GUI
204204

205-
<a href="https://chat.lmsys.org"><img src="assets/screenshot_gui.png" width="70%"></a>
205+
<a href="https://lmarena.ai"><img src="assets/screenshot_gui.png" width="70%"></a>
206206

207207
To serve using the web UI, you need three main components: web servers that interface with users, model workers that host one or more models, and a controller to coordinate the webserver and model workers. You can learn more about the architecture [here](docs/server_arch.md).
208208

@@ -237,6 +237,33 @@ This is the user interface that users will interact with.
237237
By following these steps, you will be able to serve your models using the web UI. You can open your browser and chat with a model now.
238238
If the models do not show up, try to reboot the gradio web server.
239239

240+
## Launch Chatbot Arena (side-by-side battle UI)
241+
242+
Currently, Chatbot Arena is powered by FastChat. Here is how you can launch an instance of Chatbot Arena locally.
243+
244+
FastChat supports popular API-based models such as OpenAI, Anthropic, Gemini, Mistral and more. To add a custom API, please refer to the model support [doc](./docs/model_support.md). Below we take OpenAI models as an example.
245+
246+
Create a JSON configuration file `api_endpoint.json` with the api endpoints of the models you want to serve, for example:
247+
```
248+
{
249+
"gpt-4o-2024-05-13": {
250+
"model_name": "gpt-4o-2024-05-13",
251+
"api_base": "https://api.openai.com/v1",
252+
"api_type": "openai",
253+
"api_key": [Insert API Key],
254+
"anony_only": false
255+
}
256+
}
257+
```
258+
For Anthropic models, specify `"api_type": "anthropic_message"` with your Anthropic key. Similarly, for gemini model, specify `"api_type": "gemini"`. More details can be found in [api_provider.py](https://github.com/lm-sys/FastChat/blob/main/fastchat/serve/api_provider.py).
259+
260+
To serve your own model using local gpus, follow the instructions in [Serving with Web GUI](#serving-with-web-gui).
261+
262+
Now you're ready to launch the server:
263+
```
264+
python3 -m fastchat.serve.gradio_web_server_multi --register-api-endpoint-file api_endpoint.json
265+
```
266+
240267
#### (Optional): Advanced Features, Scalability, Third Party UI
241268
- You can register multiple model workers to a single controller, which can be used for serving a single model with higher throughput or serving multiple models at the same time. When doing so, please allocate different GPUs and ports for different model workers.
242269
```

docs/arena.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# Chatbot Arena
2-
Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://chat.lmsys.org.
2+
Chatbot Arena is an LLM benchmark platform featuring anonymous, randomized battles, available at https://lmarena.ai.
33
We invite the entire community to join this benchmarking effort by contributing your votes and models.
44

55
## How to add a new model

docs/dashinfer_integration.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# dash-infer Integration
2+
[DashInfer](https://github.com/modelscope/dash-infer) is a high-performance inference engine specifically optimized for CPU environments, delivering exceptional performance boosts for LLM inference tasks. It supports acceleration for a variety of models including Llama, Qwen, and ChatGLM, making it a versatile choice as a performant worker in FastChat. Notably, DashInfer exhibits significant performance enhancements on both Intel x64 and ARMv9 processors, catering to a wide spectrum of hardware platforms. Its efficient design and optimization techniques ensure rapid and accurate inference capabilities, making it an ideal solution for deploying large language models in resource-constrained environments or scenarios where CPU utilization is preferred over GPU acceleration.
3+
4+
## Instructions
5+
1. Install dash-infer.
6+
```
7+
pip install dashinfer
8+
```
9+
10+
2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the dash-infer worker (`fastchat.serve.dashinfer_worker`). All other commands such as controller, gradio web server, and OpenAI API server are kept the same.
11+
```
12+
python3 -m fastchat.serve.dashinfer_worker --model-path qwen/Qwen-7B-Chat --revision=master /path/to/dashinfer-model-generation-config.json
13+
```
14+
Here is an example:
15+
```
16+
python3 -m fastchat.serve.dashinfer_worker --model-path qwen/Qwen-7B-Chat --revision=master dash-infer/examples/python/model_config/config_qwen_v10_7b.json
17+
```
18+
19+
If you use an already downloaded model, try to replace model-path with a local one and choose a conversation template via --conv-template option
20+
'''
21+
python3 -m fastchat.serve.dashinfer_worker --model-path ~/.cache/modelscope/hub/qwen/Qwen-7B-Chat --conv-template qwen-7b-chat /path/to/dashinfer-model-generation-config.json
22+
'''
23+
All avaliable conversation chat templates are listed at [fastchat/conversation.py](../fastchat/conversation.py)

docs/dataset_release.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,6 @@
22
We release the following datasets based on our projects and websites.
33

44
- [LMSYS-Chat-1M: A Large-Scale Real-World LLM Conversation Dataset](https://huggingface.co/datasets/lmsys/lmsys-chat-1m)
5+
- [LMSYS-Human-Preference-55k](lmsys/lmsys-arena-human-preference-55k)
56
- [Chatbot Arena Conversation Dataset](https://huggingface.co/datasets/lmsys/chatbot_arena_conversations)
67
- [MT-bench Human Annotation Dataset](https://huggingface.co/datasets/lmsys/mt_bench_human_judgments)

fastchat/constants.py

Lines changed: 20 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,21 @@
77

88
REPO_PATH = os.path.dirname(os.path.dirname(__file__))
99

10+
# Survey Link URL (to be removed) #00729c
11+
# SURVEY_LINK = """<div style='text-align: left; margin: 20px 0;'>
12+
# <div style='display: inline-block; border: 2px solid #00729c; padding: 20px; padding-bottom: 10px; padding-top: 10px; border-radius: 5px;'>
13+
# <span style='color: #00729c; font-weight: bold;'>New Launch! Copilot Arena: <a href='https://marketplace.visualstudio.com/items?itemName=copilot-arena.copilot-arena' style='color: #00729c; text-decoration: underline;'>VS Code Extension</a> to compare Top LLMs</span>
14+
# </div>
15+
# </div>"""
16+
# SURVEY_LINK = ""
17+
18+
COLOR = "#F11414"
19+
SURVEY_LINK = f"""<div style='text-align: center; margin: 20px 0;'>
20+
<div style='display: block; width: 100%; border: 2px solid {COLOR}; padding: 20px; padding-bottom: 10px; padding-top: 10px; border-radius: 5px; background-color: #FE9393'>
21+
<span style='font-weight: bold; font-size: 20px; color: #050505; '>🔔 New Arena UI at <a href='https://lmarena.ai/leaderboard?utm_campaign=hf_banner' target="_blank" rel="noopener noreferrer" style="color: #233F9C; text-decoration: underline;">lmarena.ai/leaderboard</a>! Check it out and give feedback!</a></span>
22+
</div>
23+
</div>"""
24+
1025
##### For the gradio web server
1126
SERVER_ERROR_MSG = (
1227
"**NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.**"
@@ -20,12 +35,14 @@
2035
MODERATION_MSG = "$MODERATION$ YOUR INPUT VIOLATES OUR CONTENT MODERATION GUIDELINES."
2136
CONVERSATION_LIMIT_MSG = "YOU HAVE REACHED THE CONVERSATION LENGTH LIMIT. PLEASE CLEAR HISTORY AND START A NEW CONVERSATION."
2237
INACTIVE_MSG = "THIS SESSION HAS BEEN INACTIVE FOR TOO LONG. PLEASE REFRESH THIS PAGE."
23-
SLOW_MODEL_MSG = "⚠️ Both models will show the responses all at once. Please stay patient as it may take over 30 seconds."
24-
RATE_LIMIT_MSG = "**RATE LIMIT OF THIS MODEL IS REACHED. PLEASE COME BACK LATER OR USE BATTLE MODE (the 1st tab).**"
38+
SLOW_MODEL_MSG = (
39+
"⚠️ Models are thinking. Please stay patient as it may take over a minute."
40+
)
41+
RATE_LIMIT_MSG = "**RATE LIMIT OF THIS MODEL IS REACHED. PLEASE COME BACK LATER OR USE <span style='color: red; font-weight: bold;'>[BATTLE MODE](https://lmarena.ai)</span> (the 1st tab).**"
2542
# Maximum input length
2643
INPUT_CHAR_LEN_LIMIT = int(os.getenv("FASTCHAT_INPUT_CHAR_LEN_LIMIT", 12000))
2744
BLIND_MODE_INPUT_CHAR_LEN_LIMIT = int(
28-
os.getenv("FASTCHAT_BLIND_MODE_INPUT_CHAR_LEN_LIMIT", 24000)
45+
os.getenv("FASTCHAT_BLIND_MODE_INPUT_CHAR_LEN_LIMIT", 30000)
2946
)
3047
# Maximum conversation turns
3148
CONVERSATION_TURN_LIMIT = 50

0 commit comments

Comments
 (0)