-
Notifications
You must be signed in to change notification settings - Fork 205
Open
Description
I will list the test results of various open-source models here. You can refer to these data to select models and configure devices. Of course, the evaluation of LLM is quite subjective. I also suggest you make evaluations more suitable for your needs based on your own requirements. Your opinions and suggestions on the evaluation methods and results are also welcome.
I will give the overall score in the first comment, and provide performance statistics in the second comment.
At present, I plan to complete the evaluation of several mainstream models first, and may also pay attention to some related fine-tuned models in the middle.
There are several tasks that need to be handled as follows:
- Test cases
- ChatGPT-4(as a reference)
- Execute test cases
- ChatGPT-3.5(as a reference)
- Execute test cases
- Llama 70B Chat
- Execute test cases
- Llama 13B Chat
- Execute test cases
- Alpaca
- Download model
- Execute test cases
- Vicuna
- Download model
- Execute test cases
- Mistral
- Download model
- Execute test cases
- Falcon
- Download model
- Execute test cases
- Aquila
- Download model
- Execute test cases
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels