Open source LLM performance evaluation

I will list the test results of various open-source models here. You can refer to these data to select models and configure devices. Of course, the evaluation of LLM is quite subjective. I also suggest you make evaluations more suitable for your needs based on your own requirements. Your opinions and suggestions on the evaluation methods and results are also welcome.

I will give the overall score in the [first comment](https://github.com/fiatrete/OpenDAN-Personal-AI-OS/issues/90#issuecomment-1788616887), and provide performance statistics in the [second comment](https://github.com/fiatrete/OpenDAN-Personal-AI-OS/issues/90#issuecomment-1788621499).

At present, I plan to complete the evaluation of several mainstream models first, and may also pay attention to some related fine-tuned models in the middle.

1. [Alpaca](https://crfm.stanford.edu/2023/03/13/alpaca.html)
2. [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/)
3. [Mistral](https://arxiv.org/abs/2310.06825)
4. [Bloom](https://huggingface.co/bigscience/bloom)
5. [Aquila](https://www.baai.ac.cn/)

There are several tasks that need to be handled as follows:
- [x] [Test cases](https://github.com/fiatrete/OpenDAN-Personal-AI-OS/issues/90#issuecomment-1788652752)
- [x] ChatGPT-4(as a reference)
    - [x] Execute test cases
- [x] ChatGPT-3.5(as a reference)
    - [x] Execute test cases
- [x] Llama 70B Chat
    - [x] Execute test cases
- [x] Llama 13B Chat
    - [x] Execute test cases
- [ ] Alpaca
    - [x] Download model
    - [x] Execute test cases
- [ ] Vicuna
    - [x] Download model
    - [x] Execute test cases
- [ ] Mistral
    - [x] Download model
    - [x] Execute test cases
- [ ] Falcon
    - [x] Download model
    - [x] Execute test cases
- [ ] Aquila
    - [x] Download model
    - [x] Execute test cases

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open source LLM performance evaluation #90

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Open source LLM performance evaluation #90

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions