@@ -96,13 +96,14 @@ Every model is written from scratch to maximize performance and remove layers of
96
96
97
97
| Model | Model size | Author | Reference |
98
98
| ----| ----| ----| ----|
99
- | Llama 3, 3.1, 3.2 | 1B, 3B, 8B, 70B, 405B | Meta AI | [ Meta AI 2024] ( https://github.com/meta-llama/llama3 ) |
99
+ | Llama 3, 3.1, 3.2, 3.3 | 1B, 3B, 8B, 70B, 405B | Meta AI | [ Meta AI 2024] ( https://github.com/meta-llama/llama3 ) |
100
100
| Code Llama | 7B, 13B, 34B, 70B | Meta AI | [ Rozière et al. 2023] ( https://arxiv.org/abs/2308.12950 ) |
101
- | Mixtral MoE | 8x7B, 8x22B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/mixtral-of-experts/ ) |
102
- | Mistral | 7B, 123B | Mistral AI | [ Mistral AI 2023] ( https://mistral.ai/news/announcing-mistral-7b/ ) |
103
101
| CodeGemma | 7B | Google | [ Google Team, Google Deepmind] ( https://ai.google.dev/gemma/docs/codegemma ) |
104
102
| Gemma 2 | 2B, 9B, 27B | Google | [ Google Team, Google Deepmind] ( https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf ) |
105
- | Phi 3 & 3.5 | 3.8B | Microsoft | [ Abdin et al. 2024] ( https://arxiv.org/abs/2404.14219 ) |
103
+ | Phi 4 | 14B | Microsoft Research | [ Abdin et al. 2024] ( https://arxiv.org/abs/2412.08905 ) |
104
+ | Qwen2.5 | 0.5B, 1.5B, 3B, 7B, 14B, 32B, 72B | Alibaba Group | [ Qwen Team 2024] ( https://qwenlm.github.io/blog/qwen2.5/ ) |
105
+ | Qwen2.5 Coder | 0.5B, 1.5B, 3B, 7B, 14B, 32B | Alibaba Group | [ Hui, Binyuan et al. 2024] ( https://arxiv.org/abs/2409.12186 ) |
106
+ | R1 Distll Llama | 8B, 70B | DeepSeek AI | [ DeepSeek AI 2025] ( https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf ) |
106
107
| ... | ... | ... | ... |
107
108
108
109
<details >
0 commit comments