You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/model-lineup.mdx
+28-25Lines changed: 28 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,45 +7,48 @@ The table below shows the models that are currently available in Tinker. We plan
7
7
- In general, use MoE models, which are more cost effective than the dense models.
8
8
- Use Base models only if you're doing research or are running the full post-training pipeline yourself
9
9
- If you want to create a model that is good at a specific task or domain, use an existing post-trained model, and fine-tune it on your own data or environment.
10
-
- If you care about latency, use one of the Instruction models, which will start outputting tokens without a chain-of-thought.
11
-
- If you care about intelligence and robustness, use one of the Hybrid or Reasoning models, which can use long chain-of-thought.
10
+
- If you care about latency, use one of the Instruction models, which will start outputting tokens without a chain-of-thought.
11
+
- If you care about intelligence and robustness, use one of the Hybrid or Reasoning models, which can use long chain-of-thought.
12
12
13
13
## Full Listing
14
14
15
-
| Model Name | Training Type | Architecture | Size |
|[openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b)| Reasoning | MoE | Medium |
28
+
|[openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b)| Reasoning | MoE | Small |
29
+
|[deepseek-ai/DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1)| Hybrid | MoE | Large |
30
+
|[deepseek-ai/DeepSeek-V3.1-Base](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base)| Base | MoE | Large |
31
+
|[meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B)| Base | Dense | Large |
32
+
|[meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct)| Instruction | Dense | Large |
33
+
|[meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B)| Base | Dense | Small |
34
+
|[meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct)| Instruction | Dense | Small |
35
+
|[meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B)| Base | Dense | Compact |
36
+
|[meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B)| Base | Dense | Compact |
37
+
|[moonshotai/Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking)| Reasoning | MoE | Large |
38
+
|[moonshotai/Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5)| Reasoning + Vision | MoE | Large |
38
39
39
40
## Legend
40
41
41
42
### Training Types
43
+
42
44
-**Base**: Foundation models trained on raw text data, suitable for post-training research and custom fine-tuning.
43
45
-**Instruction**: Models fine-tuned for following instructions and chat, optimized for fast inference.
44
46
-**Reasoning**: Models that always use chain-of-thought reasoning before their "visible" output that responds to the prompt.
45
47
-**Hybrid**: Models that can operate in both thinking and non-thinking modes, where the non-thinking mode requires using a special renderer or argument that disables chain-of-thought.
46
48
-**Vision**: Vision-language models (VLMs) that can process images alongside text. See [Vision Inputs](/rendering#vision-inputs) for usage.
47
49
48
50
### Architecture
51
+
49
52
-**Dense**: Standard transformer architecture with all parameters active
50
53
-**MoE**: Mixture of Experts architecture with sparse activation
0 commit comments