diff --git a/docs/model-lineup.mdx b/docs/model-lineup.mdx index ebc8c2e1..b8447d7e 100644 --- a/docs/model-lineup.mdx +++ b/docs/model-lineup.mdx @@ -7,38 +7,40 @@ The table below shows the models that are currently available in Tinker. We plan - In general, use MoE models, which are more cost effective than the dense models. - Use Base models only if you're doing research or are running the full post-training pipeline yourself - If you want to create a model that is good at a specific task or domain, use an existing post-trained model, and fine-tune it on your own data or environment. - - If you care about latency, use one of the Instruction models, which will start outputting tokens without a chain-of-thought. - - If you care about intelligence and robustness, use one of the Hybrid or Reasoning models, which can use long chain-of-thought. + - If you care about latency, use one of the Instruction models, which will start outputting tokens without a chain-of-thought. + - If you care about intelligence and robustness, use one of the Hybrid or Reasoning models, which can use long chain-of-thought. ## Full Listing -| Model Name | Training Type | Architecture | Size | -| ----------------------------------------------------------------------------------------------- | ------------- | ------------ | --------- | -| [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) | Vision | MoE | Large | -| [Qwen/Qwen3-VL-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) | Vision | MoE | Medium | -| [Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) | Instruction | MoE | Large | -| [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | Instruction | MoE | Medium | -| [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) | Hybrid | MoE | Medium | -| [Qwen/Qwen3-30B-A3B-Base](https://huggingface.co/Qwen/Qwen3-30B-A3B-Base) | Base | MoE | Medium | -| [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Hybrid | Dense | Medium | -| [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Hybrid | Dense | Small | -| [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) | Base | Dense | Small | -| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | Instruction | Dense | Compact | -| [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) | Reasoning | MoE | Medium | -| [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) | Reasoning | MoE | Small | -| [deepseek-ai/DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) | Hybrid | MoE | Large | -| [deepseek-ai/DeepSeek-V3.1-Base](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) | Base | MoE | Large | -| [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | Base | Dense | Large | -| [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | Instruction | Dense | Large | -| [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | Base | Dense | Small | -| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | Instruction | Dense | Small | -| [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | Base | Dense | Compact | -| [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) | Base | Dense | Compact | -| [moonshotai/Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking) | Reasoning | MoE | Large | +| Model Name | Training Type | Architecture | Size | +| ----------------------------------------------------------------------------------------------- | ------------- | ------------ | ------- | +| [Qwen/Qwen3-VL-235B-A22B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) | Vision | MoE | Large | +| [Qwen/Qwen3-VL-30B-A3B-Instruct](https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct) | Vision | MoE | Medium | +| [Qwen/Qwen3-235B-A22B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507) | Instruction | MoE | Large | +| [Qwen/Qwen3-30B-A3B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Instruct-2507) | Instruction | MoE | Medium | +| [Qwen/Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) | Hybrid | MoE | Medium | +| [Qwen/Qwen3-30B-A3B-Base](https://huggingface.co/Qwen/Qwen3-30B-A3B-Base) | Base | MoE | Medium | +| [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) | Hybrid | Dense | Medium | +| [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) | Hybrid | Dense | Small | +| [Qwen/Qwen3-8B-Base](https://huggingface.co/Qwen/Qwen3-8B-Base) | Base | Dense | Small | +| [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507) | Instruction | Dense | Compact | +| [openai/gpt-oss-120b](https://huggingface.co/openai/gpt-oss-120b) | Reasoning | MoE | Medium | +| [openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) | Reasoning | MoE | Small | +| [deepseek-ai/DeepSeek-V3.1](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) | Hybrid | MoE | Large | +| [deepseek-ai/DeepSeek-V3.1-Base](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) | Base | MoE | Large | +| [meta-llama/Llama-3.1-70B](https://huggingface.co/meta-llama/Llama-3.1-70B) | Base | Dense | Large | +| [meta-llama/Llama-3.3-70B-Instruct](https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct) | Instruction | Dense | Large | +| [meta-llama/Llama-3.1-8B](https://huggingface.co/meta-llama/Llama-3.1-8B) | Base | Dense | Small | +| [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | Instruction | Dense | Small | +| [meta-llama/Llama-3.2-3B](https://huggingface.co/meta-llama/Llama-3.2-3B) | Base | Dense | Compact | +| [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) | Base | Dense | Compact | +| [moonshotai/Kimi-K2-Thinking](https://huggingface.co/moonshotai/Kimi-K2-Thinking) | Reasoning | MoE | Large | +| [moonshotai/Kimi-K2.5](https://huggingface.co/moonshotai/Kimi-K2.5) | Reasoning + Vision | MoE | Large | ## Legend ### Training Types + - **Base**: Foundation models trained on raw text data, suitable for post-training research and custom fine-tuning. - **Instruction**: Models fine-tuned for following instructions and chat, optimized for fast inference. - **Reasoning**: Models that always use chain-of-thought reasoning before their "visible" output that responds to the prompt. @@ -46,6 +48,7 @@ The table below shows the models that are currently available in Tinker. We plan - **Vision**: Vision-language models (VLMs) that can process images alongside text. See [Vision Inputs](/rendering#vision-inputs) for usage. ### Architecture + - **Dense**: Standard transformer architecture with all parameters active - **MoE**: Mixture of Experts architecture with sparse activation