| Gradient | Blog | X(Twitter) | Discord | Arxiv
- [2025/10] 🔥 Parallax won #1 Product of The Day on Product Hunt!
- [2025/10] 🔥 Parallax version 0.0.1 has been released!
A fully decentralized inference engine developed by Gradient. Parallax lets you build your own AI cluster for model inference onto a set of distributed nodes despite their varying configuration and physical location. Its core features include:
- Host local LLM on personal devices
- Cross-platform support
- Pipeline parallel model sharding
- Paged KV cache management & continuous batching for Mac
- Dynamic request scheduling and routing for high performance
The backend architecture:
- P2P communication powered by Lattica
- GPU backend powered by SGLang and vLLM
- MAC backend powered by MLX LM
We warmly welcome contributions of all kinds! For guidelines on how to get involved, please refer to our Contributing Guide.
| Provider | HuggingFace Collection | Blog | Description | |
|---|---|---|---|---|
| DeepSeek | Deepseek | DeepSeek-V3.2 DeepSeek-R1 |
Deep Seek AI Launches Revolutionary Language Model | Deep Seek AI is proud to announce the launch of our latest language model, setting new standards in natural language processing and understanding. This breakthrough represents a significant step forward in AI technology, offering unprecedented capabilities in text generation, comprehension, and analysis. |
| MiniMax-M2 | MiniMax AI | MiniMax-M2 | MiniMax M2 & Agent: Ingenious in Simplicity | MiniMax-M2 is a compact, fast, and cost-effective MoE model (230B parameters, 10B active) built for advanced coding and agentic workflows. It offers state-of-the-art intelligence and coding abilities, delivering efficient, reliable tool use and strong multi-step reasoning for developers and agents, with high throughput and low latency for easy deployment. |
| GLM | Z AI | GLM-4.7 GLM-4.6 |
GLM-4.7: Advancing the Coding Capability | "GLM" is an advanced large language model series from Z AI, including GLM-4.6 and GLM-4.7. These models feature long-context support, strong coding and reasoning performance, enhanced tool-use and agent integration, and competitive results across leading open-source benchmarks. |
| Kimi-K2 | Moonshot AI | Kimi-K2 | Kimi K2: Open Agentic Intelligence | "Kimi-K2" is Moonshot AI's Kimi-K2 model family, including Kimi-K2-Base, Kimi-K2-Instruct and Kimi-K2-Thinking. Kimi K2 Thinking is a state-of-the-art open-source agentic model designed for deep, step-by-step reasoning and dynamic tool use. It features native INT4 quantization and a 256k context window for fast, memory-efficient inference. Uniquely stable in long-horizon tasks, Kimi K2 enables reliable autonomous workflows with consistent performance across hundreds of tool calls. |
| Qwen | Qwen | Qwen3-Next Qwen3 Qwen2.5 |
Qwen3-Next: Towards Ultimate Training & Inference Efficiency | The Qwen series is a family of large language models developed by Alibaba's Qwen team. It includes multiple generations such as Qwen2.5, Qwen3, and Qwen3-Next, which improve upon model architecture, efficiency, and capabilities. The models are available in various sizes and instruction-tuned versions, with support for cutting-edge features like long context and quantization. Suitable for a wide range of language tasks and open-source use cases. |
| gpt-oss | OpenAI | gpt-oss gpt-oss-safeguard |
Introducing gpt-oss-safeguard | gpt-oss are OpenAI’s open-weight GPT models (20B & 120B). The gpt-oss-safeguard variants are reasoning-based safety classification models: developers provide their own policy at inference, and the model uses chain-of-thought to classify content and explain its reasoning. This allows flexible, policy-driven moderation in complex or evolving domains, with open weights under Apache 2.0. |
| Meta Llama 3 | Meta | Meta Llama 3 Llama 3.1 Llama 3.2 Llama 3.3 |
Introducing Meta Llama 3: The most capable openly available LLM to date | "Meta Llama 3" is Meta's third-generation Llama model, available in sizes such as 8B and 70B parameters. Includes instruction-tuned and quantized (e.g., FP8) variants. |
