Skip to content

Commit 8e8c885

Browse files
committed
Deployed 27b6d75 to 2.0 with MkDocs 1.6.0 and mike 2.1.3
1 parent 27b6d75 commit 8e8c885

File tree

10 files changed

+147
-193
lines changed

10 files changed

+147
-193
lines changed
194 KB
Loading
423 KB
Loading

2.0/cli-reference/start/index.html

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3721,6 +3721,11 @@ <h3 id="common-options">Common Options</h3>
37213721
<td>Port to bind the TLS server to.</td>
37223722
</tr>
37233723
<tr>
3724+
<td><code>--api-port</code> value</td>
3725+
<td><code>8080</code></td>
3726+
<td>Port to bind the GPUStack API server to.</td>
3727+
</tr>
3728+
<tr>
37243729
<td><code>--config-file</code> value</td>
37253730
<td>(empty)</td>
37263731
<td>Path to the YAML config file.</td>
@@ -3808,11 +3813,6 @@ <h3 id="server-options">Server Options</h3>
38083813
</thead>
38093814
<tbody>
38103815
<tr>
3811-
<td><code>--api-port</code> value</td>
3812-
<td><code>8080</code></td>
3813-
<td>Port to bind the gpustack server to.</td>
3814-
</tr>
3815-
<tr>
38163816
<td><code>--database-port</code> value</td>
38173817
<td><code>5432</code></td>
38183818
<td>Port of the embedded PostgresSQL database.</td>

2.0/development/index.html

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -3574,7 +3574,7 @@ <h2 id="set-up-environment">Set Up Environment</h2>
35743574
<div class="highlight"><pre><span></span><code>make<span class="w"> </span>install
35753575
</code></pre></div>
35763576
<h2 id="run">Run</h2>
3577-
<div class="highlight"><pre><span></span><code>poetry<span class="w"> </span>run<span class="w"> </span>gpustack
3577+
<div class="highlight"><pre><span></span><code>uv<span class="w"> </span>run<span class="w"> </span>gpustack
35783578
</code></pre></div>
35793579
<h2 id="build">Build</h2>
35803580
<div class="highlight"><pre><span></span><code>make<span class="w"> </span>build
@@ -3584,10 +3584,10 @@ <h2 id="test">Test</h2>
35843584
<div class="highlight"><pre><span></span><code>make<span class="w"> </span><span class="nb">test</span>
35853585
</code></pre></div>
35863586
<h2 id="update-dependencies">Update Dependencies</h2>
3587-
<div class="highlight"><pre><span></span><code>poetry<span class="w"> </span>add<span class="w"> </span>&lt;something&gt;
3587+
<div class="highlight"><pre><span></span><code>uv<span class="w"> </span>add<span class="w"> </span>&lt;something&gt;
35883588
</code></pre></div>
35893589
<p>Or</p>
3590-
<div class="highlight"><pre><span></span><code>poetry<span class="w"> </span>add<span class="w"> </span>--group<span class="w"> </span>dev<span class="w"> </span>&lt;something&gt;
3590+
<div class="highlight"><pre><span></span><code>uv<span class="w"> </span>add<span class="w"> </span>--dev<span class="w"> </span>&lt;something&gt;
35913591
</code></pre></div>
35923592
<p>For dev/testing dependencies.</p>
35933593

2.0/overview/index.html

Lines changed: 44 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@
8585
<div data-md-component="skip">
8686

8787

88-
<a href="#key-features" class="md-skip">
88+
<a href="#tested-inference-engines-gpus-and-models" class="md-skip">
8989
Skip to content
9090
</a>
9191

@@ -448,36 +448,18 @@
448448
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
449449

450450
<li class="md-nav__item">
451-
<a href="#key-features" class="md-nav__link">
451+
<a href="#tested-inference-engines-gpus-and-models" class="md-nav__link">
452452
<span class="md-ellipsis">
453-
Key Features
453+
Tested Inference Engines, GPUs, and Models
454454
</span>
455455
</a>
456456

457457
</li>
458458

459459
<li class="md-nav__item">
460-
<a href="#supported-accelerators" class="md-nav__link">
460+
<a href="#architecture" class="md-nav__link">
461461
<span class="md-ellipsis">
462-
Supported Accelerators
463-
</span>
464-
</a>
465-
466-
</li>
467-
468-
<li class="md-nav__item">
469-
<a href="#supported-models" class="md-nav__link">
470-
<span class="md-ellipsis">
471-
Supported Models
472-
</span>
473-
</a>
474-
475-
</li>
476-
477-
<li class="md-nav__item">
478-
<a href="#openai-compatible-apis" class="md-nav__link">
479-
<span class="md-ellipsis">
480-
OpenAI-Compatible APIs
462+
Architecture
481463
</span>
482464
</a>
483465

@@ -3562,36 +3544,18 @@
35623544
<ul class="md-nav__list" data-md-component="toc" data-md-scrollfix>
35633545

35643546
<li class="md-nav__item">
3565-
<a href="#key-features" class="md-nav__link">
3566-
<span class="md-ellipsis">
3567-
Key Features
3568-
</span>
3569-
</a>
3570-
3571-
</li>
3572-
3573-
<li class="md-nav__item">
3574-
<a href="#supported-accelerators" class="md-nav__link">
3575-
<span class="md-ellipsis">
3576-
Supported Accelerators
3577-
</span>
3578-
</a>
3579-
3580-
</li>
3581-
3582-
<li class="md-nav__item">
3583-
<a href="#supported-models" class="md-nav__link">
3547+
<a href="#tested-inference-engines-gpus-and-models" class="md-nav__link">
35843548
<span class="md-ellipsis">
3585-
Supported Models
3549+
Tested Inference Engines, GPUs, and Models
35863550
</span>
35873551
</a>
35883552

35893553
</li>
35903554

35913555
<li class="md-nav__item">
3592-
<a href="#openai-compatible-apis" class="md-nav__link">
3556+
<a href="#architecture" class="md-nav__link">
35933557
<span class="md-ellipsis">
3594-
OpenAI-Compatible APIs
3558+
Architecture
35953559
</span>
35963560
</a>
35973561

@@ -3655,54 +3619,44 @@ <h1>Overview</h1>
36553619
<a class="github-button" href="https://github.com/gpustack/gpustack/fork" data-show-count="true" data-icon="octicon-repo-forked" data-size="large" aria-label="Fork">Fork</a>
36563620
</p>
36573621

3658-
<p>GPUStack is an open-source GPU cluster manager for running AI models.</p>
3659-
<h3 id="key-features">Key Features</h3>
3622+
<p>GPUStack is an open-source GPU cluster manager designed for efficient AI model deployment. It lets you run models efficiently on your own GPU hardware by choosing the best inference engines, scheduling GPU resources, analyzing model architectures, and automatically configuring deployment parameters.</p>
3623+
<p>The following figure shows how GPUStack delivers improved inference throughput over the unoptimized vLLM baseline:</p>
3624+
<p><a class="glightbox" href="../assets/a100-throughput-comparison.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="a100-throughput-comparison" src="../assets/a100-throughput-comparison.png" /></a></p>
3625+
<p>For detailed benchmarking methods and results, visit our <a href="https://docs.gpustack.ai/latest/performance-lab/overview/">Inference Performance Lab</a>.</p>
3626+
<h2 id="tested-inference-engines-gpus-and-models">Tested Inference Engines, GPUs, and Models</h2>
3627+
<p>GPUStack uses a plug-in architecture that makes it easy to add new AI models, inference engines, and GPU hardware. We work closely with partners and the open-source community to test and optimize emerging models across different inference engines and GPUs. Below is the current list of supported inference engines, GPUs, and models, which will continue to expand over time.</p>
3628+
<p><strong>Tested Inference Engines:</strong></p>
36603629
<ul>
3661-
<li><strong>High Performance:</strong> Optimized for high-throughput and low-latency inference.</li>
3662-
<li><strong>GPU Cluster Management:</strong> Efficiently manage multiple GPU clusters across different providers, including Docker-based, Kubernetes, and cloud platforms such as DigitalOcean.</li>
3663-
<li><strong>Broad GPU Compatibility:</strong> Seamless support for GPUs from various vendors.</li>
3664-
<li><strong>Extensive Model Support:</strong> Supports a wide range of models, including LLMs, VLMs, image models, audio models, embedding models, and rerank models.</li>
3665-
<li><strong>Flexible Inference Backends:</strong> Built-in support for fast inference engines such as vLLM and SGLang, with the ability to integrate custom backends.</li>
3666-
<li><strong>Multi-Version Backend Support:</strong> Run multiple versions of inference backends concurrently to meet diverse runtime requirements.</li>
3667-
<li><strong>Distributed Inference:</strong> Supports single-node and multi-node, multi-GPU inference, including heterogeneous GPUs across vendors and environments.</li>
3668-
<li><strong>Scalable GPU Architecture:</strong> Easily scale by adding more GPUs, nodes, or clusters to your infrastructure.</li>
3669-
<li><strong>Robust Model Stability:</strong> Ensures high availability through automatic failure recovery, multi-instance redundancy, and intelligent load balancing.</li>
3670-
<li><strong>Intelligent Deployment Evaluation:</strong> Automatically assesses model resource requirements, backend and architecture compatibility, OS compatibility, and other deployment factors.</li>
3671-
<li><strong>Automated Scheduling:</strong> Dynamically allocates models based on available resources.</li>
3672-
<li><strong>OpenAI-Compatible APIs:</strong> Fully compatible with OpenAI API specifications for seamless integration.</li>
3673-
<li><strong>User &amp; API Key Management:</strong> Simplified management of users and API keys.</li>
3674-
<li><strong>Real-Time GPU Monitoring:</strong> Monitor GPU performance and utilization in real time.</li>
3675-
<li><strong>Token and Rate Metrics:</strong> Track token usage and API request rates.</li>
3630+
<li>vLLM</li>
3631+
<li>SGLang</li>
3632+
<li>TensorRT-LLM</li>
3633+
<li>MindIE</li>
36763634
</ul>
3677-
<h2 id="supported-accelerators">Supported Accelerators</h2>
3678-
<p>GPUStack supports a variety of General-Purpose Accelerators, including:</p>
3679-
<ul class="task-list">
3680-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> NVIDIA GPU</li>
3681-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> AMD GPU</li>
3682-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> Ascend NPU</li>
3683-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> Hygon DCU (Experimental)</li>
3684-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> MThreads GPU (Experimental)</li>
3685-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> Iluvatar GPU (Experimental)</li>
3686-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> MetaX GPU (Experimental)</li>
3687-
<li class="task-list-item"><label class="task-list-control"><input type="checkbox" disabled checked/><span class="task-list-indicator"></span></label> Cambricon MLU (Experimental)</li>
3635+
<p><strong>Tested GPUs:</strong></p>
3636+
<ul>
3637+
<li>NVIDIA A100</li>
3638+
<li>NVIDIA H100/H200</li>
3639+
<li>Ascend 910B</li>
3640+
</ul>
3641+
<p><strong>Tuned Models:</strong></p>
3642+
<ul>
3643+
<li>Qwen3</li>
3644+
<li>gpt-oss</li>
3645+
<li>GLM-4.5-Air</li>
3646+
<li>GLM-4.5/4.6</li>
3647+
<li>DeepSeek-R1</li>
3648+
</ul>
3649+
<h2 id="architecture">Architecture</h2>
3650+
<p>GPUStack enables development teams, IT organizations, and service providers to deliver Model-as-a-Service at scale. It supports industry-standard APIs for LLM, voice, image, and video models. The platform includes built-in user authentication and access control, real-time monitoring of GPU performance and utilization, and detailed metering of token usage and API request rates.</p>
3651+
<p>The figure below illustrates how a single GPUStack server can manage multiple GPU clusters across both on-premises and cloud environments. The GPUStack scheduler allocates GPUs to maximize resource utilization and selects the appropriate inference engines for optimal performance. Administrators also gain full visibility into system health and metrics through integrated Grafana and Prometheus dashboards.</p>
3652+
<p><a class="glightbox" href="../assets/gpustack-v2-architecture.png" data-type="image" data-width="auto" data-height="auto" data-desc-position="bottom"><img alt="gpustack-v2-architecture" src="../assets/gpustack-v2-architecture.png" /></a></p>
3653+
<p>GPUStack provides a powerful framework for deploying AI models. Its core features include:</p>
3654+
<ul>
3655+
<li><strong>Multi-Cluster GPU Management.</strong> Manages GPU clusters across multiple environments. This includes on-premises servers, Kubernetes clusters, and cloud providers.</li>
3656+
<li><strong>Pluggable Inference Engines.</strong> Automatically configures high-performance inference engines such as vLLM, SGLang, and TensorRT-LLM. You can also add custom inference engines as needed.</li>
3657+
<li><strong>Performance-Optimized Configurations.</strong> Offers pre-tuned modes for low latency or high throughput. GPUStack supports extended KV cache systems like LMCache and HiCache to reduce TTFT. It also includes built-in support for speculative decoding methods such as EAGLE3, MTP, and N-grams.</li>
3658+
<li><strong>Enterprise-Grade Operations.</strong> Offers support for automated failure recovery, load balancing, monitoring, authentication, and access control.</li>
36883659
</ul>
3689-
<h2 id="supported-models">Supported Models</h2>
3690-
<p>GPUStack uses <a href="https://github.com/vllm-project/vllm">vLLM</a>, <a href="https://github.com/sgl-project/sglang">SGLang</a>, <a href="https://www.hiascend.com/en/software/mindie">MindIE</a> and <a href="https://github.com/gpustack/vox-box">vox-box</a> as built-in inference backends, and it also supports any custom backend that can run in a container and expose a serving API. This allows GPUStack to work with a wide range of models.</p>
3691-
<p>Models can come from the following sources:</p>
3692-
<ol>
3693-
<li>
3694-
<p><a href="https://huggingface.co/">Hugging Face</a></p>
3695-
</li>
3696-
<li>
3697-
<p><a href="https://modelscope.cn/">ModelScope</a></p>
3698-
</li>
3699-
<li>
3700-
<p>Local File Path</p>
3701-
</li>
3702-
</ol>
3703-
<p>For information on which models are supported by each built-in inference backend, please refer to the supported models section in the <a href="../user-guide/built-in-inference-backends/">Built-in Inference Backends</a> documentation.</p>
3704-
<h2 id="openai-compatible-apis">OpenAI-Compatible APIs</h2>
3705-
<p>GPUStack serves OpenAI compatible APIs. For details, please refer to <a href="../user-guide/openai-compatible-apis/">OpenAI Compatible APIs</a></p>
37063660

37073661

37083662

2.0/quickstart/index.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3640,7 +3640,7 @@ <h1 id="quickstart">Quickstart</h1>
36403640
<h2 id="install-gpustack">Install GPUStack</h2>
36413641
<div class="admonition note">
36423642
<p class="admonition-title">Note</p>
3643-
<p>GPUStack now supports Linux only. For Windows, use WSL2 and avoid Docker Desktop.</p>
3643+
<p>GPUStack now supports Linux only.</p>
36443644
</div>
36453645
<p>If you are using NVIDIA GPUs, ensure the NVIDIA driver, <a href="https://docs.docker.com/engine/install/">Docker</a> and <a href="https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html">NVIDIA Container Toolkit</a> are installed. Then start the GPUStack with the following command:</p>
36463646
<div class="highlight"><pre><span></span><code>sudo<span class="w"> </span>docker<span class="w"> </span>run<span class="w"> </span>-d<span class="w"> </span>--name<span class="w"> </span>gpustack<span class="w"> </span><span class="se">\</span>

2.0/search/search_index.json

Lines changed: 1 addition & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)