How to deploy GPT-OSS 120B on KubeAI with multi-node GPU setup? #589

SupakritCRO · 2025-10-07T08:25:24Z

SupakritCRO
Oct 7, 2025

Hi everyone 👋

I have a Kubernetes cluster with 2 control plane nodes and 2 worker nodes, each worker has 1× NVIDIA RTX 4090 GPU.

I want to use KubeAI to deploy a GPT-OSS 120B open-weight model as an API or chat GUI for internal use — ideally using distributed inference across the two GPU nodes.

Has anyone tried this setup?

How does KubeAI handle large models split across nodes?
What kind of performance or latency difference should I expect compared to running on a single node with 2 GPUs?

Any tips or real-world experience would be appreciated 🙏

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to deploy GPT-OSS 120B on KubeAI with multi-node GPU setup? #589

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

How to deploy GPT-OSS 120B on KubeAI with multi-node GPU setup? #589

Uh oh!

SupakritCRO Oct 7, 2025

Replies: 0 comments

SupakritCRO
Oct 7, 2025