Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 32 additions & 42 deletions articles/run-nvidia.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,21 @@
"- `gpt-oss-20b`\n",
"- `gpt-oss-120b`\n",
"\n",
"In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide."
"In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md) deployment guide.\n",
"\n",
"Note: Your input prompts should use the [harmony response](http://cookbook.openai.com/articles/openai-harmony) format for the model to work properly, though this guide does not require it."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Launch on NVIDIA Brev\n",
"You can simplify the environment setup by using [NVIDIA Brev](https://developer.nvidia.com/brev). Click the button below to launch this project on a Brev instance with the necessary dependencies pre-configured.\n",
"\n",
"Once deployed, click on the \"Open Notebook\" button to get start with this guide\n",
"\n",
"[![Launch on Brev](https://brev-assets.s3.us-west-1.amazonaws.com/nv-lb-dark.svg)](https://brev.nvidia.com/launchable/deploy?launchableID=env-30i1YjHsRWT109HL6eYxLUeHIwF)"
]
},
{
Expand All @@ -33,69 +47,45 @@
"metadata": {},
"source": [
"### Hardware\n",
"To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n",
"To run the gpt-oss-20b model, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n",
"\n",
"> Recommended GPUs: NVIDIA RTX 50 Series (e.g.RTX 5090), NVIDIA H100, or L40S.\n",
"Recommended GPUs: NVIDIA Hopper (e.g., H100, H200), NVIDIA Blackwell (e.g., B100, B200), NVIDIA RTX PRO, NVIDIA RTX 50 Series (e.g., RTX 5090).\n",
"\n",
"### Software\n",
"- CUDA Toolkit 12.8 or later\n",
"- Python 3.12 or later\n",
"- Access to the Orangina model checkpoint from Hugging Face"
"- Python 3.12 or later"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Installling TensorRT-LLM"
"## Installing TensorRT-LLM\n",
"\n",
"There are multiple ways to install TensorRT-LLM. In this guide, we'll cover using a pre-built Docker container from NVIDIA NGC as well as building from source.\n",
"\n",
"If you're using NVIDIA Brev, you can skip this section."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Using NGC\n",
"## Using NVIDIA NGC\n",
"\n",
"Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.\n",
"Pull the pre-built [TensorRT-LLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) for GPT-OSS from [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/).\n",
"This is the easiest way to get started and ensures all dependencies are included.\n",
"\n",
"`docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
"`docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n",
"```bash\n",
"docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n",
"docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n",
"```\n",
"\n",
"## Using Docker (build from source)\n",
"## Using Docker (Build from Source)\n",
"\n",
"Alternatively, you can build the TensorRT-LLM container from source.\n",
"This is useful if you want to modify the source code or use a custom branch.\n",
"See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker\n",
"\n",
"The following commands will install required dependencies, clone the repository,\n",
"check out the GPT-OSS feature branch, and build the Docker container:\n",
" ```\n",
"#Update package lists and install required system packages\n",
"sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake\n",
"\n",
"# Initialize Git LFS (Large File Storage) for handling large model files\n",
"git lfs install\n",
"\n",
"# Clone the TensorRT-LLM repository\n",
"git clone https://github.com/NVIDIA/TensorRT-LLM.git\n",
"cd TensorRT-LLM\n",
"\n",
"# Check out the branch with GPT-OSS support\n",
"git checkout feat/gpt-oss\n",
"\n",
"# Initialize and update submodules (required for build)\n",
"git submodule update --init --recursive\n",
"\n",
"# Pull large files (e.g., model weights) managed by Git LFS\n",
"git lfs pull\n",
"\n",
"# Build the release Docker image\n",
"make -C docker release_build\n",
"\n",
"# Run the built Docker container\n",
"make -C docker release_run \n",
"```"
"This approach is useful if you want to modify the source code or use a custom branch.\n",
"For detailed instructions, see the [official documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker)."
]
},
{
Expand Down
3 changes: 1 addition & 2 deletions registry.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,7 @@
# should build pages for, and indicates metadata such as tags, creation date and
# authors for each page.


- title: Using NVIDIA TensorRT-LLM to run the 20B model
- title: Using NVIDIA TensorRT-LLM to run gpt-oss-20b
path: articles/run-nvidia.ipynb
date: 2025-08-05
authors:
Expand Down