|
18 | 18 | "- `gpt-oss-20b`\n",
|
19 | 19 | "- `gpt-oss-120b`\n",
|
20 | 20 | "\n",
|
21 |
| - "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/tree/main/docs/source/blogs/tech_blog) deployment guide." |
| 21 | + "In this guide, we will run `gpt-oss-20b`, if you want to try the larger model or want more customization refer to [this](https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/blogs/tech_blog/blog9_Deploying_GPT_OSS_on_TRTLLM.md) deployment guide.\n", |
| 22 | + "\n", |
| 23 | + "Note: Your input prompts should use the [harmony response](http://cookbook.openai.com/articles/openai-harmony) format for the model to work properly, though this guide does not require it." |
| 24 | + ] |
| 25 | + }, |
| 26 | + { |
| 27 | + "cell_type": "markdown", |
| 28 | + "metadata": {}, |
| 29 | + "source": [ |
| 30 | + "#### Launch on NVIDIA Brev\n", |
| 31 | + "You can simplify the environment setup by using [NVIDIA Brev](https://developer.nvidia.com/brev). Click the button below to launch this project on a Brev instance with the necessary dependencies pre-configured.\n", |
| 32 | + "\n", |
| 33 | + "Once deployed, click on the \"Open Notebook\" button to get start with this guide\n", |
| 34 | + "\n", |
| 35 | + "[](https://brev.nvidia.com/launchable/deploy?launchableID=env-30i1YjHsRWT109HL6eYxLUeHIwF)" |
22 | 36 | ]
|
23 | 37 | },
|
24 | 38 | {
|
|
33 | 47 | "metadata": {},
|
34 | 48 | "source": [
|
35 | 49 | "### Hardware\n",
|
36 |
| - "To run the 20B model and the TensorRT-LLM build process, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n", |
| 50 | + "To run the gpt-oss-20b model, you will need an NVIDIA GPU with at least 20 GB of VRAM.\n", |
37 | 51 | "\n",
|
38 |
| - "> Recommended GPUs: NVIDIA RTX 50 Series (e.g.RTX 5090), NVIDIA H100, or L40S.\n", |
| 52 | + "Recommended GPUs: NVIDIA Hopper (e.g., H100, H200), NVIDIA Blackwell (e.g., B100, B200), NVIDIA RTX PRO, NVIDIA RTX 50 Series (e.g., RTX 5090).\n", |
39 | 53 | "\n",
|
40 | 54 | "### Software\n",
|
41 | 55 | "- CUDA Toolkit 12.8 or later\n",
|
42 |
| - "- Python 3.12 or later\n", |
43 |
| - "- Access to the Orangina model checkpoint from Hugging Face" |
| 56 | + "- Python 3.12 or later" |
44 | 57 | ]
|
45 | 58 | },
|
46 | 59 | {
|
47 | 60 | "cell_type": "markdown",
|
48 | 61 | "metadata": {},
|
49 | 62 | "source": [
|
50 |
| - "## Installling TensorRT-LLM" |
| 63 | + "## Installing TensorRT-LLM\n", |
| 64 | + "\n", |
| 65 | + "There are multiple ways to install TensorRT-LLM. In this guide, we'll cover using a pre-built Docker container from NVIDIA NGC as well as building from source.\n", |
| 66 | + "\n", |
| 67 | + "If you're using NVIDIA Brev, you can skip this section." |
51 | 68 | ]
|
52 | 69 | },
|
53 | 70 | {
|
54 | 71 | "cell_type": "markdown",
|
55 | 72 | "metadata": {},
|
56 | 73 | "source": [
|
57 |
| - "## Using NGC\n", |
| 74 | + "## Using NVIDIA NGC\n", |
58 | 75 | "\n",
|
59 |
| - "Pull the pre-built TensorRT-LLM container for GPT-OSS from NVIDIA NGC.\n", |
| 76 | + "Pull the pre-built [TensorRT-LLM container](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) for GPT-OSS from [NVIDIA NGC](https://www.nvidia.com/en-us/gpu-cloud/).\n", |
60 | 77 | "This is the easiest way to get started and ensures all dependencies are included.\n",
|
61 | 78 | "\n",
|
62 |
| - "`docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n", |
63 |
| - "`docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev`\n", |
| 79 | + "```bash\n", |
| 80 | + "docker pull nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n", |
| 81 | + "docker run --gpus all -it --rm -v $(pwd):/workspace nvcr.io/nvidia/tensorrt-llm/release:gpt-oss-dev\n", |
| 82 | + "```\n", |
64 | 83 | "\n",
|
65 |
| - "## Using Docker (build from source)\n", |
| 84 | + "## Using Docker (Build from Source)\n", |
66 | 85 | "\n",
|
67 | 86 | "Alternatively, you can build the TensorRT-LLM container from source.\n",
|
68 |
| - "This is useful if you want to modify the source code or use a custom branch.\n", |
69 |
| - "See the official instructions here: https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker\n", |
70 |
| - "\n", |
71 |
| - "The following commands will install required dependencies, clone the repository,\n", |
72 |
| - "check out the GPT-OSS feature branch, and build the Docker container:\n", |
73 |
| - " ```\n", |
74 |
| - "#Update package lists and install required system packages\n", |
75 |
| - "sudo apt-get update && sudo apt-get -y install git git-lfs build-essential cmake\n", |
76 |
| - "\n", |
77 |
| - "# Initialize Git LFS (Large File Storage) for handling large model files\n", |
78 |
| - "git lfs install\n", |
79 |
| - "\n", |
80 |
| - "# Clone the TensorRT-LLM repository\n", |
81 |
| - "git clone https://github.com/NVIDIA/TensorRT-LLM.git\n", |
82 |
| - "cd TensorRT-LLM\n", |
83 |
| - "\n", |
84 |
| - "# Check out the branch with GPT-OSS support\n", |
85 |
| - "git checkout feat/gpt-oss\n", |
86 |
| - "\n", |
87 |
| - "# Initialize and update submodules (required for build)\n", |
88 |
| - "git submodule update --init --recursive\n", |
89 |
| - "\n", |
90 |
| - "# Pull large files (e.g., model weights) managed by Git LFS\n", |
91 |
| - "git lfs pull\n", |
92 |
| - "\n", |
93 |
| - "# Build the release Docker image\n", |
94 |
| - "make -C docker release_build\n", |
95 |
| - "\n", |
96 |
| - "# Run the built Docker container\n", |
97 |
| - "make -C docker release_run \n", |
98 |
| - "```" |
| 87 | + "This approach is useful if you want to modify the source code or use a custom branch.\n", |
| 88 | + "For detailed instructions, see the [official documentation](https://github.com/NVIDIA/TensorRT-LLM/tree/feat/gpt-oss/docker)." |
99 | 89 | ]
|
100 | 90 | },
|
101 | 91 | {
|
|
0 commit comments