Add NVIDIA TensorRT-LLM optimization guide for GPT-OSS models #1983

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

pap-openai merged 4 commits into openai:main from jayrodge:feature/add-nvidia-tensorrt-guide

Aug 5, 2025

Contributor

jayrodge commented Aug 5, 2025

Summary

Adds a comprehensive guide for optimizing OpenAI GPT-OSS models using NVIDIA TensorRT-LLM.

Changes

Add detailed guide for optimizing gpt-oss-20b and gpt-oss-120b models
Include hardware prerequisites (16GB+ VRAM, recommended GPUs)
Provide installation instructions for TensorRT-LLM via NGC and Docker
Add Python API examples for model loading and inference
Include performance optimization tips and next steps

Benefits

Helps users optimize GPT-OSS models for high-performance inference
Provides clear hardware requirements and setup instructions
Includes practical code examples for immediate use

jayrodge added 3 commits

August 5, 2025 09:53


          Add NVIDIA TensorRT-LLM optimization guide for GPT-OSS models

baa747d


          Convert NVIDIA TensorRT guide to Jupyter notebook format

1f7a931


          Update registry.yaml for NVIDIA notebook

c4f665d

pap-openai approved these changes

View reviewed changes


          Merge branch 'main' into feature/add-nvidia-tensorrt-guide

89b7eaf

pap-openai merged commit 3d32e44 into openai:main

1 check passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet