Examples on how to get started with Nemotron models
This directory contains cookbook-style guides showing how to deploy and use the models directly:
- TensorRT-LLM Launch Guide - Running Nemotron models efficiently with TensorRT-LLM
- vLLM Integration - Steps for fast inference and scalable serving of Nemotron models with vLLM.
- SGLang Deployment - Tutorials on serving and interacting with Nemotron via SGLang
- NIM Microservice - Guide to deploying Nemotron as scalable, production-ready endpoints using NVIDIA Inference Microservices (NIM).
- Hugging Face Transformers - Direct loading and inference of Nemotron models with Hugging Face Transformers