|
| 1 | +# 🤗 Hugging Face Transformers API Compatibility |
| 2 | + |
| 3 | +NeMo AutoModel is built to work with the 🤗 Hugging Face ecosystem. |
| 4 | +In practice, compatibility comes in two layers: |
| 5 | + |
| 6 | +- **API compatibility**: for many workflows, you can keep your existing `transformers` code and swap in NeMo AutoModel “drop-in” wrappers (`NeMoAutoModel*`, `NeMoAutoTokenizer`) with minimal changes. |
| 7 | +- **Artifact compatibility**: NeMo AutoModel produces **Hugging Face-compatible checkpoints** (config + tokenizer + safetensors) that can be loaded by Hugging Face Transformers and downstream tools (vLLM, SGLang, etc.). |
| 8 | + |
| 9 | +This page summarizes what "HF compatibility" means in NeMo AutoModel, calls out differences you should be aware of, and provides side-by-side examples. |
| 10 | + |
| 11 | +## Transformers Version Compatibility: v4 and v5 |
| 12 | + |
| 13 | +### Transformers v4 (Current Default) |
| 14 | + |
| 15 | +NeMo AutoModel currently pins Hugging Face Transformers to the **v4** major line (see `pyproject.toml`, currently `transformers<=4.57.5`). |
| 16 | + |
| 17 | +This means: |
| 18 | + |
| 19 | +- NeMo AutoModel is primarily tested and released against **Transformers v4.x** |
| 20 | +- New model releases on the Hugging Face Hub that require a newer Transformers may require upgrading NeMo AutoModel as well (similar to upgrading `transformers` directly) |
| 21 | + |
| 22 | +### Transformers v5 (Forward-Compatibility and Checkpoint Interoperability) |
| 23 | + |
| 24 | +Transformers **v5** introduces breaking changes across some internal utilities (e.g., cache APIs) and adds/reshapes tokenizer backends for some model families. |
| 25 | + |
| 26 | +NeMo AutoModel addresses this in two complementary ways: |
| 27 | + |
| 28 | +- **Forward-compatibility shims**: NeMo AutoModel includes small compatibility patches to smooth over known API differences across Transformers releases (for example, cache utility method names). The built-in recipes apply these patches automatically. |
| 29 | +- **Backports where needed**: for some model families, NeMo AutoModel may vendor/backport Hugging Face code that originated in the v5 development line so users can run those models while staying on a pinned v4 dependency. |
| 30 | +- **Stable artifact format**: NeMo AutoModel checkpoints are written in Hugging Face-compatible `save_pretrained` layouts (config + tokenizer + safetensors). These artifacts are designed to be loadable by both Transformers **v4** and **v5** (and non-Transformers tools that consume HF-style model repos). |
| 31 | + |
| 32 | +:::{note} |
| 33 | +If you are running Transformers v5 in another environment, you can still use NeMo AutoModel-produced consolidated checkpoints with Transformers' standard loading APIs. For details on the checkpoint layouts, see [checkpointing](checkpointing.md). |
| 34 | +::: |
| 35 | + |
| 36 | +## Drop-In Compatibility and Key Differences |
| 37 | + |
| 38 | +### Drop-In (Same Mental Model as Transformers) |
| 39 | + |
| 40 | +- **Load by model ID or local path**: `from_pretrained(...)` |
| 41 | +- **Standard HF config objects**: `AutoConfig` / `config.json` |
| 42 | +- **Tokenizers**: standard `PreTrainedTokenizerBase` behavior, including `__call__` to create tensors and `decode`/`batch_decode` |
| 43 | +- **Generation**: `model.generate(...)` and the usual generation kwargs |
| 44 | + |
| 45 | +### Differences (Where NeMo AutoModel Adds Value or Has Constraints) |
| 46 | + |
| 47 | +- **Performance features**: NeMo AutoModel can automatically apply optional kernel patches/optimizations (e.g., SDPA selection, Liger kernels, DeepEP, etc.) while keeping the public model API the same. |
| 48 | +- **Distributed training stack**: NeMo AutoModel's recipes/CLI are designed for multi-GPU/multi-node fine-tuning with PyTorch-native distributed features (FSDP2, pipeline parallelism, etc.). |
| 49 | +- **CUDA expectation**: NeMo AutoModel's `NeMoAutoModel*` wrappers are primarily optimized for NVIDIA GPU workflows, and offer support for CPU workflows as well. |
| 50 | + |
| 51 | +:::{important} |
| 52 | +`NeMoAutoModelForCausalLM.from_pretrained(...)` currently assumes CUDA is available (it uses `torch.cuda.current_device()` internally). If you need CPU-only inference, use Hugging Face `transformers` directly. |
| 53 | +::: |
| 54 | + |
| 55 | +## API Mapping (Transformers and NeMo AutoModel) |
| 56 | + |
| 57 | +### API Name Mapping |
| 58 | + |
| 59 | +:::{raw} html |
| 60 | +<table> |
| 61 | + <thead> |
| 62 | + <tr> |
| 63 | + <th style="width: 45%;">🤗 Hugging Face (<code>transformers</code>)</th> |
| 64 | + <th style="width: 45%;">NeMo AutoModel (<code>nemo_automodel</code>)</th> |
| 65 | + <th style="width: 10%;">Status</th> |
| 66 | + </tr> |
| 67 | + </thead> |
| 68 | + <tbody> |
| 69 | + <tr> |
| 70 | + <td><code>transformers.AutoModelForCausalLM</code></td> |
| 71 | + <td><code>nemo_automodel.NeMoAutoModelForCausalLM</code></td> |
| 72 | + <td>✅</td> |
| 73 | + </tr> |
| 74 | + <tr> |
| 75 | + <td><code>transformers.AutoModelForImageTextToText</code></td> |
| 76 | + <td><code>nemo_automodel.NeMoAutoModelForImageTextToText</code></td> |
| 77 | + <td>✅</td> |
| 78 | + </tr> |
| 79 | + <tr> |
| 80 | + <td><code>transformers.AutoModelForSequenceClassification</code></td> |
| 81 | + <td><code>nemo_automodel.NeMoAutoModelForSequenceClassification</code></td> |
| 82 | + <td>✅</td> |
| 83 | + </tr> |
| 84 | + <tr> |
| 85 | + <td><code>transformers.AutoModelForTextToWaveform</code></td> |
| 86 | + <td><code>nemo_automodel.NeMoAutoModelForTextToWaveform</code></td> |
| 87 | + <td>✅</td> |
| 88 | + </tr> |
| 89 | + <tr> |
| 90 | + <td><code>transformers.AutoTokenizer.from_pretrained(...)</code></td> |
| 91 | + <td><code>nemo_automodel.NeMoAutoTokenizer.from_pretrained(...)</code></td> |
| 92 | + <td>✅</td> |
| 93 | + </tr> |
| 94 | + <tr> |
| 95 | + <td><code>model.generate(...)</code></td> |
| 96 | + <td><code>model.generate(...)</code></td> |
| 97 | + <td>🚧</td> |
| 98 | + </tr> |
| 99 | + <tr> |
| 100 | + <td><code>model.save_pretrained(path)</code></td> |
| 101 | + <td><code>model.save_pretrained(path, checkpointer=...)</code></td> |
| 102 | + <td>🚧</td> |
| 103 | + </tr> |
| 104 | + </tbody> |
| 105 | +</table> |
| 106 | +::: |
| 107 | + |
| 108 | +## Side-by-Side Examples |
| 109 | + |
| 110 | +### Load a Model and Tokenizer (Transformers v4) |
| 111 | + |
| 112 | +:::{raw} html |
| 113 | +<table> |
| 114 | + <thead> |
| 115 | + <tr> |
| 116 | + <th style="width: 50%;">🤗 Hugging Face (<code>transformers</code>)</th> |
| 117 | + <th style="width: 50%;">NeMo AutoModel (<code>nemo_automodel</code>)</th> |
| 118 | + </tr> |
| 119 | + </thead> |
| 120 | + <tbody> |
| 121 | + <tr> |
| 122 | + <td style="vertical-align: top;"> |
| 123 | + <div class="highlight"><pre><code>import torch |
| 124 | +from transformers import AutoModelForCausalLM, AutoTokenizer |
| 125 | + |
| 126 | +model_id = "gpt2" |
| 127 | + |
| 128 | +tokenizer = AutoTokenizer.from_pretrained(model_id) |
| 129 | +model = AutoModelForCausalLM.from_pretrained( |
| 130 | + model_id, |
| 131 | + torch_dtype=torch.bfloat16, |
| 132 | +)</code></pre></div> |
| 133 | + </td> |
| 134 | + <td style="vertical-align: top;"> |
| 135 | + <div class="highlight"><pre><code>import torch |
| 136 | +from nemo_automodel import NeMoAutoModelForCausalLM, NeMoAutoTokenizer |
| 137 | + |
| 138 | +model_id = "gpt2" |
| 139 | + |
| 140 | +tokenizer = NeMoAutoTokenizer.from_pretrained(model_id) |
| 141 | +model = NeMoAutoModelForCausalLM.from_pretrained( |
| 142 | + model_id, |
| 143 | + torch_dtype=torch.bfloat16, |
| 144 | +)</code></pre></div> |
| 145 | + </td> |
| 146 | + </tr> |
| 147 | + </tbody> |
| 148 | +</table> |
| 149 | +::: |
| 150 | + |
| 151 | +### Text Generation |
| 152 | + |
| 153 | +This snippet assumes you already have a `model` and `tokenizer` (see the loading snippet above). |
| 154 | + |
| 155 | +:::{raw} html |
| 156 | +<table> |
| 157 | + <thead> |
| 158 | + <tr> |
| 159 | + <th style="width: 50%;">🤗 Hugging Face (<code>transformers</code>)</th> |
| 160 | + <th style="width: 50%;">NeMo AutoModel (<code>nemo_automodel</code>)</th> |
| 161 | + </tr> |
| 162 | + </thead> |
| 163 | + <tbody> |
| 164 | + <tr> |
| 165 | + <td style="vertical-align: top; padding-top: 0;"> |
| 166 | + <div class="highlight" style="margin-top: 0;"><pre style="margin: 0;"><code>import torch |
| 167 | + |
| 168 | +prompt = "Write a haiku about GPU kernels." |
| 169 | +inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| 170 | + |
| 171 | +with torch.inference_mode(): |
| 172 | + out = model.generate(**inputs, max_new_tokens=64) |
| 173 | + |
| 174 | +print(tokenizer.decode(out[0], skip_special_tokens=True))</code></pre></div> |
| 175 | + </td> |
| 176 | + <td style="vertical-align: top; padding-top: 0;"> |
| 177 | + <div class="highlight" style="margin-top: 0;"><pre style="margin: 0;"><code>import torch |
| 178 | + |
| 179 | +prompt = "Write a haiku about GPU kernels." |
| 180 | +inputs = tokenizer(prompt, return_tensors="pt").to(model.device) |
| 181 | + |
| 182 | +with torch.inference_mode(): |
| 183 | + out = model.generate(**inputs, max_new_tokens=64) |
| 184 | + |
| 185 | +print(tokenizer.decode(out[0], skip_special_tokens=True))</code></pre></div> |
| 186 | + </td> |
| 187 | + </tr> |
| 188 | + </tbody> |
| 189 | +</table> |
| 190 | +::: |
| 191 | + |
| 192 | + |
| 193 | +### Tokenizers (Transformers vs NeMo AutoModel) |
| 194 | + |
| 195 | +NeMo AutoModel provides `NeMoAutoTokenizer` as a Transformers-like auto-tokenizer with a small registry for specialized backends (and a safe fallback when no specialization is needed). |
| 196 | + |
| 197 | +:::{raw} html |
| 198 | +<table> |
| 199 | + <thead> |
| 200 | + <tr> |
| 201 | + <th style="width: 50%;">🤗 Hugging Face (<code>transformers</code>)</th> |
| 202 | + <th style="width: 50%;">NeMo AutoModel (<code>nemo_automodel</code>)</th> |
| 203 | + </tr> |
| 204 | + </thead> |
| 205 | + <tbody> |
| 206 | + <tr> |
| 207 | + <td style="vertical-align: top;"> |
| 208 | + <div class="highlight"><pre><code>from transformers import AutoTokenizer |
| 209 | + |
| 210 | +tok = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")</code></pre></div> |
| 211 | + </td> |
| 212 | + <td style="vertical-align: top;"> |
| 213 | + <div class="highlight"><pre><code>from nemo_automodel import NeMoAutoTokenizer |
| 214 | + |
| 215 | +tok = NeMoAutoTokenizer.from_pretrained("mistralai/Mistral-7B-v0.1")</code></pre></div> |
| 216 | + </td> |
| 217 | + </tr> |
| 218 | + </tbody> |
| 219 | +</table> |
| 220 | +::: |
| 221 | + |
| 222 | +## Checkpoints: Save in NeMo AutoModel, Load Everywhere |
| 223 | + |
| 224 | +NeMo AutoModel training recipes write checkpoints in Hugging Face-compatible layouts, including consolidated safetensors that you can load directly with Transformers: |
| 225 | + |
| 226 | +- See [checkpointing](checkpointing.md) for checkpoint formats and example directory layouts. |
| 227 | +- See [model coverage](../model-coverage/overview.md) for notes on how model support depends on the pinned Transformers version. |
| 228 | + |
| 229 | +If your goal is: **train/fine-tune in NeMo AutoModel → deploy in the HF ecosystem**, the recommended workflow is to enable consolidated safetensors checkpoints and then load them with the standard HF APIs or downstream inference engines. |
0 commit comments