Skip to content

ModelOpt 0.39.0 Release

Latest

Choose a tag to compare

@kevalmorabia97 kevalmorabia97 released this 13 Nov 07:25
f329b19

Deprecations

  • Deprecated modelopt.torch._deploy.utils.get_onnx_bytes API. Please use modelopt.torch._deploy.utils.get_onnx_bytes_and_metadata instead to access the ONNX model bytes with external data. See examples/onnx_ptq/download_example_onnx.py for example usage.

New Features

  • Added flag op_types_to_exclude_fp16 in ONNX quantization to exclude ops from being converted to FP16/BF16. Alternatively, for custom TensorRT ops, this can also be done by indicating 'fp32' precision in trt_plugins_precision.
  • Added LoRA mode support for MCore in a new peft submodule: modelopt.torch.peft.update_model(model, LORA_CFG).
  • Supported PTQ and fakequant in vLLM for fast evaluation of arbitrary quantization formats. See examples/vllm_serve for more details.
  • Added support for nemotron-post-training-dataset-v2 and nemotron-post-training-dataset-v1 in examples/llm_ptq. Defaults to a mix of cnn_dailymail and nemotron-post-training-dataset-v2 (gated dataset accessed using the HF_TOKEN environment variable) if no dataset is specified.
  • Allows specifying calib_seq in examples/llm_ptq to set the maximum sequence length for calibration.
  • Added support for MCore MoE PTQ/QAT/QAD.
  • Added support for multi-node PTQ and export with FSDP2 in examples/llm_ptq/multinode_ptq.py. See examples/llm_ptq/README.md for more details.
  • Added support for Nemotron Nano VL v1 & v2 models in FP8/NVFP4 PTQ workflow.
  • Added flags nodes_to_include and op_types_to_include in AutoCast to force-include nodes in low precision, even if they would otherwise be excluded by other rules.
  • Added support for torch.compile and benchmarking in examples/diffusers/quantization/diffusion_trt.py.
  • Enabled native ModelOpt quantization support for FP8 and NVFP4 formats in SGLang. See SGLang quantization documentation for more details.
  • Added ModelOpt quantized checkpoints in vLLM/SGLang CI/CD pipelines (PRs are under review).
  • Added support for exporting QLoRA checkpoints finetuned using ModelOpt.

Documentation

Additional Announcements

  • ModelOpt will change its versioning from odd minor versions to all consecutive versions from next release. This means next release will be named 0.40.0 instead of 0.41.0