File tree Expand file tree Collapse file tree 2 files changed +32
-0
lines changed
modelopt/torch/quantization/plugins Expand file tree Collapse file tree 2 files changed +32
-0
lines changed Original file line number Diff line number Diff line change 11Model Optimizer Changelog (Linux)
22=================================
33
4+ 0.41 (2025-12-xx)
5+ ^^^^^^^^^^^^^^^^^
6+
7+ **Deprecations **
8+
9+ **New Features **
10+ - Add FP8/NVFP4 KV cache quantization support for Megatron Core models.
11+
4120.39 (2025-11-xx)
513^^^^^^^^^^^^^^^^^
614
Original file line number Diff line number Diff line change 1+ # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+ # SPDX-License-Identifier: Apache-2.0
3+ #
4+ # Licensed under the Apache License, Version 2.0 (the "License");
5+ # you may not use this file except in compliance with the License.
6+ # You may obtain a copy of the License at
7+ #
8+ # http://www.apache.org/licenses/LICENSE-2.0
9+ #
10+ # Unless required by applicable law or agreed to in writing, software
11+ # distributed under the License is distributed on an "AS IS" BASIS,
12+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+ # See the License for the specific language governing permissions and
14+ # limitations under the License.
15+
16+ """Support quantization for Megatron Core specific layers.
17+
18+ This plugin provides additional support for Megatron Core models beyond what's
19+ available in the main megatron plugin. Currently this is a placeholder as the
20+ TEDotProductAttention support is implemented in the megatron.py plugin.
21+ """
22+
23+ # The TEDotProductAttention quantization support is implemented in megatron.py
24+ # This file exists to satisfy the import in __init__.py
You can’t perform that action at this time.
0 commit comments