From cd086107d831f1acbb98bf39116fd4953df5a8ef Mon Sep 17 00:00:00 2001 From: omrialmog Date: Wed, 17 Sep 2025 20:55:23 -0700 Subject: [PATCH 1/2] Update News README.md Signed-off-by: omrialmog --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 72c954017..132b93103 100644 --- a/README.md +++ b/README.md @@ -26,6 +26,7 @@ Model Optimizer is also integrated with [NVIDIA NeMo](https://github.com/NVIDIA- ## Latest News +- [2025/09/17] [An Introduction to Speculative Decoding for Reducing Latency in AI Inference](https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/) - [2025/08/29] [Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training](https://developer.nvidia.com/blog/fine-tuning-gpt-oss-for-accuracy-and-performance-with-quantization-aware-training/) - [2025/08/01] [Optimizing LLMs for Performance and Accuracy with Post-Training Quantization](https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/) - [2025/06/24] [Introducing NVFP4 for Efficient and Accurate Low-Precision Inference](https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/) From ec6171d12727a27147a85cb89a71c4b97dba2626 Mon Sep 17 00:00:00 2001 From: omrialmog Date: Wed, 17 Sep 2025 20:59:10 -0700 Subject: [PATCH 2/2] Update News README.md Signed-off-by: omrialmog --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 132b93103..b16196b03 100644 --- a/README.md +++ b/README.md @@ -27,6 +27,7 @@ Model Optimizer is also integrated with [NVIDIA NeMo](https://github.com/NVIDIA- ## Latest News - [2025/09/17] [An Introduction to Speculative Decoding for Reducing Latency in AI Inference](https://developer.nvidia.com/blog/an-introduction-to-speculative-decoding-for-reducing-latency-in-ai-inference/) +- [2025/09/11] [How Quantization Aware Training Enables Low-Precision Accuracy Recovery](https://developer.nvidia.com/blog/how-quantization-aware-training-enables-low-precision-accuracy-recovery/) - [2025/08/29] [Fine-Tuning gpt-oss for Accuracy and Performance with Quantization Aware Training](https://developer.nvidia.com/blog/fine-tuning-gpt-oss-for-accuracy-and-performance-with-quantization-aware-training/) - [2025/08/01] [Optimizing LLMs for Performance and Accuracy with Post-Training Quantization](https://developer.nvidia.com/blog/optimizing-llms-for-performance-and-accuracy-with-post-training-quantization/) - [2025/06/24] [Introducing NVFP4 for Efficient and Accurate Low-Precision Inference](https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/)