-
Notifications
You must be signed in to change notification settings - Fork 280
LUT-based compressed data type #3496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
96 commits
Select commit
Hold shift + click to select a range
488cacc
Support scale estimation inside GPTQ
alexsu52 ee64877
fix for INT4_ASYM
alexsu52 f22e411
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 51b4d7b
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr f66cd1e
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 7ce5a53
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr f74d156
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 5288c79
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 1becf15
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 047d7d9
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr c0c7e57
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr b74dea1
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 26a9a77
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 25fcc2c
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 26d4887
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 7748233
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr df251b3
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 4c134c4
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 6147097
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 2b94d28
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 5e312a5
Merge remote-tracking branch 'upstream/develop' into develop
andreyanufr 2fc8f9c
Draft.
andreyanufr 7c6795e
Draft.
andreyanufr 1dcdd75
Draft for codebook.
andreyanufr b870d8d
Compression for default codebook.
andreyanufr ac26b8a
Reverted change in spell check.
andreyanufr 16d7a9e
Fixed compression to 4bit for codebook indexes.
andreyanufr 87280cc
Added tests and example.
andreyanufr a132ed2
Merge remote-tracking branch 'upstream/develop' into aanuf/LUT
andreyanufr 4ab1470
Added file with compression data structures.
andreyanufr 6ccd252
Removed debug information.
andreyanufr 22308e9
Added custom codebook to example.
andreyanufr fb259fc
Fixed bug with group_size=-1.
andreyanufr 86acc8e
Moved convert before gather.
andreyanufr b54606c
Removed backend specific parameter from advanced parameters.
andreyanufr 72b803e
Fixed tests.
andreyanufr 79f34a7
Fix for prevent Gather from low-precision types be recognized as inpu…
andreyanufr 1c64e7c
Merge remote-tracking branch 'upstream/develop' into aanuf/LUT
andreyanufr 9323381
Extend test for codebook.
andreyanufr 464c097
Refactoring.
andreyanufr b964c0c
Delete codebook algo.
andreyanufr fb834cf
Merge remote-tracking branch 'upstream/develop' into aanuf/LUT_like_nf4
andreyanufr 145fbf3
Refactoring.
andreyanufr d4e8578
Data aware codebook.
andreyanufr 97d7ecd
Fixed merge conflict.
andreyanufr ac0346d
Fixed test.
andreyanufr 5fb55e4
Fixed tests.
andreyanufr bf94228
Added CB4_F8E4M3 type.
andreyanufr e0d163e
Merge remote-tracking branch 'upstream/develop' into aanuf/LUT
andreyanufr 37a7c59
Fixed pre-commit.
andreyanufr 6006be6
Applied suggestions.
andreyanufr caed8a8
Fixed tests.
andreyanufr 0a36b51
Added codebook parametars validation.
andreyanufr 68d633b
Fixed bug.
andreyanufr 508aec4
Applied suggestions.
andreyanufr 88e645d
Resolved merge conflict.
andreyanufr 79f9368
Added description for codebook parameter.
andreyanufr 8c9b7b5
Renamed global parameter for codebook.
andreyanufr c62c315
Merge remote-tracking branch 'upstream/develop' into aanuf/LUT
andreyanufr 9bd8c4b
Removed tensor type.
andreyanufr 8f6eb33
1) Applied suggestions.
andreyanufr b5f2bc3
Fixed merge conflict.
andreyanufr 8c7f428
Removed data type from codebook parameters.
andreyanufr db43991
Removed circular imports.
andreyanufr 7c9429e
Added file with constants.
andreyanufr b90ccf3
Moved default codebook initialization to function.
andreyanufr 8a06f88
Added test for comparison of compressed weight values for CB4_F8E4M3 …
andreyanufr d6e4a76
Fixed test.
andreyanufr b231848
Fixed fp8 value.
andreyanufr de7b709
Test for codebook graph.
andreyanufr 4737ade
Changed name of file for more appropriate.
andreyanufr 072a62a
Changed name of file for more appropriate.
andreyanufr ede9342
Return reshape_weight_for_grouped_quantization to weight_lowering.
andreyanufr 4471290
Changed no ascii chracter.
andreyanufr 3f9f833
Removed extra convert from fp16 to fp16.
andreyanufr 67faaa7
Added test and exception which checks what codebook is sorted, not em…
andreyanufr e5322e3
Fixed fp8 values in test.
andreyanufr 8f18fb8
Applied suggestions.
andreyanufr c838708
Applied suggestions.
andreyanufr 0949a92
Applied suggestions.
andreyanufr b491012
Fixed data type.
andreyanufr 6bf05fc
Removed torch tensor from codebook docstring.
andreyanufr e44b3d8
Applied suggestion.
andreyanufr f1c68d6
Applied suggestion.
andreyanufr 7673381
Merge remote-tracking branch 'upstream/develop' into aanuf/LUT
andreyanufr 8159e56
Fixed bug.
andreyanufr b24936b
Fixed bug for onnx.
andreyanufr 6fdfd33
Applied suggestion.
andreyanufr 17d6d2d
Applied suggestions.
andreyanufr 61abc6a
Applied suggestions.
andreyanufr d1d8232
1) Added docstrings for codebook example.
andreyanufr b8f2526
Applied suggestions.
andreyanufr ca342ab
Applied suggestion.
andreyanufr 635ef23
Changed docstring formatting.
andreyanufr 50a94aa
Applied suggestions.
andreyanufr 82d9e5c
Applied suggestions.
andreyanufr File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -505,4 +505,4 @@ yolov | |
| yscale | ||
| yujie | ||
| yury | ||
| zfnet | ||
| zfnet | ||
26 changes: 26 additions & 0 deletions
26
examples/llm_compression/openvino/smollm2_360m_codebook/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| # Large Language Models FP8 Compression Example | ||
|
|
||
| This example demonstrates how to apply codebook compression to [HuggingFaceTB/SmolLM2-360M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM2-360M-Instruct) model. It can be useful for evaluation and early HW enablement purposes. | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| To use this example: | ||
|
|
||
| - Create a separate Python* environment and activate it: `python3 -m venv nncf_env && source nncf_env/bin/activate` | ||
| - Install dependencies: | ||
|
|
||
| ```bash | ||
| pip install -U pip | ||
| pip install -r requirements.txt | ||
| pip install ../../../../ | ||
| ``` | ||
|
|
||
| ## Run Example | ||
|
|
||
| To run example: | ||
|
|
||
| ```bash | ||
| python main.py | ||
| ``` | ||
|
|
||
| It will automatically download the dataset and baseline model and save the resulting model. |
163 changes: 163 additions & 0 deletions
163
examples/llm_compression/openvino/smollm2_360m_codebook/main.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,163 @@ | ||
| # Copyright (c) 2025 Intel Corporation | ||
| # Licensed under the Apache License, Version 2.0 (the "License"); | ||
| # you may not use this file except in compliance with the License. | ||
| # You may obtain a copy of the License at | ||
| # http://www.apache.org/licenses/LICENSE-2.0 | ||
| # Unless required by applicable law or agreed to in writing, software | ||
| # distributed under the License is distributed on an "AS IS" BASIS, | ||
| # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
| # See the License for the specific language governing permissions and | ||
| # limitations under the License. | ||
|
|
||
| import warnings | ||
|
|
||
| import numpy as np | ||
| from optimum.intel.openvino import OVModelForCausalLM | ||
| from torch.jit import TracerWarning | ||
| from transformers import AutoTokenizer | ||
| from transformers import logging | ||
|
|
||
| import nncf | ||
|
|
||
| logging.set_verbosity_error() | ||
| warnings.filterwarnings("ignore", category=TracerWarning) | ||
|
|
||
|
|
||
| MODEL_ID = "HuggingFaceTB/SmolLM2-360M-Instruct" | ||
| COMPRESSED_MODEL_ID = "smollm2_360m_compressed_codebook" | ||
|
|
||
|
|
||
| def generate_answers( | ||
| questions: list[str], model: OVModelForCausalLM, tokenizer: AutoTokenizer, max_new_tokens: int = 50 | ||
| ) -> dict[str, str]: | ||
| """ | ||
| Generate answers for a list of questions using the provided model and tokenizer. | ||
|
|
||
| :param questions: List of questions to be answered. | ||
| :param model: The model to use for generating answers. | ||
| :param tokenizer: The tokenizer to use for processing the input and output. | ||
| :param max_new_tokens: Maximum number of new tokens to generate for each answer. Defaults to 50. | ||
| :return: A dictionary mapping each question to its corresponding answer. | ||
| """ | ||
| messages = [ | ||
| {"role": "system", "content": "You are a chatbot who always responds as short as possible."}, | ||
| {"role": "user", "content": "What is the capital of Spain?"}, | ||
| {"role": "assistant", "content": "Madrid."}, | ||
| ] | ||
| answers_by_questions = {} | ||
|
|
||
| for question in questions: | ||
| messages.append({"role": "user", "content": question}) | ||
| input_ids = tokenizer.apply_chat_template( | ||
| messages, tokenize=True, add_generation_prompt=True, return_tensors="pt" | ||
| ).to(device=model.device) | ||
| input_len = len(input_ids[0]) | ||
|
|
||
| output = model.generate(input_ids, max_new_tokens=max_new_tokens, do_sample=False)[0] | ||
| answer = tokenizer.decode(output[input_len:], skip_special_tokens=True) | ||
| answers_by_questions[question] = answer | ||
| messages.append({"role": "assistant", "content": answer}) | ||
|
|
||
| return answers_by_questions | ||
|
|
||
|
|
||
| def print_answers(header: str, answers_by_questions: list[str]) -> None: | ||
| """ | ||
| Print the answers to the console. | ||
|
|
||
| :param header: Header to print before the answers. | ||
| :param answers_by_questions: Dictionary mapping questions to their answers. | ||
| """ | ||
| print(header) | ||
| for question, answer in answers_by_questions.items(): | ||
| print(f"Q: {question}\nA: {answer}\n") | ||
|
|
||
|
|
||
| QUESTIONS = [ | ||
| "What is the capital of France?", | ||
| "What is the highest peak in the Alps?", | ||
| "What is the largest city in Canada?", | ||
| "What is the most visited city in Japan?", | ||
| ] | ||
|
|
||
|
|
||
| def load_model_and_tokenizer(model_id: str, export=True) -> tuple[OVModelForCausalLM, AutoTokenizer]: | ||
| """ | ||
| Load the model and tokenizer from the specified model ID. | ||
|
|
||
| :param model_id: The identifier of the model to load. | ||
| :param export: Whether to export the model for OpenVINO. Defaults to True. | ||
| :return: A tuple containing the loaded model and tokenizer. | ||
| """ | ||
| tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=False) | ||
| model = OVModelForCausalLM.from_pretrained( | ||
| model_id, | ||
| export=export, | ||
| load_in_8bit=False, | ||
| ) | ||
| return model, tokenizer | ||
|
|
||
|
|
||
| def default_codebook_example(model_id: str, compressed_model_id: str) -> list[str]: | ||
| """ | ||
| Example of using the default codebook compression. | ||
|
|
||
| :param model_id: The identifier of the model to load. | ||
| :param compressed_model_id: The identifier for the compressed model to save. | ||
| :return: A list of answers generated by the model after compression. | ||
| """ | ||
| model, tokenizer = load_model_and_tokenizer(model_id) | ||
AlexanderDokuchaev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| answers_by_questions = generate_answers(QUESTIONS, model, tokenizer) | ||
| print_answers("Non-optimized model outputs:\n", answers_by_questions) | ||
|
|
||
| model.model = nncf.compress_weights(model.model, mode=nncf.CompressWeightsMode.CB4_F8E4M3, ratio=1.0, group_size=64) | ||
| model.save_pretrained(compressed_model_id) | ||
| tokenizer.save_pretrained(compressed_model_id) | ||
|
|
||
| model, tokenizer = load_model_and_tokenizer(compressed_model_id, False) | ||
| answers_by_questions = generate_answers(QUESTIONS, model, tokenizer) | ||
| print_answers("Optimized model outputs:\n", answers_by_questions) | ||
|
|
||
| return list(answers_by_questions.values()) | ||
|
|
||
|
|
||
| def custom_codebook_example(model_id: str, compressed_model_id: str) -> list[str]: | ||
| """ | ||
| Example of using the custom codebook compression. | ||
|
|
||
| :param model_id: The identifier of the model to load. | ||
| :param compressed_model_id: The identifier for the compressed model to save. | ||
| :return: A list of answers generated by the model after compression. | ||
| """ | ||
| model, tokenizer = load_model_and_tokenizer(model_id) | ||
|
|
||
| answers_by_questions = generate_answers(QUESTIONS, model, tokenizer) | ||
| print_answers("Non-optimized model outputs:\n", answers_by_questions) | ||
|
|
||
| codebook = np.array([-8, -4, -2, -1, 0, 1, 2, 4, 8], dtype=np.int8) | ||
|
|
||
| model.model = nncf.compress_weights( | ||
| model.model, | ||
| mode=nncf.CompressWeightsMode.CODEBOOK, | ||
| ratio=1.0, | ||
| group_size=-1, | ||
| advanced_parameters=nncf.AdvancedCompressionParameters(codebook=codebook), | ||
| ) | ||
| model.save_pretrained(compressed_model_id) | ||
| tokenizer.save_pretrained(compressed_model_id) | ||
|
|
||
| model, tokenizer = load_model_and_tokenizer(compressed_model_id, False) | ||
| answers_by_questions = generate_answers(QUESTIONS, model, tokenizer) | ||
| print_answers("Optimized model outputs:\n", answers_by_questions) | ||
|
|
||
| return list(answers_by_questions.values()) | ||
|
|
||
|
|
||
| def main(): | ||
| res = default_codebook_example(MODEL_ID, COMPRESSED_MODEL_ID) | ||
| res += custom_codebook_example(MODEL_ID, COMPRESSED_MODEL_ID + "_custom") | ||
| return res | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
4 changes: 4 additions & 0 deletions
4
examples/llm_compression/openvino/smollm2_360m_codebook/requirements.txt
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,4 @@ | ||
| openvino==2025.1 | ||
| optimum-intel[openvino]>=1.22.0 | ||
| transformers>=4.48.0 | ||
| onnx==1.17.0 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.