Skip to content

Commit 865364f

Browse files
author
George
authored
[Oneshot Refactor] Main refactor (#1110)
ORDER OF REVIEWS: 1. #1108 2. #1103 3. #1109 4. #1110 <- current PR SUMMARY: * Create a class to decouple dependency to `main`. Class `Oneshot` consists of pre-processing, carrying out oneshot logic and post-processing * Move the oneshot class and method under `llmcompressor/entrypoints/oneshot.py`. * Add ReadMe in `/llmcompressor/entrypoints` for info on oneshot * Delete oneshot logic from `/finetune` directory, add deprecation warning * Remove apply used only for stagerunner oneshot pathway in session.py and session_function.py * Add oneshot only calibration dataloader logic * Add a return variable of `model: PretrainedModel` for `def oneshot`. * Make oneshot carryout logic independent of `TrainingArguments` * remove `overwrite_output_dir` as oneshot input arg -> only used for `TrainingArguments` * Update README on `/finetune` path. Remove `oneshot` logic and `oneshot with fsdp` * Update `wrap_save_pretrained` logic to run only if not updated already -> used for stage runner to avoid double wrapping Entrypoints: ```python3 from llmcompressor import oneshot oneshot(**kwargs) # calls Oneshot ``` or ```python3 from llmcompressor import Oneshot oneshot = Oneshot(**kwargs) oneshot() # preprocesss, carries out logic and post process ``` TEST PLAN: Pass all tests and examples. Verified https://github.com/vllm-project/llm-compressor/blob/main/examples/quantization_2of4_sparse_w4a16/llama7b_sparse_w4a16.py works as expected. FOLLOW UPS: * Stage runner removal * Update entrypoints folder with train, eval, predict, etc. --------- Signed-off-by: George Ohashi <george@neuralmagic.com> Signed-off-by: George <george@neuralmagic.com>
1 parent 1101723 commit 865364f

File tree

61 files changed

+680
-270
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

61 files changed

+680
-270
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Quantization is applied by selecting an algorithm and calling the `oneshot` API.
5858
```python
5959
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
6060
from llmcompressor.modifiers.quantization import GPTQModifier
61-
from llmcompressor.transformers import oneshot
61+
from llmcompressor import oneshot
6262

6363
# Select quantization algorithm. In this case, we:
6464
# * apply SmoothQuant to make the activations easier to quantize

examples/big_models_with_accelerate/cpu_offloading_fp8.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
from transformers import AutoModelForCausalLM, AutoTokenizer
22

3+
from llmcompressor import oneshot
34
from llmcompressor.modifiers.quantization import QuantizationModifier
4-
from llmcompressor.transformers import oneshot
55

66
MODEL_ID = "meta-llama/Meta-Llama-3-70B-Instruct"
77
OUTPUT_DIR = MODEL_ID.split("/")[1] + "-FP8-Dynamic"

examples/big_models_with_accelerate/mult_gpus_int8_device_map.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22
from datasets import load_dataset
33
from transformers import AutoModelForCausalLM, AutoTokenizer
44

5+
from llmcompressor import oneshot
56
from llmcompressor.modifiers.quantization import GPTQModifier
67
from llmcompressor.modifiers.smoothquant import SmoothQuantModifier
7-
from llmcompressor.transformers import oneshot
88
from llmcompressor.transformers.compression.helpers import calculate_offload_device_map
99

1010
MODEL_ID = "meta-llama/Meta-Llama-3-70B-Instruct"

examples/big_models_with_accelerate/multi_gpu_int8.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,8 @@
11
from datasets import load_dataset
22
from transformers import AutoModelForCausalLM, AutoTokenizer
33

4+
from llmcompressor import oneshot
45
from llmcompressor.modifiers.quantization import GPTQModifier
5-
from llmcompressor.transformers import oneshot
66

77
MODEL_ID = "meta-llama/Meta-Llama-3-70B-Instruct"
88
SAVE_DIR = MODEL_ID.split("/")[1] + "-W8A8-Dynamic"

examples/multimodal_audio/whisper_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22
from datasets import load_dataset
33
from transformers import WhisperProcessor
44

5+
from llmcompressor import oneshot
56
from llmcompressor.modifiers.quantization import GPTQModifier
6-
from llmcompressor.transformers import oneshot
77
from llmcompressor.transformers.tracing import TraceableWhisperForConditionalGeneration
88

99
# Select model and load it.

examples/multimodal_vision/idefics3_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@
44
from PIL import Image
55
from transformers import AutoProcessor
66

7+
from llmcompressor import oneshot
78
from llmcompressor.modifiers.quantization import GPTQModifier
8-
from llmcompressor.transformers import oneshot
99
from llmcompressor.transformers.tracing import TraceableIdefics3ForConditionalGeneration
1010

1111
# Load model.

examples/multimodal_vision/llava_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
from PIL import Image
44
from transformers import AutoProcessor
55

6+
from llmcompressor import oneshot
67
from llmcompressor.modifiers.quantization import GPTQModifier
7-
from llmcompressor.transformers import oneshot
88
from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
99

1010
# Load model.

examples/multimodal_vision/mllama_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
from PIL import Image
44
from transformers import AutoProcessor
55

6+
from llmcompressor import oneshot
67
from llmcompressor.modifiers.quantization import GPTQModifier
7-
from llmcompressor.transformers import oneshot
88
from llmcompressor.transformers.tracing import TraceableMllamaForConditionalGeneration
99

1010
# Load model.

examples/multimodal_vision/phi3_vision_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55
from datasets import load_dataset
66
from transformers import AutoModelForCausalLM, AutoProcessor
77

8+
from llmcompressor import oneshot
89
from llmcompressor.modifiers.quantization import GPTQModifier
9-
from llmcompressor.transformers import oneshot
1010

1111
# Load model.
1212
model_id = "microsoft/Phi-3-vision-128k-instruct"

examples/multimodal_vision/pixtral_example.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
from PIL import Image
44
from transformers import AutoProcessor
55

6+
from llmcompressor import oneshot
67
from llmcompressor.modifiers.quantization import GPTQModifier
7-
from llmcompressor.transformers import oneshot
88
from llmcompressor.transformers.tracing import TraceableLlavaForConditionalGeneration
99

1010
# Load model.

0 commit comments

Comments
 (0)