Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion src/llmcompressor/modifiers/quantization/gptq/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,8 @@ def on_initialize(self, state: State, **kwargs) -> bool:
warnings.warn(
f"Failed to trace {model_name} with inputs {input_names}. For more "
"information on tracing with the sequential pipeline, see "
"`src/llmcompressor/transformers/tracing/GUIDE.md`"
"https://github.com/vllm-project/llm-compressor/blob/main/"
"src/llmcompressor/transformers/tracing/GUIDE.md"
)
if isinstance(exception, unfixable_errors):
raise exception
Expand Down
6 changes: 4 additions & 2 deletions src/llmcompressor/modifiers/smoothquant/utils.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import functools
import pathlib
from collections import namedtuple
from typing import Dict, List, Tuple, Union

Expand Down Expand Up @@ -94,7 +93,10 @@ def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception as original_exception:
readme_location = pathlib.Path(__file__).parent / "README.md"
readme_location = (
"https://github.com/vllm-project/llm-compressor/tree/main/"
"src/llmcompressor/modifiers/smoothquant"
)
raise RuntimeError(
f"Error resolving mappings for given architecture."
f"Please refer to the README at {readme_location} for more information."
Expand Down
8 changes: 5 additions & 3 deletions src/llmcompressor/utils/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1091,9 +1091,11 @@ def DisableQuantization(model: torch.nn.Module):
"""
Disable quantization from QuantizationModifier
"""
model.apply(disable_quantization)
yield
model.apply(enable_quantization)
try:
model.apply(disable_quantization)
yield
finally:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.apply(enable_quantization)
will always runs regardless of try being success or fail. Is this intended.
Also if it fails I would add a log

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model.apply(enable_quantization) will always runs regardless of try being success or fail. Is this intended.

Yes, that is exactly the intention of this change

Also if it fails I would add a log

If an exception is raised, the user will see the raised exception

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry why do we want to run model.apply(enable_quantization) if it raises exception? Wouldn't it just exit?

And as long as the exception contains info that we can get from this function, then it should be ok. Ex. have input args useful to log causing the error.

Copy link
Copy Markdown
Collaborator Author

@kylesayrs kylesayrs Jan 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry why do we want to run model.apply(enable_quantization) if it raises exception? Wouldn't it just exit?

For example, if the sequential pipeline fails, an exception will be thrown and then caught, in order to fall back to the layer-wise sequential pipeline. This is example of where an exception is raised but caught, and I do not want do have to think about the weird interactions that would result from this context not being properly exited.

This "try finally" pattern is, in general, good for contexts. It allows the user to not have to worry about if the context was exited properly, reducing mental load.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we move this code change to a separate PR, ideally I would like the scope of this PR to be limited to updating readme paths

model.apply(enable_quantization)


@contextlib.contextmanager
Expand Down
5 changes: 4 additions & 1 deletion tests/llmcompressor/modifiers/smoothquant/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,10 @@

@pytest.mark.unit
def test_handle_mapping_resolution_errors():
README_LOCATION = "llmcompressor/modifiers/smoothquant/README.md"
README_LOCATION = (
"https://github.com/vllm-project/llm-compressor/tree/main/"
"src/llmcompressor/modifiers/smoothquant"
)

@handle_mapping_resolution_errors
def func_that_raises_exception():
Expand Down