Skip to content
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,8 @@ def localversion_func(version: ScmVersion) -> str:
"local_scheme": localversion_func,
"version_file": "src/llmcompressor/version.py",
},
author="Neuralmagic, Inc.",
author_email="support@neuralmagic.com",
author="The vLLM Project",
author_email="vllm-questions@lists.berkeley.edu",
description=(
"A library for compressing large language models utilizing the "
"latest techniques and research in the field for both "
Expand All @@ -102,7 +102,7 @@ def localversion_func(version: ScmVersion) -> str:
"huggingface, compressors, compression, quantization, pruning, "
"sparsity, optimization, model optimization, model compression, "
),
license="Apache",
license="Apache 2.0",
url="https://github.com/vllm-project/llm-compressor",
include_package_data=True,
package_dir={"": "src"},
Expand Down
2 changes: 1 addition & 1 deletion src/llmcompressor/modifiers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ One-shot algorithm that quantizes weights, input activations and/or output activ
calculating a range from weights or calibration data. All data is quantized to the closest
bin using a scale and (optional) zero point. This basic quantization algorithm is
suitable for FP8 quantization. A variety of quantization schemes are supported via the
[compressed-tensors](https://github.com/neuralmagic/compressed-tensors) library.
[compressed-tensors](https://github.com/vllm-project/compressed-tensors) library.

### [GPTQ](./gptq/base.py)
One-shot algorithm that uses calibration data to select the ideal bin for weight quantization.
Expand Down
Loading