Skip to content

Commit f738972

Browse files
tharapalanivelandrea-fasoliBrandonGrothchichun-charlie-liukcirred
committed
Initial commit for optimization techniques
Co-authored-by: Andrea Fasoli <[email protected]> Co-authored-by: Brandon Groth <[email protected]> Co-authored-by: Charlie Liu <[email protected]> Co-authored-by: Derrick Liu <[email protected]> Co-authored-by: Iqbal Saraf <[email protected]> Co-authored-by: Martin Hickey <[email protected]> Co-authored-by: Naigang Wang <[email protected]> Co-authored-by: Omobayode Fagbohungbe <[email protected]> Signed-off-by: Thara Palanivel <[email protected]>
1 parent f245f25 commit f738972

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

69 files changed

+31143
-32
lines changed

.isort.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,4 +7,4 @@ import_heading_thirdparty=Third Party
77
import_heading_firstparty=First Party
88
import_heading_localfolder=Local
99
known_firstparty=
10-
known_localfolder=fms_mo
10+
known_localfolder=fms_mo,tests

.pylintrc

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,13 @@ ignore-patterns=^\.#
6363
# (useful for modules/projects where namespaces are manipulated during runtime
6464
# and thus existing member attributes cannot be deduced by static analysis). It
6565
# supports qualified module names, as well as Unix pattern matching.
66-
ignored-modules=
66+
ignored-modules=auto_gptq,
67+
exllama_kernels,
68+
exllamav2_kernels,
69+
llmcompressor,
70+
cutlass_mm,
71+
pygraphviz,
72+
matplotlib
6773

6874
# Python code to execute, usually for sys.path manipulation such as
6975
# pygtk.require().
@@ -81,7 +87,7 @@ limit-inference-results=100
8187

8288
# List of plugins (as comma separated values of python module names) to load,
8389
# usually to register additional checkers.
84-
load-plugins=pylint_pytest
90+
load-plugins=
8591

8692
# Pickle collected data for later comparisons.
8793
persistent=yes
@@ -435,10 +441,12 @@ disable=raw-checker-failed,
435441
too-many-branches,
436442
too-many-statements,
437443
too-many-positional-arguments,
444+
too-many-lines,
438445
cyclic-import,
439446
too-few-public-methods,
440447
protected-access,
441448
fixme,
449+
logging-fstring-interpolation,
442450
logging-format-interpolation,
443451
logging-too-many-args,
444452
attribute-defined-outside-init,

CODEOWNERS

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#####################################################
22
#
3-
# List of approvers for fms-model-optimization repository
3+
# List of approvers for fms-model-optimizer repository
44
#
55
#####################################################
66
#

CONTRIBUTING.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Help on open source projects is always welcome and there is always something tha
2222

2323
For any contributions that need design changes/API changes, reach out to maintainers to check if an Architectural Design Record would be beneficial. Reason for ADR: teams agree on the design, to avoid back and forth after writing code. An ADR gives context on the code being written. If requested for an ADR, make a contribution [using the template](./architecture_records/template.md).
2424

25-
When contributing, it's useful to start by looking at [issues](https://github.com/foundation-model-stack/fms-model-optimization/issues). After picking up an issue, writing code, or updating a document, make a pull request and your work will be reviewed and merged. If you're adding a new feature or find a bug, it's best to [write an issue](https://github.com/foundation-model-stack/fms-model-optimization/issues/new) first to discuss it with maintainers.
25+
When contributing, it's useful to start by looking at [issues](https://github.com/foundation-model-stack/fms-model-optimizer/issues). After picking up an issue, writing code, or updating a document, make a pull request and your work will be reviewed and merged. If you're adding a new feature or find a bug, it's best to [write an issue](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new) first to discuss it with maintainers.
2626

2727
To contribute to this repo, you'll use the Fork and Pull model common in many open source repositories. For details on this process, check out [The GitHub Workflow
2828
Guide](https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md)
@@ -35,9 +35,9 @@ Before sending pull requests, make sure your changes pass formatting, linting an
3535
#### Dependencies
3636
If additional new Python module dependencies are required, think about where to put them:
3737

38-
- If they're required for fms-model-optimization, then append them to the [dependencies](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L28) in the pyproject.toml.
39-
- If they're optional dependencies for additional functionality, then put them in the pyproject.toml file like were done for [flash-attn](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L44) or [aim](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L45).
40-
- If it's an additional dependency for development, then add it to the [dev](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L43) dependencies.
38+
- If they're required for fms-model-optimizer, then append them to the [dependencies](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L28) in the pyproject.toml.
39+
- If they're optional dependencies for additional functionality, then put them in the pyproject.toml file like were done for [flash-attn](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L44) or [aim](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L45).
40+
- If it's an additional dependency for development, then add it to the [dev](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L43) dependencies.
4141

4242
#### Code Review
4343

@@ -56,19 +56,19 @@ This section guides you through submitting a bug report. Following these guideli
5656

5757
#### How Do I Submit A (Good) Bug Report?
5858

59-
Bugs are tracked as [GitHub issues using the Bug Report template](https://github.com/foundation-model-stack/fms-model-optimization/issues/new?template=bug_report.md). Create an issue on that and provide the information suggested in the bug report issue template.
59+
Bugs are tracked as [GitHub issues using the Bug Report template](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new?template=bug_report.md). Create an issue on that and provide the information suggested in the bug report issue template.
6060

6161
### Suggesting Enhancements
6262

6363
This section guides you through submitting an enhancement suggestion, including completely new features, tools, and minor improvements to existing functionality. Following these guidelines helps maintainers and the community understand your suggestion ✏️ and find related suggestions 🔎
6464

6565
#### How Do I Submit A (Good) Enhancement Suggestion?
6666

67-
Enhancement suggestions are tracked as [GitHub issues using the Feature Request template](https://github.com/foundation-model-stack/fms-model-optimization/issues/new?template=feature_request.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
67+
Enhancement suggestions are tracked as [GitHub issues using the Feature Request template](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new?template=feature_request.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
6868

6969
#### How Do I Submit A (Good) Improvement Item?
7070

71-
Improvements to existing functionality are tracked as [GitHub issues using the User Story template](https://github.com/foundation-model-stack/fms-model-optimization/issues/new?template=user_story.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
71+
Improvements to existing functionality are tracked as [GitHub issues using the User Story template](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new?template=user_story.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
7272

7373
## Development
7474

@@ -94,7 +94,7 @@ make test
9494

9595
#### Formatting
9696

97-
FMS Model Optimization follows the python [pep8](https://peps.python.org/pep-0008/) coding style. The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.
97+
FMS Model Optimizer follows the python [pep8](https://peps.python.org/pep-0008/) coding style. The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.
9898

9999
We use [pre-commit](https://pre-commit.com/) to enforce coding style using [black](https://github.com/psf/black), [prettier](https://github.com/prettier/prettier) and [isort](https://pycqa.github.io/isort/).
100100

@@ -145,8 +145,8 @@ Running the command will create a single ZIP-format archive containing the libra
145145

146146
Unsure where to begin contributing? You can start by looking through these issues:
147147

148-
- Issues with the [`good first issue` label](https://github.com/foundation-model-stack/fms-model-optimization/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) - these should only require a few lines of code and are good targets if you're just starting contributing.
149-
- Issues with the [`help wanted` label](https://github.com/foundation-model-stack/fms-model-optimization/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) - these range from simple to more complex, but are generally things we want but can't get to in a short time frame.
148+
- Issues with the [`good first issue` label](https://github.com/foundation-model-stack/fms-model-optimizer/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) - these should only require a few lines of code and are good targets if you're just starting contributing.
149+
- Issues with the [`help wanted` label](https://github.com/foundation-model-stack/fms-model-optimizer/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) - these range from simple to more complex, but are generally things we want but can't get to in a short time frame.
150150

151151
<!-- ## Releasing (Maintainers only)
152152

README.md

Lines changed: 93 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1,93 @@
1-
# fms-model-optimization
1+
# FMS Model Optimizer
2+
3+
## Introduction
4+
5+
FMS Model Optimizer is a framework for developing reduced precision neural network models. Quantization techniques, such as [quantization-aware-training (QAT)](https://arxiv.org/abs/2407.11062), [post-training quantization (PTQ)](https://arxiv.org/abs/2102.05426), and several other optimization techniques on popular deep learning workloads are supported.
6+
7+
## Highlights
8+
9+
- **Python API to enable model quantization:** With addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
10+
- **Robust:** Verified for INT 8/4/2-bit quantization on Vision/Speech/NLP/Object Detection/LLM
11+
- **Flexible:** This package can analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. Users can also easily customize any of the settings through a JSON config file, and even bypass the Dynamo tracing if preferred.
12+
- **State-of-the-art INT and FP quantization techniques:** For weights and activations, such as SAWB+ and PACT+, comparable or better than other published works.
13+
- **Supports key compute-intensive operations:** Conv2d, Linear, LSTM, MM, BMM
14+
15+
## Supported Models
16+
17+
| | GPTQ | FP8 | PTQ | QAT |
18+
|---|------|-----|-----|-----|
19+
| Granite |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
20+
| Llama |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
21+
| Mixtral |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
22+
| BERT/Roberta |:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark: |
23+
24+
**Note**: Direct QAT on LLMs is not recommended
25+
26+
## Getting Started
27+
28+
### Requirements
29+
30+
1. **🐧 Linux system with Nvidia GPU (V100/A100/H100)**
31+
2. Python 3.10 or Python 3.11
32+
📋 Python 3.12 is currently not supported due to PyTorch Dynamo constraint
33+
2. CUDA >=12
34+
35+
*Optional packages based on optimization functionalities required:*
36+
37+
- **GPTQ** is a popular compression method for LLMs:
38+
- [auto_gptq](https://pypi.org/project/auto-gptq/) or build from [source](https://github.com/AutoGPTQ/AutoGPTQ)
39+
- If you want to experiment with **INT8** deployment in [QAT](./examples/QAT_INT8/) and [PTQ](./examples/PTQ_INT8/) examples:
40+
- Nvidia GPU with compute capability > 8.0 (A100 family or higher)
41+
- [Ninja](https://ninja-build.org/)
42+
- Clone the [CUTLASS](https://github.com/NVIDIA/cutlass) repository
43+
- `PyTorch 2.3.1` (as newer version will cause issue for the custom CUDA kernel used in these examples)
44+
- **FP8** is a reduced precision format like **INT8**:
45+
- Nvidia H100 family or higher
46+
- [llm-compressor](https://github.com/vllm-project/llm-compressor)
47+
- To enable compute graph plotting function (mostly for troubleshooting purpose):
48+
- [graphviz](https://graphviz.org/)
49+
- [pygraphviz](https://pygraphviz.github.io/)
50+
51+
> [!NOTE]
52+
> PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.
53+
54+
### Installation
55+
56+
We recommend using a Python virtual environment with Python 3.10+. Here is how to setup a virtual environment using [Python venv](https://docs.python.org/3/library/venv.html):
57+
58+
```
59+
python3 -m venv fms_mo_venv
60+
source fms_mo_venv/bin/activate
61+
```
62+
63+
> [!TIP]
64+
> If you use [pyenv](https://github.com/pyenv/pyenv), [Conda Miniforge](https://github.com/conda-forge/miniforge) or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not `venv`.
65+
66+
To install `fms_mo` package from source:
67+
68+
```shell
69+
python3 -m venv fms_mo_venv
70+
source fms_mo_venv/bin/activate
71+
git clone https://github.com/foundation-model-stack/fms-model-optimizer
72+
cd fms-model-optimizer
73+
pip install -e .
74+
```
75+
76+
### Try It Out!
77+
78+
To help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:
79+
80+
- Jupyter notebook tutorials (It is recommended to begin here):
81+
- [Quantization tutorial](tutorials/quantization_tutorial.ipynb):
82+
- Visualizes a random Gaussian tensor step-by-step along the quantization process
83+
- Build a quantizer and quantized convolution module based on this process
84+
- [Python script examples](./examples/)
85+
86+
## Docs
87+
88+
Dive into the [design document](./docs/fms_mo_design.md) to get a better understanding of the
89+
framework motivation and concepts.
90+
91+
## Contributing
92+
93+
Check out our [contributing guide](CONTRIBUTING.md) to learn how to contribute.

0 commit comments

Comments
 (0)