foundation-model-stack
diff --git a/‎.isort.cfg‎
Lines changed: 1 addition & 1 deletion b/‎.isort.cfg‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎.pylintrc‎
Lines changed: 10 additions & 2 deletions b/‎.pylintrc‎
Lines changed: 10 additions & 2 deletions
diff --git a/‎CODEOWNERS‎
Lines changed: 1 addition & 1 deletion b/‎CODEOWNERS‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 10 additions & 10 deletions b/‎CONTRIBUTING.md‎
Lines changed: 10 additions & 10 deletions
diff --git a/‎README.md‎
Lines changed: 93 additions & 1 deletion b/‎README.md‎
Lines changed: 93 additions & 1 deletion
@@ -7,4 +7,4 @@ import_heading_thirdparty=Third Party
 import_heading_firstparty=First Party
 import_heading_localfolder=Local
 known_firstparty=
-known_localfolder=fms_mo
+known_localfolder=fms_mo,tests
@@ -63,7 +63,13 @@ ignore-patterns=^\.#
 # (useful for modules/projects where namespaces are manipulated during runtime
 # and thus existing member attributes cannot be deduced by static analysis). It
 # supports qualified module names, as well as Unix pattern matching.
-ignored-modules=
+ignored-modules=auto_gptq,                
+                exllama_kernels,
+                exllamav2_kernels,
+                llmcompressor,
+                cutlass_mm,
+                pygraphviz,
+                matplotlib
 
 # Python code to execute, usually for sys.path manipulation such as
 # pygtk.require().
@@ -81,7 +87,7 @@ limit-inference-results=100
 
 # List of plugins (as comma separated values of python module names) to load,
 # usually to register additional checkers.
-load-plugins=pylint_pytest
+load-plugins=
 
 # Pickle collected data for later comparisons.
 persistent=yes
@@ -435,10 +441,12 @@ disable=raw-checker-failed,
         too-many-branches,
         too-many-statements,
         too-many-positional-arguments,
+        too-many-lines,
         cyclic-import,
         too-few-public-methods,
         protected-access,
         fixme,
+        logging-fstring-interpolation,
         logging-format-interpolation,
         logging-too-many-args,
         attribute-defined-outside-init,
 
@@ -1,6 +1,6 @@
 #####################################################
 #
-# List of approvers for fms-model-optimization repository
+# List of approvers for fms-model-optimizer repository
 #
 #####################################################
 #
 
@@ -22,7 +22,7 @@ Help on open source projects is always welcome and there is always something tha
 
 For any contributions that need design changes/API changes, reach out to maintainers to check if an Architectural Design Record would be beneficial. Reason for ADR: teams agree on the design, to avoid back and forth after writing code. An ADR gives context on the code being written. If requested for an ADR, make a contribution [using the template](./architecture_records/template.md).
 
-When contributing, it's useful to start by looking at [issues](https://github.com/foundation-model-stack/fms-model-optimization/issues). After picking up an issue, writing code, or updating a document, make a pull request and your work will be reviewed and merged. If you're adding a new feature or find a bug, it's best to [write an issue](https://github.com/foundation-model-stack/fms-model-optimization/issues/new) first to discuss it with maintainers. 
+When contributing, it's useful to start by looking at [issues](https://github.com/foundation-model-stack/fms-model-optimizer/issues). After picking up an issue, writing code, or updating a document, make a pull request and your work will be reviewed and merged. If you're adding a new feature or find a bug, it's best to [write an issue](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new) first to discuss it with maintainers. 
 
 To contribute to this repo, you'll use the Fork and Pull model common in many open source repositories. For details on this process, check out [The GitHub Workflow
 Guide](https://github.com/kubernetes/community/blob/master/contributors/guide/github-workflow.md)
@@ -35,9 +35,9 @@ Before sending pull requests, make sure your changes pass formatting, linting an
 #### Dependencies
 If additional new Python module dependencies are required, think about where to put them:
 
-- If they're required for fms-model-optimization, then append them to the [dependencies](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L28) in the pyproject.toml.
-- If they're optional dependencies for additional functionality, then put them in the pyproject.toml file like were done for [flash-attn](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L44) or [aim](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L45).
-- If it's an additional dependency for development, then add it to the [dev](https://github.com/foundation-model-stack/fms-model-optimization/blob/main/pyproject.toml#L43) dependencies.
+- If they're required for fms-model-optimizer, then append them to the [dependencies](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L28) in the pyproject.toml.
+- If they're optional dependencies for additional functionality, then put them in the pyproject.toml file like were done for [flash-attn](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L44) or [aim](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L45).
+- If it's an additional dependency for development, then add it to the [dev](https://github.com/foundation-model-stack/fms-model-optimizer/blob/main/pyproject.toml#L43) dependencies.
 
 #### Code Review
 
@@ -56,19 +56,19 @@ This section guides you through submitting a bug report. Following these guideli
 
 #### How Do I Submit A (Good) Bug Report?
 
-Bugs are tracked as [GitHub issues using the Bug Report template](https://github.com/foundation-model-stack/fms-model-optimization/issues/new?template=bug_report.md). Create an issue on that and provide the information suggested in the bug report issue template. 
+Bugs are tracked as [GitHub issues using the Bug Report template](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new?template=bug_report.md). Create an issue on that and provide the information suggested in the bug report issue template. 
 
 ### Suggesting Enhancements
 
 This section guides you through submitting an enhancement suggestion, including completely new features, tools, and minor improvements to existing functionality. Following these guidelines helps maintainers and the community understand your suggestion ✏️ and find related suggestions 🔎
 
 #### How Do I Submit A (Good) Enhancement Suggestion?
 
-Enhancement suggestions are tracked as [GitHub issues using the Feature Request template](https://github.com/foundation-model-stack/fms-model-optimization/issues/new?template=feature_request.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
+Enhancement suggestions are tracked as [GitHub issues using the Feature Request template](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new?template=feature_request.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
 
 #### How Do I Submit A (Good) Improvement Item?
 
-Improvements to existing functionality are tracked as [GitHub issues using the User Story template](https://github.com/foundation-model-stack/fms-model-optimization/issues/new?template=user_story.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
+Improvements to existing functionality are tracked as [GitHub issues using the User Story template](https://github.com/foundation-model-stack/fms-model-optimizer/issues/new?template=user_story.md). Create an issue and provide the information suggested in the feature requests or user story issue template.
 
 ## Development
 
@@ -94,7 +94,7 @@ make test
 
 #### Formatting
 
-FMS Model Optimization follows the python [pep8](https://peps.python.org/pep-0008/) coding style. The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.
+FMS Model Optimizer follows the python [pep8](https://peps.python.org/pep-0008/) coding style. The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.
 
 We use [pre-commit](https://pre-commit.com/) to enforce coding style using [black](https://github.com/psf/black), [prettier](https://github.com/prettier/prettier) and [isort](https://pycqa.github.io/isort/).
 
@@ -145,8 +145,8 @@ Running the command will create a single ZIP-format archive containing the libra
 
 Unsure where to begin contributing? You can start by looking through these issues:
 
-- Issues with the [`good first issue` label](https://github.com/foundation-model-stack/fms-model-optimization/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) - these should only require a few lines of code and are good targets if you're just starting contributing.
-- Issues with the [`help wanted` label](https://github.com/foundation-model-stack/fms-model-optimization/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) - these range from simple to more complex, but are generally things we want but can't get to in a short time frame.
+- Issues with the [`good first issue` label](https://github.com/foundation-model-stack/fms-model-optimizer/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22) - these should only require a few lines of code and are good targets if you're just starting contributing.
+- Issues with the [`help wanted` label](https://github.com/foundation-model-stack/fms-model-optimizer/issues?q=is%3Aissue+is%3Aopen+label%3A%22help+wanted%22) - these range from simple to more complex, but are generally things we want but can't get to in a short time frame.
 
 <!-- ## Releasing (Maintainers only)
 
 
@@ -1 +1,93 @@
-# fms-model-optimization
+# FMS Model Optimizer
+
+## Introduction
+
+FMS Model Optimizer is a framework for developing reduced precision neural network models. Quantization techniques, such as [quantization-aware-training (QAT)](https://arxiv.org/abs/2407.11062), [post-training quantization (PTQ)](https://arxiv.org/abs/2102.05426), and several other optimization techniques on popular deep learning workloads are supported.
+
+## Highlights
+
+- **Python API to enable model quantization:** With addition of a few lines of codes, module-level and/or function-level operations replacement will be performed.
+- **Robust:** Verified for INT 8/4/2-bit quantization on Vision/Speech/NLP/Object Detection/LLM
+- **Flexible:** This package can analyze the network using PyTorch Dynamo, apply best practices, such as clip_val initialization, layer-level precision setting, optimizer param group setting, etc. Users can also easily customize any of the settings through a JSON config file, and even bypass the Dynamo tracing if preferred.
+- **State-of-the-art INT and FP quantization techniques:** For weights and activations, such as SAWB+ and PACT+, comparable or better than other published works.
+- **Supports key compute-intensive operations:** Conv2d, Linear, LSTM, MM, BMM
+
+## Supported Models
+
+| | GPTQ | FP8 | PTQ | QAT |
+|---|------|-----|-----|-----|
+| Granite      |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
+| Llama        |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
+| Mixtral      |:white_check_mark:|:white_check_mark:|:white_check_mark:|:black_square_button:|
+| BERT/Roberta |:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:   |
+
+**Note**: Direct QAT on LLMs is not recommended
+
+## Getting Started
+
+### Requirements
+
+1. **🐧 Linux system with Nvidia GPU (V100/A100/H100)**
+2. Python 3.10 or Python 3.11
+    📋 Python 3.12 is currently not supported due to PyTorch Dynamo constraint
+2. CUDA >=12
+
+*Optional packages based on optimization functionalities required:*
+
+- **GPTQ** is a popular compression method for LLMs: 
+    - [auto_gptq](https://pypi.org/project/auto-gptq/) or build from [source](https://github.com/AutoGPTQ/AutoGPTQ)
+- If you want to experiment with **INT8** deployment in [QAT](./examples/QAT_INT8/) and [PTQ](./examples/PTQ_INT8/) examples:
+    - Nvidia GPU with compute capability > 8.0 (A100 family or higher)
+    - [Ninja](https://ninja-build.org/)
+    - Clone the [CUTLASS](https://github.com/NVIDIA/cutlass) repository
+    - `PyTorch 2.3.1` (as newer version will cause issue for the custom CUDA kernel used in these examples)
+- **FP8** is a reduced precision format like **INT8**:
+    - Nvidia H100 family or higher
+    - [llm-compressor](https://github.com/vllm-project/llm-compressor)
+- To enable compute graph plotting function (mostly for troubleshooting purpose):
+    - [graphviz](https://graphviz.org/)
+    - [pygraphviz](https://pygraphviz.github.io/)
+
+> [!NOTE]
+> PyTorch version should be < 2.4 if you would like to experiment deployment with external INT8 kernel.
+
+### Installation
+
+We recommend using a Python virtual environment with Python 3.10+. Here is how to setup a virtual environment using [Python venv](https://docs.python.org/3/library/venv.html):
+
+```
+python3 -m venv fms_mo_venv
+source fms_mo_venv/bin/activate
+```
+
+> [!TIP]
+> If you use [pyenv](https://github.com/pyenv/pyenv), [Conda Miniforge](https://github.com/conda-forge/miniforge) or other such tools for Python version management, create the virtual environment with that tool instead of venv. Otherwise, you may have issues with installed packages not being found as they are linked to your Python version management tool and not `venv`.
+
+To install `fms_mo` package from source:
+
+```shell
+python3 -m venv fms_mo_venv
+source fms_mo_venv/bin/activate
+git clone https://github.com/foundation-model-stack/fms-model-optimizer
+cd fms-model-optimizer
+pip install -e .
+```
+
+### Try It Out!
+
+To help you get up and running as quickly as possible with the FMS Model Optimizer framework, check out the following resources which demonstrate how to use the framework with different quantization techniques:
+
+ - Jupyter notebook tutorials (It is recommended to begin here):
+    - [Quantization tutorial](tutorials/quantization_tutorial.ipynb):
+        - Visualizes a random Gaussian tensor step-by-step along the quantization process
+        - Build a quantizer and quantized convolution module based on this process
+- [Python script examples](./examples/)
+
+## Docs
+
+Dive into the [design document](./docs/fms_mo_design.md) to get a better understanding of the
+framework motivation and concepts.
+
+## Contributing
+
+Check out our [contributing guide](CONTRIBUTING.md) to learn how to contribute.
Original file line number	Diff line number	Diff line change
`@@ -1,6 +1,6 @@`
`1`	`1`	`#####################################################`
`2`	`2`	`#`
`3`		`-# List of approvers for fms-model-optimization repository`
	`3`	`+# List of approvers for fms-model-optimizer repository`
`4`	`4`	`#`
`5`	`5`	`#####################################################`
`6`	`6`	`#`