Skip to content

Commit 81ff41f

Browse files
committed
Add spell checker
This adds the pyspelling spell check automation tool. It is a wrapper around CLI of Aspell or Hunspell which are spell checker tools. The PR specifies that pyspelling uses Aspell as the spell checker tools can differ in output. Therefore by speficying Aspell, it will mean consistency. Closes #31 Signed-off-by: Martin Hickey <[email protected]>
1 parent e8bc88e commit 81ff41f

File tree

12 files changed

+225
-17
lines changed

12 files changed

+225
-17
lines changed

.github/workflows/spellcheck.yml

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
name: Spellcheck
2+
3+
on:
4+
pull_request:
5+
branches:
6+
- main
7+
- "release-**"
8+
paths:
9+
- '**.md'
10+
- 'tox.ini'
11+
- '.spellcheck*'
12+
- '.github/workflows/spellcheck.yml' # This workflow file
13+
14+
env:
15+
LC_ALL: en_US.UTF-8
16+
17+
defaults:
18+
run:
19+
shell: bash
20+
21+
permissions:
22+
contents: read
23+
24+
jobs:
25+
spellcheck:
26+
runs-on: ubuntu-latest
27+
steps:
28+
- name: "Harden Runner"
29+
uses: step-security/harden-runner@0080882f6c36860b6ba35c610c98ce87d4e2f26f # v2.10.2
30+
with:
31+
egress-policy: audit # TODO: change to 'egress-policy: block' after couple of runs
32+
33+
- name: Checkout Code
34+
uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
35+
with:
36+
fetch-depth: 0
37+
38+
- name: Install aspell
39+
run: |
40+
sudo sudo apt-get update
41+
sudo apt-get install -y aspell aspell-en
42+
43+
- name: Setup Python 3.11
44+
uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0
45+
with:
46+
python-version: 3.11
47+
cache: pip
48+
cache-dependency-path: |
49+
**/pyproject.toml
50+
51+
- name: Install tox dependencies
52+
run: python -m pip install --upgrade tox
53+
54+
- name: Run spellchecker
55+
run: python -m tox -e spellcheck

.gitignore

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,9 +34,12 @@ venv/
3434
# Build output
3535
/build/lib/
3636

37-
# generated by setuptools_scm
37+
# Generated by setuptools_scm
3838
/fms_mo/_version.py
3939

40-
#Generated by tests
40+
# Generated by tests
4141
qcfg.json
4242

43+
# Generated by spelling check
44+
dictionary.dic
45+

.spellcheck-en-custom.txt

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
activations
2+
ADR
3+
Args
4+
AutoGPTQ
5+
autoregressive
6+
backpropagation
7+
bmm
8+
BMM
9+
BRECQ
10+
CLI
11+
Conda
12+
config
13+
Conv
14+
CUDA
15+
CUDAGRAPH
16+
dataset
17+
datautils
18+
Deployable
19+
dequant
20+
dequantize
21+
dequantization
22+
dq
23+
DQ
24+
dev
25+
eval
26+
fms
27+
fp
28+
FP
29+
frac
30+
gptq
31+
GPTQ
32+
GPTQArgs
33+
graphviz
34+
GPTQ
35+
hyperparameters
36+
Inductor
37+
inferenced
38+
inferencing
39+
isort
40+
Jupyter
41+
Kubernetes
42+
KV
43+
kvcache
44+
len
45+
lfloor
46+
llm
47+
LLM
48+
lm
49+
lossy
50+
LSTM
51+
matmul
52+
matmuls
53+
maxperCh
54+
maxpertoken
55+
Miniforge
56+
mins
57+
Mixtral
58+
MSE
59+
msec
60+
natively
61+
nbatch
62+
nbits
63+
NLP
64+
Nouterloop
65+
Nvidia
66+
Nvidia's
67+
orchestrator
68+
param
69+
pre
70+
ptq
71+
PTQ
72+
py
73+
pyenv
74+
pylint
75+
pygraphviz
76+
pyproject
77+
pytest
78+
QAT
79+
QAT'ed
80+
quant
81+
quantized
82+
quantizer
83+
quantizers
84+
quantizes
85+
Quantizing
86+
QW
87+
rceil
88+
repo
89+
representable
90+
runtime
91+
Runtime
92+
SAWB
93+
sexualized
94+
SmoothQuant
95+
socio
96+
sparsification
97+
SQuAD
98+
straightforward
99+
tokenization
100+
tokenized
101+
Tokenized
102+
tokenizer
103+
Tokenizer
104+
toml
105+
Unquantized
106+
vals
107+
venv
108+
vllm
109+
xs
110+
zp
111+

.spellcheck.yml

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
matrix:
2+
- name: markdown
3+
aspell:
4+
lang: en
5+
d: en_US
6+
camel-case: true
7+
mode: markdown
8+
sources:
9+
- "**/*.md|!CODEOWNERS.md|!build/**|!.tox/**|!venv/**"
10+
dictionary:
11+
wordlists:
12+
- .spellcheck-en-custom.txt
13+
pipeline:
14+
- pyspelling.filters.context:
15+
context_visible_first: true
16+
escapes: '\\[\\`~]'
17+
delimiters:
18+
# Ignore multiline content between fences (fences can have 3 or more back ticks)
19+
# ```language
20+
# content
21+
# ```
22+
- open: '(?s)^(?P<open> *`{3,}).*?$'
23+
close: '^(?P=open)$'
24+
# Ignore text between inline back ticks
25+
- open: '(?P<open>`+)'
26+
close: '(?P=open)'

docs/fms_mo_design.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,9 +37,9 @@ The quantization process can be illustrated in the following plots:
3737

3838
### Quantization-aware training (QAT)
3939

40-
In order to accommodate the quantization errors, one straightfoward technique is to take quantization/dequantization into account during the training process, hence the name quantization-aware training [(QAT)](https://arxiv.org/pdf/1712.05877), as illustrated by Step 1 of the following figure. The training optimizer will then adjust the parameters of the model, e.g. weights, accordingly so that the resulting accuracy will be comparable to the original FP32 model.
40+
In order to accommodate the quantization errors, one straightforward technique is to take quantization/dequantization into account during the training process, hence the name quantization-aware training [(QAT)](https://arxiv.org/pdf/1712.05877), as illustrated by Step 1 of the following figure. The training optimizer will then adjust the parameters of the model, e.g. weights, accordingly so that the resulting accuracy will be comparable to the original FP32 model.
4141

42-
There are many other techniques, such as post-training quantization ([PTQ](https://arxiv.org/abs/2102.05426)), that can achieve similar outcome. Users will need to pick the proper method for their specific task based on model size, dataset size, resource available, and other consideraions.
42+
There are many other techniques, such as post-training quantization ([PTQ](https://arxiv.org/abs/2102.05426)), that can achieve similar outcome. Users will need to pick the proper method for their specific task based on model size, dataset size, resource available, and other considerations.
4343

4444
![Quantize and deploy](./images/layer_swapping.png)
4545

@@ -91,7 +91,7 @@ For generative LLMs, very often the bottleneck of inference is no longer the com
9191

9292
The key architectural components are:
9393
1. **`model_analyzer`**, which traces the model and identifies the layers/operations to be quantized or to be skipped. It will try to recognize several well-known structures and configure based on best practice. However, users could also choose to bypass the tracing and manually specify the desired configuration with full flexibility.
94-
2. **A set of `wrappers`**. As shown in the figure above, the preparation for QAT and deployment can be viewed as a "layer swapping" process. One could identify a desired `torch.nn.Linear` layer to be quantized, e.g. Linear1 in the plot, and replace it with a `QLinear` wrapper, which contains a set of `quantizers` that can quantize/dequantize the inputs and weights before the Linear operation. Similarly, the `QLinear` wrapper for deployment stage will quantize the inputs, perform INT matmul, then dequantize the outcome. It is mathmatically equivalanet to the wrapper used in QAT, but it can utilize the INT compute engine.
94+
2. **A set of `wrappers`**. As shown in the figure above, the preparation for QAT and deployment can be viewed as a "layer swapping" process. One could identify a desired `torch.nn.Linear` layer to be quantized, e.g. Linear1 in the plot, and replace it with a `QLinear` wrapper, which contains a set of `quantizers` that can quantize/dequantize the inputs and weights before the Linear operation. Similarly, the `QLinear` wrapper for deployment stage will quantize the inputs, perform INT matmul, then dequantize the outcome. It is mathematically equivalent to the wrapper used in QAT, but it can utilize the INT compute engine.
9595

9696

9797
### Interfaces

examples/DQ_SQ/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Here, we provide an example of direct quantization. In this case, we demonstrate
66
## Requirements
77
- [FMS Model Optimizer requirements](../../README.md#requirements)
88

9-
## Quickstart
9+
## QuickStart
1010

1111
**1. Prepare Data** for calibration process by converting into its tokenized form. An example of tokenization using `LLAMA-3-8B`'s tokenizer is below.
1212

@@ -55,7 +55,7 @@ The perplexity of the INT8 and FP8 quantized models on the `wikitext` dataset is
5555
|`Llama3-8b`|INT8 |maxpertoken |maxperCh |yes |yes |6.21 |
5656
| |FP8 |fp8_e4m3_scale|fp8_e4m3_scale|yes |yes |6.19 |
5757

58-
## Code Walkthrough
58+
## Code Walk-through
5959

6060
**1. KV caching**
6161

examples/FP8_QUANT/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ This is an example of mature FP8, which under the hood leverages some functional
2424
> [!CAUTION]
2525
> `vllm` may require a specific PyTorch version that is different from what is installed in your current environment and it may force install without asking. Make sure it's compatible with your settings or create a new environment if needed.
2626
27-
## Quickstart
27+
## QuickStart
2828
This end-to-end example utilizes the common set of interfaces provided by `fms_mo` for easily applying multiple quantization algorithms with FP8 being the focus of this example. The steps involved are:
2929
3030
1. **FP8 quantization through CLI**. Other arguments could be found here [FP8Args](../../fms_mo/training_args.py#L84).
@@ -88,7 +88,7 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
8888
| | |none | 5|perplexity||3.8915|± |0.3727|
8989
```
9090

91-
## Code Walkthrough
91+
## Code Walk-through
9292

9393
1. The non-quantized pre-trained model is loaded using model wrapper from `llm-compressor`. The corresponding tokenizer is constructed as well.
9494

examples/GPTQ/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ For generative LLMs, very often the bottleneck of inference is no longer the com
1313
```
1414
1515
16-
## Quickstart
16+
## QuickStart
1717
This end-to-end example utilizes the common set of interfaces provided by `fms_mo` for easily applying multiple quantization algorithms with GPTQ being the focus of this example. The steps involved are:
1818
1919
1. **Convert the dataset into its tokenized form.** An example of tokenization using `LLAMA-3-8B`'s tokenizer is below.
@@ -109,7 +109,7 @@ This end-to-end example utilizes the common set of interfaces provided by `fms_m
109109
> There is some randomness in generating the model and data, the resulting accuracy may vary ~$\pm$ 0.05.
110110
111111

112-
## Code Walkthrough
112+
## Code Walk-through
113113

114114
1. Command line arguments will be used to create a GPTQ quantization config. Information about the required arguments and their default values can be found [here](../../fms_mo/training_args.py)
115115

examples/PTQ_INT8/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ This is an example of [block sequential PTQ](https://arxiv.org/abs/2102.05426).
1515
- `PyTorch 2.3.1` (as newer version will cause issue for the custom CUDA kernel)
1616

1717

18-
## Quickstart
18+
## QuickStart
1919

2020
> [!NOTE]
2121
> This example is based on the HuggingFace [Transformers Question answering example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering). Unlike our [QAT example](../QAT_INT8/README.md), which utilizes the training loop of the original code, our PTQ function will control the loop and the program will end before entering the original loop. Make sure the model doesn't get "tuned" twice!
@@ -106,7 +106,7 @@ The table below shows results obtained for the conditions listed:
106106
`Nouterloop` and `ptq_nbatch` are PTQ specific hyper-parameter.
107107
Above experiments were run on v100 machine.
108108

109-
## Code Walkthrough
109+
## Code Walk-through
110110

111111
In this section, we will deep dive into what happens during the example steps.
112112

examples/QAT_INT8/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ In the following example, we will first create a fine-tuned FP16 model, and then
2323
- `PyTorch 2.3.1` (as newer version will cause issue for the custom CUDA kernel)
2424

2525

26-
## Quickstart
26+
## QuickStart
2727

2828
> [!NOTE]
2929
> This example is based on the HuggingFace [Transformers Question answering example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/question-answering).
@@ -101,7 +101,7 @@ For comparison purposes, here are some of the results we found during testing wh
101101
> [!NOTE]
102102
> Accuracy could vary ~ +-0.2 from run to run.
103103
104-
|model|batchsize|torch.compile|accuracy(F1)|inference speed (msec)|
104+
|model|batch size|torch.compile|accuracy(F1)|inference speed (msec)|
105105
|----|--:|---------:|----:|------------:|
106106
|fp16|128|eager |88.21 (as fine-tuned) |126.38|
107107
| |128|Inductor | |71.59|
@@ -116,7 +116,7 @@ For comparison purposes, here are some of the results we found during testing wh
116116

117117
<sup>3</sup> `CUDAGRAPH` is the most effective way to minimize job launching overheads and can achieve ~2X end-to-end speed-up in this case. However, there seem to be bugs associated with this option at the moment. Further investigation is still on-going.
118118

119-
## Code Walkthrough
119+
## Code Walk-through
120120

121121
In this section, we will deep dive into what happens during the example steps.
122122

0 commit comments

Comments
 (0)