Skip to content

Commit e53751d

Browse files
Merge pull request #41 from hickeyma/bug/fix-notebook
fix: Update the quantization notebook tutorial
2 parents 3eac12a + 4d0d35d commit e53751d

File tree

2 files changed

+67
-37
lines changed

2 files changed

+67
-37
lines changed

tutorials/quantization_tutorial.ipynb

Lines changed: 67 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -15,17 +15,17 @@
1515
"id": "ac31e8e4",
1616
"metadata": {},
1717
"source": [
18-
"# Summary\n",
18+
"## Summary\n",
1919
"\n",
20-
"This notebook demonstrates the development of quantized models for 4-bit inferencing using our model optimization library `fms_mo`. `fms_mo` is a Python package for the development of reduced precision neural network models which provides state-of-the-art quantization techniques, together with automated tools to apply these techniques in Pytorch environments for Quantization-Aware-Training (QAT) of popular deep learning workloads. The resulting low-precision models can be deployed on GPUs or other accelerators.\n",
20+
"This notebook demonstrates the development of quantized models for 4-bit inferencing using our [model optimization library](https://pypi.org/project/fms-model-optimizer/). [FMS Model Optimizer](https://github.com/foundation-model-stack/fms-model-optimizer/) is a Python framework for the development of reduced precision neural network models which provides state-of-the-art quantization techniques, together with automated tools to apply these techniques in Pytorch environments for Quantization-Aware-Training (QAT) of popular deep learning workloads. The resulting low-precision models can be deployed on GPUs or other accelerators.\n",
2121
"\n",
2222
"We will demonstrate the following:\n",
2323
"- How input data can be quantized\n",
2424
"- How quantization can be applied to a convolution layer\n",
2525
"- How to automate the quantization process\n",
2626
"- How a quantized convolution layer performs on a typical image\n",
2727
"\n",
28-
"`fms_mo` can be applied across a variety of computer vision and natural language processing tasks to speed up inferencing, reduce power requirements, and reduce model size, while maintaining comparable model accuracy."
28+
"FMS Model Optimizer can be applied across a variety of computer vision and natural language processing tasks to speed up inferencing, reduce power requirements, and reduce model size, while maintaining comparable model accuracy."
2929
]
3030
},
3131
{
@@ -34,10 +34,10 @@
3434
"id": "bedbc959",
3535
"metadata": {},
3636
"source": [
37-
"# Table of Contents\n",
37+
"## Table of Contents\n",
3838
"\n",
3939
"* <a href=\"#fms_mo_quantizer\">Step 1. Quantize a normal data distribution</a>\n",
40-
" * <a href=\"#fms_mo_import\">Import code libraries and data</a>\n",
40+
" * <a href=\"#fms_mo_import\">Import code libraries</a>\n",
4141
" * <a href=\"#geninput\">Generate input data</a>\n",
4242
" * <a href=\"#clip\">Clip input data </a>\n",
4343
" * <a href=\"#quant\">Scale, shift, and quantize data</a>\n",
@@ -49,7 +49,7 @@
4949
" * <a href=\"#3p5\">Generate weights and bias</a>\n",
5050
" * <a href=\"#3p6\">Quantize weights</a>\n",
5151
" * <a href=\"#3p7\">Feed quantized data, weights, and bias into convolution layer</a>\n",
52-
"* <a href=\"#fms_mo\">Step 3. Use `fms_mo` to automate quantization</a>\n",
52+
"* <a href=\"#fms_mo\">Step 3. Use FMS Model Optimizer to automate quantization</a>\n",
5353
"* <a href=\"#fms_mo_visual\">Step 4. Try a convolution layer on a quantized image</a>\n",
5454
"* <a href=\"#fms_mo_conclusion\">Conclusion</a>\n",
5555
"* <a href=\"#fms_mo_learn\">Learn more</a>"
@@ -63,7 +63,7 @@
6363
"source": [
6464
"<a id=\"`fms_mo`_quantizer\"></a>\n",
6565
" \n",
66-
"# Step 1. Quantize a normal data distribution\n",
66+
"## Step 1. Quantize a normal data distribution\n",
6767
"\n",
6868
"In this section we show how quantization works using a randomly generated normal distribution of input data. We will feed the input data to a quantizer and show the quantized output.\n",
6969
"\n",
@@ -94,7 +94,18 @@
9494
"metadata": {},
9595
"source": [
9696
"<a id=\"`fms_mo`_import\"></a>\n",
97-
"# Import code libraries and data"
97+
"\n",
98+
"### Import code libraries"
99+
]
100+
},
101+
{
102+
"cell_type": "code",
103+
"execution_count": null,
104+
"id": "2af65016",
105+
"metadata": {},
106+
"outputs": [],
107+
"source": [
108+
"! pip install fms-model-optimizer"
98109
]
99110
},
100111
{
@@ -127,7 +138,8 @@
127138
"source": [
128139
"\n",
129140
"<a id=\"geninput\"></a>\n",
130-
"## Generate input data\n",
141+
"\n",
142+
"### Generate input data\n",
131143
"\n",
132144
"For simplicity, we generate a normal distribution of input data, with the mean set to 0 and standard deviation set to 1. A sample size of 1 million is chosen.\n",
133145
"\n",
@@ -164,7 +176,8 @@
164176
"metadata": {},
165177
"source": [
166178
"<a id=\"clip\"></a>\n",
167-
"## Clip input data\n",
179+
"\n",
180+
"### Clip input data\n",
168181
"\n",
169182
"Quantization of a tensor means that we can only use a limited number of distinct values (16 in the case of 4-bit precision) to represent all the numbers in the tensor. For 4-bit precision, we will need to:\n",
170183
"- determine the range we want to represent, i.e. $[ -\\infty, \\infty] => [ \\alpha_l , \\alpha_u]$, which means anything above $\\alpha_u$ or below $\\alpha_l$ will become $\\alpha_u$ and $\\alpha_l$ respectively.\n",
@@ -248,7 +261,8 @@
248261
"metadata": {},
249262
"source": [
250263
"<a id=\"quant\"></a>\n",
251-
"## Scale, shift, and quantize data\n",
264+
"\n",
265+
"### Scale, shift, and quantize data\n",
252266
"\n",
253267
"Here we choose to use 4-bit integer for this quantization, with zp = clip_min. \n",
254268
"\n",
@@ -306,7 +320,8 @@
306320
"metadata": {},
307321
"source": [
308322
"<a id=\"dequant\"></a>\n",
309-
"## Dequantize data\n",
323+
"\n",
324+
"### Dequantize data\n",
310325
"\n",
311326
"The last step is to dequantize the quantized data $y_{int}$ back to the range [-2.5, 2.5] so that it overlays the original distribution. <br>\n",
312327
"<font size=4>\n",
@@ -339,7 +354,7 @@
339354
"id": "e794bc37",
340355
"metadata": {},
341356
"source": [
342-
"# An example of symmetric vs asymmetric quantization "
357+
"### An example of symmetric vs asymmetric quantization "
343358
]
344359
},
345360
{
@@ -425,7 +440,8 @@
425440
"metadata": {},
426441
"source": [
427442
"<a id=\"conv\"></a>\n",
428-
"# Step 2. Quantize a convolution layer\n",
443+
"\n",
444+
"## Step 2. Quantize a convolution layer\n",
429445
"\n",
430446
"In this section, we show how to manually quantize a Convolution layer, i.e. quantizing the input data and weights, and then feed them into a convolution computation. \n",
431447
"\n",
@@ -441,7 +457,8 @@
441457
"metadata": {},
442458
"source": [
443459
"<a id=\"3p2\"></a>\n",
444-
"## Generate input data\n",
460+
"\n",
461+
"### Generate input data\n",
445462
"\n",
446463
"Similar to Step 1, the input data is a randomly generated normal distribution. We generate 1 input sample with 3 channels, 32 pixel width, and 32 pixel height."
447464
]
@@ -472,7 +489,8 @@
472489
"metadata": {},
473490
"source": [
474491
"<a id=\"3p3\"></a>\n",
475-
"## Quantize input data\n"
492+
"\n",
493+
"### Quantize input data\n"
476494
]
477495
},
478496
{
@@ -501,7 +519,8 @@
501519
"metadata": {},
502520
"source": [
503521
"<a id=\"3p4\"></a>\n",
504-
"## Create a single layer convolution network"
522+
"\n",
523+
"### Create a single layer convolution network"
505524
]
506525
},
507526
{
@@ -534,7 +553,8 @@
534553
"metadata": {},
535554
"source": [
536555
"<a id=\"3p5\"></a>\n",
537-
"## Generate weights and bias\n",
556+
"\n",
557+
"### Generate weights and bias\n",
538558
"\n",
539559
"To simulate the quantization of a pretrained model we set the weights manually to a normal distribution of values. Bias will be set to zeros because we don't plan on using bias."
540560
]
@@ -568,7 +588,8 @@
568588
"metadata": {},
569589
"source": [
570590
"<a id=\"3p6\"></a>\n",
571-
"## Quantize weights\n"
591+
"\n",
592+
"### Quantize weights\n"
572593
]
573594
},
574595
{
@@ -596,7 +617,8 @@
596617
"metadata": {},
597618
"source": [
598619
"<a id=\"3p7\"></a>\n",
599-
"## Feed quantized data, weights, and bias into convolution layer\n"
620+
"\n",
621+
"### Feed quantized data, weights, and bias into convolution layer\n"
600622
]
601623
},
602624
{
@@ -623,7 +645,8 @@
623645
"id": "e62182e6",
624646
"metadata": {},
625647
"source": [
626-
"## Now we plot four cases to determine how well quantization works with convolution:\n",
648+
"**Now we plot four cases to determine how well quantization works with convolution:**\n",
649+
"\n",
627650
"1. both input and weights are not quantized\n",
628651
"2. quantized weights with raw input\n",
629652
"3. raw weights with quantized input\n",
@@ -679,17 +702,18 @@
679702
"metadata": {},
680703
"source": [
681704
"<a id=\"fms_mo\"></a>\n",
682-
"# Step 3. Use `fms_mo` to automate quantization\n",
683705
"\n",
684-
"In this section we show how to reduce manual effort in the quantization process by using our model optimization package (`fms_mo`) library to automate the process.\n",
706+
"## Step 3. Use FMS Model Optimizer to automate quantization\n",
685707
"\n",
686-
"For simplicity we will use a 1-layer toy network as an example, but `fms_mo` can handle more complicated networks. \n",
708+
"In this section we show how to reduce manual effort in the quantization process by using our model optimization library to automate the process.\n",
709+
"\n",
710+
"For simplicity we will use a 1-layer toy network as an example, but FMS Model Optimizer can handle more complicated networks. \n",
687711
"\n",
688712
"As in Step 2, to simulate the quantization of a pretrained model we set the weights manually to a normal distribution of values. Bias will be set to zeros because we don't plan on using bias.\n",
689713
"\n",
690714
"We initialize the configuration (dictionary), manually modify the parameters of interest, then run a \"model prep\" to quantize the network automatically. The results will be identical to the output `y_quant` shown in Step 2.\n",
691715
"\n",
692-
"The parameters `nbits_w` and `nbits_a` will be used to control the precision of (most of) the modules identified by `fms_mo` that can be quantized."
716+
"The parameters `nbits_w` and `nbits_a` will be used to control the precision of (most of) the modules identified by FMS Model Optimizer that can be quantized."
693717
]
694718
},
695719
{
@@ -766,7 +790,8 @@
766790
"metadata": {},
767791
"source": [
768792
"<a id=\"`fms_mo`_visual\"></a>\n",
769-
"# Step 4. Try a convolution layer on a quantized image\n",
793+
"\n",
794+
"## Step 4. Try a convolution layer on a quantized image\n",
770795
"\n",
771796
"In this section we pass an image of a lion through a quantizer and convolution layer to observe the performance of the quantizer with convolution."
772797
]
@@ -778,7 +803,14 @@
778803
"metadata": {},
779804
"outputs": [],
780805
"source": [
781-
"img = Image.open(\"lion.png\")\n",
806+
"import os, wget\n",
807+
"IMG_FILE_NAME = 'lion.png'\n",
808+
"url = 'https://raw.githubusercontent.com/foundation-model-stack/fms-model-optimizer/main/tutorials/images/' + IMG_FILE_NAME\n",
809+
"\n",
810+
"if not os.path.isfile(IMG_FILE_NAME):\n",
811+
" wget.download(url, out=IMG_FILE_NAME)\n",
812+
"\n",
813+
"img = Image.open(IMG_FILE_NAME)\n",
782814
"img"
783815
]
784816
},
@@ -866,12 +898,14 @@
866898
"metadata": {},
867899
"source": [
868900
"<a id=\"`fms_mo`_conclusion\"></a>\n",
869-
"# Conclusion\n",
901+
"\n",
902+
"## Conclusion\n",
903+
"\n",
870904
"This notebook provided the following demonstrations:\n",
871905
"\n",
872906
"- In Step 1, we showed how quantization can be applied manually to a randomly generated normal distribution of input data.\n",
873907
"- In Step 2, we showed how to apply quantization to a convolution layer.\n",
874-
"- In Step 3, we showed how to automate the quantization process using `fms_mo`.\n",
908+
"- In Step 3, we showed how to automate the quantization process using FMS Model Optimizer.\n",
875909
"- In Step 4, we observed the performance of a quantized convolution layer on an image of a lion."
876910
]
877911
},
@@ -882,15 +916,11 @@
882916
"metadata": {},
883917
"source": [
884918
"<a id=\"`fms_mo`_learn\"></a>\n",
885-
"# Learn more \n",
886-
"Please see [example scripts](https://github.ibm.com/ai-chip-toolchain/sq1e/tree/fms_mo/examples/) for more practical use of `fms_mo`\n"
919+
"\n",
920+
"## Learn more \n",
921+
"\n",
922+
"Please see [example scripts](https://github.com/foundation-model-stack/fms-model-optimizer/tree/main/examples) for more practical use of FMS Model Optimizer.\n"
887923
]
888-
},
889-
{
890-
"cell_type": "markdown",
891-
"id": "0a409825",
892-
"metadata": {},
893-
"source": []
894924
}
895925
],
896926
"metadata": {

0 commit comments

Comments
 (0)