|
15 | 15 | "id": "ac31e8e4", |
16 | 16 | "metadata": {}, |
17 | 17 | "source": [ |
18 | | - "# Summary\n", |
| 18 | + "## Summary\n", |
19 | 19 | "\n", |
20 | | - "This notebook demonstrates the development of quantized models for 4-bit inferencing using our model optimization library `fms_mo`. `fms_mo` is a Python package for the development of reduced precision neural network models which provides state-of-the-art quantization techniques, together with automated tools to apply these techniques in Pytorch environments for Quantization-Aware-Training (QAT) of popular deep learning workloads. The resulting low-precision models can be deployed on GPUs or other accelerators.\n", |
| 20 | + "This notebook demonstrates the development of quantized models for 4-bit inferencing using our [model optimization library](https://pypi.org/project/fms-model-optimizer/). [FMS Model Optimizer](https://github.com/foundation-model-stack/fms-model-optimizer/) is a Python framework for the development of reduced precision neural network models which provides state-of-the-art quantization techniques, together with automated tools to apply these techniques in Pytorch environments for Quantization-Aware-Training (QAT) of popular deep learning workloads. The resulting low-precision models can be deployed on GPUs or other accelerators.\n", |
21 | 21 | "\n", |
22 | 22 | "We will demonstrate the following:\n", |
23 | 23 | "- How input data can be quantized\n", |
24 | 24 | "- How quantization can be applied to a convolution layer\n", |
25 | 25 | "- How to automate the quantization process\n", |
26 | 26 | "- How a quantized convolution layer performs on a typical image\n", |
27 | 27 | "\n", |
28 | | - "`fms_mo` can be applied across a variety of computer vision and natural language processing tasks to speed up inferencing, reduce power requirements, and reduce model size, while maintaining comparable model accuracy." |
| 28 | + "FMS Model Optimizer can be applied across a variety of computer vision and natural language processing tasks to speed up inferencing, reduce power requirements, and reduce model size, while maintaining comparable model accuracy." |
29 | 29 | ] |
30 | 30 | }, |
31 | 31 | { |
|
34 | 34 | "id": "bedbc959", |
35 | 35 | "metadata": {}, |
36 | 36 | "source": [ |
37 | | - "# Table of Contents\n", |
| 37 | + "## Table of Contents\n", |
38 | 38 | "\n", |
39 | 39 | "* <a href=\"#fms_mo_quantizer\">Step 1. Quantize a normal data distribution</a>\n", |
40 | | - " * <a href=\"#fms_mo_import\">Import code libraries and data</a>\n", |
| 40 | + " * <a href=\"#fms_mo_import\">Import code libraries</a>\n", |
41 | 41 | " * <a href=\"#geninput\">Generate input data</a>\n", |
42 | 42 | " * <a href=\"#clip\">Clip input data </a>\n", |
43 | 43 | " * <a href=\"#quant\">Scale, shift, and quantize data</a>\n", |
|
49 | 49 | " * <a href=\"#3p5\">Generate weights and bias</a>\n", |
50 | 50 | " * <a href=\"#3p6\">Quantize weights</a>\n", |
51 | 51 | " * <a href=\"#3p7\">Feed quantized data, weights, and bias into convolution layer</a>\n", |
52 | | - "* <a href=\"#fms_mo\">Step 3. Use `fms_mo` to automate quantization</a>\n", |
| 52 | + "* <a href=\"#fms_mo\">Step 3. Use FMS Model Optimizer to automate quantization</a>\n", |
53 | 53 | "* <a href=\"#fms_mo_visual\">Step 4. Try a convolution layer on a quantized image</a>\n", |
54 | 54 | "* <a href=\"#fms_mo_conclusion\">Conclusion</a>\n", |
55 | 55 | "* <a href=\"#fms_mo_learn\">Learn more</a>" |
|
63 | 63 | "source": [ |
64 | 64 | "<a id=\"`fms_mo`_quantizer\"></a>\n", |
65 | 65 | " \n", |
66 | | - "# Step 1. Quantize a normal data distribution\n", |
| 66 | + "## Step 1. Quantize a normal data distribution\n", |
67 | 67 | "\n", |
68 | 68 | "In this section we show how quantization works using a randomly generated normal distribution of input data. We will feed the input data to a quantizer and show the quantized output.\n", |
69 | 69 | "\n", |
|
94 | 94 | "metadata": {}, |
95 | 95 | "source": [ |
96 | 96 | "<a id=\"`fms_mo`_import\"></a>\n", |
97 | | - "# Import code libraries and data" |
| 97 | + "\n", |
| 98 | + "### Import code libraries" |
| 99 | + ] |
| 100 | + }, |
| 101 | + { |
| 102 | + "cell_type": "code", |
| 103 | + "execution_count": null, |
| 104 | + "id": "2af65016", |
| 105 | + "metadata": {}, |
| 106 | + "outputs": [], |
| 107 | + "source": [ |
| 108 | + "! pip install fms-model-optimizer" |
98 | 109 | ] |
99 | 110 | }, |
100 | 111 | { |
|
127 | 138 | "source": [ |
128 | 139 | "\n", |
129 | 140 | "<a id=\"geninput\"></a>\n", |
130 | | - "## Generate input data\n", |
| 141 | + "\n", |
| 142 | + "### Generate input data\n", |
131 | 143 | "\n", |
132 | 144 | "For simplicity, we generate a normal distribution of input data, with the mean set to 0 and standard deviation set to 1. A sample size of 1 million is chosen.\n", |
133 | 145 | "\n", |
|
164 | 176 | "metadata": {}, |
165 | 177 | "source": [ |
166 | 178 | "<a id=\"clip\"></a>\n", |
167 | | - "## Clip input data\n", |
| 179 | + "\n", |
| 180 | + "### Clip input data\n", |
168 | 181 | "\n", |
169 | 182 | "Quantization of a tensor means that we can only use a limited number of distinct values (16 in the case of 4-bit precision) to represent all the numbers in the tensor. For 4-bit precision, we will need to:\n", |
170 | 183 | "- determine the range we want to represent, i.e. $[ -\\infty, \\infty] => [ \\alpha_l , \\alpha_u]$, which means anything above $\\alpha_u$ or below $\\alpha_l$ will become $\\alpha_u$ and $\\alpha_l$ respectively.\n", |
|
248 | 261 | "metadata": {}, |
249 | 262 | "source": [ |
250 | 263 | "<a id=\"quant\"></a>\n", |
251 | | - "## Scale, shift, and quantize data\n", |
| 264 | + "\n", |
| 265 | + "### Scale, shift, and quantize data\n", |
252 | 266 | "\n", |
253 | 267 | "Here we choose to use 4-bit integer for this quantization, with zp = clip_min. \n", |
254 | 268 | "\n", |
|
306 | 320 | "metadata": {}, |
307 | 321 | "source": [ |
308 | 322 | "<a id=\"dequant\"></a>\n", |
309 | | - "## Dequantize data\n", |
| 323 | + "\n", |
| 324 | + "### Dequantize data\n", |
310 | 325 | "\n", |
311 | 326 | "The last step is to dequantize the quantized data $y_{int}$ back to the range [-2.5, 2.5] so that it overlays the original distribution. <br>\n", |
312 | 327 | "<font size=4>\n", |
|
339 | 354 | "id": "e794bc37", |
340 | 355 | "metadata": {}, |
341 | 356 | "source": [ |
342 | | - "# An example of symmetric vs asymmetric quantization " |
| 357 | + "### An example of symmetric vs asymmetric quantization " |
343 | 358 | ] |
344 | 359 | }, |
345 | 360 | { |
|
425 | 440 | "metadata": {}, |
426 | 441 | "source": [ |
427 | 442 | "<a id=\"conv\"></a>\n", |
428 | | - "# Step 2. Quantize a convolution layer\n", |
| 443 | + "\n", |
| 444 | + "## Step 2. Quantize a convolution layer\n", |
429 | 445 | "\n", |
430 | 446 | "In this section, we show how to manually quantize a Convolution layer, i.e. quantizing the input data and weights, and then feed them into a convolution computation. \n", |
431 | 447 | "\n", |
|
441 | 457 | "metadata": {}, |
442 | 458 | "source": [ |
443 | 459 | "<a id=\"3p2\"></a>\n", |
444 | | - "## Generate input data\n", |
| 460 | + "\n", |
| 461 | + "### Generate input data\n", |
445 | 462 | "\n", |
446 | 463 | "Similar to Step 1, the input data is a randomly generated normal distribution. We generate 1 input sample with 3 channels, 32 pixel width, and 32 pixel height." |
447 | 464 | ] |
|
472 | 489 | "metadata": {}, |
473 | 490 | "source": [ |
474 | 491 | "<a id=\"3p3\"></a>\n", |
475 | | - "## Quantize input data\n" |
| 492 | + "\n", |
| 493 | + "### Quantize input data\n" |
476 | 494 | ] |
477 | 495 | }, |
478 | 496 | { |
|
501 | 519 | "metadata": {}, |
502 | 520 | "source": [ |
503 | 521 | "<a id=\"3p4\"></a>\n", |
504 | | - "## Create a single layer convolution network" |
| 522 | + "\n", |
| 523 | + "### Create a single layer convolution network" |
505 | 524 | ] |
506 | 525 | }, |
507 | 526 | { |
|
534 | 553 | "metadata": {}, |
535 | 554 | "source": [ |
536 | 555 | "<a id=\"3p5\"></a>\n", |
537 | | - "## Generate weights and bias\n", |
| 556 | + "\n", |
| 557 | + "### Generate weights and bias\n", |
538 | 558 | "\n", |
539 | 559 | "To simulate the quantization of a pretrained model we set the weights manually to a normal distribution of values. Bias will be set to zeros because we don't plan on using bias." |
540 | 560 | ] |
|
568 | 588 | "metadata": {}, |
569 | 589 | "source": [ |
570 | 590 | "<a id=\"3p6\"></a>\n", |
571 | | - "## Quantize weights\n" |
| 591 | + "\n", |
| 592 | + "### Quantize weights\n" |
572 | 593 | ] |
573 | 594 | }, |
574 | 595 | { |
|
596 | 617 | "metadata": {}, |
597 | 618 | "source": [ |
598 | 619 | "<a id=\"3p7\"></a>\n", |
599 | | - "## Feed quantized data, weights, and bias into convolution layer\n" |
| 620 | + "\n", |
| 621 | + "### Feed quantized data, weights, and bias into convolution layer\n" |
600 | 622 | ] |
601 | 623 | }, |
602 | 624 | { |
|
623 | 645 | "id": "e62182e6", |
624 | 646 | "metadata": {}, |
625 | 647 | "source": [ |
626 | | - "## Now we plot four cases to determine how well quantization works with convolution:\n", |
| 648 | + "**Now we plot four cases to determine how well quantization works with convolution:**\n", |
| 649 | + "\n", |
627 | 650 | "1. both input and weights are not quantized\n", |
628 | 651 | "2. quantized weights with raw input\n", |
629 | 652 | "3. raw weights with quantized input\n", |
|
679 | 702 | "metadata": {}, |
680 | 703 | "source": [ |
681 | 704 | "<a id=\"fms_mo\"></a>\n", |
682 | | - "# Step 3. Use `fms_mo` to automate quantization\n", |
683 | 705 | "\n", |
684 | | - "In this section we show how to reduce manual effort in the quantization process by using our model optimization package (`fms_mo`) library to automate the process.\n", |
| 706 | + "## Step 3. Use FMS Model Optimizer to automate quantization\n", |
685 | 707 | "\n", |
686 | | - "For simplicity we will use a 1-layer toy network as an example, but `fms_mo` can handle more complicated networks. \n", |
| 708 | + "In this section we show how to reduce manual effort in the quantization process by using our model optimization library to automate the process.\n", |
| 709 | + "\n", |
| 710 | + "For simplicity we will use a 1-layer toy network as an example, but FMS Model Optimizer can handle more complicated networks. \n", |
687 | 711 | "\n", |
688 | 712 | "As in Step 2, to simulate the quantization of a pretrained model we set the weights manually to a normal distribution of values. Bias will be set to zeros because we don't plan on using bias.\n", |
689 | 713 | "\n", |
690 | 714 | "We initialize the configuration (dictionary), manually modify the parameters of interest, then run a \"model prep\" to quantize the network automatically. The results will be identical to the output `y_quant` shown in Step 2.\n", |
691 | 715 | "\n", |
692 | | - "The parameters `nbits_w` and `nbits_a` will be used to control the precision of (most of) the modules identified by `fms_mo` that can be quantized." |
| 716 | + "The parameters `nbits_w` and `nbits_a` will be used to control the precision of (most of) the modules identified by FMS Model Optimizer that can be quantized." |
693 | 717 | ] |
694 | 718 | }, |
695 | 719 | { |
|
766 | 790 | "metadata": {}, |
767 | 791 | "source": [ |
768 | 792 | "<a id=\"`fms_mo`_visual\"></a>\n", |
769 | | - "# Step 4. Try a convolution layer on a quantized image\n", |
| 793 | + "\n", |
| 794 | + "## Step 4. Try a convolution layer on a quantized image\n", |
770 | 795 | "\n", |
771 | 796 | "In this section we pass an image of a lion through a quantizer and convolution layer to observe the performance of the quantizer with convolution." |
772 | 797 | ] |
|
778 | 803 | "metadata": {}, |
779 | 804 | "outputs": [], |
780 | 805 | "source": [ |
781 | | - "img = Image.open(\"lion.png\")\n", |
| 806 | + "import os, wget\n", |
| 807 | + "IMG_FILE_NAME = 'lion.png'\n", |
| 808 | + "url = 'https://raw.githubusercontent.com/foundation-model-stack/fms-model-optimizer/main/tutorials/images/' + IMG_FILE_NAME\n", |
| 809 | + "\n", |
| 810 | + "if not os.path.isfile(IMG_FILE_NAME):\n", |
| 811 | + " wget.download(url, out=IMG_FILE_NAME)\n", |
| 812 | + "\n", |
| 813 | + "img = Image.open(IMG_FILE_NAME)\n", |
782 | 814 | "img" |
783 | 815 | ] |
784 | 816 | }, |
|
866 | 898 | "metadata": {}, |
867 | 899 | "source": [ |
868 | 900 | "<a id=\"`fms_mo`_conclusion\"></a>\n", |
869 | | - "# Conclusion\n", |
| 901 | + "\n", |
| 902 | + "## Conclusion\n", |
| 903 | + "\n", |
870 | 904 | "This notebook provided the following demonstrations:\n", |
871 | 905 | "\n", |
872 | 906 | "- In Step 1, we showed how quantization can be applied manually to a randomly generated normal distribution of input data.\n", |
873 | 907 | "- In Step 2, we showed how to apply quantization to a convolution layer.\n", |
874 | | - "- In Step 3, we showed how to automate the quantization process using `fms_mo`.\n", |
| 908 | + "- In Step 3, we showed how to automate the quantization process using FMS Model Optimizer.\n", |
875 | 909 | "- In Step 4, we observed the performance of a quantized convolution layer on an image of a lion." |
876 | 910 | ] |
877 | 911 | }, |
|
882 | 916 | "metadata": {}, |
883 | 917 | "source": [ |
884 | 918 | "<a id=\"`fms_mo`_learn\"></a>\n", |
885 | | - "# Learn more \n", |
886 | | - "Please see [example scripts](https://github.ibm.com/ai-chip-toolchain/sq1e/tree/fms_mo/examples/) for more practical use of `fms_mo`\n" |
| 919 | + "\n", |
| 920 | + "## Learn more \n", |
| 921 | + "\n", |
| 922 | + "Please see [example scripts](https://github.com/foundation-model-stack/fms-model-optimizer/tree/main/examples) for more practical use of FMS Model Optimizer.\n" |
887 | 923 | ] |
888 | | - }, |
889 | | - { |
890 | | - "cell_type": "markdown", |
891 | | - "id": "0a409825", |
892 | | - "metadata": {}, |
893 | | - "source": [] |
894 | 924 | } |
895 | 925 | ], |
896 | 926 | "metadata": { |
|
0 commit comments