Skip to content

Commit cbc536d

Browse files
ezelanzamvafin
authored andcommitted
Improve VLM quantization notebook structure with clear step-by-step organization (#1385)
- Add Step 1: Installation and Setup - Add Step 2: Data Preparation with sample image display - Add Step 3: Load Original Model and Test - Add Step 4: Configure and Apply Quantization (with substeps 4a and 4b) - Add Step 5: Compare Results (with substeps 5a and 5b) - Add Conclusion section summarizing benefits - Improve readability and educational flow of the notebook
1 parent 8dd9fd2 commit cbc536d

File tree

1 file changed

+153
-15
lines changed

1 file changed

+153
-15
lines changed

notebooks/openvino/visual_language_quantization.ipynb

Lines changed: 153 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,20 @@
1212
"Quantization is a technique to reduce the computational and memory costs of running inference by representing the weights and / or the activations with lower precision data types like 8-bit or 4-bit.\n"
1313
]
1414
},
15+
{
16+
"cell_type": "markdown",
17+
"id": "b70eeef0",
18+
"metadata": {
19+
"vscode": {
20+
"languageId": "raw"
21+
}
22+
},
23+
"source": [
24+
"## Step 1: Installation and Setup\n",
25+
"\n",
26+
"First, let's install the required dependencies."
27+
]
28+
},
1529
{
1630
"cell_type": "markdown",
1731
"id": "e8ebc847-8181-4c8a-9236-12cb23904773",
@@ -33,6 +47,28 @@
3347
"#! pip install \"optimum-intel[openvino]\" datasets num2words"
3448
]
3549
},
50+
{
51+
"cell_type": "markdown",
52+
"id": "7a179812",
53+
"metadata": {
54+
"vscode": {
55+
"languageId": "raw"
56+
}
57+
},
58+
"source": [
59+
"## Step 2: Preparation\n",
60+
"\n",
61+
"Now let's load the processor and prepare our input data. We'll use a sample image of a bee on a flower and ask the model what's on the flower.\n"
62+
]
63+
},
64+
{
65+
"cell_type": "markdown",
66+
"id": "860ff939",
67+
"metadata": {},
68+
"source": [
69+
"![image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg)"
70+
]
71+
},
3672
{
3773
"cell_type": "markdown",
3874
"id": "f253327b-af28-41de-b010-8edbec3c2c4a",
@@ -82,6 +118,20 @@
82118
"print(img_url)"
83119
]
84120
},
121+
{
122+
"cell_type": "markdown",
123+
"id": "0c9c5734",
124+
"metadata": {
125+
"vscode": {
126+
"languageId": "raw"
127+
}
128+
},
129+
"source": [
130+
"## Step 3: Load Original Model and Test\n",
131+
"\n",
132+
"Let's load the original FP32 model and test it with our prepared inputs to establish a baseline.\n"
133+
]
134+
},
85135
{
86136
"cell_type": "code",
87137
"execution_count": 3,
@@ -115,6 +165,32 @@
115165
"print(generated_texts[0])"
116166
]
117167
},
168+
{
169+
"cell_type": "markdown",
170+
"id": "1075a71e",
171+
"metadata": {
172+
"vscode": {
173+
"languageId": "raw"
174+
}
175+
},
176+
"source": [
177+
"## Step 4: Configure and Apply Quantization\n",
178+
"\n",
179+
"Now we'll configure the quantization settings and apply them to create an INT8 version of our model. We'll use weight-only quantization for size reduction with minimal accuracy loss. You can explore other quantization options [here](https://huggingface.co/docs/optimum/en/intel/openvino/optimization).\n"
180+
]
181+
},
182+
{
183+
"cell_type": "markdown",
184+
"id": "bfd08433",
185+
"metadata": {
186+
"vscode": {
187+
"languageId": "raw"
188+
}
189+
},
190+
"source": [
191+
"### Step 4a: Configure Quantization Settings\n"
192+
]
193+
},
118194
{
119195
"cell_type": "code",
120196
"execution_count": 4,
@@ -149,6 +225,18 @@
149225
")\n"
150226
]
151227
},
228+
{
229+
"cell_type": "markdown",
230+
"id": "e159efa8",
231+
"metadata": {
232+
"vscode": {
233+
"languageId": "raw"
234+
}
235+
},
236+
"source": [
237+
"### Step 4b: Apply Quantization\n"
238+
]
239+
},
152240
{
153241
"cell_type": "code",
154242
"execution_count": 5,
@@ -317,6 +405,32 @@
317405
"q_model.save_pretrained(int8_model_path)"
318406
]
319407
},
408+
{
409+
"cell_type": "markdown",
410+
"id": "0558b3b8",
411+
"metadata": {
412+
"vscode": {
413+
"languageId": "raw"
414+
}
415+
},
416+
"source": [
417+
"## Step 5: Compare Results\n",
418+
"\n",
419+
"Let's test the quantized model and compare it with the original model in terms of both output quality and model size.\n"
420+
]
421+
},
422+
{
423+
"cell_type": "markdown",
424+
"id": "a52faa10",
425+
"metadata": {
426+
"vscode": {
427+
"languageId": "raw"
428+
}
429+
},
430+
"source": [
431+
"### Step 5a: Test Quantized Model Output\n"
432+
]
433+
},
320434
{
321435
"cell_type": "code",
322436
"execution_count": 6,
@@ -343,6 +457,20 @@
343457
"print(generated_texts[0])"
344458
]
345459
},
460+
{
461+
"cell_type": "markdown",
462+
"id": "5d7778bf",
463+
"metadata": {
464+
"vscode": {
465+
"languageId": "raw"
466+
}
467+
},
468+
"source": [
469+
"### Step 5b: Compare Model Sizes\n",
470+
"\n",
471+
"Now let's compare the file sizes of the original FP32 model and the quantized INT8 model:\n"
472+
]
473+
},
346474
{
347475
"cell_type": "code",
348476
"execution_count": 7,
@@ -365,32 +493,42 @@
365493
},
366494
{
367495
"cell_type": "code",
368-
"execution_count": 8,
369-
"id": "8fd53000-1bad-4058-83c7-252f92e6d966",
496+
"execution_count": null,
497+
"id": "3c862277",
370498
"metadata": {},
371-
"outputs": [
372-
{
373-
"name": "stdout",
374-
"output_type": "stream",
375-
"text": [
376-
"FP32 model size: 1028.25 MB\n",
377-
"INT8 model size: 260.94 MB\n",
378-
"INT8 size decrease: 3.94x\n"
379-
]
380-
}
381-
],
499+
"outputs": [],
382500
"source": [
383501
"fp32_model_size = get_model_size(fp32_model_path)\n",
384502
"int8_model_size = get_model_size(int8_model_path)\n",
385503
"print(f\"FP32 model size: {fp32_model_size:.2f} MB\")\n",
386504
"print(f\"INT8 model size: {int8_model_size:.2f} MB\")\n",
387505
"print(f\"INT8 size decrease: {fp32_model_size / int8_model_size:.2f}x\")"
388506
]
507+
},
508+
{
509+
"cell_type": "markdown",
510+
"id": "43531db0",
511+
"metadata": {
512+
"vscode": {
513+
"languageId": "raw"
514+
}
515+
},
516+
"source": [
517+
"## Conclusion\n",
518+
"\n",
519+
"Great! We've successfully quantized our VLM model using Optimum Intel. The results show:\n",
520+
"\n",
521+
"1. **Quality**: The quantized model produces the same output as the original model\n",
522+
"2. **Size**: We achieved approximately 4x reduction in model size (from ~1GB to ~260MB)\n",
523+
"3. **Performance**: The INT8 model has been reduced on size maintaining the accuracy\n",
524+
"\n",
525+
"This demonstrates how quantization can significantly reduce model size preserving the model's accuracy for visual language tasks.\n"
526+
]
389527
}
390528
],
391529
"metadata": {
392530
"kernelspec": {
393-
"display_name": "Python 3 (ipykernel)",
531+
"display_name": "openvino_env",
394532
"language": "python",
395533
"name": "python3"
396534
},
@@ -404,7 +542,7 @@
404542
"name": "python",
405543
"nbconvert_exporter": "python",
406544
"pygments_lexer": "ipython3",
407-
"version": "3.9.18"
545+
"version": "3.12.7"
408546
}
409547
},
410548
"nbformat": 4,

0 commit comments

Comments
 (0)