Add BAAI/bge-small-en-v1.5 #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft

weiyuanyue wants to merge 2 commits into microsoft:main from weiyuanyue:milly/baai

README.md

-Original file line number
+Diff line change
@@ Expand Up @@
     | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
     | [google-bert-bert-base-multilingual-cased](google-bert-bert-base-multilingual-cased/aitk) | [laion-CLIP-ViT-B-32-laion2B-s34B-b79K](laion-CLIP-ViT-B-32-laion2B-s34B-b79K/aitk) | [deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B](deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/aitk) | [meta-llama-Llama-3.2-1B-Instruct](meta-llama-Llama-3.2-1B-Instruct/NvTensorRtRtx) | [mistralai-Mistral-7B-Instruct-v0.3](mistralai-Mistral-7B-Instruct-v0.3/aitk) | [microsoft-Phi-3.5-mini-instruct](microsoft-Phi-3.5-mini-instruct/aitk) | [microsoft-Phi-3.5-mini-instruct](microsoft-Phi-3.5-mini-instruct/NvTensorRtRtx) | [Qwen-Qwen2.5-1.5B-Instruct](Qwen-Qwen2.5-1.5B-Instruct/NvTensorRtRtx) | [microsoft-resnet-50](microsoft-resnet-50/aitk) | [google-vit-base-patch16-224](google-vit-base-patch16-224/aitk) |
     | [intel-bert-base-uncased-mrpc](intel-bert-base-uncased-mrpc/aitk) | [openai-clip-vit-base-patch16](openai-clip-vit-base-patch16/aitk) |  | [meta-llama-Llama-3.2-1B-Instruct](meta-llama-Llama-3.2-1B-Instruct/aitk) |  | [microsoft-Phi-4-mini-reasoning](microsoft-Phi-4-mini-reasoning/aitk) |  | [Qwen-Qwen2.5-1.5B-Instruct](Qwen-Qwen2.5-1.5B-Instruct/aitk) |  |  |
-    |  | [openai-clip-vit-base-patch32](openai-clip-vit-base-patch32/aitk) |  |  |  |  |  | [deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B](deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/NvTensorRtRtx) |  |  |
+    |[BAAI/bge-small-en-v1.5](baai-bge-small-en-v1.5/aitk)| [openai-clip-vit-base-patch32](openai-clip-vit-base-patch32/aitk) |  |  |  |  |  | [deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B](deepseek-ai-DeepSeek-R1-Distill-Qwen-1.5B/NvTensorRtRtx) |  |  |
     <!-- end_arch_models -->
     </details>
@@ Expand Down @@

baai-bge-small-en-v1.5/aitk/.gitignore

-Original file line number
+Diff line change
@@ -0,0 +1,6 @@
+    __pycache__
+    /cache
+    /history/*/*
+    !/history/*/history.config
+    !/history/*/olive_config.json
+    .DS_Store

baai-bge-small-en-v1.5/aitk/README.md

-Original file line number
+Diff line change
@@ -0,0 +1,165 @@
+    # BGE-Small-EN-v1.5 Optimization
+    This folder contains examples of BGE-Small-EN-v1.5 optimization using different workflows for various hardware accelerators.
+    ## Model Overview
+    BGE-Small-EN-v1.5 is a lightweight English text embedding model developed by BAAI (Beijing Academy of Artificial Intelligence). The model is optimized for sentence and text embedding tasks, providing high-quality vector representations for downstream applications such as semantic search, text classification, and similarity matching.
+    ## Optimization Workflows
+    This directory provides three different optimization workflows targeting specific hardware accelerators:
+    - **QDQ for Qualcomm NPU**: Quantization-aware training for Qualcomm Neural Processing Units
+    - **QDQ for AMD NPU**: Quantization-aware training for AMD Neural Processing Units
+    - **OpenVINO for Intel NPU**: OpenVINO optimization for Intel Neural Processing Units
+    ## Workflow Details
+    ### QDQ for Qualcomm NPU
+    This workflow performs quantization-aware training optimization for Qualcomm NPU acceleration. It follows the optimization pipeline:
+    - *HuggingFace Model → ONNX Model → Quantized ONNX Model*
+    **Configuration File**: `bge-small-en-v1.5_qdq_qnn.json`
+    **Key Features**:
+    - Uses QNN (Qualcomm Neural Network) execution provider
+    - Implements quantization-aware training with dynamic quantization
+    - Optimized for Qualcomm NPU hardware architecture
+    - Supports both activation and weight quantization
+    ### QDQ for AMD NPU
+    This workflow performs quantization-aware training optimization for AMD NPU acceleration. It follows the optimization pipeline:
+    - *HuggingFace Model → ONNX Model → Quantized ONNX Model*
+    **Configuration File**: `bge-small-en-v1.5_qdq_amd.json`
+    **Key Features**:
+    - Optimized for AMD NPU architecture
+    - Implements quantization-aware training with dynamic quantization
+    - Enhanced performance for AMD hardware
+    - Supports both activation and weight quantization
+    ### OpenVINO for Intel NPU
+    This workflow performs OpenVINO optimization for Intel NPU acceleration. It follows the optimization pipeline:
+    - *HuggingFace Model → OpenVINO IR Model*
+    **Configuration File**: `bge-small-en-v1.5_context_ov_static.json`
+    **Key Features**:
+    - Uses OpenVINO execution provider for Intel NPU
+    - Implements static quantization for optimal performance
+    - Custom user script for specialized data processing
+    - Enhanced accuracy evaluation using MTEB benchmarks
+    ## Dataset Information
+    ### Quantization Datasets
+    - **QNN/AMD NPU**: Uses MTEB Banking77 test split for quantization calibration
+    - **Intel NPU**: Uses Wikipedia train split (300 samples) with custom preprocessing
+    ### Evaluation Datasets
+    - **Primary**: MTEB Banking77 classification task
+    - **Evaluation Metric**: Custom embedding accuracy for semantic similarity
+    - **Benchmark**: MTEB (Massive Text Embedding Benchmark) for standardized evaluation
+    ## Performance Evaluation Results
+    The following results are based on comprehensive evaluation using standard embedding benchmarks and performance metrics. All evaluations use the MTEB Banking77 dataset for consistency.
+    ### Qualcomm NPU (QNN) Performance
+    | Metric | Value |
+    |--------|-------|
+    | **Accuracy** | 85.57% |
+    | **Latency (avg)** | 14.83 ms |
+    | **Latency (min)** | 13.66 ms |
+    | **Latency (max)** | 17.92 ms |
+    | **Latency (p90)** | 15.52 ms |
+    | **Throughput (avg)** | 70.97 tokens/sec |
+    | **Throughput (max)** | 72.83 tokens/sec |
+    | **Throughput (min)** | 68.47 tokens/sec |
+    ### AMD NPU Performance
+    | Metric | Value |
+    |--------|-------|
+    | **Accuracy** | 83.66% |
+    | **Latency (avg)** | 8.58 ms |
+    | **Latency (min)** | 7.54 ms |
+    | **Latency (max)** | 9.43 ms |
+    | **Latency (p90)** | 9.13 ms |
+    | **Throughput (avg)** | 107.26 tokens/sec |
+    | **Throughput (max)** | 130.15 tokens/sec |
+    | **Throughput (min)** | 88.90 tokens/sec |
+    ### Intel NPU Performance
+    | Metric | Value |
+    |--------|-------|
+    | **Accuracy** | 85.42% |
+    | **Latency (avg)** | 3.33 ms |
+    | **Latency (min)** | 2.30 ms |
+    | **Latency (max)** | 6.39 ms |
+    | **Latency (p90)** | 4.01 ms |
+    | **Throughput (avg)** | 312.15 tokens/sec |
+    | **Throughput (max)** | 421.12 tokens/sec |
+    | **Throughput (min)** | 199.13 tokens/sec |
+    ## Optimization Techniques
+    ### Quantization Strategies
+    - **Dynamic Quantization**: Used for QNN and AMD NPU workflows
+    - **Static Quantization**: Used for Intel NPU workflow with OpenVINO
+    - **Mixed Precision**: Combines different precision levels for optimal performance
+    ### Model Optimization Features
+    - **Input Optimization**: Fixed input shapes for better inference performance
+    - **Memory Optimization**: Efficient memory usage through quantization
+    - **Hardware-Specific Tuning**: Custom optimizations for each NPU architecture
+    ## Requirements
+    The following dependencies are required for running the optimization workflows:
+    ```
+    olive-ai
+    datasets
+    optimum
+    mteb
+    polars-lts-cpu (QNN only)
+    ```
+    ## Usage
+. **Select Workflow**: Choose the appropriate configuration file based on your target hardware:
+       - For Qualcomm NPU: `bge-small-en-v1.5_qdq_qnn.json`
+       - For AMD NPU: `bge-small-en-v1.5_qdq_amd.json`
+       - For Intel NPU: `bge-small-en-v1.5_context_ov_static.json`
+. **Configure Parameters**: Adjust quantization parameters such as activation type, weight type, and quantization dataset according to your specific requirements.
+. **Run Optimization**: Execute the optimization pipeline using the selected configuration.
+. **Evaluate Results**: Use the provided evaluation scripts to assess model performance on your target hardware.
+    ## Performance Notes
+    - **Accuracy**: Measured using custom embedding accuracy metrics from MTEB benchmark
+    - **Latency**: Measured in milliseconds per inference
+    - **Throughput**: Measured in tokens per second
+    -
+    ## Model Information
+    - **Model ID**: `BAAI/bge-small-en-v1.5`
+    - **Model Type**: Text Embedding Model
+    - **Framework**: HuggingFace Transformers
+    - **Optimization Target**: Hardware-specific acceleration for embedding generation
+    *Note: Performance metrics may vary depending on hardware specifications, system environment, and workload characteristics. The values provided here are for reference and may not reflect performance on all devices or configurations.*

baai-bge-small-en-v1.5/aitk/bge-small-en-v1.5_context_ov_static.json

-Original file line number
+Diff line change
@@ -0,0 +1,203 @@
+    {
+        "input_model": {
+            "type": "HfModel",
+            "model_path": "BAAI/bge-small-en-v1.5",
+            "task": "feature-extraction",
+            "io_config": {
+                "input_names": [
+                    "input_ids",
+                    "attention_mask",
+                    "token_type_ids"
+                ],
+                "input_shapes": [
+                    [
+,
+                    ],
+                    [
+,
+                    ],
+                    [
+,
+                    ]
+                ],
+                "input_types": [
+                    "int64",
+                    "int64",
+                    "int64"
+                ],
+                "output_names": [
+                    "last_hidden_state",
+                    "state"
+                ]
+            }
+        },
+        "systems": {
+            "local_system": {
+                "type": "LocalSystem",
+                "accelerators": [
+                    {
+                        "device": "npu",
+                        "execution_providers": [
+                            "OpenVINOExecutionProvider"
+                        ]
+                    }
+                ]
+            }
+        },
+        "data_configs": [
+            {
+                "name": "quantize_data_config",
+                "user_script": "user_script.py",
+                "load_dataset_config": {
+                    "type": "bge_small_en_dataset",
+                    "data_name": "wikipedia",
+                    "split": "train",
+                    "max_samples": 300
+                },
+                "dataloader_config": {
+                    "batch_size": 1,
+                    "drop_last": true
+                }
+            },
+            {
+                "name": "accuracy_data_config",
+                "type": "HuggingfaceContainer",
+                "load_dataset_config": {
+                    "data_name": "mteb/banking77",
+                    "split": "test"
+                },
+                "pre_process_data_config": {
+                    "max_length": 128,
+                    "padding": "max_length",
+                    "input_cols": ["text"]
+                },
+                "dataloader_config": {
+                    "batch_size": 1
+                }
+            },
+            {
+                "name": "evaluation_data_config",
+                "type": "HuggingfaceContainer",
+                "load_dataset_config": {
+                    "data_name": "mteb/banking77",
+                    "split": "test"
+                },
+                "pre_process_data_config": {
+                    "max_length": 128,
+                    "padding": "max_length",
+                    "input_cols": ["text"]
+                },
+                "dataloader_config": {
+                    "batch_size": 1
+                }
+            }
+        ],
+        "evaluators": {
+            "common_evaluator": {
+                "metrics": [
+                    {
+                        "name": "accuracy",
+                        "type": "custom",
+                        "sub_types": [
+                            {
+                                "name": "embedding_accuracy",
+                                "priority": 1,
+                                "higher_is_better": true,
+                                "goal": { "type": "max-degradation", "value": 0.05 }
+                            }
+                        ],
+                        "user_config": {
+                            "user_script": "user_script.py",
+                            "evaluate_func": "eval_accuracy"
+                        }
+                    },
+                    {
+                        "name": "latency",
+                        "type": "latency",
+                        "data_config": "evaluation_data_config",
+                        "sub_types": [
+                            { "name": "avg", "priority": 2, "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p50", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p75", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p90", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p95", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p99", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "min", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "max", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } }
+                        ]
+                    },
+                    {
+                        "name": "throughput",
+                        "type": "throughput",
+                        "data_config": "evaluation_data_config",
+                        "sub_types": [
+                            { "name": "avg", "priority": 3, "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p50", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p75", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p90", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p95", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "p99", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "min", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } },
+                            { "name": "max", "metric_config": { "warmup_num": 20, "repeat_test_num": 100 } }
+                        ]
+                    }
+                ]
+            }
+        },
+        "passes": {
+            "optimum_convert": {
+                "type": "OpenVINOOptimumConversion",
+                "extra_args": {
+                    "device": "npu",
+                    "task": "feature-extraction"
+                }
+            },
+            "io_update": {
+                "type": "OpenVINOIoUpdate",
+                "input_shapes": [
+                    [
+,
+                    ],
+                    [
+,
+                    ],
+                    [
+,
+                    ]
+                ],
+                "static": true
+            },
+            "ov_quantize": {
+                "type": "OpenVINOQuantization",
+                "target_device": "npu",
+                "data_config": "quantize_data_config",
+                "model_type": "TRANSFORMER",
+                "user_script": "user_script.py",
+                "transform_fn": "custom_transform_func",
+                "extra_configs": [
+                    {
+                        "advanced_quantization_parameters": {
+                            "smooth_quant_alpha": 0.6
+                        }
+                    }
+                ]
+            },
+            "encapsulation": {
+                "type": "OpenVINOEncapsulation",
+                "target_device": "npu",
+                "ov_version": "2025.1"
+            }
+        },
+        "cache_dir": "cache",
+        "evaluate_input_model": false,
+        "evaluator": "common_evaluator",
+        "host": "local_system",
+        "output_dir": "models/bge-small-en-v1.5/openvino",
+        "target": "local_system"
+    }

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BAAI/bge-small-en-v1.5 #25

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!

Add BAAI/bge-small-en-v1.5 #25

Are you sure you want to change the base?

Uh oh!

Add BAAI/bge-small-en-v1.5 #25

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Uh oh!