You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add shopify dataset for q3vl inference (#152)
* Initial commit for adding shopify dataset to predefined
* temporal test script
* Tested shopify dataset
* pre-commit formatting
* Remove unused counter and modified default image format
* Apply the suggestion to use df.to_dict instead of iterrows
* Unused import
* Update folder and class names
* Update folder and class names
* Offline perf yaml verified
* Rename
* Accuracy results tested
* Add the updated Readme
* Add unit tests for new preset dataset and scorer
* precommit formatting
* Potential fix for pull request finding 'Unused global variable'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* Config and readme related updates
* Refactor metadata related schema
* Load_from_huggingface returns a HF dataset instead of pd frame.
* Follow up fixes for updating load_from_huggingface
* Adding Pillow to dependency for dataset decoding
* Add logging for worker settings and also make worker init timeout configurable
* Load from disk direcly loads what's inside, no split kwarg supported
* No default cache dir needed.
* Make zmq_rev/send buffer size configurable
* Finalize the yaml file
* Add new yaml config args to unit test
* Fix typing
* precommit update
* Put metadata to a separate file for better formatting
* format fix
* ruff fix
* remove redudant type checks as hf dataset sample is fetched
* update pytest as PIL image input is assumed
* Potential fix for pull request finding 'Unused global variable'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* Potential fix for pull request finding 'An assert statement has a side-effect'
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
* Revert "Adding Pillow to dependency for dataset decoding"
This reverts commit 812fad1.
* rename output to response to align with the function name
* Add example calculation in docstring and update the naming
* Revert the changes to load_from_huggingface
* use hf original load_datasets
* Remove uv file
* Fix Pillow version
---------
Co-authored-by: Zhihan Jiang <68881590+nvzhihanj@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
# Running Endpoints with Qwen3-VL-235B-A22B on Shopify Product Catalogue
2
+
3
+
This document describes how to perform MLPerf Q3VL benchmarking using the inference endpoints with [Qwen3-VL-235B-A22B-instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) model and [Shopify's Product Catalogue dataset](https://huggingface.co/datasets/Shopify/product-catalogue) for multimodal product taxonomy classification.
4
+
5
+
## Get Dataset
6
+
7
+
The Shopify Product Catalogue dataset is loaded from HuggingFace and will be generated automatically on first run. Images are converted to base64 for storage.
8
+
9
+
```
10
+
# Dataset is auto-downloaded from https://huggingface.co/datasets/Shopify/product-catalogue
11
+
# No manual download required - DataLoaderFactory handles it
export HF_TOKEN=<your Hugging Face token> # Optional for public model; may help with rate limits
21
+
hf download $MODEL_NAME
22
+
```
23
+
24
+
The model is available at [Qwen3-VL-235B-A22B-instruct](https://huggingface.co/Qwen/Qwen3-VL-235B-A22B-Instruct) — no access request required.
25
+
26
+
**Note:** The Shopify Product Catalogue includes `ground_truth_category`, `ground_truth_brand`, and `ground_truth_is_secondhand` from the HuggingFace dataset. For accuracy evaluation, use the `shopify_category_f1` scorer which computes hierarchical F1 for category taxonomy (matches [MLCommons Q3VL evaluation](https://github.com/mlcommons/inference/blob/master/multimodal/qwen3-vl/src/mlperf_inf_mm_q3vl/evaluation.py)).
27
+
28
+
To add accuracy evaluation, include an accuracy dataset alongside the performance dataset:
29
+
30
+
```yaml
31
+
datasets:
32
+
- name: shopify_product_catalogue::q3vl
33
+
type: "performance"
34
+
force: true
35
+
- name: shopify_product_catalogue::q3vl
36
+
type: "accuracy"
37
+
force: true
38
+
accuracy_config:
39
+
eval_method: "shopify_category_f1"
40
+
ground_truth: "ground_truth_category"
41
+
extractor: "identity_extractor"# Required by benchmark; scorer parses JSON internally
42
+
num_repeats: 1
43
+
```
44
+
45
+
## Benchmark Qwen3-VL-235B-A22B using a config file
0 commit comments