fix error test

yangw-dev · yangw-dev · commit 99df0fe0da4e · 2025-06-18T18:38:36.000-07:00
Signed-off-by: Yang Wang &lt;elainewy@meta.com&gt;
diff --git a/.ci/scripts/benchmark_tooling/README.md b/.ci/scripts/benchmark_tooling/README.md
@@ -1,20 +1,48 @@
-# Benchmark Tooling
+# Executorch Benchmark Tooling
 
-A library providing tools for fetching, processing, and analyzing ExecutorchBenchmark data from the HUD Open API.
+A  library providing tools for fetching, processing, and analyzing ExecutorchBenchmark data from the HUD Open API. This tooling helps compare performance metrics between private and public devices with identical settings.
+
+## Table of Contents
+
+- [Overview](#overview)
+- [Installation](#installation)
+- [Tools](#tools)
+  - [get_benchmark_analysis_data.py](#get_benchmark_analysis_datapy)
+    - [Quick Start](#quick-start)
+    - [Command Line Options](#command-line-options)
+    - [Example Usage](#example-usage)
+    - [Working with Output Files](#working-with-output-files-csv-and-excel)
+    - [Python API Usage](#python-api-usage)
+- [Running Unit Tests](#running-unit-tests)
+
+## Overview
+
+The Executorch Benchmark Tooling provides a suite of utilities designed to:
+
+- Fetch benchmark data from HUD Open API for specified time ranges
+- Clean and process data by filtering out failures
+- Compare metrics between private and public devices with matching configurations
+- Generate analysis reports in various formats (CSV, Excel, JSON)
+- Support filtering by device pools, backends, and models
+
+This tooling is particularly useful for performance analysis, regression testing, and cross-device comparisons.
 
 ## Installation
 
 Install dependencies:
+
 ```bash
 pip install -r requirements.txt
 ```
 
 ## Tools
 
 ### get_benchmark_analysis_data.py
-This script mainlu used to generate analysis data between private device and public device with same settings.
 
-It fetches benchmark data from HUD Open API for a time range, then cleans the data with FAILURE inidcator, and retrieves all private device metrics and equivalent public device  metrics based on [model, backend, device_pool_names, arch]. User can filter the data by specifying private device_pool_names, backends, and models for private devices.
+This script is mainly used to generate analysis data comparing private devices with public devices using the same settings.
+
+It fetches benchmark data from HUD Open API for a specified time range, cleans the data by removing entries with FAILURE indicators, and retrieves all private device metrics along with equivalent public device metrics based on matching [model, backend, device_pool_names, arch] configurations. Users can filter the data by specifying private device_pool_names, backends, and models.
+
 #### Quick Start
 
 ```bash
@@ -37,38 +65,42 @@ python3 .ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py \
   - `print`: Display results in console
   - `json`: Generate JSON file
   - `df`: Display results in DataFrame format: `{'private': List[{'groupInfo':Dict,'df': DF},...],'public':List[{'groupInfo':Dict,'df': DF}]`
-  - `excel`: Generate Excel files with multiple sheets, the field in first row and first column contains the json string of the raw metadata
-  - `csv`: Generate CSV files in separate folders, the field in first row and first column contains the json string of the raw metadata
+  - `excel`: Generate Excel files with multiple sheets, the field in first row and first column contains the JSON string of the raw metadata
+  - `csv`: Generate CSV files in separate folders, the field in first row and first column contains the JSON string of the raw metadata
 - `--outputDir`: Directory to save output files (default: current directory)
 
 ##### Filtering Options:
 
 - `--private-device-pools`: Filter by private device pool names (e.g., "samsung-galaxy-s22-5g", "samsung-galaxy-s22plus-5g")
-- `--backends`: Filter by specific backend names (e.g.,  "qnn-q8" , ""llama3-spinquan)
-- `--models`: Filter by specific model names (e.g "mv3" "meta-llama-llama-3.2-1b-instruct-qlora-int4-eo8")
+- `--backends`: Filter by specific backend names (e.g., "qnn-q8", "llama3-spinquan")
+- `--models`: Filter by specific model names (e.g., "mv3", "meta-llama-llama-3.2-1b-instruct-qlora-int4-eo8")
 
 #### Example Usage
-call multiple private device pools and models:
-this fetches all the private table data that has model `llama-3.2-1B` and `mv3`
+
+Filter by multiple private device pools and models:
 ```bash
+# This fetches all private table data for models 'llama-3.2-1B' and 'mv3'
 python3 get_benchmark_analysis_data.py \
---startTime "2025-06-01T00:00:00" \
---endTime "2025-06-11T00:00:00" \
---private-device-pools 'apple_iphone_15_private' 'samsung_s22_private' \
---models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3'
+  --startTime "2025-06-01T00:00:00" \
+  --endTime "2025-06-11T00:00:00" \
+  --private-device-pools 'apple_iphone_15_private' 'samsung_s22_private' \
+  --models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3'
 ```
 
-this fetches all the private iphone table data that has model `llama-3.2-1B` and `mv3`, and associated public iphone
+Filter by specific device pool and models:
 ```bash
+# This fetches all private iPhone table data for models 'llama-3.2-1B' and 'mv3',
+# and associated public iPhone data
 python3 get_benchmark_analysis_data.py \
---startTime "2025-06-01T00:00:00" \
---endTime "2025-06-11T00:00:00" \
---private-device-pools 'apple_iphone_15_private' \
---models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3'
+  --startTime "2025-06-01T00:00:00" \
+  --endTime "2025-06-11T00:00:00" \
+  --private-device-pools 'apple_iphone_15_private' \
+  --models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3'
 ```
+
 #### Working with Output Files CSV and Excel
 
-You can use methods in `common.py` to convert the file data back to DataFrame format, those methods read the first row in csv/excel file, and return result with format list of {"groupInfo":DICT, "df":df.Dataframe{}} format.
+You can use methods in `common.py` to convert the file data back to DataFrame format. These methods read the first row in CSV/Excel files and return results with the format `list of {"groupInfo":DICT, "df":df.Dataframe{}}`.
 
 ```python
 import logging
@@ -126,7 +158,7 @@ results = fetcher.output_data(output_type="excel", output_dir="./results")
 
 The benchmark tooling includes unit tests to ensure functionality.
 
-### Using pytest
+### Using pytest for unit tests
 
 ```bash
 # From the executorch root directory