Use the full dataset for performance benchmarks by farook-edev · Pull Request #1069 · mlcommons/mobile_app_open

farook-edev · 2025-11-10T09:50:57Z

I also included some cleanup for debugging prints and such.

should close #1059, and maybe #1058 and #1060 since there's no more work to be done on any of them.

@freedomtan can you confirm?

github-actions · 2025-11-10T09:51:07Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

sonarqubecloud · 2025-11-10T10:25:26Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
100.0% Duplication on New Code

See analysis details on SonarQube Cloud

freedomtan · 2025-11-11T05:52:07Z

How do we test this?

farook-edev · 2025-11-11T05:57:24Z

How do we test this?

Running the cmdline or app on Performance mode and checking the summary should show 100 samples instead of 1

freedomtan · 2025-11-11T06:08:56Z

@freedomtan to test it.

freedomtan · 2025-11-11T06:10:45Z

@farook-edev please share the datasets so that @anhappdev could to upload them to the CDN.

freedomtan

YES, the performance_sample_count is reasonable now on Pixel 10.

================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode     : PerformanceOnly
90th percentile latency (ns) : 6196843547
90th first token percentile latency (ns) : 5568099822
Result is : VALID
  Min duration satisfied : Yes
  Min queries satisfied : Skipped
  Early stopping satisfied: Yes
TTFT Early Stopping Result:
 * Processed at least 64 queries (66).
 * Would discard 0 highest latency queries.
 * Early stopping 90th percentile estimate: 17103484388
 * Not enough queries processed for 99th percentile
 early stopping estimate (would need to process at
 least 662 total queries).
TPOT Early Stopping Result:
 * Processed at least 64 queries (66).
 * Would discard 0 highest latency queries.
 * Early stopping 90th percentile estimate: 834750626
 * Not enough queries processed for 99th percentile
 early stopping estimate (would need to process at
 least 662 total queries).

================================================
Additional Stats
================================================
QPS w/ loadgen overhead         : 0.21
QPS w/o loadgen overhead        : 0.21

Min latency (ns)                : 1436503361
Max latency (ns)                : 17598149232
Mean latency (ns)               : 4682296887
50.00 percentile latency (ns)   : 4288985862
90.00 percentile latency (ns)   : 6196843547
95.00 percentile latency (ns)   : 8122990944
97.00 percentile latency (ns)   : 13729710479
99.00 percentile latency (ns)   : 17598149232
99.90 percentile latency (ns)   : 17598149232

TPS w/ loadgen overhead         : 0.45
TPS w/o loadgen overhead        : 0.44
Min First Token latency (ns)                : 1436326199
Max First Token latency (ns)                : 17103484388
Mean First Token latency (ns)               : 4058709108
50.00 percentile first token latency (ns)   : 3507652997
90.00 percentile first token latency (ns)   : 5568099822
95.00 percentile first token latency (ns)   : 7330172688
97.00 percentile first token latency (ns)   : 12895301937
99.00 percentile first token latency (ns)   : 17103484388
99.90 percentile first token latency (ns)   : 17103484388

Min Time to Output Token (ns)                : -177162
Max Time to Output Token (ns)                : 834750626
Mean Time to Output Token (ns)               : 576319279
50.00 percentile time to output token (ns)   : 515973151
90.00 percentile time to output token (ns)   : 790773282
95.00 percentile time to output token (ns)   : 802750365
97.00 percentile time to output token (ns)   : 834408542
99.00 percentile time to output token (ns)   : 834750626
99.90 percentile time to output token (ns)   : 834750626

================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
ttft_latency (ns): 100000000
tpot_latency (ns): 100000000
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 300000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 100

No warnings encountered during test.

1 ERROR encountered. See detailed log.

use the entire dataset for performance

2b9c54c

farook-edev requested review from anhappdev and freedomtan November 10, 2025 09:50

farook-edev requested a review from a team as a code owner November 10, 2025 09:50

This was linked to issues Nov 10, 2025

LLM Dataset Implementation #1058

Open

LLM TinyMMLU Dataset Implementation #1059

Open

LLM IFEval Dataset Implementation #1060

Open

freedomtan approved these changes Nov 11, 2025

View reviewed changes

farook-edev merged commit 2510839 into submission-v6.0 Nov 11, 2025
27 checks passed

farook-edev deleted the farook/full-dataset-performance branch November 11, 2025 08:18

github-actions bot locked and limited conversation to collaborators Nov 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the full dataset for performance benchmarks#1069

Use the full dataset for performance benchmarks#1069
farook-edev merged 1 commit intosubmission-v6.0from
farook/full-dataset-performance

farook-edev commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

sonarqubecloud bot commented Nov 10, 2025

Uh oh!

freedomtan commented Nov 11, 2025

Uh oh!

farook-edev commented Nov 11, 2025

Uh oh!

freedomtan commented Nov 11, 2025

Uh oh!

freedomtan commented Nov 11, 2025

Uh oh!

freedomtan left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

farook-edev commented Nov 10, 2025

Uh oh!

github-actions bot commented Nov 10, 2025

Uh oh!

sonarqubecloud bot commented Nov 10, 2025

Quality Gate passed

Uh oh!

freedomtan commented Nov 11, 2025

Uh oh!

farook-edev commented Nov 11, 2025

Uh oh!

freedomtan commented Nov 11, 2025

Uh oh!

freedomtan commented Nov 11, 2025

Uh oh!

freedomtan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants