Use the full dataset for performance benchmarks#1069
Merged
farook-edev merged 1 commit intosubmission-v6.0from Nov 11, 2025
Merged
Use the full dataset for performance benchmarks#1069farook-edev merged 1 commit intosubmission-v6.0from
farook-edev merged 1 commit intosubmission-v6.0from
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
This was
linked to
issues
Nov 10, 2025
|
Contributor
|
How do we test this? |
Contributor
Author
Running the cmdline or app on Performance mode and checking the summary should show 100 samples instead of 1 |
Contributor
|
@freedomtan to test it. |
Contributor
|
@farook-edev please share the datasets so that @anhappdev could to upload them to the CDN. |
freedomtan
approved these changes
Nov 11, 2025
Contributor
freedomtan
left a comment
There was a problem hiding this comment.
YES, the performance_sample_count is reasonable now on Pixel 10.
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode : PerformanceOnly
90th percentile latency (ns) : 6196843547
90th first token percentile latency (ns) : 5568099822
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Skipped
Early stopping satisfied: Yes
TTFT Early Stopping Result:
* Processed at least 64 queries (66).
* Would discard 0 highest latency queries.
* Early stopping 90th percentile estimate: 17103484388
* Not enough queries processed for 99th percentile
early stopping estimate (would need to process at
least 662 total queries).
TPOT Early Stopping Result:
* Processed at least 64 queries (66).
* Would discard 0 highest latency queries.
* Early stopping 90th percentile estimate: 834750626
* Not enough queries processed for 99th percentile
early stopping estimate (would need to process at
least 662 total queries).
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 0.21
QPS w/o loadgen overhead : 0.21
Min latency (ns) : 1436503361
Max latency (ns) : 17598149232
Mean latency (ns) : 4682296887
50.00 percentile latency (ns) : 4288985862
90.00 percentile latency (ns) : 6196843547
95.00 percentile latency (ns) : 8122990944
97.00 percentile latency (ns) : 13729710479
99.00 percentile latency (ns) : 17598149232
99.90 percentile latency (ns) : 17598149232
TPS w/ loadgen overhead : 0.45
TPS w/o loadgen overhead : 0.44
Min First Token latency (ns) : 1436326199
Max First Token latency (ns) : 17103484388
Mean First Token latency (ns) : 4058709108
50.00 percentile first token latency (ns) : 3507652997
90.00 percentile first token latency (ns) : 5568099822
95.00 percentile first token latency (ns) : 7330172688
97.00 percentile first token latency (ns) : 12895301937
99.00 percentile first token latency (ns) : 17103484388
99.90 percentile first token latency (ns) : 17103484388
Min Time to Output Token (ns) : -177162
Max Time to Output Token (ns) : 834750626
Mean Time to Output Token (ns) : 576319279
50.00 percentile time to output token (ns) : 515973151
90.00 percentile time to output token (ns) : 790773282
95.00 percentile time to output token (ns) : 802750365
97.00 percentile time to output token (ns) : 834408542
99.00 percentile time to output token (ns) : 834750626
99.90 percentile time to output token (ns) : 834750626
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
ttft_latency (ns): 100000000
tpot_latency (ns): 100000000
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 300000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 100
No warnings encountered during test.
1 ERROR encountered. See detailed log.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.



I also included some cleanup for debugging prints and such.
should close #1059, and maybe #1058 and #1060 since there's no more work to be done on any of them.
@freedomtan can you confirm?