-
Notifications
You must be signed in to change notification settings - Fork 30
Use the full dataset for performance benchmarks #1069
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
|
How do we test this? |
Running the cmdline or app on Performance mode and checking the summary should show 100 samples instead of 1 |
|
@freedomtan to test it. |
|
@farook-edev please share the datasets so that @anhappdev could to upload them to the CDN. |
freedomtan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YES, the performance_sample_count is reasonable now on Pixel 10.
================================================
MLPerf Results Summary
================================================
SUT name : TFLite
Scenario : SingleStream
Mode : PerformanceOnly
90th percentile latency (ns) : 6196843547
90th first token percentile latency (ns) : 5568099822
Result is : VALID
Min duration satisfied : Yes
Min queries satisfied : Skipped
Early stopping satisfied: Yes
TTFT Early Stopping Result:
* Processed at least 64 queries (66).
* Would discard 0 highest latency queries.
* Early stopping 90th percentile estimate: 17103484388
* Not enough queries processed for 99th percentile
early stopping estimate (would need to process at
least 662 total queries).
TPOT Early Stopping Result:
* Processed at least 64 queries (66).
* Would discard 0 highest latency queries.
* Early stopping 90th percentile estimate: 834750626
* Not enough queries processed for 99th percentile
early stopping estimate (would need to process at
least 662 total queries).
================================================
Additional Stats
================================================
QPS w/ loadgen overhead : 0.21
QPS w/o loadgen overhead : 0.21
Min latency (ns) : 1436503361
Max latency (ns) : 17598149232
Mean latency (ns) : 4682296887
50.00 percentile latency (ns) : 4288985862
90.00 percentile latency (ns) : 6196843547
95.00 percentile latency (ns) : 8122990944
97.00 percentile latency (ns) : 13729710479
99.00 percentile latency (ns) : 17598149232
99.90 percentile latency (ns) : 17598149232
TPS w/ loadgen overhead : 0.45
TPS w/o loadgen overhead : 0.44
Min First Token latency (ns) : 1436326199
Max First Token latency (ns) : 17103484388
Mean First Token latency (ns) : 4058709108
50.00 percentile first token latency (ns) : 3507652997
90.00 percentile first token latency (ns) : 5568099822
95.00 percentile first token latency (ns) : 7330172688
97.00 percentile first token latency (ns) : 12895301937
99.00 percentile first token latency (ns) : 17103484388
99.90 percentile first token latency (ns) : 17103484388
Min Time to Output Token (ns) : -177162
Max Time to Output Token (ns) : 834750626
Mean Time to Output Token (ns) : 576319279
50.00 percentile time to output token (ns) : 515973151
90.00 percentile time to output token (ns) : 790773282
95.00 percentile time to output token (ns) : 802750365
97.00 percentile time to output token (ns) : 834408542
99.00 percentile time to output token (ns) : 834750626
99.90 percentile time to output token (ns) : 834750626
================================================
Test Parameters Used
================================================
samples_per_query : 1
target_qps : 1000
ttft_latency (ns): 100000000
tpot_latency (ns): 100000000
max_async_queries : 1
min_duration (ms): 60000
max_duration (ms): 300000
min_query_count : 100
max_query_count : 0
qsl_rng_seed : 3066443479025735752
sample_index_rng_seed : 10688027786191513374
schedule_rng_seed : 14962580496156340209
accuracy_log_rng_seed : 0
accuracy_log_probability : 0
accuracy_log_sampling_target : 0
print_timestamps : 0
performance_issue_unique : 0
performance_issue_same : 0
performance_issue_same_index : 0
performance_sample_count : 100
No warnings encountered during test.
1 ERROR encountered. See detailed log.



I also included some cleanup for debugging prints and such.
should close #1059, and maybe #1058 and #1060 since there's no more work to be done on any of them.
@freedomtan can you confirm?