-
Notifications
You must be signed in to change notification settings - Fork 30
Feat. Benchmark Sets #1095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: submission-v6.0
Are you sure you want to change the base?
Feat. Benchmark Sets #1095
Conversation
…ith variants of the same benchmark.
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
|
|
@freedomtan to check if the running order (offline one should be the latest one to run) is kept unchanged. |
|
|
||
| benchmark_setting { | ||
| benchmark_id: "image_classification_v2" | ||
| benchmark_id: "image_classification_online_v2" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We collect run results shared by users. This change will cause a migration if we analyse the results over time.
|
|
@farook-edev @anhappdev would like to test, but we don;t have download link for the reference models, benchmark_setting { |
the files are available here, I think the model isn't there because it hadn't been decided at the time.. |



This PR adds a new element (BenchmarkSet) which bundles together benchmarks that are mostly similar but need to be run separately (i.e. different models or datasets but same function).
Under the hood the benchmarks work exactly the same, no C++ logic has been changed. The added configuration is only for the frontend.
The way it works is by bundling similar benchmarks under a set, and having each benchmark be active if one or more options it requires are active. For example, if we take LLM, let's say we have 3 models and 3 dataset implementations to test,
ModelA-DatasetB,ModelC-DatasetA, and so on, that'll be 9 benchmarks.Benchmark
ModelA-DatasetCwill define 2 required options,Model-AandDataset-C, then the Benchmark Set will contain 6 options in 2 categories, Models (A,B,C) and DataSets (A,B,C).If a user then enables Models A and C, And dataset A. the set will automatically activate
ModelA-DatasetAandModelC-DatasetAand disable all the others.The benefit from this approach is that instead of having 9 benchmarks that are basically the same, we'll have 1 set containg 6 options. While the core benchmarking will not see the sets or options.
This PR also applies the above described implementation to
image_classification_v2, combining the default and offline versions into a set, and providing 2 options to enable and disable the benchmarks. This is only a secondary improvements, since this system is meant to tidy up the (at least) 4 benchmarks that LLM will add.I've also included a video of the system in action:
optionvid.mp4
Closes #1082