Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
ae987c9 to
9f7e9f4
Compare
f95f29a to
cb7db52
Compare
237bbe3 to
a4fca64
Compare
| @@ -0,0 +1,43 @@ | |||
| from abc import ABC, abstractmethod | |||
|
|
|||
| class BaseCheck(ABC): | |||
There was a problem hiding this comment.
We can utilize the __init_subclass__ feature of Python to handle this registry-like functionality.
https://peps.python.org/pep-0487/#subclass-registration
I believe a better paradigm would be something like:
- Implement a Checker subclass that inherits BaseChecker
- Each Checker class will implement some number of methods that are prefixed with
check_ - BaseChecker's
__call__andexecute()methods will check all attributes on the class and run (in sequence) all attributes that are callable and start with the stringcheck_ - If there is a dependency, we can implement an
@BaseCheck.mark_dependency(<str>, ...)decorator where you can pass in a list of strings that are the function names, which need to be executed before the current check.
From what I could tell, all the implemented Check classes have an init method with the arguments log, path, config, submission_logs - The BaseCheck class should probably do the same and just store the values in self to be used by the subclasses later.
| v = self.execute(check) | ||
| valid &= v | ||
| if not valid: | ||
| return False |
There was a problem hiding this comment.
I'm wondering if it makes more sense to run every check here and return the success value of each, keyed by the checker's class? Something like:
{
AccuracyCheck: True,
ComplianceCheck: False,
PerformanceCheck: True,
...
}
I can see this being clunky if many tests depend on each other, which means that check failures will cascade. In which case my question is should there be a system to determine the dependencies of each test? Something like MeasurementsCheck depends on DirectoryStructureCheck, etc.
| if model in self.config.base["models_TEST04"]: | ||
| test_list.append("TEST04") | ||
| if model in self.config.base["models_TEST06"]: | ||
| test_list.append("TEST06") |
There was a problem hiding this comment.
Does it make more sense to have a ComplianceCheck class (which inherits BaseCheck), then have individual TEST0XCheck subclasses?
|
|
||
| if args.scenarios_to_skip: | ||
| scenarios_to_skip = [ | ||
| scenario for scenario in args.scenarios_to_skip.split(",")] |
There was a problem hiding this comment.
Can you add a formatter to the project like autopep8 or black?
| for logs in loader.load(): | ||
| # Initialize check classes | ||
| performance_checks = PerformanceCheck( | ||
| log, logs.loader_data["perf_path"], config, logs) |
There was a problem hiding this comment.
The overloading of the term log here feels clunky and confusing. A few comments here:
- It seems like bad practices to create a logger named 'main' and pass it around to each Checker. It would be better that each file has it's own logger (
logging.getLogger(__file__)) so that if (for instance) ExampleCheck didlog.info("Missing file ___"), the message in console would show that it originated in ExampleCheck rather thanmain. - If each file has its own logger, you no longer need to pass around
logeverywhere (makes it more concise) - If we are passing in
logs, is there a point to also pass inlogs.loader_data[key]? Can't that just be extracted by the Check's init method? - If this is simplified down to just
xxxx_check = XXXXCheck(config, logs), then it can further be simplified down to
for logs in loader.load():
for check_cls in [PerformanceCheck, ...]:
check_cls(config, logs)()
| measurements_checks() | ||
| power_checks() | ||
|
|
||
| with open(args.csv, "w") as csv: |
There was a problem hiding this comment.
Why are these empty files here?
|
|
||
| def load_config(self, version): | ||
| # TODO: Load values from | ||
| self.models = self.base["models"] |
There was a problem hiding this comment.
I mentioned this in the GH Issue, but if the giant model dict is already being stored in a Python file, it should probably be refactored into some hierarchy of dataclasses. Having a class based representation would also make doing the key -> property remapping you're doing here either easier or unnecessary.
Having it as dataclasses rather than a dict also makes the schema of the config more defined and easier to navigate.
a10cbc0 to
91a2e33
Compare
|
Hi @pgmpablo157321 , could you please update the Summary section of the README to reflect what the updated command looks like? |
|
initial test with 5.1 submission seem to be passing. |
5de7958 to
0a75d8f
Compare
|
@nv-alicheng @nvzhihanj to take a look and provide guidance whether to merge for 6.0 |
e9a6279 to
42acfd4
Compare
|
@arjunsuresh Hi Arjun, I’m not sure if you’re the right person for this. Could you, or someone else from the automation group, help address the ResNet50 and RetinaNet errors in some of the checks? It looks like this is affecting multiple PRs, which aren't related with ResNet and RetinaNet at all.
|
f8c26d3 to
a3aec67
Compare
|
@pgmpablo157321 Thanks for the PR! it seems like some of Alice's comments are not addressed yet. I would suggest that we either address them in this PR, or document the TODOs (in the issue) and address in follow-up PRs. |
72b20e1 to
9f6d2e2
Compare
e278c94 to
bd34399
Compare
bd34399 to
bbcff8e
Compare
0a7562c to
2ec5922
Compare
|
Pablo confirm this is ready for merging. |
da1ae6b to
b23e33a
Compare
fcf6783 to
2864163
Compare
|
WG: It is ok to merge it. |

#1670
Testing command (outside the inference repo):
implemented checks
performance:
accuracy
compliance
measurements
power
system