-
Notifications
You must be signed in to change notification settings - Fork 7
Add Workload object and execution methods #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
rolfmorel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a partial pass through it. Will try to complete the first pass through tomorrow!
| pass | ||
|
|
||
| @abstractmethod | ||
| def get_input_arrays(self, execution_engine) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-> abc.Sequence[ir.Value]?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had that each Workload is explicitly associated with a Target (or more than one), i.e. has a Target member, and each Target keeps track of it's associated execution_engine, this signature becomes simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating the execution engine requires a fully lowered payload IR. So keeping the engine in the Workload object complicates the re-use of this object, e.g., for different schedule parameters.
The flow is something like: ParametrizedWorkload(schedule with unknown params, initial payload IR, correctness test, target specific info, ...) -> ConcreteWorkload(lowered payload IR, target specific info, ...).
In this proposal the Workload object addresses the first part - it is a re-usable high-level description of the problem. Every time one asks for a payload or schedule module, a new module is generated (The payload module is lowered in-place, so it cannot be re-used; schedule may depend on the given parameters). The latter part - lowering to a final payload IR and executing it - is handled via the execution helper functions. We could design this differently, e.g. by adding another "ConcreteWorkload" object that is associated with an execution engine (either in this PR or later). All ideas are much appreciated!
| pass | ||
|
|
||
| @abstractmethod | ||
| def get_complexity(self) -> list: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the different "dimensions" of the returned thing are known, and have sensible names, I would go with a NamedTuple.
| print(schedule_module) | ||
|
|
||
|
|
||
| def lower_payload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this just be a non-abstractmethod method on the Workload class?
| dump_schedule: bool = False, | ||
| ): | ||
| if not dump_kernel or dump_kernel != "initial": | ||
| with context, location: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind there are very few cases where you have a reference to a mlir.ir.Context and haven't entered its context manager yet. The story for mlir.ir.Location is similar (there you might actually want to enter and exit nested Locations based on the scope of the IR you're building).
Have a look at #20 (comment) and #20 (comment) . My take is that the function which acts as the process's main() should set up and enter a mlir.ir.Context and set a mlir.ir.Location.unknown(). If we do that, the rest of the code should barely need to think about these any more.
| if not dump_kernel or dump_kernel != "initial": | ||
| with context, location: | ||
| # invoke transform interpreter directly | ||
| transform_interpreter.apply_named_sequence( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have llvm/llvm-project#168223, this can probably be simplified. (E.g. do we need apply_transform_schedule?)
| return (flop_count, memory_reads, memory_writes) | ||
|
|
||
| def payload_module(self): | ||
| with self.context, self.location: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| with self.context, self.location: |
| A = f.arguments[0] | ||
| B = f.arguments[1] | ||
| C = f.arguments[2] | ||
| a_tensor = bufferization.ToTensorOp(tensor_t, A, restrict=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a general principle we can now set for ourselves for the project right at the start: in case the upstream bindings already have a workable snake_case version of an op, lets use that over the CamelCaseOp version. The crux of the argument being that this will make the Python code look closer to a terse version of the MLIR textual format.
| c_tensor = bufferization.ToTensorOp( | ||
| tensor_t, C, restrict=True, writable=True | ||
| ) | ||
| add = linalg.add(a_tensor, b_tensor, outs=[c_tensor]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to prefer doing the linalg.add on tensors rather than on memrefs? If anything, the IR would be even simpler.
| schedule_module.operation.attributes[ | ||
| "transform.with_named_sequence"] = (ir.UnitAttr.get()) | ||
| with ir.InsertionPoint(schedule_module.body): | ||
| named_sequence = transform.NamedSequenceOp( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| named_sequence = transform.NamedSequenceOp( | |
| named_sequence = transform.named_sequence( |
| if dump_kernel == "bufferized": | ||
| transform.YieldOp() | ||
| return schedule_module |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want this in the committed version?
An alternative to consider is inserting transform.print %mod if some verbosity flag has been set.
Implements #16.
Workloadobject and execution engine utility functions that can execute it.