Conversation
|
Review updated until commit 2376857 Description
|
| Relevant files | |||
|---|---|---|---|
| Enhancement |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Namespace inconsistency for scope types
hir::ForLoop is used for for loops but kir::IfThenElse is used for if-then-else. This should be verified to ensure both are using the correct types. If both should be from the same namespace (hir or kir), this could be a bug. |
Test failures
-
(Medium, 1)
nvFuser HopperMatmulTest matmul accuracy mismatch on H100Test Name H100 Source HopperMatmulTest.PingPongPersistent ❌ Link
…located when not inside for loop
|
!test |
|
!test |
|
!test |
Greptile SummaryThis PR replaces the simple linear scan in Key structural changes:
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[AllocateAndDeallocate::runPass] --> B[insertAllocations]
A --> C[insertDeallocations]
A --> D[checkMemoryLeak]
B --> B1[DominatorTree\nforward DFS order]
B1 --> B2[pre_fn: insert kir::Allocate\nbefore expr if output needs prealloc]
B2 --> B3[post_fn: remove output from\ndefined set on scope exit]
C --> C1[PostDominatorTree\nreverse execution order]
C1 --> C2[LowestCommonAncestor\ncompute LCA per TensorView]
C2 --> C3{tv is fusion\ninput or output?}
C3 -- Yes --> C4[skip]
C3 -- No --> C5[insert hir::Deallocate\nimmediately after LCA node]
D --> D1[Build new PostDominatorTree\non modified container]
D1 --> D2[DFS: pre_fn adds TVs\nto allocated set]
D2 --> D3[post_fn: erase TV\nwhen Deallocate hit]
D3 --> D4{remaining allocated\nare all fusion I/O?}
D4 -- Yes --> D5[pass]
D4 -- No --> D6[NVF_ERROR: memory leak]
Last reviewed commit: 2376857 |
Additional Comments (1)
|
Additional Comments (3)
|
Additional Comments (2)
The The diff shows this was introduced by this PR, changing
Node(Scope* scope, Scope::Iterator iterator, const Node* parent)where Beyond the type error, even if the iterator were stored, lca_node->scope()->insert(std::next(lca_node->iterator()), deallocate);
The fix is to convert to a forward iterator before storing. The element pointed to by reverse iterator for (auto it = exprs.rbegin(); it != exprs.rend(); ++it) {
Expr* e = *it;
Scope::Iterator fwd_it = std::prev(it.base());
auto [node_it, inserted] = nodes_.try_emplace(e, &scope, fwd_it, parent); |
|
!test |
|
!test |
Additional Comments (1)
|
wujingyue
left a comment
There was a problem hiding this comment.
Almost there. I'm reviewing the LCA part...
Made-with: Cursor
Additional Comments (1)
|
|
!test |
| computeLcaMap(); | ||
| } | ||
|
|
||
| const std::unordered_map<TensorView*, const Node*>& getLcaMap() const { |
There was a problem hiding this comment.
This still exposes the map. Consider getLca that takes a TV and returns a Node
There was a problem hiding this comment.
I attempted using hic.vals() first, but since that contains tvs inside kernels, there is no LCA for them.
One way is to ignore any tv that does not have a LCA, but that would miss any actual error in traversal / recording LCAs. I find using the LCA map more understandable that skipping over some tensorviews.
Am I missing any other method here?
Additional Comments (4)
The guard validates only top-level expressions for pre-existing Additionally, The check should recursively walk inner scopes, consistent with where deallocs are now being inserted: // Recursive helper to assert no pre-existing deallocates in any scope
std::function<void(const Scope&)> assertNoDeallocate;
assertNoDeallocate = [&](const Scope& scope) {
for (Expr* expr : scope.exprs()) {
NVF_ERROR(
!expr->isA<hir::Deallocate>(),
"Expected hostir container to not have deallocate, but found one anyways: ",
expr);
if (auto* loop = dynamic_cast<hir::ForLoop*>(expr)) {
assertNoDeallocate(loop->body());
}
if (auto* ite = dynamic_cast<kir::IfThenElse*>(expr)) {
assertNoDeallocate(ite->thenBody());
assertNoDeallocate(ite->elseBody());
}
}
};
assertNoDeallocate(hic.topLevel());
The While the tree is correctly structured as a single rooted tree for any valid while (a != b) {
NVF_ERROR(a != nullptr && b != nullptr,
"findLca reached null parent before finding a common ancestor");
a = a->parent();
b = b->parent();
}
The However, consider a TV that is the output of the last expression before a branch but is consumed only in one branch. Including the producer in the LCA shifts the insertion point to at least the producer level even if the consumer would have been sufficient. While not wrong, it could cause deallocation to be deferred beyond the strictly necessary point when Is including outputs here intentional, or was the intent to only track the production point via the
From
So However, the same attribute-vs-input distinction means that, in |
|
!test |
Additional Comments (2)
In the final In a valid, single-rooted PDT this is unreachable because any two nodes at the same depth must share the same root as a common ancestor. However, if the tree is somehow malformed (e.g. a bug in A defensive assertion or early-exit would harden this: while (a != b) {
NVF_ERROR(a != nullptr && b != nullptr,
"findLca: reached tree root without converging; "
"PostDominatorTree may have multiple roots");
a = a->parent();
b = b->parent();
}
The The guard is harmless but may mislead readers into thinking there is a real code path that produces a null LCA. Consider removing it or replacing it with a comment explaining why the null case is unreachable, e.g.: // `lca_node` is always non-null: computeLcaMap initialises every entry
// with findLca(nullptr, node) which returns node, and subsequent calls
// return a non-null common ancestor.
NVF_ERROR(lca_node != nullptr, ...); |
No description provided.