Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 6 additions & 9 deletions csrc/host_ir/lowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -192,11 +192,11 @@ void lowerSegment(
out,
DomainType::kLoop,
{ParallelType::Stream})) {
auto [i, inserted] = replacement_map.try_emplace(
in,
hir::shardByStream(in, innermost.loop->index(), communication));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood try_emplace's lazy construction. shardByStream will be called anyway regardless of the key.

if (inserted) {
innermost_scope.pushBack(i->second->definition());
Val*& sharded_in = replacement_map[in];
if (sharded_in == nullptr) {
sharded_in =
hir::shardByStream(in, innermost.loop->index(), communication);
innermost_scope.pushBack(sharded_in->definition());
}
}

Expand All @@ -210,7 +210,7 @@ void lowerSegment(
nullptr) {
innermost.parent_scope->insert(
innermost.parent_insertion_point, allocate);
auto [i, inserted] = replacement_map.try_emplace(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK because of the assert inserted below. However, it's not necessary because the value type, Val*, doesn't involve construction.

auto [i, inserted] = replacement_map.emplace(
out,
hir::shardByStream(out, innermost.loop->index(), communication));
NVF_ERROR(inserted, "The input segmented fusion should be SSA.");
Expand Down Expand Up @@ -314,9 +314,6 @@ void lowerSegment(
innermost.parent_insertion_point, allocate);
// Loop is stream parallelized but allocation is not. Therefore,
// `out` should be allocated outside the loop.
//
// I use try_emplace here so shardByStream is called only when `out`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is outdated.

// is missing.
TensorView* sharded_out =
hir::shardByStream(out, innermost.loop->index(), e);
replacement_map[out] = sharded_out;
Expand Down
2 changes: 1 addition & 1 deletion csrc/multidevice/utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ std::unordered_map<ParallelType, IterDomain*> mapDeviceAndStreamParallelTypeToId
}

NVF_ERROR(
parallel_type_to_id.try_emplace(parallel_type, id).second,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

parallel_type_to_id.emplace(parallel_type, id).second,
"Found multiple loop IterDomains with the same parallel type (",
parallel_type,
"): ",
Expand Down