You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The getCUDAStream method has been significantly simplified by replacing manual lookup+insertion logic with try_emplace. While this is generally an improvement, verify that the new logic correctly handles all edge cases, particularly around device index determination when communicator is null or unavailable.
c10::cuda::CUDAStream HostIrEvaluator::getCUDAStream(Stream* stream) {
StreamKey stream_key = stream;
// if stream points to an index, it represents the dynamic value of that indexif (Val* index = stream->index(); index != nullptr) {
auto value = expr_evaluator_.evaluate(index);
NVF_ERROR(value.hasValue() && value.is<int64_t>());
stream_key = value.as<int64_t>();
}
auto [it, inserted] =
streams_.try_emplace(stream_key, c10::cuda::getStreamFromPool());
return it->second;
}
The PR removes the communicator_->is_available() check that was previously used to determine device index. This change assumes the deviceId() method is safe to call even when communicator is null or unavailable. Validate this assumption and ensure no runtime issues arise from this change.
A new AssignStreams pass has been added that modifies the host IR by inserting stream management operations. While this appears to be part of the stream parallelism feature, ensure the logic correctly handles stream synchronization and doesn't introduce race conditions or performance regressions.
voidAssignStreams::runPass(Fusion* fusion) {
auto* hic = dynamic_cast<HostIrContainer*>(fusion);
NVF_CHECK(hic != nullptr);
FusionGuard fg(hic);
for (auto it = hic->topLevel().exprs().begin();
it != hic->topLevel().exprs().end();) {
auto next_it = std::next(it);
auto* for_loop = dynamic_cast<ForLoop*>(*it);
if (for_loop == nullptr) {
it = next_it;
continue;
}
// We should check that the loop is stream-parallel. This is not necessary// at this moment because all loops are stream-parallel. This is also hard// to do becauase hir::ForLoop doesn't point to the source IterDomain.auto* get_current_stream = IrBuilder::create<GetCurrentStream>();
Stream* main_stream = get_current_stream->stream();
hic->topLevel().insert(it, get_current_stream);
// At the beginning of each iteration: set stream and synchronize with main// streamauto* worker_stream = IrBuilder::create<Stream>(for_loop->index());
auto* set_stream = IrBuilder::create<SetCurrentStream>(worker_stream);
auto* sync_main = IrBuilder::create<Synchronize>(main_stream);
auto old_begin = for_loop->body().exprs().begin();
for_loop->body().insert(old_begin, set_stream);
for_loop->body().insert(old_begin, sync_main);
// After the loop: create a joining loop to synchronize all worker streams
hic->topLevel().insert(
next_it, IrBuilder::create<SetCurrentStream>(main_stream));
auto* join_loop = IrBuilder::create<ForLoop>(
for_loop->index(), for_loop->start(), for_loop->stop());
hic->topLevel().insert(next_it, join_loop);
// In the joining loop: synchronize each worker streamauto* join_worker_stream = IrBuilder::create<Stream>(join_loop->index());
auto* sync_worker = IrBuilder::create<Synchronize>(join_worker_stream);
join_loop->body().push_back(sync_worker);
it = next_it;
}
}
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.