Skip to content

Conversation

@wujingyue
Copy link
Collaborator

No description provided.

@wujingyue
Copy link
Collaborator Author

!test

{ParallelType::Stream})) {
auto [i, inserted] = replacement_map.try_emplace(
in,
hir::shardByStream(in, innermost.loop->index(), communication));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I misunderstood try_emplace's lazy construction. shardByStream will be called anyway regardless of the key.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 5, 2026

Greptile Summary

This PR fixes incorrect uses of try_emplace in the host IR lowering code. The main fix addresses a performance/correctness bug where hir::shardByStream() was being called unnecessarily even when a key already existed in the map, since try_emplace evaluates its value argument before checking key existence. The fix uses operator[] with a nullptr check instead, ensuring shardByStream is only called when needed. Additional changes replace try_emplace with emplace in contexts where insertion is expected to always succeed (enforced by NVF_ERROR), improving code consistency and clarity.

Key changes:

  • Fixed replacement_map usage in Communication case to avoid redundant shardByStream calls
  • Replaced try_emplace with emplace where SSA property guarantees unique insertions
  • Removed obsolete comment about try_emplace optimization that no longer applies

Confidence Score: 5/5

  • This PR is safe to merge with no risk - it fixes a clear bug with proper replacement code
  • The changes are straightforward C++ standard library usage fixes with clear correctness improvements. The first change fixes a real bug where try_emplace caused unnecessary function calls, while the other changes improve consistency. All modifications maintain existing error checking and SSA guarantees.
  • No files require special attention

Important Files Changed

Filename Overview
csrc/host_ir/lowering.cpp Fixed incorrect try_emplace usage that caused unnecessary shardByStream calls, and improved consistency by replacing another try_emplace with emplace
csrc/multidevice/utils.cpp Changed try_emplace to emplace for consistency where insertion is expected to always succeed

Sequence Diagram

sequenceDiagram
    participant LowerSegment as lowerSegment()
    participant ReplacementMap as replacement_map
    participant ShardByStream as hir::shardByStream()
    
    Note over LowerSegment: Processing Communication exprs
    
    LowerSegment->>ReplacementMap: Check if 'in' exists using operator[]
    alt in is nullptr (not found)
        LowerSegment->>ShardByStream: Call shardByStream(in, ...)
        ShardByStream-->>ReplacementMap: Store sharded_in
        LowerSegment->>LowerSegment: pushBack(definition())
    else in exists (not nullptr)
        Note over LowerSegment: Skip - already sharded
    end
    
    Note over LowerSegment: Processing output allocation
    
    LowerSegment->>ReplacementMap: emplace(out, sharded_out)
    alt Insertion succeeds
        Note over LowerSegment: SSA property maintained
        LowerSegment->>LowerSegment: pushBack(definition())
    else Insertion fails
        LowerSegment->>LowerSegment: NVF_ERROR - SSA violation
    end
    
    Note over LowerSegment: ExprEval case
    
    loop For each input
        LowerSegment->>ReplacementMap: contains(in)?
        alt Not in map
            LowerSegment->>ShardByStream: shardByStream(in, ...)
            ShardByStream-->>ReplacementMap: Store via operator[]
        end
    end
    
    loop For each output
        LowerSegment->>ReplacementMap: Check !contains(out)
        Note over LowerSegment: SSA check via NVF_ERROR
        LowerSegment->>ReplacementMap: Store via operator[]
    end
Loading

nullptr) {
innermost.parent_scope->insert(
innermost.parent_insertion_point, allocate);
auto [i, inserted] = replacement_map.try_emplace(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is OK because of the assert inserted below. However, it's not necessary because the value type, Val*, doesn't involve construction.

}

NVF_ERROR(
parallel_type_to_id.try_emplace(parallel_type, id).second,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

// Loop is stream parallelized but allocation is not. Therefore,
// `out` should be allocated outside the loop.
//
// I use try_emplace here so shardByStream is called only when `out`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is outdated.

@wujingyue wujingyue requested a review from mdavis36 January 5, 2026 04:05
@github-actions
Copy link

github-actions bot commented Jan 5, 2026

Review updated until commit 21a25dd

Description

  • Replace incorrect try_emplace usage with direct map access pattern in lowerSegment function

  • Change try_emplace to emplace where insertion success checking is needed

  • Remove outdated comment about try_emplace usage rationale

  • Fix similar try_emplace misuse in device mapping utility function

Changes walkthrough

Relevant files
Bug fix
lowering.cpp
Fix try_emplace usage in lowering logic                                   

csrc/host_ir/lowering.cpp

  • Replace try_emplace with direct map access replacement_map[in] and
    null check
  • Change try_emplace to emplace for proper insertion with success
    validation
  • Remove comment explaining try_emplace usage rationale
  • Update allocation logic to use direct assignment pattern
  • +6/-9     
    utils.cpp
    Fix try_emplace in device mapping utility                               

    csrc/multidevice/utils.cpp

  • Replace try_emplace with emplace in parallel type mapping
  • Maintain error checking for duplicate parallel types
  • +1/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review
    Behavioral Change in Duplicate Detection

    In the second hunk (lines 213-217), the code changes from try_emplace to emplace. This is a significant behavioral change: try_emplace will not overwrite existing keys, while emplace will overwrite them. The original code checked .second to detect when a key already existed (which would be an error according to the SSA comment), but emplace changes this to silently overwrite. This could hide bugs where the same key is processed multiple times.

    auto [i, inserted] = replacement_map.emplace(
        out,
        hir::shardByStream(out, innermost.loop->index(), communication));
    NVF_ERROR(inserted, "The input segmented fusion should be SSA.");
    innermost_scope.pushBack(i->second->definition());
    Similar Duplicate Detection Issue

    In the fourth hunk (lines 117-121), the code similarly changes from try_emplace to emplace. The error message indicates this is meant to detect multiple loop IterDomains with the same parallel type, but emplace will silently overwrite instead of detecting this error condition. This could mask serious logic errors in parallel type mapping.

    parallel_type_to_id.emplace(parallel_type, id).second,
    "Found multiple loop IterDomains with the same parallel type (",
    parallel_type,
    "): ",
    toDelimitedString(domain));

    @wujingyue
    Copy link
    Collaborator Author

    !test

    Copy link
    Collaborator

    @mdavis36 mdavis36 left a comment

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    LGTM

    @wujingyue
    Copy link
    Collaborator Author

    !test

    @wujingyue wujingyue merged commit ba8c704 into main Jan 6, 2026
    43 of 44 checks passed
    @wujingyue wujingyue deleted the wjy/emplace branch January 6, 2026 04:51
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    3 participants