Skip to content

Scatter is nonblocking when called on a communicator other than GroupWorld #2456

@shannong

Description

@shannong

I"m implementing a custom alltoall, which relies on multiple scatter steps and custom communicators. When scatter is called on one of the custom communicators, it appears to be non-blocking, which results in an error that the EventQueue is empty. The simulation does finish and report latencies though.

Adding a scatter on any communicator(s) that include the ranks not in the first communicator corrects the issue.

Link to example code: https://github.com/shannong/sst-elements/blob/multilevel-hierarchical/src/sst/elements/ember/mpi/motifs/emberalltoall.cc#L189

if the else block from 198 - 201 is removed, the event queue error is printed in the output.

Example simulation:

import sst
from sst.merlin.base import * 
from sst.merlin.endpoint import *
from sst.merlin.interface import *
from sst.merlin.topology import *

from sst.ember import *

if __name__ == "__main__":

    PlatformDefinition.setCurrentPlatform("firefly-defaults")

    sst.setStatisticLoadLevel(15)
    sst.enableAllStatisticsForAllComponents()
    sst.setStatisticOutput("sst.statOutputConsole")
    

    ### set up the topology
    topo = topoDragonFly()
    topo.hosts_per_router = 4
    topo.routers_per_group = 32
    topo.intergroup_links = 4
    topo.num_groups = 2
    topo.algorithm = ["minimal", "ugal"]

    group_size = topo.hosts_per_router * topo.routers_per_group

    # Set up the routers
    router = hr_router()
    router.link_bw = "25GB/s"
    router.flit_size = "8B"
    router.xbar_bw = "30GB/s"
    router.input_latency = "20ns"
    router.output_latency = "20ns"
    router.input_buf_size = "256kB"
    router.output_buf_size = "256kB"
    router.num_vns = 2
    router.xbar_arb = "merlin.xbar_arb_lru"

    topo.router = router
    topo.link_latency = "20ns"      

    networkif = ReorderLinkControl()
    networkif.link_bw = "25GB/s"
    networkif.input_buf_size = "256kB" 
    networkif.output_buf_size = "256kB"

    ep = EmberMPIJob(0, topo.getNumNodes(), numCores=1)
    ep.network_interface = networkif
    ep.addMotif("Init")
    ep.addMotif("Alltoall") # look at different sizes here (< 500 bytes, 500 < n < 8k, > 8k) 
    ep.addMotif("Fini")
    ep.nic.nic2host_lat="100ns"
    
    system = System()
    system.setTopology(topo, 1)
    system.allocateNodes(ep, "linear")

    system.build()

    sst.setStatisticLoadLevel(16)
    sst.enableAllStatisticsForAllComponents()
    sst.setStatisticOutput("sst.statOutputCSV")
    sst.setStatisticOutputOptions({
        "filepath" : "/users/skinkead/carc-scratch/frontier/hierarchical/hierarchical1-2-frontier.csv",
        "separator" : ", "
    })

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions