-
Notifications
You must be signed in to change notification settings - Fork 141
Open
Milestone
Description
I"m implementing a custom alltoall, which relies on multiple scatter steps and custom communicators. When scatter is called on one of the custom communicators, it appears to be non-blocking, which results in an error that the EventQueue is empty. The simulation does finish and report latencies though.
Adding a scatter on any communicator(s) that include the ranks not in the first communicator corrects the issue.
Link to example code: https://github.com/shannong/sst-elements/blob/multilevel-hierarchical/src/sst/elements/ember/mpi/motifs/emberalltoall.cc#L189
if the else block from 198 - 201 is removed, the event queue error is printed in the output.
Example simulation:
import sst
from sst.merlin.base import *
from sst.merlin.endpoint import *
from sst.merlin.interface import *
from sst.merlin.topology import *
from sst.ember import *
if __name__ == "__main__":
PlatformDefinition.setCurrentPlatform("firefly-defaults")
sst.setStatisticLoadLevel(15)
sst.enableAllStatisticsForAllComponents()
sst.setStatisticOutput("sst.statOutputConsole")
### set up the topology
topo = topoDragonFly()
topo.hosts_per_router = 4
topo.routers_per_group = 32
topo.intergroup_links = 4
topo.num_groups = 2
topo.algorithm = ["minimal", "ugal"]
group_size = topo.hosts_per_router * topo.routers_per_group
# Set up the routers
router = hr_router()
router.link_bw = "25GB/s"
router.flit_size = "8B"
router.xbar_bw = "30GB/s"
router.input_latency = "20ns"
router.output_latency = "20ns"
router.input_buf_size = "256kB"
router.output_buf_size = "256kB"
router.num_vns = 2
router.xbar_arb = "merlin.xbar_arb_lru"
topo.router = router
topo.link_latency = "20ns"
networkif = ReorderLinkControl()
networkif.link_bw = "25GB/s"
networkif.input_buf_size = "256kB"
networkif.output_buf_size = "256kB"
ep = EmberMPIJob(0, topo.getNumNodes(), numCores=1)
ep.network_interface = networkif
ep.addMotif("Init")
ep.addMotif("Alltoall") # look at different sizes here (< 500 bytes, 500 < n < 8k, > 8k)
ep.addMotif("Fini")
ep.nic.nic2host_lat="100ns"
system = System()
system.setTopology(topo, 1)
system.allocateNodes(ep, "linear")
system.build()
sst.setStatisticLoadLevel(16)
sst.enableAllStatisticsForAllComponents()
sst.setStatisticOutput("sst.statOutputCSV")
sst.setStatisticOutputOptions({
"filepath" : "/users/skinkead/carc-scratch/frontier/hierarchical/hierarchical1-2-frontier.csv",
"separator" : ", "
})
Metadata
Metadata
Assignees
Labels
No labels