Skip to content

Commit 7ba8f98

Browse files
grondomergify[bot]
authored andcommitted
cmd: support addition of node 0 brokers in flux-batch/alloc
Problem: For subinstances of Flux with large node counts, it may be useful to launch an extra set of brokers on node 0 of the allocation to assist with handling message distribution, but there's no easy way for users to request that in flux-alloc and flux-batch. Add a new `--add-brokers=N` option to `flux alloc` and `flux batch` which requests that `N` extra brokers be started on node 0 of the allocation. The option is hidden and undocumented for now in case a better solution is implemented in the future. The `--add-brokers` option is only available when a number of nodes is explicitly requested with `-N, --nodes`. The `--add-brokers` option adjusts the job's taskmap to force extra brokers onto the first allocated node and automatically excludes the extra ranks to ensure their resources are not available for scheduling. The jobspec is also updated to reflect the updated task count.
1 parent 221eb5f commit 7ba8f98

File tree

1 file changed

+33
-1
lines changed
  • src/bindings/python/flux/cli

1 file changed

+33
-1
lines changed

src/bindings/python/flux/cli/base.py

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1661,6 +1661,9 @@ class BatchAllocCmd(MiniCmd):
16611661
def __init__(self, prog, usage=None, description=None, exclude_io=True):
16621662
self.t0 = None
16631663
super().__init__(prog, usage, description, exclude_io)
1664+
self.parser.add_argument(
1665+
"--add-brokers", default=0, type=int, help=argparse.SUPPRESS
1666+
)
16641667
self.parser.add_argument(
16651668
"--conf",
16661669
metavar="CONF",
@@ -1739,6 +1742,35 @@ def init_common(self, args):
17391742
args.broker_opts = args.broker_opts or []
17401743
args.broker_opts.append("-Scontent.dump=" + args.dump)
17411744

1745+
if args.add_brokers > 0:
1746+
if not args.nodes:
1747+
raise ValueError(
1748+
"--add-brokers may only be specified with -N, --nnodes"
1749+
)
1750+
nbrokers = args.add_brokers
1751+
nnodes = args.nodes
1752+
1753+
# Force update taskmap with extra ranks on nodeid 0:
1754+
args.taskmap = f"manual:[[0,1,{1+nbrokers},1],[1,{nnodes-1},1,1]]"
1755+
1756+
# Exclude the additional brokers via configuration. However,
1757+
# don't throw away any ranks already excluded bythe user.
1758+
# Note: raises an exception if user excluded by hostname (unlikely)
1759+
exclude = IDset(args.conf.get("resource.exclude", default="")).set(
1760+
1, nbrokers
1761+
)
1762+
args.conf.update(f'resource.exclude="{exclude}"')
1763+
17421764
def update_jobspec_common(self, args, jobspec):
17431765
"""Common jobspec update code for batch/alloc"""
1744-
pass
1766+
# If args.add_brokers is being used, update jobspec task count
1767+
# to accurately reflect the updated task count.
1768+
if args.add_brokers > 0:
1769+
# Note: args.nodes required with add_brokers already checked above
1770+
total_tasks = args.nodes + args.add_brokers
1771+
1772+
# Overwrite task count with new total_tasks:
1773+
jobspec.tasks[0]["count"] = {"total": total_tasks}
1774+
1775+
# remove per-resource shell option which is no longer necessary:
1776+
del jobspec.attributes["system"]["shell"]["options"]["per-resource"]

0 commit comments

Comments
 (0)