Skip to content

Commit 40d6f4c

Browse files
authored
Merge pull request #5605 from garlick/rexec_shell
shell: add rexec plugin
2 parents 650223d + 6693d11 commit 40d6f4c

File tree

12 files changed

+544
-100
lines changed

12 files changed

+544
-100
lines changed

doc/Makefile.am

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -382,7 +382,6 @@ if ENABLE_DOCS
382382
man_MANS = $(MAN1_FILES) $(MAN3_FILES) $(MAN5_FILES) $(MAN7_FILES)
383383
$(RST_FILES): \
384384
man1/common/resources.rst \
385-
man1/common/nodeset.rst \
386385
man1/common/job-param-additional.rst \
387386
man1/common/job-param-batch.rst \
388387
man1/common/job-param-common.rst \
@@ -461,7 +460,6 @@ EXTRA_DIST = \
461460
$(RST_FILES) \
462461
man1/index.rst \
463462
man1/common/resources.rst \
464-
man1/common/nodeset.rst \
465463
man1/common/job-param-additional.rst \
466464
man1/common/job-param-batch.rst \
467465
man1/common/job-param-common.rst \

doc/man1/common/nodeset.rst

Lines changed: 0 additions & 22 deletions
This file was deleted.

doc/man1/flux-exec.rst

Lines changed: 76 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -2,66 +2,85 @@
22
flux-exec(1)
33
============
44

5-
65
SYNOPSIS
7-
--------
8-
**flux** **exec** [--noinput] [*--label-io] [*—dir=DIR'] [*--rank=NODESET*] [*--verbose*] COMMANDS...
6+
========
7+
8+
**flux** **exec** [*--noinput*] [*--label-io*] [*—dir=DIR*] [*--rank=IDSET*] [*--verbose*] *COMMAND...*
99

1010
DESCRIPTION
1111
===========
1212

1313
.. program:: flux exec
1414

15-
:program:`flux exec` runs commands across one or more Flux broker ranks using
16-
the *broker.exec* service. The commands are executed as direct children
17-
of the broker, and the broker handles buffering stdout and stderr and
18-
sends the output back to :program:`flux exec` which copies output to its own
19-
stdout and stderr.
20-
21-
On receipt of SIGINT and SIGTERM signals, :program:`flux exec` shall forward
22-
the received signal to all currently running remote processes.
23-
24-
In the event subprocesses are hanging or ignoring SIGINT, two SIGINT
25-
signals (typically sent via Ctrl+C) in short succession can force
26-
:program:`flux exec` to exit.
27-
28-
:program:`flux exec` is meant as an administrative and test utility, and cannot
29-
be used to launch Flux jobs.
15+
:program:`flux exec` remotely executes one or more copies of *COMMAND*,
16+
similar to :linux:man1:`pdsh`. It bypasses the scheduler and is intended
17+
for launching administrative commands or tool daemons, not for launching
18+
parallel jobs. For that, see :man1:`flux-run`.
3019

20+
By default, *COMMAND* runs across all :man1:`flux-broker` processes. If the
21+
:option:`--jobid` option is specified, the commands are run across a job's
22+
:man1:`flux-shell` processes. Normally this means that one copy of *COMMAND*
23+
is executed per node, but in unusual cases it could mean more (e.g. if the
24+
Flux instance was started with multiple brokers per node).
3125

32-
EXIT STATUS
33-
===========
34-
35-
In the case that all processes are successfully launched, the exit status
36-
of :program:`flux exec` is the largest of the remote process exit codes.
37-
38-
If a non-existent rank is targeted, :program:`flux exec` will return with
39-
code 68 (EX_NOHOST from sysexits.h).
40-
41-
If one or more remote commands are terminated by a signal, then
42-
:program:`flux exec` exits with exit code 128+signo.
26+
Standard output and standard error of the remote commands are captured
27+
and combined on the :program:`flux exec` standard output and standard error.
28+
Standard input of :program:`flux exec` is captured and broadcast to standard
29+
input of the remote commands.
4330

31+
On receipt of SIGINT and SIGTERM signals, :program:`flux exec` forwards
32+
the received signal to the remote processes. When standard input of
33+
:program:`flux exec` is a terminal, :kbd:`Control-C` may be used to send
34+
SIGINT. Two of those in short succession can force :program:`flux exec`
35+
to exit in the event that remote processes are hanging.
4436

4537
OPTIONS
4638
=======
4739

4840
.. option:: -l, --label-io
4941

50-
Label lines of output with the source RANK.
42+
Label lines of output with the source broker RANK. This option is not
43+
affected by :option:`--jobid`.
5144

5245
.. option:: -n, --noinput
5346

5447
Do not attempt to forward stdin. Send EOF to remote process stdin.
5548

5649
.. option:: -d, --dir=DIR
5750

58-
Set the working directory of remote *COMMANDS* to *DIR*. The default is to
51+
Set the working directory of remote *COMMAND* to *DIR*. The default is to
5952
propagate the current working directory of flux-exec(1).
6053

61-
.. option:: -r, --rank=NODESET
54+
.. option:: -r, --rank=IDSET
55+
56+
Target specific ranks, where *IDSET* is a set of zero-origin node ranks in
57+
RFC 22 format. If :option:`--jobid` is specified, the ranks are interpreted
58+
as an index into the list of nodes assigned to the job. Otherwise, they
59+
refer to the nodes assigned to the Flux instance.
60+
61+
The default is to target all ranks. As a special case, :option:`--rank=all`
62+
is accepted and behaves the same as the default.
63+
64+
.. option:: -x, --exclude=IDSET
6265

63-
Target specific ranks in *NODESET*. Default is to target "all" ranks.
64-
See `NODESET FORMAT`_ below for more information.
66+
Exclude specific ranks. *IDSET* is as described in :option:`--rank`.
67+
68+
.. option:: -j, --jobid=JOBID
69+
70+
Run *COMMAND* on the nodes allocated to *JOBID* instead of the nodes
71+
assigned to the Flux instance.
72+
73+
This uses the exec service embedded in :man1:`flux-shell` rather than
74+
:man1:`flux-broker`.
75+
76+
The interpretation of :option:`--rank` and :option:`--exclude` is adjusted
77+
as noted in their descriptions. For example, :option:`flux exec -j ID -r 0`
78+
will run only on the first node assigned to *JOBID*, and
79+
:option:`flux exec -j ID -x 0` will run on all nodes assigned to *JOBID*
80+
except the first node.
81+
82+
This option is only available when the job owner is the same as the Flux
83+
instance owner.
6584

6685
.. option:: -v, --verbose
6786

@@ -73,22 +92,40 @@ OPTIONS
7392

7493
.. option:: --with-imp
7594

76-
Prepend the full path to :program:`flux-imp run` to *COMMANDS*. This option
95+
Prepend the full path to :program:`flux-imp run` to *COMMAND*. This option
7796
is mostly meant for testing or as a convenience to execute a configured
7897
``prolog`` or ``epilog`` command under the IMP. Note: When this option is
7998
used, or if :program:`flux-imp` is detected as the first argument of
80-
*COMMANDS*, :program:`flux exec` will use :program:`flux-imp kill` to
99+
*COMMAND*, :program:`flux exec` will use :program:`flux-imp kill` to
81100
signal remote commands instead of the normal builtin subprocess signaling
82101
mechanism.
83102

103+
CAVEATS
104+
=======
105+
106+
In a multi-user flux instance, access to the rank 0 broker execution
107+
service is restricted to requests that originate from the local broker.
108+
Therefore, :program:`flux exec` (without :option:`--jobid`) must be run
109+
from the rank 0 broker if rank 0 is included in the target *IDSET*.
110+
111+
EXIT STATUS
112+
===========
84113

85-
NODESET FORMAT
86-
==============
114+
In the case that all processes are successfully launched, the exit status
115+
of :program:`flux exec` is the largest of the remote process exit codes.
87116

88-
.. include:: common/nodeset.rst
117+
If a non-existent rank is targeted, :program:`flux exec` will return with
118+
code 68 (EX_NOHOST from sysexits.h).
89119

120+
If one or more remote commands are terminated by a signal, then
121+
:program:`flux exec` exits with exit code 128+signo.
90122

91123
RESOURCES
92124
=========
93125

94126
.. include:: common/resources.rst
127+
128+
FLUX RFC
129+
========
130+
131+
:doc:`rfc:spec_22`

doc/test/spell.en.pws

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,6 @@ graphviz
6565
prepended
6666
multipart
6767
Tpng
68-
nodesets
6968
keepalives
7069
emerg
7170
gettimeofday
@@ -117,7 +116,6 @@ ary
117116
baz
118117
EPGM
119118
modopts
120-
nodeset
121119
noexec
122120
pre
123121
slurm
@@ -831,3 +829,4 @@ unlinks
831829
VM
832830
fred
833831
unmapped
832+
kbd

0 commit comments

Comments
 (0)