Skip to content

Commit ea11197

Browse files
d4l3kfacebook-github-bot
authored andcommitted
local_scheduler: add local_cwd scheduler + improve default handling (#203)
Summary: * `local_cwd`: This adds a new scheduler/image provider that overrides the image with the current working directory. This is intended to be a better approach for local launching since it doesn't require interacting with the images. * This removes `local_dir` scheduler in favor of `local_cwd`. * Changes how the default scheduler is specified since it's unclear what `default` even means without looking at the code and instead the CLI specifies the default via argparse default. * Updates the documentation to use local_cwd and local_docker instead. Pull Request resolved: #203 Test Plan: ``` $ pytest $ pyre $ env O="--port 4444" make clean livehtml ``` Go to quickstart page and ensure command run without any changes. CI Reviewed By: aivanou Differential Revision: D31178356 Pulled By: d4l3k fbshipit-source-id: e1fff784184b1b3a41a01cbed2d96dec2345f774
1 parent ad80696 commit ea11197

File tree

15 files changed

+217
-131
lines changed

15 files changed

+217
-131
lines changed

docs/source/quickstart.rst

Lines changed: 17 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Echo looks familiar and simple. Lets understand how to run ``utils.echo``.
2626

2727
.. code-block:: shell-session
2828
29-
$ torchx run --scheduler local utils.echo --help
29+
$ torchx run --scheduler local_cwd utils.echo --help
3030
usage: torchx run echo [-h] [--msg MSG]
3131
3232
Echos a message
@@ -39,7 +39,7 @@ We can see that it takes a ``--msg`` argument. Lets try running it locally
3939

4040
.. code-block:: shell-session
4141
42-
$ torchx run --scheduler local utils.echo --msg "hello world"
42+
$ torchx run --scheduler local_cwd utils.echo --msg "hello world"
4343
4444
.. note:: ``echo`` in this context is just an app spec. It is not the application
4545
logic itself but rather just the "job definition" for running `/bin/echo`.
@@ -83,7 +83,7 @@ Now copy paste the following into ``test.py``
8383
specs.Role(
8484
name="echo",
8585
entrypoint="/bin/echo",
86-
image="/tmp",
86+
image="ubuntu:latest",
8787
args=[f"replica #{specs.macros.replica_id}: {msg}"],
8888
num_replicas=num_replicas,
8989
)
@@ -94,17 +94,15 @@ Notice that
9494

9595
1. Unlike ``--msg``, ``--num_replicas`` does not have a default value
9696
indicating that it is a required argument.
97-
2. We use a local dir (``/tmp``) as the ``image``. In practice this will be
98-
the identifier of the package (e.g. Docker image) that the scheduler supports.
99-
3. ``test.py`` does **not** contain the logic of the app and is
97+
2. ``test.py`` does **not** contain the logic of the app and is
10098
simply a job definition.
10199

102100

103101
Now lets try running our custom ``echo``
104102

105103
.. code-block:: shell-session
106104
107-
$ torchx run --scheduler local ~/test.py:echo --num_replicas 4 --msg "foobar"
105+
$ torchx run --scheduler local_cwd ~/test.py:echo --num_replicas 4 --msg "foobar"
108106
109107
replica #0: foobar
110108
replica #1: foobar
@@ -113,13 +111,14 @@ Now lets try running our custom ``echo``
113111
114112
Running on Other Images
115113
-----------------------------
116-
So far we've run ``utils.echo`` with ``image=/tmp``. This means that the
117-
``entrypoint`` we specified is relative to ``/tmp``. That did not matter for us
114+
So far we've run ``utils.echo`` with the ``local_cwd`` scheduler. This means that the
115+
``entrypoint`` we specified is relative to the current working directory and
116+
ignores the specified image. That did not matter for us
118117
since we specified an absolute path as the entrypoint (``entrypoint=/bin/echo``).
119-
Had we specified ``entrypoint=echo`` the local scheduler would have tried to invoke
120-
``/tmp/echo``.
118+
Had we specified ``entrypoint=echo`` the local_cwd scheduler would have tried to invoke
119+
``echo`` relative to the current directory and the specified PATH.
121120

122-
If you have a pre-built application binary, setting the image to a local directory is a
121+
If you have a pre-built application binary, using local_cwd is a
123122
quick way to validate the application and the ``specs.AppDef``. But its not all
124123
that useful if you want to run the application on a remote scheduler
125124
(see :ref:`quickstart:Running On Other Schedulers`).
@@ -128,18 +127,8 @@ that useful if you want to run the application on a remote scheduler
128127
supported by the scheduler. Refer to the scheduler documentation to find out
129128
what container image is supported by the scheduler you want to use.
130129

131-
For ``local`` scheduler we can see that it supports both a local directory
132-
and docker as the image:
133-
134-
.. code-block:: shell-session
135-
136-
$ torchx runopts local
137-
138-
{ 'image_type': { 'default': 'dir',
139-
'help': 'image type. One of [dir, docker]',
140-
'type': 'str'},
141-
... <omitted for brevity> ...
142-
130+
To match remote image behavior we can use the ``local_docker`` scheduler which
131+
will launch the image via docker and run the same application.
143132

144133
.. note:: Before proceeding, you will need docker installed. If you have not done so already
145134
follow the install instructions on: https://docs.docker.com/get-docker/
@@ -178,8 +167,7 @@ Try running the echo app
178167

179168
.. code-block:: shell-session
180169
181-
$ torchx run --scheduler local \
182-
--scheduler_args image_type=docker \
170+
$ torchx run --scheduler local_docker \
183171
~/test.py:echo \
184172
--num_replicas 4 \
185173
--msg "foobar from docker!"
@@ -209,15 +197,15 @@ required by the scheduler you are planning to use
209197
.. code-block:: shell-session
210198
211199
$ torchx runopts <sched_name>
212-
$ torchx runopts local
200+
$ torchx runopts local_docker
213201
214202
Now that you've figured out what scheduler args are required, launch your app
215203

216204
.. code-block:: shell-session
217205
218206
$ torchx run --scheduler <sched_name> --scheduler_args <k1=v1,k2=v2,...> \
219207
utils.sh ~/my_app.py <app_args...>
220-
$ torchx run --scheduler local --scheduler_args image_type=dir,log_dir=/tmp \
208+
$ torchx run --scheduler local_cwd --scheduler_args log_dir=/tmp \
221209
utils.sh ~/my_app.py --foo=bar
222210
223211
.. note:: If your app args overlap with the ``run`` subcommand's args, you
@@ -227,7 +215,7 @@ Now that you've figured out what scheduler args are required, launch your app
227215

228216
.. code-block:: shell-session
229217
230-
$ torchx run --scheduler local ~/my_app.py -- --scheduler foobar
218+
$ torchx run --scheduler local_docker ~/my_app.py -- --scheduler foobar
231219
232220
233221
Next Steps

docs/source/schedulers.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,11 @@ torchx.schedulers
55
.. currentmodule:: torchx.schedulers
66

77
.. autofunction:: get_schedulers
8+
.. autofunction:: get_scheduler_factories
9+
.. autofunction:: get_default_scheduler_name
810

911
.. autoclass:: Scheduler
10-
:members:
12+
:members:
13+
14+
.. autoclass:: SchedulerFactory
15+
:members:

docs/source/schedulers/local.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ Image Providers
1313
.. autoclass:: ImageProvider
1414
:members:
1515

16-
.. autoclass:: LocalDirectoryImageProvider
16+
.. autoclass:: DockerImageProvider
1717
:members:
1818

19-
.. autoclass::DockerImageProvider
19+
.. autoclass:: CWDImageProvider
2020
:members:

torchx/cli/__init__.py

Lines changed: 21 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -31,13 +31,23 @@
3131
3. touch
3232
... <omitted for brevity>
3333
34-
Listing the supported schedulers
35-
----------------------------------
34+
Listing the supported schedulers and arguments
35+
-------------------------------------------------
3636
To get a list of supported schedulers that you can launch your job into run:
3737
3838
.. code-block:: shell-session
3939
40-
$ torchx schedulers
40+
$ torchx runopts
41+
local_docker:
42+
{ 'log_dir': { 'default': 'None',
43+
'help': 'dir to write stdout/stderr log files of replicas',
44+
'type': 'str'}}
45+
local_cwd:
46+
...
47+
slurm:
48+
...
49+
kubernetes:
50+
...
4151
4252
Running a component as a job
4353
---------------------------------
@@ -80,27 +90,24 @@ def my_trainer(foo: int, bar: str) -> specs.AppDef:
8090
8191
2. arguments to the scheduler (``--scheduler_args``, also known as ``run_options`` or ``run_configs``),
8292
each scheduler takes different args, to find out the args for a specific scheduler run (command for
83-
``local`` scheduler shown below:
93+
``local_cwd`` scheduler shown below:
8494
8595
.. code-block:: shell-session
8696
87-
$ torchx runopts local
88-
{ 'image_fetcher': { 'default': 'dir',
89-
'help': 'image fetcher type',
90-
'type': 'str'},
91-
'log_dir': { 'default': 'None',
97+
$ torchx runopts local_cwd
98+
{ 'log_dir': { 'default': 'None',
9299
'help': 'dir to write stdout/stderr log files of replicas',
93100
'type': 'str'}}
94101
95102
# pass run options as comma-delimited k=v pairs
96-
$ torchx run --scheduler local --scheduler_args image_fetcher=dir,log_dir=/tmp ...
103+
$ torchx run --scheduler local_cwd --scheduler_args log_dir=/tmp ...
97104
98105
3. arguments to the component (the app args are included here), this also depends on the
99106
component and can be seen with the ``--help`` string on the component
100107
101108
.. code-block:: shell-session
102109
103-
$ torchx run --scheduler local utils.echo --help
110+
$ torchx run --scheduler local_cwd utils.echo --help
104111
usage: torchx run echo.torchx [-h] [--msg MSG]
105112
106113
Echos a message
@@ -109,11 +116,11 @@ def my_trainer(foo: int, bar: str) -> specs.AppDef:
109116
-h, --help show this help message and exit
110117
--msg MSG Message to echo
111118
112-
Putting everything together, running ``echo`` with the ``local`` scheduler:
119+
Putting everything together, running ``echo`` with the ``local_cwd`` scheduler:
113120
114121
.. code-block:: shell-session
115122
116-
$ torchx run --scheduler local --scheduler_args image_fetcher=dir,log_dir=/tmp utils.echo --msg "hello $USER"
123+
$ torchx run --scheduler local_cwd --scheduler_args log_dir=/tmp utils.echo --msg "hello $USER"
117124
=== RUN RESULT ===
118125
Launched app: local://torchx_kiuk/echo_ecd30f74
119126
@@ -137,7 +144,7 @@ def my_trainer(foo: int, bar: str) -> specs.AppDef:
137144
138145
.. code-block:: shell-session
139146
140-
$ torchx run --dryrun utils.echo --msg hello_world
147+
$ torchx run --dryrun utils.echo --msg hello_world
141148
=== APPLICATION ===
142149
{ 'metadata': {},
143150
'name': 'echo',

torchx/cli/cmd_run.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@
1515
from pyre_extensions import none_throws
1616
from torchx.cli.cmd_base import SubCommand
1717
from torchx.runner import Runner, get_runner
18-
from torchx.schedulers import get_scheduler_factories
18+
from torchx.schedulers import get_scheduler_factories, get_default_scheduler_name
1919
from torchx.specs.finder import (
2020
_Component,
2121
get_components,
@@ -77,7 +77,7 @@ def add_arguments(self, subparser: argparse.ArgumentParser) -> None:
7777
"--scheduler",
7878
type=str,
7979
help=f"Name of the scheduler to use. One of: [{','.join(scheduler_names)}]",
80-
default="default",
80+
default=get_default_scheduler_name(),
8181
)
8282
subparser.add_argument(
8383
"--scheduler_args",
@@ -140,7 +140,7 @@ def _run(self, runner: Runner, args: argparse.Namespace) -> None:
140140
app_handle = cast(specs.AppHandle, result)
141141
print(app_handle)
142142

143-
if args.scheduler == "local":
143+
if args.scheduler.startswith("local"):
144144
self._wait_and_exit(runner, app_handle)
145145
else:
146146
logger.info("=== RUN RESULT ===")

torchx/cli/test/cmd_run_test.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ def test_run_with_user_conf_abs_path(self) -> None:
4545
args = self.parser.parse_args(
4646
[
4747
"--scheduler",
48-
"local",
48+
"local_cwd",
4949
str(Path(__file__).parent / "components.py:touch"),
5050
"--file",
5151
str(self.tmpdir / "foobar.txt"),
@@ -60,7 +60,7 @@ def test_run_with_relpath(self) -> None:
6060
args = self.parser.parse_args(
6161
[
6262
"--scheduler",
63-
"local",
63+
"local_cwd",
6464
str(Path(__file__).parent / "components.py:touch_v2"),
6565
"--file",
6666
str(self.tmpdir / "foobar.txt"),
@@ -83,7 +83,7 @@ def test_run_terminate_on_received_signal(
8383
args = self.parser.parse_args(
8484
[
8585
"--scheduler",
86-
"local",
86+
"local_cwd",
8787
str(Path(__file__).parent / "components.py:touch_v2"),
8888
"--file",
8989
str(self.tmpdir / "foobar.txt"),
@@ -99,7 +99,7 @@ def test_run_missing(self) -> None:
9999
args = self.parser.parse_args(
100100
[
101101
"--scheduler",
102-
"local",
102+
"local_cwd",
103103
"1234_does_not_exist.torchx",
104104
]
105105
)
@@ -111,7 +111,7 @@ def test_run_dryrun(self, mock_runner_run: MagicMock) -> None:
111111
[
112112
"--dryrun",
113113
"--scheduler",
114-
"local",
114+
"local_cwd",
115115
"utils.echo",
116116
"--image",
117117
"/tmp",

torchx/cli/test/main_test.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
# This source code is licensed under the BSD-style license found in the
66
# LICENSE file in the root directory of this source tree.
77

8+
import os
89
import unittest
910
from pathlib import Path
1011

@@ -17,17 +18,22 @@
1718

1819

1920
class CLITest(unittest.TestCase):
21+
def setUp(self) -> None:
22+
self.old_cwd = os.getcwd()
23+
os.chdir(_root / "container")
24+
25+
def tearDown(self) -> None:
26+
os.chdir(self.old_cwd)
27+
2028
def test_run_abs_config_path(self) -> None:
2129
main(
2230
[
2331
"run",
2432
"--scheduler",
25-
"local",
33+
"local_cwd",
2634
str(_root / "components.py:simple"),
2735
"--num_trainers",
2836
"2",
29-
"--trainer_image",
30-
str(_root / "container"),
3137
]
3238
)
3339

@@ -36,11 +42,9 @@ def test_run_builtin_config(self) -> None:
3642
[
3743
"run",
3844
"--scheduler",
39-
"local",
45+
"local_cwd",
4046
_SIMPLE_CONF,
4147
"--num_trainers",
4248
"2",
43-
"--trainer_image",
44-
str(_root / "container"),
4549
]
4650
)

torchx/components/__init__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,11 +20,11 @@
2020
2121
# using via sdk
2222
from torchx.runner import get_runner
23-
get_runner().run_component("distributed.ddp", app_args=[], scheduler="local", ...)
23+
get_runner().run_component("distributed.ddp", app_args=[], scheduler="local_cwd", ...)
2424
2525
# using via torchx-cli
2626
27-
>> torchx run --scheduler local distributed.ddp --param1 --param2
27+
>> torchx run --scheduler local_cwd distributed.ddp --param1 --param2
2828
2929
3030
Components development

0 commit comments

Comments
 (0)