Skip to content

Commit 59d8dae

Browse files
authored
Make GitArchivePackager behavior consistent with subpath and include_pattern (#101)
* Make git packager consistent and add docs Signed-off-by: Hemil Desai <[email protected]> * DockerExecutor Signed-off-by: Hemil Desai <[email protected]> --------- Signed-off-by: Hemil Desai <[email protected]>
1 parent 7527bc8 commit 59d8dae

File tree

3 files changed

+111
-12
lines changed

3 files changed

+111
-12
lines changed

docs/source/guides/execution.md

Lines changed: 42 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@ After configuring NeMo-Run, the next step is to execute it. Nemo-Run decouples c
44

55
Each execution of a single configured task requires an executor. Nemo-Run provides `run.Executor`, which are APIs to configure your remote executor and set up the packaging of your code. Currently we support:
66
- `run.LocalExecutor`
7+
- `run.DockerExecutor`
78
- `run.SlurmExecutor` with an optional `SSHTunnel` for executing on Slurm clusters from your local machine
89
- `run.SkypilotExecutor` (available under the optional feature `skypilot` in the python package).
910

@@ -36,10 +37,13 @@ The packager support matrix is described below:
3637
| Executor | Packagers |
3738
|----------|----------|
3839
| LocalExecutor | run.Packager |
39-
| SlurmExecutor | run.GitArchivePackager |
40-
| SkypilotExecutor | run.GitArchivePackager |
40+
| DockerExecutor | run.Packager, run.GitArchivePackager, run.PatternPackager |
41+
| SlurmExecutor | run.Packager, run.GitArchivePackager, run.PatternPackager |
42+
| SkypilotExecutor | run.Packager, run.GitArchivePackager, run.PatternPackager |
4143

42-
`run.Packager` is a passthrough base packager. `run.GitArchivePackager` uses `git archive` to package your code. Refer to the API reference for `run.GitArchivePackager` to see the exact mechanics of packaging using `git archive`.
44+
`run.Packager` is a passthrough base packager.
45+
46+
`run.GitArchivePackager` uses `git archive` to package your code. Refer to the API reference for `run.GitArchivePackager` to see the exact mechanics of packaging using `git archive`.
4347
At a high level, it works in the following way:
4448
1. base_path = `git rev-parse --show-toplevel`.
4549
2. Optionally define a subpath as `base_path/GitArchivePackager.subpath` by setting `subpath` attribute on `GitArchivePackager`.
@@ -60,6 +64,20 @@ If you're executing a Python function, this working directory will automatically
6064

6165
> **_NOTE:_** git archive doesn't package uncommitted changes. In the future, we may add support for including uncommitted changes while honoring `.gitignore`.
6266
67+
`run.PatternPackager` is a packager that uses a pattern to package your code. It is useful for packaging code that is not under version control. For example, if you have a directory structure like this:
68+
```
69+
- docs
70+
- src
71+
- your_library
72+
```
73+
74+
You can use `run.PatternPackager` to package your code by specifying `include_pattern` as `src/**` and `relative_path` as `os.getcwd()`. This will package the entire `src` directory. The command used to get the list of files to package is:
75+
76+
```bash
77+
# relative_include_pattern = os.path.relpath(self.include_pattern, self.relative_path)
78+
cd {relative_path} && find {relative_include_pattern} -type f
79+
```
80+
6381
### Defining Executors
6482
Next, We'll describe details on setting up each of the executors below.
6583

@@ -69,6 +87,27 @@ The LocalExecutor is the simplest executor. It executes your task locally in a s
6987

7088
The easiest way to define one is to call `run.LocalExecutor()`.
7189

90+
#### DockerExecutor
91+
92+
The DockerExecutor enables launching a task using `docker` on your local machine. It requires `docker` to be installed and running as a prerequisite.
93+
94+
The DockerExecutor uses the [docker python client](https://docker-py.readthedocs.io/en/stable/) and most of the options are passed directly to the client.
95+
96+
Below is an example of configuring a Docker Executor
97+
98+
```python
99+
run.DockerExecutor(
100+
container_image="python:3.12",
101+
num_gpus=-1,
102+
runtime="nvidia",
103+
ipc_mode="host",
104+
shm_size="30g",
105+
volumes=["/local/path:/path/in/container"],
106+
env_vars={"PYTHONUNBUFFERED": "1"},
107+
packager=run.Packager(),
108+
)
109+
```
110+
72111
#### SlurmExecutor
73112

74113
The SlurmExecutor enables launching the configured task on a Slurm Cluster with Pyxis.  Additionally, you can configure a `run.SSHTunnel`, which enables you to execute tasks on the Slurm cluster from your local machine while NeMo-Run manages the SSH connection for you. This setup supports use cases such as launching the same task on multiple Slurm clusters.

src/nemo_run/core/packaging/git.py

Lines changed: 25 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,8 @@ def package(self, path: Path, job_dir: str, name: str) -> str:
107107
assert not bool(
108108
untracked_files
109109
), "Your repo has untracked files. Please track your files via git or set check_untracked_files to False to proceed with packaging."
110+
111+
ctx = Context()
110112
if self.include_pattern:
111113
include_pattern_relative_path = self.include_pattern_relative_path or shlex.quote(
112114
str(git_base_path)
@@ -117,17 +119,31 @@ def package(self, path: Path, job_dir: str, name: str) -> str:
117119
# we first add git files into an uncompressed archive
118120
# then we add an extra files from pattern to that archive
119121
# finally we compress it (cannot compress right away, since adding files is not possible)
120-
cmd = (
121-
f"(cd {shlex.quote(str(git_base_path))} && git ls-files {git_sub_path} "
122-
f"| tar -cf {output_file}.tmp -C {shlex.quote(str(git_base_path))} -T -) "
123-
f"&& (cd {include_pattern_relative_path} && find {relative_include_pattern} -type f "
124-
f"| tar -rf {output_file}.tmp -C {include_pattern_relative_path} -T -) "
125-
f"&& gzip -c {output_file}.tmp > {output_file} && rm {output_file}.tmp"
122+
git_archive_cmd = (
123+
f"git archive --format=tar --output={output_file}.tmp {self.ref}:{git_sub_path}"
126124
)
125+
include_pattern_cmd = f"find {relative_include_pattern} -type f | tar -cf {os.path.join(git_base_path, 'additional.tmp')} -T -"
126+
tar_concatenate_cmd = f"tar -Af {output_file}.tmp additional.tmp"
127+
gzip_cmd = f"gzip -c {output_file}.tmp > {output_file}"
128+
rm_cmd = f"rm {output_file}.tmp additional.tmp"
129+
130+
with ctx.cd(git_base_path):
131+
ctx.run(git_archive_cmd)
132+
133+
with ctx.cd(include_pattern_relative_path):
134+
ctx.run(include_pattern_cmd)
135+
136+
with ctx.cd(git_base_path):
137+
ctx.run(tar_concatenate_cmd)
138+
ctx.run(gzip_cmd)
139+
ctx.run(rm_cmd)
127140
else:
128-
cmd = f"cd {shlex.quote(str(git_base_path))} && git archive --format=tar.gz --output={output_file} {self.ref}:{git_sub_path}"
129-
ctx = Context()
130-
ctx.run(cmd)
141+
with ctx.cd(git_base_path):
142+
git_archive_cmd = (
143+
f"git archive --format=tar.gz --output={output_file} {self.ref}:{git_sub_path}"
144+
)
145+
ctx.run(git_archive_cmd)
146+
131147
return output_file
132148

133149

test/core/packaging/test_git.py

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -162,6 +162,50 @@ def test_package_with_include_pattern(packager, temp_repo):
162162
assert not cmp.diff_files
163163

164164

165+
@patch("nemo_run.core.packaging.git.Context", MockContext)
166+
def test_package_with_include_pattern_and_subpath(packager, temp_repo):
167+
temp_repo = Path(temp_repo)
168+
# Create extra files
169+
(temp_repo / "extra").mkdir()
170+
with open(temp_repo / "extra" / "extra_file1.txt", "w") as f:
171+
f.write("Extra file 1")
172+
with open(temp_repo / "extra" / "extra_file2.txt", "w") as f:
173+
f.write("Extra file 2")
174+
175+
# Create extra files
176+
(temp_repo / "extra2").mkdir()
177+
with open(temp_repo / "extra2" / "extra2_file1.txt", "w") as f:
178+
f.write("Extra file 1")
179+
with open(temp_repo / "extra2" / "extra2_file2.txt", "w") as f:
180+
f.write("Extra file 2")
181+
subprocess.check_call(
182+
[f"cd {temp_repo} && git add extra2 && git commit -m 'Extra2 commit'"], shell=True
183+
)
184+
185+
packager = GitArchivePackager(ref="HEAD", include_pattern="extra", subpath="extra2")
186+
with tempfile.TemporaryDirectory() as job_dir:
187+
output_file = packager.package(Path(temp_repo), job_dir, "test_package")
188+
assert os.path.exists(output_file)
189+
subprocess.check_call(shlex.split(f"mkdir -p {os.path.join(job_dir, 'extracted_output')}"))
190+
subprocess.check_call(
191+
shlex.split(f"tar -xvzf {output_file} -C {os.path.join(job_dir, 'extracted_output')}"),
192+
)
193+
cmp = filecmp.dircmp(
194+
os.path.join(temp_repo, "extra"),
195+
os.path.join(job_dir, "extracted_output", "extra"),
196+
)
197+
assert cmp.left_list == cmp.right_list
198+
assert not cmp.diff_files
199+
200+
cmp = filecmp.dircmp(
201+
os.path.join(temp_repo, "extra2"),
202+
os.path.join(job_dir, "extracted_output"),
203+
ignore=["extra"],
204+
)
205+
assert cmp.left_list == cmp.right_list
206+
assert not cmp.diff_files
207+
208+
165209
@patch("nemo_run.core.packaging.git.Context", MockContext)
166210
def test_package_with_include_pattern_multiple_directories(packager, temp_repo):
167211
temp_repo = Path(temp_repo)

0 commit comments

Comments
 (0)