Skip to content

Commit 0c68809

Browse files
Documentation Restructurting (#350)
* Fixing documentation layout Signed-off-by: Andrew Schilling <[email protected]> * documentation.md Signed-off-by: Andrew Schilling <[email protected]> * Removing live-server Signed-off-by: Andrew Schilling <[email protected]> * Correctin .vscode Signed-off-by: Andrew Schilling <[email protected]> --------- Signed-off-by: Andrew Schilling <[email protected]>
1 parent f104fe6 commit 0c68809

File tree

15 files changed

+206
-67
lines changed

15 files changed

+206
-67
lines changed
Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -33,18 +33,34 @@
3333
"sphinx.ext.githubpages",
3434
"sphinx.ext.napoleon",
3535
"sphinxcontrib.mermaid",
36+
"sphinx_copybutton",
37+
"sphinx_new_tab_link",
3638
]
3739

3840
templates_path = ["_templates"]
39-
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
41+
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store", "documentation.md"]
42+
43+
# -- Options for MyST Parser (Markdown) --------------------------------------
44+
# MyST Parser settings
45+
myst_enable_extensions = [
46+
"dollarmath", # Enables dollar math for inline math
47+
"amsmath", # Enables LaTeX math for display mode
48+
"colon_fence", # Enables code blocks using ::: delimiters instead of ```
49+
"deflist", # Supports definition lists with term: definition format
50+
"fieldlist", # Enables field lists for metadata like :author: Name
51+
"tasklist", # Adds support for GitHub-style task lists with [ ] and [x]
52+
]
53+
myst_heading_anchors = 5 # Generates anchor links for headings up to level 5
54+
myst_fence_as_directive = ["mermaid"]
55+
4056
python_maximum_signature_line_length = 88
4157

4258
# Autoapi settings
4359
autoapi_generate_api_docs = True
4460
autoapi_keep_files = False
4561
autoapi_add_toctree_entry = False
4662
autoapi_type = "python"
47-
autoapi_dirs = ["../../nemo_run"]
63+
autoapi_dirs = ["../nemo_run"]
4864
autoapi_file_pattern = "*.py"
4965
autoapi_root = "api"
5066
autoapi_options = [
@@ -58,10 +74,6 @@
5874
# Autodoc settings
5975
autodoc_typehints = "signature"
6076

61-
# MyST settings
62-
myst_heading_anchors = 3
63-
myst_fence_as_directive = ["mermaid"]
64-
6577
# Napoleon settings
6678
napoleon_google_docstring = True
6779
napoleon_numpy_docstring = True

docs/documentation.md

Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
# Documentation Development
2+
3+
- [Documentation Development](#documentation-development)
4+
- [Build the Documentation](#build-the-documentation)
5+
- [Live Building](#live-building)
6+
7+
8+
## Build the Documentation
9+
10+
The following sections describe how to set up and build the NeMo RL documentation.
11+
12+
Switch to the documentation source folder and generate HTML output.
13+
14+
```sh
15+
cd docs/
16+
uv run --group docs sphinx-build . _build/html
17+
```
18+
19+
* The resulting HTML files are generated in a `_build/html` folder that is created under the project `docs/` folder.
20+
* The generated python API docs are placed in `apidocs` under the `docs/` folder.
21+
22+
## Checking for Broken Links
23+
24+
To check for broken http links in the docs, run this command:
25+
26+
```sh
27+
cd docs/
28+
uv run --group docs sphinx-build --builder linkcheck . _build/linkcheck
29+
```
30+
31+
It will output a JSON file at `_build/linkcheck/output.json` with links it found while building the
32+
docs. Records will have a status of `broken` if the link is not reachable. The `docs/conf.py` file is
33+
configured to ignore github links because the CI test will often experience rate limit errors.
34+
Comment out the `linkcheck_ignore` variable there to check all the links.
35+
36+
## Live Building
37+
38+
When writing documentation, it can be helpful to serve the documentation and have it update live while you edit.
39+
40+
To do so, run:
41+
42+
```sh
43+
cd docs/
44+
uv run --group docs sphinx-autobuild . _build/html --port 12345 --host 0.0.0.0
45+
```
46+
47+
Open a web browser and go to `http://${HOST_WHERE_SPHINX_COMMAND_RUN}:12345` to view the output.
File renamed without changes.
Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
NeMo Run CLI is a Python-based command-line tool designed to efficiently configure and execute machine learning experiments. It provides a type-safe, Python-centric alternative to argparse and Hydra, streamlining workflows from prototyping to scaling across diverse environments.
44

5-
## 1. Introduction
5+
## Introduction
66

77
NeMo Run CLI simplifies experiment management by leveraging Python's capabilities:
88

@@ -65,15 +65,15 @@ def train():
6565
- **Typer**: General-purpose CLIs with good documentation that don't require nested configuration
6666
- **argparse**: Simple scripts with minimal configuration needs and standard library requirements
6767

68-
## 2. Core Concepts
68+
## Core Concepts
6969

7070
- **Entrypoints**: Python functions decorated with `@run.cli.entrypoint` serving as primary CLI commands.
7171
- **Factories**: Functions decorated with `@run.cli.factory` that configure complex objects (e.g., models, optimizers).
7272
- **Partials**: Reusable, partially configured functions enabling flexible experiment definitions.
7373
- **Experiments**: Groups of tasks executed sequentially or concurrently.
7474
- **RunContext**: Manages execution settings, including executor configurations.
7575

76-
## 3. Getting Started
76+
## Getting Started
7777

7878
### Example 1: Basic Entrypoint
7979

@@ -116,7 +116,7 @@ Output:
116116
Unknown argument 'epocks'. Did you mean 'epochs'?
117117
```
118118

119-
## 4. Advanced Configuration
119+
## Advanced Configuration
120120

121121
### Nested Configurations with Dataclasses
122122

@@ -213,25 +213,25 @@ File contents:
213213
│ 2 target = "main.train_model"
214214
│ 3 batch_size = 32
215215
│ 4 epochs = 10
216-
│ 5
216+
│ 5
217217
│ 6 [model]
218218
│ 7 target = "main.Model"
219219
│ 8 activation = "relu"
220220
│ 9 hidden_size = 256
221221
│ 10 num_layers = 5
222-
│ 11
222+
│ 11
223223
│ 12 [optimizer]
224224
│ 13 target = "main.Optimizer"
225225
│ 14 betas = [ 0.9, 0.999,]
226226
│ 15 learning_rate = 0.001
227227
│ 16 weight_decay = 1e-5
228-
│ 17
228+
│ 17
229229
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
230230
Export complete. Skipping execution.
231231
232232
```
233233

234-
## 5. Executors
234+
## Executors
235235

236236
Executors determine where your code runs, such as local environments, Docker containers, or Slurm clusters.
237237

@@ -283,7 +283,7 @@ def slurm_cluster() -> run.Executor:
283283
job_dir=BASE_DIR,
284284
container_image="nvcr.io/nvidia/nemo:dev",
285285
container_mounts=[
286-
f"/home/{USER}:/home/{USER}",
286+
f"/home/{USER}:/home/{USER}",
287287
"/lustre:/lustre",
288288
],
289289
time="4:00:00",
@@ -298,7 +298,7 @@ Execute lazily:
298298
python script.py --lazy model=alexnet epochs=5 run.executor=slurm_cluster run.executor.nodes=2
299299
```
300300

301-
## 6. Advanced CLI Features
301+
## Advanced CLI Features
302302

303303
### Dry Runs and Help Messages
304304

@@ -393,4 +393,3 @@ The help output clearly shows:
393393
5. Registered factory functions for each complex argument type
394394

395395
This makes it easy for users to discover what factory functions they can use to configure complex arguments like `model` and `optimizer`, along with information about where these factories are defined (module name and line number).
396-
Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,10 @@ In our context, this is equivalent to:
113113
_target_: nemo.collections.llm.gpt.model.llama.Llama3Config8B
114114
seq_length: 16384
115115
```
116-
> Note: we've used the [Hydra instantiation](https://hydra.cc/docs/advanced/instantiate_objects/overview/) syntax here.
116+
117+
```{note}
118+
We've used the [Hydra instantiation](https://hydra.cc/docs/advanced/instantiate_objects/overview/) syntax here.
119+
```
117120

118121
Python operations are performed on the config rather than directly on the class. For example:
119122

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,13 @@ A tuple of task and executor form an execution unit. A key goal of NeMo-Run is t
1414
Once an execution unit is created, the next step is to run it. The `run.run` function executes a single task, whereas `run.Experiment` offers more fine-grained control to define complex experiments. `run.run` wraps `run.Experiment` with a single task. `run.Experiment` is an API to launch and manage multiple tasks all using pure Python.
1515
The `run.Experiment` takes care of storing the run metadata, launching it on the specified cluster, and syncing the logs, etc. Additionally, `run.Experiment` also provides management tools to easily inspect and reproduce past experiments. The `run.Experiment` is inspired from [xmanager](https://github.com/google-deepmind/xmanager/tree/main) and uses [TorchX](https://pytorch.org/torchx/latest/) under the hood to handle execution.
1616

17-
> **_NOTE:_** NeMo-Run assumes familiarity with Docker and uses a docker image as the environment for remote execution. This means you must provide a Docker image that includes all necessary dependencies and configurations when using a remote executor.
17+
```{note}
18+
NeMo-Run assumes familiarity with Docker and uses a docker image as the environment for remote execution. This means you must provide a Docker image that includes all necessary dependencies and configurations when using a remote executor.
19+
```
1820

19-
> **_NOTE:_** All the experiment metadata is stored under `NEMORUN_HOME` env var on the machine where you launch the experiments. By default, the value for `NEMORUN_HOME` value is `~/.run`. Be sure to change this according to your needs.
21+
```{note}
22+
All the experiment metadata is stored under `NEMORUN_HOME` env var on the machine where you launch the experiments. By default, the value for `NEMORUN_HOME` value is `~/.run`. Be sure to change this according to your needs.
23+
```
2024

2125
## Executors
2226
Executors are dataclasses that configure your remote executor and set up the packaging of your code. All supported executors inherit from the base class `run.Executor`, but have configuration parameters specific to their execution environment. There is an initial cost to understanding the specifics of your executor and setting it up, but this effort is easily amortized over time.
@@ -29,7 +33,9 @@ We support the following `launchers`:
2933
- `torchrun` or `run.Torchrun`: This will launch the task using `torchrun`. See the `Torchrun` class for configuration options. You can use it using `executor.launcher = "torchrun"` or `executor.launcher = Torchrun(...)`.
3034
- `ft` or `run.core.execution.FaultTolerance`: This will launch the task using NVIDIA's fault tolerant launcher. See the `FaultTolerance` class for configuration options. You can use it using `executor.launcher = "ft"` or `executor.launcher = FaultTolerance(...)`.
3135

32-
> **_NOTE:_** Launcher may not work very well with `run.Script`. Please report any issues at https://github.com/NVIDIA-NeMo/Run/issues.
36+
```{attention}
37+
Launcher may not work very well with `run.Script`. Please report any issues at [https://github.com/NVIDIA-NeMo/Run/issues](https://github.com/NVIDIA-NeMo/Run/issues).
38+
```
3339

3440
### Packagers
3541

@@ -65,7 +71,9 @@ Your working directory at the time of execution will look like:
6571
```
6672
If you're executing a Python function, this working directory will automatically be included in your Python path.
6773

68-
> **_NOTE:_** git archive doesn't package uncommitted changes. In the future, we may add support for including uncommitted changes while honoring `.gitignore`.
74+
```{note}
75+
Git archive doesn't package uncommitted changes. In the future, we may add support for including uncommitted changes while honoring `.gitignore`.
76+
```
6977

7078
`run.PatternPackager` is a packager that uses a pattern to package your code. It is useful for packaging code that is not under version control. For example, if you have a directory structure like this:
7179
```
@@ -228,7 +236,9 @@ As demonstrated in the examples, defining executors in Python offers great flexi
228236

229237
The `DGXCloudExecutor` integrates with a DGX Cloud cluster's Run:ai API to launch distributed jobs. It uses REST API calls to authenticate, identify the target project and cluster, and submit the job specification.
230238

231-
> **_WARNING:_** Currently, the `DGXCloudExecutor` is only supported when launching experiments *from* a pod running on the DGX Cloud cluster itself. Furthermore, this launching pod must have access to a Persistent Volume Claim (PVC) where the experiment/job directories will be created, and this same PVC must also be configured to be mounted by the job being launched.
239+
```{warning}
240+
Currently, the `DGXCloudExecutor` is only supported when launching experiments *from* a pod running on the DGX Cloud cluster itself. Furthermore, this launching pod must have access to a Persistent Volume Claim (PVC) where the experiment/job directories will be created, and this same PVC must also be configured to be mounted by the job being launched.
241+
```
232242

233243
Here's an example configuration:
234244

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
Guides
2-
=================
1+
# Guides
32

4-
```{toctree}
3+
4+
:::{toctree}
55
:maxdepth: 2
66
:hidden:
77

@@ -11,7 +11,7 @@ execution
1111
management
1212
ray
1313
cli
14-
```
14+
:::
1515

1616
Welcome to the NeMo-Run guides! This section provides comprehensive documentation on how to use NeMo-Run effectively for your machine learning experiments.
1717

@@ -36,7 +36,7 @@ For more advanced usage:
3636
NeMo-Run is built around three core responsibilities:
3737

3838
1. **Configuration** - Define your ML experiments using a flexible, Pythonic configuration system.
39-
2. **Execution** - Run your experiments seamlessly across local machines, Slurm clusters, cloud providers, and more.
40-
3. **Management** - Track, reproduce, and organize your experiments with built-in experiment management.
39+
1. **Execution** - Run your experiments seamlessly across local machines, Slurm clusters, cloud providers, and more.
40+
1. **Management** - Track, reproduce, and organize your experiments with built-in experiment management.
4141

4242
Each guide dives deep into these concepts with practical examples and best practices. Choose a guide above to get started!
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ exp = Experiment("My Experiment")
1212

1313
When executed, it will automatically generate a unique experiment ID for you, which represents one unique run of the experiment.
1414

15-
> [!NOTE] > `Experiment` is a context manager and `Experiment.add` and `Experiment.run` methods can currently only be used after entering the context manager.
15+
```{note}
16+
`Experiment` is a context manager and `Experiment.add` and `Experiment.run` methods can currently only be used after entering the context manager.
17+
```
1618

1719
## Add Tasks
1820

@@ -73,7 +75,7 @@ You can check the status of an experiment using the `status` method:
7375
exp.status()
7476
```
7577

76-
This method will display information about the status of each task in the experiment. The following is a sample output from the status of experiment in [hello_scripts.py](../../../examples/hello-world/hello_scripts.py):
78+
This method will display information about the status of each task in the experiment. The following is a sample output from the status of experiment in [hello_scripts.py](../../examples/hello-world/hello_scripts.py):
7779

7880
```bash
7981
Experiment Status for experiment_with_scripts_1730761155
File renamed without changes.
File renamed without changes.

0 commit comments

Comments
 (0)