NVIDIA-NeMo
diff --git a/‎.gitignore‎
Lines changed: 1 addition & 0 deletions b/‎.gitignore‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎CONTRIBUTING.md‎
Lines changed: 30 additions & 11 deletions b/‎CONTRIBUTING.md‎
Lines changed: 30 additions & 11 deletions
diff --git a/‎docs/Makefile‎
Lines changed: 0 additions & 20 deletions b/‎docs/Makefile‎
Lines changed: 0 additions & 20 deletions
diff --git a/‎docs/make.bat‎
Lines changed: 0 additions & 35 deletions b/‎docs/make.bat‎
Lines changed: 0 additions & 35 deletions
diff --git a/‎docs/source/conf.py‎
Lines changed: 31 additions & 12 deletions b/‎docs/source/conf.py‎
Lines changed: 31 additions & 12 deletions
diff --git a/‎docs/source/faqs.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/source/faqs.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source/guides/management.md‎
Lines changed: 35 additions & 31 deletions b/‎docs/source/guides/management.md‎
Lines changed: 35 additions & 31 deletions
@@ -71,6 +71,7 @@ instance/
 
 # Sphinx documentation
 docs/_build/
+docs/source/api/
 
 # PyBuilder
 .pybuilder/
 
@@ -17,6 +17,7 @@ Not all steps are necessary for some contributions, so read the linked sections
   - [Whom should you ask for review:](#whom-should-you-ask-for-review)
 
 ## General principles
+
 1. **User-oriented**: make it easy for end users, even at the cost of writing more code in the background
 1. **Robust**: make it hard for users to make mistakes.
 1. **Reusable**: for every piece of code, think about how it can be reused in the future and make it easy to be reused.
@@ -25,6 +26,7 @@ Not all steps are necessary for some contributions, so read the linked sections
 1. **Sensible**: code should make sense. If you think a piece of code might be confusing, write comments.
 
 ## Environment Setup
+
 We use [uv](https://docs.astral.sh/uv/) to develop NeMo Run. The following steps should get you started with the dev environment:
 
 1. Install [uv](https://docs.astral.sh/uv/getting-started/installation/)
@@ -34,16 +36,19 @@ We use [uv](https://docs.astral.sh/uv/) to develop NeMo Run. The following steps
 If all tests passed, then you should be good to get started with the development of NeMo-Run.
 
 ## Code Structure
+
 The repository is home to flexible Python modules, sample scripts, tests, and more.
 Here is a brief overview of where everything lives:
+
 - [docker](docker/) - Dockerfiles to build NeMo with NeMo Run.
 - [docs](docs/) - Walkthroughs and guides the library.
 - [examples](examples/) - Examples for how users may want to use NeMo Run.
 - [src](src/) -
-    - [nemo_run](src/nemo_run/) - The source code for NeMo Run.
+  - [nemo_run](src/nemo_run/) - The source code for NeMo Run.
 - [test](test/) - Unit tests.
 
 ## Examples and Documentation
+
 Examples provide an easy way for users to see how the NeMo Run works in action.
 They should be incredibly lightweight and rely mostly on `nemo_run` for their functionality
 Most should be designed for a user to get up and running on their local machines, but examples on remote clusters are welcomed if it makes sense.
@@ -54,35 +59,49 @@ It should include both an explanation of the API, and how it's used in its corre
 The documentation should also cover potential pitfalls and caveats.
 This existing examples and documentation should serve as a good reference to what is expected.
 
+### Building Documentation
+
+Run the following commands to switch to the project documentation folder and generate HTML output.
+
+```sh
+cd docs/
+uv run --group docs sphinx-build source/ _build/html
+```
+
+The resulting HTML files are generated in a `_build/html` folder created under the project `docs/` folder.
+
 ## Python style
-We use [``ruff``](https://docs.astral.sh/ruff/) for linting and formatting. To lint and format your code, you can run `uv run --group lint -- ruff check` and `uv run --group lint -- ruff format` respectively.
+
+We use [`ruff`](https://docs.astral.sh/ruff/) for linting and formatting. To lint and format your code, you can run `uv run --group lint -- ruff check` and `uv run --group lint -- ruff format` respectively.
 
 ## Unit tests
+
 Unit tests should be simple and fast.
 Developers should be able to run them frequently while developing without any slowdown.
 
 ## Pull Requests (PR) Guidelines
 
 **Send your PRs to the `main` or `dev` branch**
 
-1) Make sure your PR does one thing. Have a clear answer to "What does this PR do?".
-2) Read General Principles and style guide above
-3) Make sure you sign off your commits. E.g. use ``git commit --signoff`` when committing.
-4) Make sure all unit tests finish successfully before sending PR
-5) Send your PR and request a review
+1. Make sure your PR does one thing. Have a clear answer to "What does this PR do?".
+2. Read General Principles and style guide above
+3. Make sure you sign off your commits. E.g. use `git commit --signoff` when committing.
+4. Make sure all unit tests finish successfully before sending PR
+5. Send your PR and request a review
 
 The `dev` branch is for active development and may be unstable. Unit tests are expected to pass before merging into `dev` or `main`.
 Every release `dev` and `main` will sync to be the same.
 
 ## Sign Your Work
 
-* We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
+- We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.
+
+  - Any contribution which contains commits that are not Signed-Off will not be accepted.
 
-  * Any contribution which contains commits that are not Signed-Off will not be accepted.
+- To sign off on a commit you simply use the `--signoff` option when committing your changes:
 
-* To sign off on a commit you simply use the `--signoff` option when committing your changes:
   ```bash
-  $ git commit --signoff -m "Add cool feature."
+  git commit --signoff -m "Add cool feature."
   ```
 
 ## Full text of the DCO:
 
@@ -6,8 +6,13 @@
 # -- Project information -----------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#project-information
 
+import sys
+from pathlib import Path
+
+sys.path.append(str(Path("../../examples").resolve()))
+
 project = "NeMo-Run"
-copyright = "2024, NVIDIA"
+copyright = "2025, NVIDIA"
 author = "NVIDIA"
 release = "0.1.0"
 
@@ -16,6 +21,7 @@
 
 extensions = [
     "myst_parser",
+    "autoapi.extension",
     "sphinx.ext.autodoc",
     "sphinx.ext.doctest",
     "sphinx.ext.intersphinx",
@@ -29,16 +35,30 @@
 ]
 
 templates_path = ["_templates"]
-exclude_patterns = []
+exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]
 python_maximum_signature_line_length = 88
-autodoc_default_options = {
-    "member-order": "bysource",
-    "special-members": "__init__",
-    "undoc-members": True,
-    "exclude-members": "__weakref__",
-}
-autodoc_class_signature = "separated"
-autodoc_inherit_docstrings = True
+
+# Autoapi settings
+autoapi_generate_api_docs = True
+autoapi_keep_files = False
+autoapi_add_toctree_entry = False
+autoapi_type = "python"
+autoapi_dirs = ["../../nemo_run"]
+autoapi_file_pattern = "*.py"
+autoapi_root = "api"
+autoapi_options = [
+    "members",
+    "show-inheritance",
+    "show-module-summary",
+    "special-members",
+    "imported-members",
+]
+
+# Autodoc settings
+autodoc_typehints = "signature"
+
+# MyST settings
+myst_heading_anchors = 3
 
 # Napoleon settings
 napoleon_google_docstring = True
@@ -56,5 +76,4 @@
 # -- Options for HTML output -------------------------------------------------
 # https://www.sphinx-doc.org/en/master/usage/configuration.html#options-for-html-output
 
-html_theme = "alabaster"
-# html_static_path = ["_static"]
+html_theme = "nvidia_sphinx_theme"
@@ -84,7 +84,7 @@ def control_flow_config() -> run.Config[llm.PreTrainingDataModule]:
 
 ### **Q:** I made a change locally in my git repo and tested it using the local executor. However, the change is not reflected in the remote job.
 
-**A**: This is most likely because you haven't committed the changes. See details about `GitArchivePackager` [here](./execution.md#packagers) to learn more.
+**A**: This is most likely because you haven't committed the changes. See details about `GitArchivePackager` [here](guides/execution.md#packagers) to learn more.
 
 ### **Q:** I made a change locally _outside_ my git repo and tested it using the local executor. However, the change is not reflected in the remote job.
 
 
@@ -2,40 +2,42 @@
 
 The central component for management of tasks in NeMo-Run is the `Experiment` class. It allows you to define, launch, and manage complex workflows consisting of multiple tasks. This guide provides an overview of the `Experiment` class, its methods, and how to use it effectively.
 
-**Creating an Experiment**
----------------------------
+## **Creating an Experiment**
 
 To create an experiment, you can instantiate the `Experiment` class by passing in a descriptive title:
+
 ```python
 exp = Experiment("My Experiment")
 ```
+
 When executed, it will automatically generate a unique experiment ID for you, which represents one unique run of the experiment.
 
-> [!NOTE]
-> `Experiment` is a context manager and `Experiment.add` and `Experiment.run` methods can currently only be used after entering the context manager.
+> [!NOTE] > `Experiment` is a context manager and `Experiment.add` and `Experiment.run` methods can currently only be used after entering the context manager.
 
-**Adding Tasks**
------------------
+## **Adding Tasks**
 
 You can add tasks to an experiment using the `add` method. This method supports tasks of the following kind:
 
-1. A single task which is an instance of either `run.Partial` or `run.Script`, along with its executor.
-```python
-with exp:
-    exp.add(task_1, executor=run.LocalExecutor())
-```
+- A single task which is an instance of either `run.Partial` or `run.Script`, along with its executor.
 
-2. A list of tasks, each of which is an instance of either `run.Partial` or `run.Script`, along with a single executor or a list of executors for each task in the group. Currently, all tasks in the group will be executed in parallel.
-```python
-with exp:
-    exp.add([task_2, task_3], executor=run.DockerExecutor(...))
-```
+  ```python
+  with exp:
+      exp.add(task_1, executor=run.LocalExecutor())
+  ```
+
+- A list of tasks, each of which is an instance of either `run.Partial` or `run.Script`, along with a single executor or a list of executors for each task in the group. Currently, all tasks in the group will be executed in parallel.
+
+  ```python
+  with exp:
+      exp.add([task_2, task_3], executor=run.DockerExecutor(...))
+  ```
 
 You can specify a descriptive name for the task using the `name` keyword argument.
 
 `add` also takes in a list of plugins, each an instance of `run.Plugin`. Plugins are used to make changes to the task and executor together, which is useful in some cases - for example, to enable a config option in the task and set an environment variable in the executor related to the config option.
 
 `add` returns a unique id for the task/job. This unique id can be used to define complex dependencies between a group of tasks as follows:
+
 ```python
 with run.Experiment("dag-experiment", log_level="INFO") as exp:
     id1 = exp.add([inline_script, inline_script_sleep], tail_logs=False, name="task-1")
@@ -48,30 +50,31 @@ with run.Experiment("dag-experiment", log_level="INFO") as exp:
    )
 ```
 
-**Launching an Experiment**
----------------------------
+## **Launching an Experiment**
 
 Once you have added all tasks to an experiment, you can launch it using the `run` method. This method takes several optional arguments, including `detach`, `sequential`, and `tail_logs` and `direct`:
 
-* `detach`: If `True`, the experiment will detach from the process executing it. This is useful when launching an experiment on a remote cluster, where you may want to end the process after scheduling the tasks in that experiment.
-* `sequential`: If `True`, all tasks will be executed sequentially. This is only applicable when the individual tasks do not have any dependencies on each other.
-* `tail_logs`: If `True`, logs will be displayed in real-time.
-* `direct`: If `True`, each task in the experiment will be executed directly in the same process on your local machine. This does not support task/job groups.
+- `detach`: If `True`, the experiment will detach from the process executing it. This is useful when launching an experiment on a remote cluster, where you may want to end the process after scheduling the tasks in that experiment.
+- `sequential`: If `True`, all tasks will be executed sequentially. This is only applicable when the individual tasks do not have any dependencies on each other.
+- `tail_logs`: If `True`, logs will be displayed in real-time.
+- `direct`: If `True`, each task in the experiment will be executed directly in the same process on your local machine. This does not support task/job groups.
 
 ```python
 with exp:
     # Add all tasks
     exp.run(detach=True, sequential=False, tail_logs=True, direct=False)
 ```
 
-**Experiment Status**
----------------------
+## **Experiment Status**
 
 You can check the status of an experiment using the `status` method:
+
 ```python
 exp.status()
 ```
+
 This method will display information the status of each task in the experiment. The following is a sample output from the status of experiment in [hello_scripts.py](../../../examples/hello-world/hello_scripts.py):
+
 ```bash
 Experiment Status for experiment_with_scripts_1730761155
 
@@ -94,24 +97,24 @@ Task 2: simple.add.add_object
 - Local Directory: /home/your_user/.nemo_run/experiments/experiment_with_scripts/experiment_with_scripts_1730761155/simple.add.add_object
 ```
 
-**Canceling a Task**
----------------------
+## **Canceling a Task**
 
 You can cancel a task using the `cancel` method:
+
 ```python
 exp.cancel("task_id")
 ```
 
-**Viewing Logs**
------------------
+## **Viewing Logs**
 
 You can view the logs of a task using the `logs` method:
+
 ```python
 exp.logs("task_id")
 ```
 
-**Experiment output**
------------------
+## **Experiment output**
+
 Once an experiment is run, NeMo-Run displays information on ways to inspect and reproduce past experiments. This allows you to check logs, sync artifacts (in the future), cancel running tasks, and rerun an old experiment.
 
 ```python
@@ -129,6 +132,7 @@ nemorun experiment status experiment_with_scripts_1720556256
 nemorun experiment logs experiment_with_scripts_1720556256 0
 nemorun experiment cancel experiment_with_scripts_1720556256 0
 ```
+
 This information is specific to each experiment on how to manage it.
 
-See [this notebook](examples/hello-world/hello_experiments.ipynb) for more details and a playable experience.
+See [this notebook](../../../examples/hello-world/hello_experiments.ipynb) for more details and a playable experience.