|
| 1 | +# Changelog |
| 2 | + |
| 3 | +<!-- Next changelog --> |
| 4 | +## NVIDIA Nemo Run 0.5.0 |
| 5 | + |
| 6 | + |
| 7 | +- Fix docs warnings [#271](https://github.com/NVIDIA-NeMo/Run/pull/271) |
| 8 | +- Fix docs build [#269](https://github.com/NVIDIA-NeMo/Run/pull/269) |
| 9 | +- Support overlapped srun commands in Slurm Ray [#263](https://github.com/NVIDIA-NeMo/Run/pull/263) |
| 10 | +- Refactor DGXC Lepton data mover: switch to BatchJob with auto cleanup and sleep after every run [#265](https://github.com/NVIDIA-NeMo/Run/pull/265) |
| 11 | +- ci: Fix nemo fw template ref after migrating to new org [#256](https://github.com/NVIDIA-NeMo/Run/pull/256) |
| 12 | +- Enable Nsys gpu device metrics [#257](https://github.com/NVIDIA-NeMo/Run/pull/257) |
| 13 | +- Sync job code in local tunnel for Slurm Ray job [#254](https://github.com/NVIDIA-NeMo/Run/pull/254) |
| 14 | +- Change the create dist job function to support creating a single node [#240](https://github.com/NVIDIA-NeMo/Run/pull/240) |
| 15 | +- Making job names match Run:ai requirements and making errors more descriptive [#255](https://github.com/NVIDIA-NeMo/Run/pull/255) |
| 16 | +- Support for %j in slurm log retrieval [#252](https://github.com/NVIDIA-NeMo/Run/pull/252) |
| 17 | +- Add KubeRay tests for Ray APIs [#249](https://github.com/NVIDIA-NeMo/Run/pull/249) |
| 18 | +- Upgrade skypilot executor with 0.9.2 [#246](https://github.com/NVIDIA-NeMo/Run/pull/246) |
| 19 | +- Add user scoping for k8s backend and log level support for Ray APIs [#247](https://github.com/NVIDIA-NeMo/Run/pull/247) |
| 20 | +- Update to latest Lepton SDK [#248](https://github.com/NVIDIA-NeMo/Run/pull/248) |
| 21 | +- Add storage mount options to LeptonExecutor [#237](https://github.com/NVIDIA-NeMo/Run/pull/237) |
| 22 | +- Import guard k8s import in Ray Cluster and Job [#245](https://github.com/NVIDIA-NeMo/Run/pull/245) |
| 23 | +- Add RayJob and Slurm support for Ray APIs + integration with run.Experiment [#236](https://github.com/NVIDIA-NeMo/Run/pull/236) |
| 24 | +- ci: Enforce coverage [#238](https://github.com/NVIDIA-NeMo/Run/pull/238) |
| 25 | +- Fix bug with a CLI overwrite [#235](https://github.com/NVIDIA-NeMo/Run/pull/235) |
| 26 | +- Add LeptonExecutor support [#224](https://github.com/NVIDIA-NeMo/Run/pull/224) |
| 27 | +- Add cancel to docker executor [#233](https://github.com/NVIDIA-NeMo/Run/pull/233) |
| 28 | +- Change default log wait timeout to 10s [#232](https://github.com/NVIDIA-NeMo/Run/pull/232) |
| 29 | +- Add RayCluster API with Kuberay support [#222](https://github.com/NVIDIA-NeMo/Run/pull/222) |
| 30 | +- Add sbatch network arg [#230](https://github.com/NVIDIA-NeMo/Run/pull/230) |
| 31 | +- chore: Update package info [#227](https://github.com/NVIDIA-NeMo/Run/pull/227) |
| 32 | +- Add support for job groups for local executor [#220](https://github.com/NVIDIA-NeMo/Run/pull/220) |
| 33 | +- Roll back get_underlying_types change + introduce extract_constituent [#223](https://github.com/NVIDIA-NeMo/Run/pull/223) |
| 34 | +- Fix some bugs for --lazy in CLI [#179](https://github.com/NVIDIA-NeMo/Run/pull/179) |
| 35 | +- Adding support for modern type-hints [#221](https://github.com/NVIDIA-NeMo/Run/pull/221) |
| 36 | +- Fix bug in CLI with calling a factory-fn inside a list [#214](https://github.com/NVIDIA-NeMo/Run/pull/214) |
| 37 | +- Handle more edge cases in --help [#219](https://github.com/NVIDIA-NeMo/Run/pull/219) |
| 38 | +- Add autogenerated API reference content to the documentation [#190](https://github.com/NVIDIA-NeMo/Run/pull/190) |
| 39 | +- Handle Callable in --help to fix nemo llm export --help error [#217](https://github.com/NVIDIA-NeMo/Run/pull/217) |
| 40 | +- Ensure job directory creation for various schedulers [#216](https://github.com/NVIDIA-NeMo/Run/pull/216) |
| 41 | +- Adding support for ForwardRef in CLI [#176](https://github.com/NVIDIA-NeMo/Run/pull/176) |
| 42 | +- Add additional debug to DGXC data mover [#215](https://github.com/NVIDIA-NeMo/Run/pull/215) |
| 43 | +- Handle ctx in entrypoint for experiment [#213](https://github.com/NVIDIA-NeMo/Run/pull/213) |
| 44 | +- zozhang/dgxc executor data mover [#206](https://github.com/NVIDIA-NeMo/Run/pull/206) |
| 45 | +- Add support for YAML, TOML & JSON [#182](https://github.com/NVIDIA-NeMo/Run/pull/182) |
| 46 | +- Add clean mode for experiment to avoid printing any NeMo-Run specific logs [#208](https://github.com/NVIDIA-NeMo/Run/pull/208) |
| 47 | +- Fix seed for torchrun [#209](https://github.com/NVIDIA-NeMo/Run/pull/209) |
| 48 | +- Support torchrun multi node on local executor [#143](https://github.com/NVIDIA-NeMo/Run/pull/143) |
| 49 | +- Add nsys filename param [#205](https://github.com/NVIDIA-NeMo/Run/pull/205) |
| 50 | +- Add DGXCloudExecutor docs and update execution guide [#192](https://github.com/NVIDIA-NeMo/Run/pull/192) |
| 51 | +- Add --cuda-event-trace=false to nsys command [#180](https://github.com/NVIDIA-NeMo/Run/pull/180) |
| 52 | + |
| 53 | + |
0 commit comments