Skip to content

Comments

Single Run DDP Example Script#9

Open
LeoRoccoBreedt wants to merge 37 commits intomainfrom
lb/pytorch_ddp
Open

Single Run DDP Example Script#9
LeoRoccoBreedt wants to merge 37 commits intomainfrom
lb/pytorch_ddp

Conversation

@LeoRoccoBreedt
Copy link
Contributor

@LeoRoccoBreedt LeoRoccoBreedt commented Apr 10, 2025

Description

  • Added a script to demonstrate how to create a Neptune Scale run object from rank zero when using DDP for a single run
  • Tests for DDP examples are commented out in legacy Neptune examples. Since we do not have a GPU setup in our current tests, might be good idea for this too in the meantime.

Related to: <ClickUp/JIRA task name>

Any expected test failures?

  • RuntimeError: use_libuv was requested but PyTorch was build without libuv support
  • Apparently no support for torch with Python 3.13 for some OS
  • Distributed training on Windows and MacOS may be problematic

Add a [X] to relevant checklist items

❔ This change

  • adds a new feature
  • fixes breaking code
  • is cosmetic (refactoring/reformatting)

✔️ Pre-merge checklist

  • Refactored code (sourcery)
  • Tested code locally
  • Precommit installed and run before pushing changes
  • Added code to GitHub tests (notebooks, scripts)
  • Updated GitHub README
  • Updated the projects overview page on Notion

🧪 Test Configuration

  • OS: Windows
  • Python version: 3.12
  • Neptune version: 0.11.3
  • Affected libraries with version: neptune-scale torch

Note

Introduces a DDP training example and aligns docs/CI.

  • Adds how-to-guides/ddp-training/scripts/ with train_ddp_single_run.py (PyTorch DDP MNIST, logs configs/metrics to neptune-scale from rank 0), run_examples.sh (runs via torchrun), and requirements.txt
  • Updates README.md with Neptune Scale-focused examples and links; adds entry for DDP training scripts
  • Updates CI (.github/workflows/pull-request.yml) to ignore **/ddp-training/** when selecting changed script dirs
  • Extends .gitignore to exclude /integrations-and-supported-tools/pytorch/data

Written by Cursor Bugbot for commit b18a853. This will update automatically on new commits. Configure here.

@LeoRoccoBreedt LeoRoccoBreedt added the enhancement New feature or request label Apr 10, 2025
@LeoRoccoBreedt LeoRoccoBreedt marked this pull request as ready for review April 10, 2025 13:57
@LeoRoccoBreedt LeoRoccoBreedt requested a review from a team April 10, 2025 13:57
@SiddhantSadangi SiddhantSadangi requested a review from Copilot April 14, 2025 13:24
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 4 out of 6 changed files in this pull request and generated 1 comment.

Files not reviewed (2)
  • how-to-guides/ddp-training/scripts/requirements.txt: Language not supported
  • how-to-guides/ddp-training/scripts/run_examples.sh: Language not supported
Comments suppressed due to low confidence (2)

how-to-guides/ddp-training/scripts/train_ddp_single_run.py:18

  • The function name 'create_dataloader_minst' appears to have a typo. Consider renaming it to 'create_dataloader_mnist' for consistency with the MNIST dataset.
def create_dataloader_minst(

how-to-guides/ddp-training/scripts/train_ddp_single_run.py:244

  • The tag 'Torch-MINST' seems to be a typo. Consider updating it to 'Torch-MNIST' to correctly reference the MNIST dataset.
run.add_tags(tags=["Torch-MINST", "ddp", "single-node", params["optimizer"]])

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Leo Breedt <101509998+LeoRoccoBreedt@users.noreply.github.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Signed-off-by: Leo Breedt <101509998+LeoRoccoBreedt@users.noreply.github.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Signed-off-by: Leo Breedt <101509998+LeoRoccoBreedt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants