Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Sep 5, 2025

This PR implements comprehensive GitHub Copilot instructions for the NVIDIA NeMo Curator repository, incorporating the complete API design architecture, framework overview, and task-centric patterns for Python 3.12 with optional CUDA 12 GPU environments.

Key Features

Complete API Design Architecture:

  • Task-Centric Architecture: Fault tolerance requirements for Xenna preemption handling with comprehensive design principles
  • Map-style (Data-Parallel) Execution: Flexible input/output task mappings allowing X to Y transformations where both can be Task | list[Task] | None
  • Core Component Interfaces: Complete specifications for Task, ProcessingStage, Pipeline, and BaseExecutor with proper Generic typing
  • Resource Specification: GPU memory allocation, NVDEC/NVENC support, and multi-GPU resource management
  • Backend Implementations: Xenna and Ray Data executors with adapter pattern for backend translation

Framework Overview:

  • 📦 Key Components: Detailed documentation of Tasks, Stages, Backends/Executors, and Pipeline components with specific directory structures
  • 🔄 Execution Flow: Complete 7-step process from pipeline definition to result collection
  • 🎯 Enhanced Design Principles: Backend agnostic, type safety, scalability, extensibility, performance, and composability
  • 💡 Multi-Backend Usage Patterns: Examples showing execution on different Ray backends
  • 🔧 Development Guidelines: Specific guidance for task types, stage implementation, backend development, and testing

Development Environment Setup:

  • Python 3.12 as the recommended and tested version (supports 3.10-3.12, excludes 3.11 due to RAPIDS compatibility)
  • CUDA 12.x setup for optional GPU acceleration with proper resource specification patterns
  • Complete uv package manager workflow with dependency groups and virtual environment management
  • Technology Stack Documentation: PyTorch, Ray, Transformers, RAPIDS with modality-specific libraries

Code Quality and Documentation:

  • Markdown Linting Compliance: Fixed all MD022, MD032, MD031, and MD040 formatting issues
  • Design Principles: Task-centric architecture with fault tolerance and map-style execution patterns
  • Testing Framework: GPU testing support with pytest markers and resource detection
  • Code Quality Standards: Ruff linting configuration and type hint requirements
  • File Structure Conventions: Modality-based organization with complete component directory mapping

Recent Updates:

  • Added PyTorch to text processing libraries section
  • Removed unsupported GPU fallback patterns
  • Updated pipeline examples to follow API design patterns
  • Added comprehensive API design information from api-design.md
  • Removed all Dask references as requested
  • Updated task-centric architecture description to mention Ray foundation
  • Added framework overview with key components and execution flow
  • Enhanced design principles with backend agnostic capabilities
  • Added development guidelines for different component types
  • Updated file structure with complete component directory mapping
  • Fixed all markdown linting issues: Addressed MD022 (headings spacing), MD032 (lists spacing), MD031 (code blocks spacing), and MD040 (code language specification) for full markdownlint compliance

The instructions provide GitHub Copilot with comprehensive context about the repository's task-based architecture and multi-backend framework capabilities, enabling better code suggestions that align with the fault-tolerant, resource-aware design patterns required for heterogeneous computing environments.

Fixes #981.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copilot AI and others added 2 commits September 5, 2025 17:10
Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Copilot AI changed the title [WIP] ✨ Set up Copilot instructions ✨ Set up Copilot instructions for Python 3.12 and CUDA 12 development environment Sep 5, 2025
Copilot AI requested a review from VibhuJawa September 5, 2025 17:12
Copy link
Contributor

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow api design for changes https://github.com/NVIDIA-NeMo/Curator/blob/main/api-design.md and look at existing modules we have like we have here: https://github.com/NVIDIA-NeMo/Curator/tree/main/nemo_curator/stages/text

@VibhuJawa
Copy link
Contributor

@copilot please look at PR Review and make changes

…terns, update API examples

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 5, 2025

@copilot please look at PR Review and make changes

Addressed all review feedback: added PyTorch to text processing libraries, removed unsupported GPU fallback patterns, updated pipeline examples to follow the API design patterns, and replaced configuration examples with actual task-centric workflow examples. (8fff4c5)

Copilot AI changed the title ✨ Set up Copilot instructions for Python 3.12 and CUDA 12 development environment ✨ Set up Copilot instructions following API design patterns with Python 3.12 and CUDA 12 Sep 5, 2025
Copilot AI requested a review from VibhuJawa September 5, 2025 18:06
Copy link
Contributor

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor feedback to add more stuff from our design doc

@VibhuJawa
Copy link
Contributor

@copilot , please address the review about api level design

…design.md

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Copilot AI changed the title ✨ Set up Copilot instructions following API design patterns with Python 3.12 and CUDA 12 ✨ Set up comprehensive Copilot instructions with API design patterns for Python 3.12 and CUDA 12 Sep 5, 2025
Copilot AI requested a review from VibhuJawa September 5, 2025 18:25
…rom api-design.md

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Sep 5, 2025

@copilot , please address the review about api level design

Added comprehensive API design information from api-design.md including background context explaining the transition from Dask to Ray, design rationale, tradeoffs, and expanded design principles section with current state and examples references. (fa16d70)

Copilot AI changed the title ✨ Set up comprehensive Copilot instructions with API design patterns for Python 3.12 and CUDA 12 ✨ Set up comprehensive Copilot instructions with complete API design architecture for Python 3.12 and CUDA 12 Sep 5, 2025
Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Copy link
Contributor

@VibhuJawa VibhuJawa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@praateekmahajan praateekmahajan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
@VibhuJawa VibhuJawa enabled auto-merge (squash) September 5, 2025 19:54
Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com>
Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
Copy link
Contributor Author

Copilot AI commented Sep 5, 2025

@copilot , Please address co pilot reviews

Fixed all markdown linting issues in the copilot instructions file. Addressed MD022 (headings spacing), MD032 (lists spacing), MD031 (code blocks spacing), and MD040 (code language specification) violations. The file now passes markdownlint validation with the repository's configuration. (aad5c86)

@VibhuJawa
Copy link
Contributor

/ok to test 1ef6dd1

Copilot AI requested a review from VibhuJawa September 5, 2025 19:58
@sarahyurick sarahyurick added the r1.0.0 Pick this label for auto cherry-picking into r1.0.0 label Sep 5, 2025
@VibhuJawa VibhuJawa merged commit 95bd0db into main Sep 5, 2025
14 of 15 checks passed
chtruong814 pushed a commit that referenced this pull request Sep 5, 2025
…architecture and framework overview for Python 3.12 and CUDA 12 (#1031)

* Initial plan

* Add comprehensive GitHub Copilot instructions for NVIDIA NeMo Curator

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Enhance Copilot instructions with explicit Python 3.12 and CUDA 12 focus

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Address review feedback: add PyTorch, remove unsupported fallback patterns, update API examples

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Add comprehensive API design principles and core components from api-design.md

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Expand API design section with comprehensive background and context from api-design.md

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Remove Dask references and background sections per review feedback

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Update task-centric architecture description to mention Ray

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Add comprehensive framework overview with key components, execution flow, and development guidelines

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Addressing PR comments

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Update .github/copilot-instructions.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

---------

Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Co-authored-by: Vibhu Jawa <vjawa@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
VibhuJawa added a commit that referenced this pull request Sep 5, 2025
…architecture and framework overview for Python 3.12 and CUDA 12 (#1031) (#1035)

* Initial plan

* Add comprehensive GitHub Copilot instructions for NVIDIA NeMo Curator



* Enhance Copilot instructions with explicit Python 3.12 and CUDA 12 focus



* Address review feedback: add PyTorch, remove unsupported fallback patterns, update API examples



* Add comprehensive API design principles and core components from api-design.md



* Expand API design section with comprehensive background and context from api-design.md



* Remove Dask references and background sections per review feedback



* Update task-centric architecture description to mention Ray



* Add comprehensive framework overview with key components, execution flow, and development guidelines



* Addressing PR comments



* Update .github/copilot-instructions.md




---------

Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>
Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com>
Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Co-authored-by: Vibhu Jawa <vjawa@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jnke2016 pushed a commit to jnke2016/Curator that referenced this pull request Nov 12, 2025
…architecture and framework overview for Python 3.12 and CUDA 12 (NVIDIA-NeMo#1031)

* Initial plan

* Add comprehensive GitHub Copilot instructions for NVIDIA NeMo Curator

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Enhance Copilot instructions with explicit Python 3.12 and CUDA 12 focus

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Address review feedback: add PyTorch, remove unsupported fallback patterns, update API examples

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Add comprehensive API design principles and core components from api-design.md

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Expand API design section with comprehensive background and context from api-design.md

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Remove Dask references and background sections per review feedback

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Update task-centric architecture description to mention Ray

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Add comprehensive framework overview with key components, execution flow, and development guidelines

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Addressing PR comments

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

* Update .github/copilot-instructions.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

---------

Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>
Co-authored-by: Vibhu Jawa <vjawa@nvidia.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-request r1.0.0 Pick this label for auto cherry-picking into r1.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

✨ Set up Copilot instructions

5 participants