✨ Set up comprehensive Copilot instructions with complete API design architecture and framework overview for Python 3.12 and CUDA 12 #1031

Copilot · 2025-09-05T17:04:53Z

This PR implements comprehensive GitHub Copilot instructions for the NVIDIA NeMo Curator repository, incorporating the complete API design architecture, framework overview, and task-centric patterns for Python 3.12 with optional CUDA 12 GPU environments.

Key Features

Complete API Design Architecture:

Task-Centric Architecture: Fault tolerance requirements for Xenna preemption handling with comprehensive design principles
Map-style (Data-Parallel) Execution: Flexible input/output task mappings allowing X to Y transformations where both can be Task | list[Task] | None
Core Component Interfaces: Complete specifications for Task, ProcessingStage, Pipeline, and BaseExecutor with proper Generic typing
Resource Specification: GPU memory allocation, NVDEC/NVENC support, and multi-GPU resource management
Backend Implementations: Xenna and Ray Data executors with adapter pattern for backend translation

Framework Overview:

📦 Key Components: Detailed documentation of Tasks, Stages, Backends/Executors, and Pipeline components with specific directory structures
🔄 Execution Flow: Complete 7-step process from pipeline definition to result collection
🎯 Enhanced Design Principles: Backend agnostic, type safety, scalability, extensibility, performance, and composability
💡 Multi-Backend Usage Patterns: Examples showing execution on different Ray backends
🔧 Development Guidelines: Specific guidance for task types, stage implementation, backend development, and testing

Development Environment Setup:

Python 3.12 as the recommended and tested version (supports 3.10-3.12, excludes 3.11 due to RAPIDS compatibility)
CUDA 12.x setup for optional GPU acceleration with proper resource specification patterns
Complete uv package manager workflow with dependency groups and virtual environment management
Technology Stack Documentation: PyTorch, Ray, Transformers, RAPIDS with modality-specific libraries

Code Quality and Documentation:

Markdown Linting Compliance: Fixed all MD022, MD032, MD031, and MD040 formatting issues
Design Principles: Task-centric architecture with fault tolerance and map-style execution patterns
Testing Framework: GPU testing support with pytest markers and resource detection
Code Quality Standards: Ruff linting configuration and type hint requirements
File Structure Conventions: Modality-based organization with complete component directory mapping

Recent Updates:

The instructions provide GitHub Copilot with comprehensive context about the repository's task-based architecture and multi-backend framework capabilities, enabling better code suggestions that align with the fault-tolerant, resource-aware design patterns required for heterogeneous computing environments.

Fixes #981.

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

copy-pr-bot · 2025-09-05T17:04:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

VibhuJawa

Please follow api design for changes https://github.com/NVIDIA-NeMo/Curator/blob/main/api-design.md and look at existing modules we have like we have here: https://github.com/NVIDIA-NeMo/Curator/tree/main/nemo_curator/stages/text

.github/copilot-instructions.md

VibhuJawa · 2025-09-05T18:00:01Z

@copilot please look at PR Review and make changes

…terns, update API examples Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Copilot · 2025-09-05T18:06:02Z

@copilot please look at PR Review and make changes

Addressed all review feedback: added PyTorch to text processing libraries, removed unsupported GPU fallback patterns, updated pipeline examples to follow the API design patterns, and replaced configuration examples with actual task-centric workflow examples. (8fff4c5)

VibhuJawa

Minor feedback to add more stuff from our design doc

.github/copilot-instructions.md

VibhuJawa · 2025-09-05T18:23:01Z

@copilot , please address the review about api level design

…design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

…rom api-design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Copilot · 2025-09-05T18:28:24Z

@copilot , please address the review about api level design

Added comprehensive API design information from api-design.md including background context explaining the transition from Dask to Ray, design rationale, tradeoffs, and expanded design principles section with current state and examples references. (fa16d70)

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

VibhuJawa

LGTM

praateekmahajan

LGTM

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

.github/copilot-instructions.md

Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

.github/copilot-instructions.md

Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

Copilot · 2025-09-05T19:58:04Z

@copilot , Please address co pilot reviews

Fixed all markdown linting issues in the copilot instructions file. Addressed MD022 (headings spacing), MD032 (lists spacing), MD031 (code blocks spacing), and MD040 (code language specification) violations. The file now passes markdownlint validation with the repository's configuration. (aad5c86)

VibhuJawa · 2025-09-05T19:58:29Z

/ok to test 1ef6dd1

…architecture and framework overview for Python 3.12 and CUDA 12 (#1031) * Initial plan * Add comprehensive GitHub Copilot instructions for NVIDIA NeMo Curator Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Enhance Copilot instructions with explicit Python 3.12 and CUDA 12 focus Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Address review feedback: add PyTorch, remove unsupported fallback patterns, update API examples Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Add comprehensive API design principles and core components from api-design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Expand API design section with comprehensive background and context from api-design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Remove Dask references and background sections per review feedback Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Update task-centric architecture description to mention Ray Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Add comprehensive framework overview with key components, execution flow, and development guidelines Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Addressing PR comments Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Update .github/copilot-instructions.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com> --------- Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> Co-authored-by: Vibhu Jawa <vjawa@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com>

…architecture and framework overview for Python 3.12 and CUDA 12 (#1031) (#1035) * Initial plan * Add comprehensive GitHub Copilot instructions for NVIDIA NeMo Curator * Enhance Copilot instructions with explicit Python 3.12 and CUDA 12 focus * Address review feedback: add PyTorch, remove unsupported fallback patterns, update API examples * Add comprehensive API design principles and core components from api-design.md * Expand API design section with comprehensive background and context from api-design.md * Remove Dask references and background sections per review feedback * Update task-centric architecture description to mention Ray * Add comprehensive framework overview with key components, execution flow, and development guidelines * Addressing PR comments * Update .github/copilot-instructions.md --------- Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com> Signed-off-by: NeMo Bot <nemo-bot@nvidia.com> Co-authored-by: Copilot <198982749+Copilot@users.noreply.github.com> Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> Co-authored-by: Vibhu Jawa <vjawa@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

…architecture and framework overview for Python 3.12 and CUDA 12 (NVIDIA-NeMo#1031) * Initial plan * Add comprehensive GitHub Copilot instructions for NVIDIA NeMo Curator Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Enhance Copilot instructions with explicit Python 3.12 and CUDA 12 focus Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Address review feedback: add PyTorch, remove unsupported fallback patterns, update API examples Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Add comprehensive API design principles and core components from api-design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Expand API design section with comprehensive background and context from api-design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Remove Dask references and background sections per review feedback Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Update task-centric architecture description to mention Ray Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Add comprehensive framework overview with key components, execution flow, and development guidelines Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Addressing PR comments Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> * Update .github/copilot-instructions.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com> --------- Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com> Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com> Co-authored-by: Vibhu Jawa <vjawa@nvidia.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Initial plan

560b62e

Copilot AI assigned Copilot and VibhuJawa Sep 5, 2025

Copilot started work on behalf of VibhuJawa September 5, 2025 17:05 View session

github-actions bot added the community-request label Sep 5, 2025

Copilot AI and others added 2 commits September 5, 2025 17:10

Add comprehensive GitHub Copilot instructions for NVIDIA NeMo Curator

d53a7ca

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Enhance Copilot instructions with explicit Python 3.12 and CUDA 12 focus

6d9e461

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] ✨ Set up Copilot instructions~~ ✨ Set up Copilot instructions for Python 3.12 and CUDA 12 development environment Sep 5, 2025

Copilot finished work on behalf of VibhuJawa September 5, 2025 17:12

Copilot AI requested a review from VibhuJawa September 5, 2025 17:12

VibhuJawa requested changes Sep 5, 2025

View reviewed changes

.github/copilot-instructions.md Outdated Show resolved Hide resolved

.github/copilot-instructions.md Outdated Show resolved Hide resolved

.github/copilot-instructions.md Outdated Show resolved Hide resolved

.github/copilot-instructions.md Show resolved Hide resolved

Copilot started work on behalf of VibhuJawa September 5, 2025 18:00 View session

Address review feedback: add PyTorch, remove unsupported fallback pat…

8fff4c5

…terns, update API examples Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Copilot AI changed the title ~~✨ Set up Copilot instructions for Python 3.12 and CUDA 12 development environment~~ ✨ Set up Copilot instructions following API design patterns with Python 3.12 and CUDA 12 Sep 5, 2025

Copilot AI requested a review from VibhuJawa September 5, 2025 18:06

Copilot finished work on behalf of VibhuJawa September 5, 2025 18:06

VibhuJawa requested changes Sep 5, 2025

View reviewed changes

.github/copilot-instructions.md Show resolved Hide resolved

Copilot started work on behalf of VibhuJawa September 5, 2025 18:22 View session

Add comprehensive API design principles and core components from api-…

df71d98

…design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Copilot AI changed the title ~~✨ Set up Copilot instructions following API design patterns with Python 3.12 and CUDA 12~~ ✨ Set up comprehensive Copilot instructions with API design patterns for Python 3.12 and CUDA 12 Sep 5, 2025

Copilot AI requested a review from VibhuJawa September 5, 2025 18:25

Copilot finished work on behalf of VibhuJawa September 5, 2025 18:25

Copilot started work on behalf of VibhuJawa September 5, 2025 18:26 View session

Expand API design section with comprehensive background and context f…

d47c07e

…rom api-design.md Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Copilot AI changed the title ~~✨ Set up comprehensive Copilot instructions with API design patterns for Python 3.12 and CUDA 12~~ ✨ Set up comprehensive Copilot instructions with complete API design architecture for Python 3.12 and CUDA 12 Sep 5, 2025

Addressing PR comments

29e0001

Co-authored-by: VibhuJawa <4837571+VibhuJawa@users.noreply.github.com>

Copilot AI requested a review from VibhuJawa September 5, 2025 19:40

Copilot finished work on behalf of VibhuJawa September 5, 2025 19:40

Copilot started work on behalf of VibhuJawa September 5, 2025 19:40 View session

Merge branch 'main' into copilot/fix-981-2

02ae7ee

Copilot finished work on behalf of VibhuJawa September 5, 2025 19:48

Copilot started work on behalf of VibhuJawa September 5, 2025 19:48 View session

VibhuJawa approved these changes Sep 5, 2025

View reviewed changes

praateekmahajan approved these changes Sep 5, 2025

View reviewed changes

Update .github/copilot-instructions.md

3cad997

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

VibhuJawa enabled auto-merge (squash) September 5, 2025 19:54

sarahyurick reviewed Sep 5, 2025

View reviewed changes

.github/copilot-instructions.md Outdated Show resolved Hide resolved

.github/copilot-instructions.md Outdated Show resolved Hide resolved

Apply suggestion from @sarahyurick

614029a

Co-authored-by: Sarah Yurick <53962159+sarahyurick@users.noreply.github.com> Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

VibhuJawa reviewed Sep 5, 2025

View reviewed changes

.github/copilot-instructions.md Outdated Show resolved Hide resolved

.github/copilot-instructions.md Outdated Show resolved Hide resolved

Update .github/copilot-instructions.md

1ef6dd1

Signed-off-by: Vibhu Jawa <vibhujawa@gmail.com>

Copilot AI requested a review from VibhuJawa September 5, 2025 19:58

Copilot finished work on behalf of VibhuJawa September 5, 2025 19:58

copy-pr-bot bot temporarily deployed to test September 5, 2025 19:58 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci September 5, 2025 19:58 Inactive

Copilot started work on behalf of VibhuJawa September 5, 2025 19:58 View session

Copilot finished work on behalf of VibhuJawa September 5, 2025 20:02

copy-pr-bot bot had a problem deploying to nemo-ci September 5, 2025 20:14 Failure

sarahyurick added the r1.0.0 Pick this label for auto cherry-picking into r1.0.0 label Sep 5, 2025

VibhuJawa merged commit 95bd0db into main Sep 5, 2025
14 of 15 checks passed

✨ Set up comprehensive Copilot instructions with complete API design architecture and framework overview for Python 3.12 and CUDA 12 #1031

✨ Set up comprehensive Copilot instructions with complete API design architecture and framework overview for Python 3.12 and CUDA 12 #1031

Uh oh!

Conversation

Copilot AI commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Features

Uh oh!

copy-pr-bot bot commented Sep 5, 2025

Uh oh!

VibhuJawa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

VibhuJawa commented Sep 5, 2025

Uh oh!

Copilot AI commented Sep 5, 2025

Uh oh!

VibhuJawa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

VibhuJawa commented Sep 5, 2025

Uh oh!

Copilot AI commented Sep 5, 2025

Uh oh!

VibhuJawa left a comment

Choose a reason for hiding this comment

Uh oh!

praateekmahajan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI commented Sep 5, 2025

Uh oh!

VibhuJawa commented Sep 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Copilot AI commented Sep 5, 2025 •

edited

Loading