Skip to content

Fastvit integration#93

Open
RanaZay wants to merge 13 commits intoEvolvingLMMs-Lab:mainfrom
RanaZay:fastvit-integration
Open

Fastvit integration#93
RanaZay wants to merge 13 commits intoEvolvingLMMs-Lab:mainfrom
RanaZay:fastvit-integration

Conversation

@RanaZay
Copy link

@RanaZay RanaZay commented Jan 8, 2026

No description provided.

…sues

- Updated alignment.sh: Added conda activation, set CUDA paths to 12.1, configured for GPU 7 only
- Modified stage_1_alignment_llava_ov_4b.sh: Changed TP from 2 to 1, updated checkpoint path to tp1_pp1
- Fixed Makefile: Added check to skip recompilation if helpers_cpp .so files already exist
- Updated utils.py: Added pre-compilation check to avoid unnecessary builds
- Added print statements for debugging in dataloader_provider.py
- Added print statements in qwen2vl_task_encoder.py for tracing preprocessing
- Added debugging prints in llavaov_1_5_provider.py
- Added print statements in train.py, megatron_trainer.py, and other training files
- These changes were made during codebase exploration and understanding
- Add FastViT model implementation (mobileclip_l_384) in aiak_training_llm/models/fastvit/
- Update LlavaOnevision1_5 model to use FastViT encoder
- Add FastViT preprocessing in qwen2vl_task_encoder.py
- Add --use-fastvit and related command-line arguments
- Add checkpoint conversion scripts for FastVLM
- Update training configs for 2-GPU setup (TP=2)
- Add .gitignore entries for checkpoints and training outputs
@yiyexy yiyexy requested review from anxiangsir and Copilot January 8, 2026 08:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR integrates FastVit functionality into the Apex library by adding extensive CUDA/cuDNN-accelerated operations for convolutional neural networks, batch normalization, gradient clipping, and RNN implementations. The changes introduce new modules for fused operations, optimized convolution paths, and automatic mixed precision training support.

  • Adds cuDNN-accelerated batch normalization and convolution-bias-relu fusion operations
  • Implements gradient clipping utilities with fused CUDA kernels
  • Introduces bottleneck layer implementations with spatial parallelism support
  • Adds comprehensive AMP (Automatic Mixed Precision) framework with optimizer integration

Reviewed changes

Copilot reviewed 99 out of 499 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
apex/apex/contrib/csrc/cudnn_gbn/cudnn_gbn.cpp Implements group batch normalization forward/backward with NCCL peer communication
apex/apex/contrib/csrc/conv_bias_relu/conv_bias_relu.cpp Provides fused convolution-bias-relu operations using cuDNN frontend API
apex/apex/contrib/conv_bias_relu/conv_bias_relu.py Python wrapper for fused conv-bias-relu autograd functions
apex/apex/contrib/conv_bias_relu/init.py Exports conv-bias-relu function variants
apex/apex/contrib/clip_grad/clip_grad.py Implements optimized gradient norm clipping with multi-tensor operations
apex/apex/contrib/clip_grad/init.py Exports gradient clipping function
apex/apex/contrib/bottleneck/test.py Test suite for bottleneck layer implementation
apex/apex/contrib/bottleneck/halo_exchangers.py Implements halo exchange patterns for spatial parallelism
apex/apex/contrib/bottleneck/bottleneck.py Bottleneck and SpatialBottleneck layer implementations with frozen batch norm
apex/apex/contrib/bottleneck/init.py Exports bottleneck classes and halo exchangers
apex/apex/amp/wrap.py Provides function wrapping utilities for automatic casting
apex/apex/amp/utils.py Utility functions for tensor type checking and casting
apex/apex/amp/scaler.py Loss scaling implementation for mixed precision training
apex/apex/amp/rnn_compat.py RNN compatibility layer for different PyTorch versions
apex/apex/amp/opt.py Optimizer wrapper for AMP integration
apex/apex/amp/lists/torch_overrides.py Function lists for torch module automatic casting
apex/apex/amp/lists/tensor_overrides.py Function lists for tensor method automatic casting
apex/apex/amp/lists/functional_overrides.py Function lists for torch.nn.functional automatic casting
apex/apex/amp/handle.py AMP handle implementation for managing mixed precision state
apex/apex/amp/frontend.py User-facing API for AMP initialization and configuration
apex/apex/amp/compat.py PyTorch version compatibility utilities
apex/apex/amp/amp.py Core AMP implementation with function patching
apex/apex/amp/_process_optimizer.py Optimizer processing for master weights and gradient scaling
apex/apex/amp/_initialize.py Model and optimizer initialization for AMP
apex/apex/amp/_amp_state.py Global state management for AMP
apex/apex/amp/version.py Version information
apex/apex/amp/init.py AMP module exports
apex/apex/amp/README.md Documentation for AMP user annotations
apex/apex/_autocast_utils.py Utilities for PyTorch autocast integration
apex/apex/init.py Main package initialization with logging and deprecation warnings
apex/apex/RNN/models.py RNN model factory functions (LSTM, GRU, etc.)
apex/apex/RNN/cells.py Custom RNN cell implementations including mLSTM
apex/apex/RNN/init.py RNN module exports
apex/apex/RNN/RNNBackend.py Backend implementation for bidirectional and stacked RNNs
apex/apex/RNN/README.md Deprecation notice for RNN module
apex/README.md Comprehensive documentation for Apex library features
apex/LICENSE BSD 3-Clause license text
apex/.gitmodules Git submodule configuration for cutlass and cudnn-frontend

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

DEBUG_CUDNN_MSG(log_buf, knobs.begin()->describe());
}

// Createmplacee the requisite engine config
Copy link

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Createmplacee' to 'Create'.

Suggested change
// Createmplacee the requisite engine config
// Create the requisite engine config

Copilot uses AI. Check for mistakes.
- Add debug prints in FastViT forward pass (mci.py, mobileclip_encoder.py, fastvit_vision_model.py)
- Update GPU configuration to use 2 GPUs (TP=2) in alignment.sh
- Add comprehensive code documentation and comments
- Update training configuration for FastViT image processing
- Add FastViT preprocessing path in qwen2vl_task_encoder.py
- Update megatron_core SOURCES.txt
- Update stage 1 alignment script configuration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant