Fastvit integration by RanaZay · Pull Request #93 · EvolvingLMMs-Lab/LLaVA-OneVision-1.5

RanaZay · 2026-01-08T08:24:24Z

No description provided.

…artifacts untracked

…validation

…sues - Updated alignment.sh: Added conda activation, set CUDA paths to 12.1, configured for GPU 7 only - Modified stage_1_alignment_llava_ov_4b.sh: Changed TP from 2 to 1, updated checkpoint path to tp1_pp1 - Fixed Makefile: Added check to skip recompilation if helpers_cpp .so files already exist - Updated utils.py: Added pre-compilation check to avoid unnecessary builds

- Added print statements for debugging in dataloader_provider.py - Added print statements in qwen2vl_task_encoder.py for tracing preprocessing - Added debugging prints in llavaov_1_5_provider.py - Added print statements in train.py, megatron_trainer.py, and other training files - These changes were made during codebase exploration and understanding

- Add FastViT model implementation (mobileclip_l_384) in aiak_training_llm/models/fastvit/ - Update LlavaOnevision1_5 model to use FastViT encoder - Add FastViT preprocessing in qwen2vl_task_encoder.py - Add --use-fastvit and related command-line arguments - Add checkpoint conversion scripts for FastVLM - Update training configs for 2-GPU setup (TP=2) - Add .gitignore entries for checkpoints and training outputs

Copilot

Pull request overview

This PR integrates FastVit functionality into the Apex library by adding extensive CUDA/cuDNN-accelerated operations for convolutional neural networks, batch normalization, gradient clipping, and RNN implementations. The changes introduce new modules for fused operations, optimized convolution paths, and automatic mixed precision training support.

Adds cuDNN-accelerated batch normalization and convolution-bias-relu fusion operations
Implements gradient clipping utilities with fused CUDA kernels
Introduces bottleneck layer implementations with spatial parallelism support
Adds comprehensive AMP (Automatic Mixed Precision) framework with optimizer integration

Reviewed changes

Copilot reviewed 99 out of 499 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
apex/apex/contrib/csrc/cudnn_gbn/cudnn_gbn.cpp	Implements group batch normalization forward/backward with NCCL peer communication
apex/apex/contrib/csrc/conv_bias_relu/conv_bias_relu.cpp	Provides fused convolution-bias-relu operations using cuDNN frontend API
apex/apex/contrib/conv_bias_relu/conv_bias_relu.py	Python wrapper for fused conv-bias-relu autograd functions
apex/apex/contrib/conv_bias_relu/init.py	Exports conv-bias-relu function variants
apex/apex/contrib/clip_grad/clip_grad.py	Implements optimized gradient norm clipping with multi-tensor operations
apex/apex/contrib/clip_grad/init.py	Exports gradient clipping function
apex/apex/contrib/bottleneck/test.py	Test suite for bottleneck layer implementation
apex/apex/contrib/bottleneck/halo_exchangers.py	Implements halo exchange patterns for spatial parallelism
apex/apex/contrib/bottleneck/bottleneck.py	Bottleneck and SpatialBottleneck layer implementations with frozen batch norm
apex/apex/contrib/bottleneck/init.py	Exports bottleneck classes and halo exchangers
apex/apex/amp/wrap.py	Provides function wrapping utilities for automatic casting
apex/apex/amp/utils.py	Utility functions for tensor type checking and casting
apex/apex/amp/scaler.py	Loss scaling implementation for mixed precision training
apex/apex/amp/rnn_compat.py	RNN compatibility layer for different PyTorch versions
apex/apex/amp/opt.py	Optimizer wrapper for AMP integration
apex/apex/amp/lists/torch_overrides.py	Function lists for torch module automatic casting
apex/apex/amp/lists/tensor_overrides.py	Function lists for tensor method automatic casting
apex/apex/amp/lists/functional_overrides.py	Function lists for torch.nn.functional automatic casting
apex/apex/amp/handle.py	AMP handle implementation for managing mixed precision state
apex/apex/amp/frontend.py	User-facing API for AMP initialization and configuration
apex/apex/amp/compat.py	PyTorch version compatibility utilities
apex/apex/amp/amp.py	Core AMP implementation with function patching
apex/apex/amp/_process_optimizer.py	Optimizer processing for master weights and gradient scaling
apex/apex/amp/_initialize.py	Model and optimizer initialization for AMP
apex/apex/amp/_amp_state.py	Global state management for AMP
apex/apex/amp/version.py	Version information
apex/apex/amp/init.py	AMP module exports
apex/apex/amp/README.md	Documentation for AMP user annotations
apex/apex/_autocast_utils.py	Utilities for PyTorch autocast integration
apex/apex/init.py	Main package initialization with logging and deprecation warnings
apex/apex/RNN/models.py	RNN model factory functions (LSTM, GRU, etc.)
apex/apex/RNN/cells.py	Custom RNN cell implementations including mLSTM
apex/apex/RNN/init.py	RNN module exports
apex/apex/RNN/RNNBackend.py	Backend implementation for bidirectional and stacked RNNs
apex/apex/RNN/README.md	Deprecation notice for RNN module
apex/README.md	Comprehensive documentation for Apex library features
apex/LICENSE	BSD 3-Clause license text
apex/.gitmodules	Git submodule configuration for cutlass and cudnn-frontend

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-08T08:53:55Z

apex/apex/contrib/csrc/conv_bias_relu/conv_bias_relu.cpp

+      DEBUG_CUDNN_MSG(log_buf, knobs.begin()->describe());
+    }
+
+    // Createmplacee the requisite engine config


Corrected spelling of 'Createmplacee' to 'Create'.

Suggested change

// Createmplacee the requisite engine config

// Create the requisite engine config

- Add debug prints in FastViT forward pass (mci.py, mobileclip_encoder.py, fastvit_vision_model.py) - Update GPU configuration to use 2 GPUs (TP=2) in alignment.sh - Add comprehensive code documentation and comments - Update training configuration for FastViT image processing - Add FastViT preprocessing path in qwen2vl_task_encoder.py

- Update megatron_core SOURCES.txt - Update stage 1 alignment script configuration

…izer/processor loading

RanaZay added 10 commits December 24, 2025 16:22

Save: commit all local changes

fc13f35

Add TE-free attention path and training fixes

8588f2f

Track Stage1/alignment.sh in outer repo (de-nest Stage1)

4692f43

Track apex in outer repo (de-nest apex submodule)

f46815e

Prune old stage_1_alignment logs and tensorboard events; keep latest …

a816816

…artifacts untracked

Add AMD/ROCm alignment launcher (alignment_rocm.sh)

b1429d0

Make ROCm launcher path-relative with conda fallback and quick-start …

fbd311d

…validation

yiyexy requested review from anxiangsir and Copilot January 8, 2026 08:53

Copilot AI reviewed Jan 8, 2026

View reviewed changes

RanaZay added 3 commits January 9, 2026 13:45

Update configuration and metadata files

46cb9ad

- Update megatron_core SOURCES.txt - Update stage 1 alignment script configuration

Fix ROCm compatibility: add PYTHONPATH and local_files_only for token…

2da41ab

…izer/processor loading

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fastvit integration#93

Fastvit integration#93
RanaZay wants to merge 13 commits intoEvolvingLMMs-Lab:mainfrom
RanaZay:fastvit-integration

RanaZay commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	// Createmplacee the requisite engine config
	// Create the requisite engine config

Conversation

RanaZay commented Jan 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant