Open
Conversation
…artifacts untracked
…sues - Updated alignment.sh: Added conda activation, set CUDA paths to 12.1, configured for GPU 7 only - Modified stage_1_alignment_llava_ov_4b.sh: Changed TP from 2 to 1, updated checkpoint path to tp1_pp1 - Fixed Makefile: Added check to skip recompilation if helpers_cpp .so files already exist - Updated utils.py: Added pre-compilation check to avoid unnecessary builds
- Added print statements for debugging in dataloader_provider.py - Added print statements in qwen2vl_task_encoder.py for tracing preprocessing - Added debugging prints in llavaov_1_5_provider.py - Added print statements in train.py, megatron_trainer.py, and other training files - These changes were made during codebase exploration and understanding
- Add FastViT model implementation (mobileclip_l_384) in aiak_training_llm/models/fastvit/ - Update LlavaOnevision1_5 model to use FastViT encoder - Add FastViT preprocessing in qwen2vl_task_encoder.py - Add --use-fastvit and related command-line arguments - Add checkpoint conversion scripts for FastVLM - Update training configs for 2-GPU setup (TP=2) - Add .gitignore entries for checkpoints and training outputs
- Add debug prints in FastViT forward pass (mci.py, mobileclip_encoder.py, fastvit_vision_model.py) - Update GPU configuration to use 2 GPUs (TP=2) in alignment.sh - Add comprehensive code documentation and comments - Update training configuration for FastViT image processing - Add FastViT preprocessing path in qwen2vl_task_encoder.py
- Update megatron_core SOURCES.txt - Update stage 1 alignment script configuration
…izer/processor loading
…zations - Add MobileLLM 140M model architecture and configuration - Integrate FastViT vision encoder with configurable image sizes - Optimize training for low-memory GPUs (1 GPU, reduced batch size, increased recomputation) - Add inference script for FastVLM testing - Configure training for 5 sample test runs to validate setup - Update .gitignore to exclude large checkpoint files
- Add HuggingFace checkpoint (model.safetensors, tokenizer.json) - Add Megatron checkpoint (TP=2 format) - Configure Git LFS for large model files
- Add MobileLLM config and layer specs - Update model provider and layer specs to support MobileLLM backbone - Add stage 1 alignment script for MobileLLM-140M - Add comprehensive integration documentation - Update alignment.sh to support MobileLLM training option
…LaVA-OneVision-1.5 into mobile-llm-integration
- Implement MobileLLM-R1-140M (140M params) with GQA, SwiGLU, RMSNorm - Fix QK LayerNorm configuration (was disabled, now enabled) - Add FastViT/MobileCLIP vision encoder support - Add local _is_te_min_version() implementation to fix import error - Update training scripts for MobileLLM experiments - Verify model architecture matches official config.json
- Add convert_fastvit_hf_to_mcore.sh: Convert FastViT from HF to Megatron format - Add convert_mobilellm_hf_to_mcore.sh: Convert MobileLLM-R1-140M to Megatron format - Add merge_mobilellm_fastvit.sh: Merge language and vision checkpoints - Update stage_1_alignment_mobilellm_140m.sh: Training config with merged checkpoint - Add test_inference_mobilellm.py: Inference script for FastVLM - Fix nvcc path resolution for multiple CUDA installations - Update training_utils.py: FastViT checkpoint handling Successfully tested: - Checkpoint conversion: 629 vision + 152 language tensors - Merged checkpoint: 781 keys, 774MB - Training verified: Loss 11.87, 36.6 tokens/sec/GPU on A100-40GB
- Load FastViT config from mobileclip_l.json to get correct architecture values - Print full config objects for language, vision, and adapter - Add local_files_only parameter for HF tokenizer when loading from filesystem - Fixes vision config showing incorrect language model values
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.