Skip to content

Mobile llm integration#95

Open
RanaZay wants to merge 23 commits intoEvolvingLMMs-Lab:mainfrom
RanaZay:mobile-llm-integration
Open

Mobile llm integration#95
RanaZay wants to merge 23 commits intoEvolvingLMMs-Lab:mainfrom
RanaZay:mobile-llm-integration

Conversation

@RanaZay
Copy link

@RanaZay RanaZay commented Jan 15, 2026

No description provided.

RanaZay and others added 23 commits December 24, 2025 16:22
…sues

- Updated alignment.sh: Added conda activation, set CUDA paths to 12.1, configured for GPU 7 only
- Modified stage_1_alignment_llava_ov_4b.sh: Changed TP from 2 to 1, updated checkpoint path to tp1_pp1
- Fixed Makefile: Added check to skip recompilation if helpers_cpp .so files already exist
- Updated utils.py: Added pre-compilation check to avoid unnecessary builds
- Added print statements for debugging in dataloader_provider.py
- Added print statements in qwen2vl_task_encoder.py for tracing preprocessing
- Added debugging prints in llavaov_1_5_provider.py
- Added print statements in train.py, megatron_trainer.py, and other training files
- These changes were made during codebase exploration and understanding
- Add FastViT model implementation (mobileclip_l_384) in aiak_training_llm/models/fastvit/
- Update LlavaOnevision1_5 model to use FastViT encoder
- Add FastViT preprocessing in qwen2vl_task_encoder.py
- Add --use-fastvit and related command-line arguments
- Add checkpoint conversion scripts for FastVLM
- Update training configs for 2-GPU setup (TP=2)
- Add .gitignore entries for checkpoints and training outputs
- Add debug prints in FastViT forward pass (mci.py, mobileclip_encoder.py, fastvit_vision_model.py)
- Update GPU configuration to use 2 GPUs (TP=2) in alignment.sh
- Add comprehensive code documentation and comments
- Update training configuration for FastViT image processing
- Add FastViT preprocessing path in qwen2vl_task_encoder.py
- Update megatron_core SOURCES.txt
- Update stage 1 alignment script configuration
…zations

- Add MobileLLM 140M model architecture and configuration
- Integrate FastViT vision encoder with configurable image sizes
- Optimize training for low-memory GPUs (1 GPU, reduced batch size, increased recomputation)
- Add inference script for FastVLM testing
- Configure training for 5 sample test runs to validate setup
- Update .gitignore to exclude large checkpoint files
- Add HuggingFace checkpoint (model.safetensors, tokenizer.json)
- Add Megatron checkpoint (TP=2 format)
- Configure Git LFS for large model files
- Add MobileLLM config and layer specs
- Update model provider and layer specs to support MobileLLM backbone
- Add stage 1 alignment script for MobileLLM-140M
- Add comprehensive integration documentation
- Update alignment.sh to support MobileLLM training option
- Implement MobileLLM-R1-140M (140M params) with GQA, SwiGLU, RMSNorm
- Fix QK LayerNorm configuration (was disabled, now enabled)
- Add FastViT/MobileCLIP vision encoder support
- Add local _is_te_min_version() implementation to fix import error
- Update training scripts for MobileLLM experiments
- Verify model architecture matches official config.json
- Add convert_fastvit_hf_to_mcore.sh: Convert FastViT from HF to Megatron format
- Add convert_mobilellm_hf_to_mcore.sh: Convert MobileLLM-R1-140M to Megatron format
- Add merge_mobilellm_fastvit.sh: Merge language and vision checkpoints
- Update stage_1_alignment_mobilellm_140m.sh: Training config with merged checkpoint
- Add test_inference_mobilellm.py: Inference script for FastVLM
- Fix nvcc path resolution for multiple CUDA installations
- Update training_utils.py: FastViT checkpoint handling

Successfully tested:
- Checkpoint conversion: 629 vision + 152 language tensors
- Merged checkpoint: 781 keys, 774MB
- Training verified: Loss 11.87, 36.6 tokens/sec/GPU on A100-40GB
- Load FastViT config from mobileclip_l.json to get correct architecture values
- Print full config objects for language, vision, and adapter
- Add local_files_only parameter for HF tokenizer when loading from filesystem
- Fixes vision config showing incorrect language model values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant