Commit 3c1e9d7
add support for qwen3.5 vl model (#70)
* add support for qwen3.5 vl model
* enable detection of VLM models and allow using non-Hopper GPUs for GPT-OSS
* fix gpt-oss-20b initialization
* add support for more vlms
* adds general vlm support
* support gemma3n
* address coderabbit review comments
- Reorder MODEL_NAME_MAPPINGS for correct substring matching
- Filter pretrained_model_name_or_path from VLM load kwargs
- Move SDPA decision outside flash_attn import block
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix test: prevent MagicMock auto-creating VLM attributes
Use spec=[] on mock model.model to prevent hasattr from
falsely detecting language_model attribute in wrap_fsdp2 tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* address remaining review comments
- osft_utils.py: filter pretrained_model_name_or_path from VLM kwargs
in OSFT path to prevent duplicate argument error
- osft_utils.py: add hasattr guard for _can_set_experts_implementation
on non-MoE base classes
- vlm_utils.py: handle RopeParameters objects (not just dicts) in
mrope detection via hasattr fallback
- Fix isort ordering
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix pre-existing test broken by VLM detection
Wrap AutoConfig.from_pretrained in try/except in
_load_model_memory_efficient so mock/dummy model paths
don't crash the VLM detection check.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix mamba kernel patching: broaden exception handling, fix comment
- Catch AttributeError in addition to ImportError to prevent partial
patching of _KERNEL_MODULE_MAPPING
- Update comment to accurately describe the compatibility concern
(PyTorch/CUDA ABI mismatch, not C API incompatibility)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix format and lint for all changed files
- Reformat 8 test/source files to match CI ruff version
- Fix UP038: use X | Y in isinstance call
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix VLM OSFT support for direct-loaded models
- Remove unnecessary OSFT guard for direct VLMs (patterns match fine)
- Add _get_text_config() helper for VLM config fallback in
align_model_and_tokenizer (vocab_size, pad/bos/eos_token_id)
- Fix model.config.pad_token_id access in train.py for VLM configs
- Skip activation checkpointing for direct VLM models (M-RoPE layers
produce non-deterministic tensor counts during reentrant recomputation)
- Use dynamic ports (_get_free_port) in model_validation.py to prevent
port conflicts between sequential tests
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix ruff format for CI version (0.15.5)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 4d6dc87 commit 3c1e9d7
File tree
12 files changed
+520
-115
lines changed- src/mini_trainer
- tests
- gpu_tests
12 files changed
+520
-115
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
10 | 10 | | |
11 | 11 | | |
12 | 12 | | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
| 13 | + | |
17 | 14 | | |
18 | 15 | | |
19 | 16 | | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
| 17 | + | |
25 | 18 | | |
26 | 19 | | |
| 20 | + | |
27 | 21 | | |
28 | 22 | | |
29 | 23 | | |
| |||
315 | 309 | | |
316 | 310 | | |
317 | 311 | | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
318 | 324 | | |
319 | 325 | | |
320 | 326 | | |
| |||
337 | 343 | | |
338 | 344 | | |
339 | 345 | | |
| 346 | + | |
| 347 | + | |
340 | 348 | | |
341 | 349 | | |
342 | 350 | | |
| |||
346 | 354 | | |
347 | 355 | | |
348 | 356 | | |
349 | | - | |
350 | | - | |
351 | 357 | | |
352 | 358 | | |
353 | 359 | | |
| |||
767 | 773 | | |
768 | 774 | | |
769 | 775 | | |
770 | | - | |
771 | | - | |
772 | | - | |
773 | | - | |
774 | | - | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
775 | 800 | | |
776 | 801 | | |
777 | 802 | | |
| |||
888 | 913 | | |
889 | 914 | | |
890 | 915 | | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
| 921 | + | |
891 | 922 | | |
892 | 923 | | |
893 | 924 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
29 | 29 | | |
30 | 30 | | |
31 | 31 | | |
32 | | - | |
33 | | - | |
34 | | - | |
35 | | - | |
| 32 | + | |
36 | 33 | | |
37 | 34 | | |
38 | 35 | | |
| |||
0 commit comments