@@ -92,7 +92,7 @@ Read technical notes:
9292
9393## Features
9494
95- - ** Various models** : LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2 -VL, DeepSeek, Yi, Gemma, ChatGLM , Phi, etc.
95+ - ** Various models** : LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen3, Qwen3 -VL, DeepSeek, Gemma, GLM , Phi, etc.
9696- ** Integrated methods** : (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
9797- ** Scalable resources** : 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
9898- ** Advanced algorithms** : [ GaLore] ( https://github.com/jiaweizzhao/GaLore ) , [ BAdam] ( https://github.com/Ledzy/BAdam ) , [ APOLLO] ( https://github.com/zhuhanqing/APOLLO ) , [ Adam-mini] ( https://github.com/zyushun/Adam-mini ) , [ Muon] ( https://github.com/KellerJordan/Muon ) , [ OFT] ( https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft ) , DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ and PiSSA.
@@ -280,11 +280,10 @@ Read technical notes:
280280| ----------------------------------------------------------------- | -------------------------------- | -------------------- |
281281| [ Audio-Flamingo-3] ( https://huggingface.co/nvidia/audio-flamingo-3-hf ) | 8B | audio_flamingo_3 |
282282| [ BLOOM/BLOOMZ] ( https://huggingface.co/bigscience ) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
283- | [ Command R] ( https://huggingface.co/CohereForAI ) | 35B/104B | cohere |
284283| [ DeepSeek (LLM/Code/MoE)] ( https://huggingface.co/deepseek-ai ) | 7B/16B/67B/236B | deepseek |
285284| [ DeepSeek 3-3.2] ( https://huggingface.co/deepseek-ai ) | 236B/671B | deepseek3 |
286285| [ DeepSeek R1 (Distill)] ( https://huggingface.co/deepseek-ai ) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
287- | [ ERNIE-4.5] ( https://huggingface.co/baidu ) | 0.3B/21B/300B | ernie/ ernie_nothink |
286+ | [ ERNIE-4.5] ( https://huggingface.co/baidu ) | 0.3B/21B/300B | ernie_nothink |
288287| [ Falcon/Falcon H1] ( https://huggingface.co/tiiuae ) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 |
289288| [ Gemma/Gemma 2/CodeGemma] ( https://huggingface.co/google ) | 2B/7B/9B/27B | gemma/gemma2 |
290289| [ Gemma 3/Gemma 3n] ( https://huggingface.co/google ) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n |
@@ -296,7 +295,7 @@ Read technical notes:
296295| [ Hunyuan (MT)] ( https://huggingface.co/tencent/ ) | 7B | hunyuan |
297296| [ InternLM 2-3] ( https://huggingface.co/internlm ) | 7B/8B/20B | intern2 |
298297| [ InternVL 2.5-3.5] ( https://huggingface.co/OpenGVLab ) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl |
299- | [ InternLM/ Intern-S1-mini] ( https://huggingface.co/internlm/ ) | 8B | intern_s1 |
298+ | [ Intern-S1-mini] ( https://huggingface.co/internlm/ ) | 8B | intern_s1 |
300299| [ Kimi-VL] ( https://huggingface.co/moonshotai ) | 16B | kimi_vl |
301300| [ Ling 2.0 (mini/flash)] ( https://huggingface.co/inclusionAI ) | 16B/100B | bailing_v2 |
302301| [ LFM 2.5 (VL)] ( https://huggingface.co/LiquidAI ) | 1.2B/1.6B | lfm2/lfm2_vl |
@@ -309,18 +308,17 @@ Read technical notes:
309308| [ LLaVA-NeXT] ( https://huggingface.co/llava-hf ) | 7B/8B/13B/34B/72B/110B | llava_next |
310309| [ LLaVA-NeXT-Video] ( https://huggingface.co/llava-hf ) | 7B/34B | llava_next_video |
311310| [ MiMo] ( https://huggingface.co/XiaomiMiMo ) | 7B/309B | mimo/mimo_v2 |
312- | [ MiniCPM 1-4.1 ] ( https://huggingface.co/openbmb ) | 0.5B/1B/2B/4B/ 8B | cpm/cpm3/ cpm4 |
311+ | [ MiniCPM 4 ] ( https://huggingface.co/openbmb ) | 0.5B/8B | cpm4 |
313312| [ MiniCPM-o-2.6/MiniCPM-V-2.6] ( https://huggingface.co/openbmb ) | 8B | minicpm_o/minicpm_v |
314313| [ MiniMax-M1/MiniMax-M2] ( https://huggingface.co/MiniMaxAI/models ) | 229B/456B | minimax1/minimax2 |
315314| [ Ministral 3] ( https://huggingface.co/mistralai ) | 3B/8B/14B | ministral3 |
316315| [ Mistral/Mixtral] ( https://huggingface.co/mistralai ) | 7B/8x7B/8x22B | mistral |
317- | [ OLMo] ( https://huggingface.co/allenai ) | 1B/7B | - |
318316| [ PaliGemma/PaliGemma2] ( https://huggingface.co/google ) | 3B/10B/28B | paligemma |
319317| [ Phi-3/Phi-3.5] ( https://huggingface.co/microsoft ) | 4B/14B | phi |
320318| [ Phi-3-small] ( https://huggingface.co/microsoft ) | 7B | phi_small |
321- | [ Phi-4] ( https://huggingface.co/microsoft ) | 14B | phi4 |
319+ | [ Phi-4-mini/Phi-4 ] ( https://huggingface.co/microsoft ) | 3.8B/ 14B | phi4_mini/ phi4 |
322320| [ Pixtral] ( https://huggingface.co/mistralai ) | 12B | pixtral |
323- | [ Qwen (1-2.5) ( Code/Math/MoE/QwQ)] ( https://huggingface.co/Qwen ) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
321+ | [ Qwen2 ( Code/Math/MoE/QwQ)] ( https://huggingface.co/Qwen ) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
324322| [ Qwen3 (MoE/Instruct/Thinking/Next)] ( https://huggingface.co/Qwen ) | 0.6B/1.7B/4B/8B/14B/32B/80B/235B | qwen3/qwen3_nothink |
325323| [ Qwen2-Audio] ( https://huggingface.co/Qwen ) | 7B | qwen2_audio |
326324| [ Qwen2.5-Omni] ( https://huggingface.co/Qwen ) | 3B/7B | qwen2_omni |
@@ -329,9 +327,6 @@ Read technical notes:
329327| [ Qwen3-VL] ( https://huggingface.co/Qwen ) | 2B/4B/8B/30B/32B/235B | qwen3_vl |
330328| [ Seed (OSS/Coder)] ( https://huggingface.co/ByteDance-Seed ) | 8B/36B | seed_oss/seed_coder |
331329| [ StarCoder 2] ( https://huggingface.co/bigcode ) | 3B/7B/15B | - |
332- | [ VibeThinker-1.5B] ( https://huggingface.co/WeiboAI ) | 1.5B | qwen3 |
333- | [ Yi/Yi-1.5 (Code)] ( https://huggingface.co/01-ai ) | 1.5B/6B/9B/34B | yi |
334- | [ Youtu-LLM] ( https://huggingface.co/tencent/ ) | 2B | youtu |
335330| [ Yuan 2] ( https://huggingface.co/IEITYuan ) | 2B/51B/102B | yuan |
336331
337332> [ !NOTE]
0 commit comments