Skip to content

Commit c87c128

Browse files
authored
Merge branch 'hiyouga:main' into feature/add-audio-flamingo-3-support
2 parents ea7d494 + df4c45c commit c87c128

File tree

10 files changed

+44
-822
lines changed

10 files changed

+44
-822
lines changed

.github/workflows/tests.yml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@ jobs:
5454
env:
5555
HF_TOKEN: ${{ secrets.HF_TOKEN }}
5656
OS_NAME: ${{ matrix.os }}
57+
UV_NO_SYNC: 1
5758

5859
steps:
5960
- name: Checkout
@@ -88,25 +89,18 @@ jobs:
8889
- name: Check quality
8990
run: |
9091
make style && make quality
91-
env:
92-
UV_NO_SYNC: 1
9392
9493
- name: Check license
9594
run: |
9695
make license
97-
env:
98-
UV_NO_SYNC: 1
9996
10097
- name: Check build
10198
run: |
10299
make build
103-
env:
104-
UV_NO_SYNC: 1
105100
106101
- name: Test with pytest
107102
run: |
108103
make test
109104
env:
110-
UV_NO_SYNC: 1
111105
HF_HOME: ${{ runner.temp }}/huggingface
112106
HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"

.github/workflows/tests_cuda.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,8 @@ jobs:
3838
env:
3939
HF_HOME: "${{ github.workspace }}/../.runner_cache/huggingface"
4040
UV_CACHE_DIR: "${{ github.workspace }}/../.runner_cache/uv"
41+
HF_TOKEN: ${{ secrets.HF_TOKEN }}
42+
OS_NAME: ${{ matrix.os }}
4143
UV_NO_SYNC: 1
4244

4345
steps:

.github/workflows/tests_npu.yml

Lines changed: 1 addition & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ jobs:
4343
HF_ENDPOINT: https://hf-mirror.com
4444
HF_TOKEN: ${{ secrets.HF_TOKEN }}
4545
OS_NAME: ${{ matrix.os }}
46+
UV_NO_SYNC: 1
4647

4748
steps:
4849
- name: Checkout
@@ -69,35 +70,18 @@ jobs:
6970
curl -fsSL https://deb.nodesource.com/setup_20.x | bash -
7071
apt-get install -y nodejs
7172
72-
- name: Cache files
73-
id: hf-hub-cache
74-
uses: actions/cache@v4
75-
with:
76-
path: ${{ runner.temp }}/huggingface
77-
key: huggingface-${{ matrix.os }}-${{ matrix.python }}-${{ hashFiles('tests/version.txt') }}
78-
7973
- name: Check quality
8074
run: |
8175
make style && make quality
82-
env:
83-
UV_NO_SYNC: 1
8476
8577
- name: Check license
8678
run: |
8779
make license
88-
env:
89-
UV_NO_SYNC: 1
9080
9181
- name: Check build
9282
run: |
9383
make build
94-
env:
95-
UV_NO_SYNC: 1
9684
9785
- name: Test with pytest
9886
run: |
9987
make test
100-
env:
101-
UV_NO_SYNC: 1
102-
HF_HOME: /root/.cache/huggingface
103-
HF_HUB_OFFLINE: "${{ steps.hf-hub-cache.outputs.cache-hit == 'true' && '1' || '0' }}"

README.md

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -92,7 +92,7 @@ Read technical notes:
9292

9393
## Features
9494

95-
- **Various models**: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen, Qwen2-VL, DeepSeek, Yi, Gemma, ChatGLM, Phi, etc.
95+
- **Various models**: LLaMA, LLaVA, Mistral, Mixtral-MoE, Qwen3, Qwen3-VL, DeepSeek, Gemma, GLM, Phi, etc.
9696
- **Integrated methods**: (Continuous) pre-training, (multimodal) supervised fine-tuning, reward modeling, PPO, DPO, KTO, ORPO, etc.
9797
- **Scalable resources**: 16-bit full-tuning, freeze-tuning, LoRA and 2/3/4/5/6/8-bit QLoRA via AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ.
9898
- **Advanced algorithms**: [GaLore](https://github.com/jiaweizzhao/GaLore), [BAdam](https://github.com/Ledzy/BAdam), [APOLLO](https://github.com/zhuhanqing/APOLLO), [Adam-mini](https://github.com/zyushun/Adam-mini), [Muon](https://github.com/KellerJordan/Muon), [OFT](https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft), DoRA, LongLoRA, LLaMA Pro, Mixture-of-Depths, LoRA+, LoftQ and PiSSA.
@@ -280,11 +280,10 @@ Read technical notes:
280280
| ----------------------------------------------------------------- | -------------------------------- | -------------------- |
281281
| [Audio-Flamingo-3](https://huggingface.co/nvidia/audio-flamingo-3-hf) | 8B | audio_flamingo_3 |
282282
| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
283-
| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere |
284283
| [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
285284
| [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
286285
| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
287-
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink |
286+
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie_nothink |
288287
| [Falcon/Falcon H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 |
289288
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 |
290289
| [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n |
@@ -296,7 +295,7 @@ Read technical notes:
296295
| [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan |
297296
| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
298297
| [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl |
299-
| [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 |
298+
| [Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 |
300299
| [Kimi-VL](https://huggingface.co/moonshotai) | 16B | kimi_vl |
301300
| [Ling 2.0 (mini/flash)](https://huggingface.co/inclusionAI) | 16B/100B | bailing_v2 |
302301
| [LFM 2.5 (VL)](https://huggingface.co/LiquidAI) | 1.2B/1.6B | lfm2/lfm2_vl |
@@ -309,18 +308,17 @@ Read technical notes:
309308
| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
310309
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
311310
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 |
312-
| [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 |
311+
| [MiniCPM 4](https://huggingface.co/openbmb) | 0.5B/8B | cpm4 |
313312
| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
314313
| [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models) | 229B/456B | minimax1/minimax2 |
315314
| [Ministral 3](https://huggingface.co/mistralai) | 3B/8B/14B | ministral3 |
316315
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
317-
| [OLMo](https://huggingface.co/allenai) | 1B/7B | - |
318316
| [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma |
319317
| [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi |
320318
| [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small |
321-
| [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 |
319+
| [Phi-4-mini/Phi-4](https://huggingface.co/microsoft) | 3.8B/14B | phi4_mini/phi4 |
322320
| [Pixtral](https://huggingface.co/mistralai) | 12B | pixtral |
323-
| [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
321+
| [Qwen2 (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
324322
| [Qwen3 (MoE/Instruct/Thinking/Next)](https://huggingface.co/Qwen) | 0.6B/1.7B/4B/8B/14B/32B/80B/235B | qwen3/qwen3_nothink |
325323
| [Qwen2-Audio](https://huggingface.co/Qwen) | 7B | qwen2_audio |
326324
| [Qwen2.5-Omni](https://huggingface.co/Qwen) | 3B/7B | qwen2_omni |
@@ -329,9 +327,6 @@ Read technical notes:
329327
| [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl |
330328
| [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder |
331329
| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
332-
| [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | qwen3 |
333-
| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi |
334-
| [Youtu-LLM](https://huggingface.co/tencent/) | 2B | youtu |
335330
| [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan |
336331

337332
> [!NOTE]

README_zh.md

Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
9494

9595
## 项目特色
9696

97-
- **多种模型**:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen、Qwen2-VL、DeepSeek、Yi、Gemma、ChatGLM、Phi 等等。
97+
- **多种模型**:LLaMA、LLaVA、Mistral、Mixtral-MoE、Qwen3、Qwen3-VL、DeepSeek、Gemma、GLM、Phi 等等。
9898
- **集成方法**:(增量)预训练、(多模态)指令监督微调、奖励模型训练、PPO 训练、DPO 训练、KTO 训练、ORPO 训练等等。
9999
- **多种精度**:16 比特全参数微调、冻结微调、LoRA 微调和基于 AQLM/AWQ/GPTQ/LLM.int8/HQQ/EETQ 的 2/3/4/5/6/8 比特 QLoRA 微调。
100100
- **先进算法**[GaLore](https://github.com/jiaweizzhao/GaLore)[BAdam](https://github.com/Ledzy/BAdam)[APOLLO](https://github.com/zhuhanqing/APOLLO)[Adam-mini](https://github.com/zyushun/Adam-mini)[Muon](https://github.com/KellerJordan/Muon)[OFT](https://github.com/huggingface/peft/tree/main/src/peft/tuners/oft)、DoRA、LongLoRA、LLaMA Pro、Mixture-of-Depths、LoRA+、LoftQ 和 PiSSA。
@@ -282,11 +282,10 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
282282
| ----------------------------------------------------------------- | -------------------------------- | -------------------- |
283283
| [Audio-Flamingo-3](https://huggingface.co/nvidia/audio-flamingo-3-hf) | 8B | audio_flamingo_3 |
284284
| [BLOOM/BLOOMZ](https://huggingface.co/bigscience) | 560M/1.1B/1.7B/3B/7.1B/176B | - |
285-
| [Command R](https://huggingface.co/CohereForAI) | 35B/104B | cohere |
286285
| [DeepSeek (LLM/Code/MoE)](https://huggingface.co/deepseek-ai) | 7B/16B/67B/236B | deepseek |
287286
| [DeepSeek 3-3.2](https://huggingface.co/deepseek-ai) | 236B/671B | deepseek3 |
288287
| [DeepSeek R1 (Distill)](https://huggingface.co/deepseek-ai) | 1.5B/7B/8B/14B/32B/70B/671B | deepseekr1 |
289-
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie/ernie_nothink |
288+
| [ERNIE-4.5](https://huggingface.co/baidu) | 0.3B/21B/300B | ernie_nothink |
290289
| [Falcon/Falcon H1](https://huggingface.co/tiiuae) | 0.5B/1.5B/3B/7B/11B/34B/40B/180B | falcon/falcon_h1 |
291290
| [Gemma/Gemma 2/CodeGemma](https://huggingface.co/google) | 2B/7B/9B/27B | gemma/gemma2 |
292291
| [Gemma 3/Gemma 3n](https://huggingface.co/google) | 270M/1B/4B/6B/8B/12B/27B | gemma3/gemma3n |
@@ -298,7 +297,7 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
298297
| [Hunyuan (MT)](https://huggingface.co/tencent/) | 7B | hunyuan |
299298
| [InternLM 2-3](https://huggingface.co/internlm) | 7B/8B/20B | intern2 |
300299
| [InternVL 2.5-3.5](https://huggingface.co/OpenGVLab) | 1B/2B/4B/8B/14B/30B/38B/78B/241B | intern_vl |
301-
| [InternLM/Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 |
300+
| [Intern-S1-mini](https://huggingface.co/internlm/) | 8B | intern_s1 |
302301
| [Kimi-VL](https://huggingface.co/moonshotai) | 16B | kimi_vl |
303302
| [Ling 2.0 (mini/flash)](https://huggingface.co/inclusionAI) | 16B/100B | bailing_v2 |
304303
| [LFM 2.5 (VL)](https://huggingface.co/LiquidAI) | 1.2B/1.6B | lfm2/lfm2_vl |
@@ -311,18 +310,17 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
311310
| [LLaVA-NeXT](https://huggingface.co/llava-hf) | 7B/8B/13B/34B/72B/110B | llava_next |
312311
| [LLaVA-NeXT-Video](https://huggingface.co/llava-hf) | 7B/34B | llava_next_video |
313312
| [MiMo](https://huggingface.co/XiaomiMiMo) | 7B/309B | mimo/mimo_v2 |
314-
| [MiniCPM 1-4.1](https://huggingface.co/openbmb) | 0.5B/1B/2B/4B/8B | cpm/cpm3/cpm4 |
313+
| [MiniCPM 4](https://huggingface.co/openbmb) | 0.5B/8B | cpm4 |
315314
| [MiniCPM-o-2.6/MiniCPM-V-2.6](https://huggingface.co/openbmb) | 8B | minicpm_o/minicpm_v |
316315
| [MiniMax-M1/MiniMax-M2](https://huggingface.co/MiniMaxAI/models) | 229B/456B | minimax1/minimax2 |
317316
| [Ministral 3](https://huggingface.co/mistralai) | 3B/8B/14B | ministral3 |
318317
| [Mistral/Mixtral](https://huggingface.co/mistralai) | 7B/8x7B/8x22B | mistral |
319-
| [OLMo](https://huggingface.co/allenai) | 1B/7B | - |
320318
| [PaliGemma/PaliGemma2](https://huggingface.co/google) | 3B/10B/28B | paligemma |
321319
| [Phi-3/Phi-3.5](https://huggingface.co/microsoft) | 4B/14B | phi |
322320
| [Phi-3-small](https://huggingface.co/microsoft) | 7B | phi_small |
323-
| [Phi-4](https://huggingface.co/microsoft) | 14B | phi4 |
321+
| [Phi-4-mini/Phi-4](https://huggingface.co/microsoft) | 3.8B/14B | phi4_mini/phi4 |
324322
| [Pixtral](https://huggingface.co/mistralai) | 12B | pixtral |
325-
| [Qwen (1-2.5) (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
323+
| [Qwen2 (Code/Math/MoE/QwQ)](https://huggingface.co/Qwen) | 0.5B/1.5B/3B/7B/14B/32B/72B/110B | qwen |
326324
| [Qwen3 (MoE/Instruct/Thinking/Next)](https://huggingface.co/Qwen) | 0.6B/1.7B/4B/8B/14B/32B/80B/235B | qwen3/qwen3_nothink |
327325
| [Qwen2-Audio](https://huggingface.co/Qwen) | 7B | qwen2_audio |
328326
| [Qwen2.5-Omni](https://huggingface.co/Qwen) | 3B/7B | qwen2_omni |
@@ -331,9 +329,6 @@ https://github.com/user-attachments/assets/43b700c6-a178-41db-b1f8-8190a5d3fcfc
331329
| [Qwen3-VL](https://huggingface.co/Qwen) | 2B/4B/8B/30B/32B/235B | qwen3_vl |
332330
| [Seed (OSS/Coder)](https://huggingface.co/ByteDance-Seed) | 8B/36B | seed_oss/seed_coder |
333331
| [StarCoder 2](https://huggingface.co/bigcode) | 3B/7B/15B | - |
334-
| [VibeThinker-1.5B](https://huggingface.co/WeiboAI) | 1.5B | qwen3 |
335-
| [Yi/Yi-1.5 (Code)](https://huggingface.co/01-ai) | 1.5B/6B/9B/34B | yi |
336-
| [Youtu-LLM](https://huggingface.co/tencent/) | 2B | youtu |
337332
| [Yuan 2](https://huggingface.co/IEITYuan) | 2B/51B/102B | yuan |
338333

339334
> [!NOTE]

pyproject.toml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,6 @@ classifiers = [
3030
"License :: OSI Approved :: Apache Software License",
3131
"Operating System :: OS Independent",
3232
"Programming Language :: Python :: 3",
33-
"Programming Language :: Python :: 3.10",
3433
"Programming Language :: Python :: 3.11",
3534
"Programming Language :: Python :: 3.12",
3635
"Programming Language :: Python :: 3.13",

0 commit comments

Comments
 (0)