Skip to content

Commit 368a858

Browse files
committed
entry readme
1 parent 86c68a8 commit 368a858

File tree

7 files changed

+73
-54
lines changed

7 files changed

+73
-54
lines changed

README.md

Lines changed: 32 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,12 @@
4141

4242

4343
## News
44+
🔥🔥 [2024/01/17] We released MFTCoder v0.3.0, mainly for MFTCoder-accelerate. It now supports new models like Mixtral(MoE), Deepseek-coder, chatglm3. It supports FSDP as an option. It also supports Self-paced Loss as a solution for convergence balance in Multitask Fine-tuning.
45+
46+
🔥🔥 [2024/01/17] [CodeFuse-Deepseek-33B](https://huggingface.co/codefuse-ai/CodeFuse-Deepseek-33B) has been released, achieving a pass@1 (greedy decoding) score of 78.7% on HumanEval. It achieves top1 win-rate on Bigcode Leardboard.
47+
48+
🔥🔥 [2024/01/17] [CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8X7B) has been released, achieving a pass@1 (greedy decoding) score of 56.1% on HumanEval.
49+
4450
🔥🔥 [2023/11/07] [MFTCoder Paper](https://arxiv.org/abs/2311.02303) has been released on Arxiv, which discloses technique details of multi-task-fine-tuning.
4551

4652
🔥🔥 [2023/10/20] [CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) has been released, achieving a pass@1 (greedy decoding) score of 48.8% on HumanEval, which gains 16% absolute improvement over the base model [Qwen-14b](https://huggingface.co/Qwen/Qwen-14B)
@@ -56,19 +62,21 @@
5662
### HumanEval Performance
5763
| Model | HumanEval(Pass@1) | Date |
5864
|:----------------------------|:-----------------:|:-------:|
59-
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
60-
|**CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
61-
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
62-
| GPT-4(zero-shot) | 67.0% | 2023/03 |
63-
| PanGu-Coder2 15B | 61.6% | 2023/08 |
64-
| **CodeFuse-StarCoder-15B** | **54.9%** | 2023/08 |
65-
| CodeLlama-34b-Python | 53.7% | 2023/08 |
66-
| **CodeFuse-QWen-14B** | **48.8%** | 2023/10 |
67-
| CodeLlama-34b | 48.8% | 2023/08 |
68-
| GPT-3.5(zero-shot) | 48.1% | 2022/11 |
69-
| OctoCoder | 46.2% | 2023/08 |
70-
| StarCoder-15B | 33.6% | 2023/05 |
71-
| QWen-14B | 32.3% | 2023/10 |
65+
| **CodeFuse-Deepseek-33B** | **78.7%** | 2024/01 |
66+
| **CodeFuse-Mixtral-8x7B** | **56.1%** | 2024/01 |
67+
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
68+
| **CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
69+
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
70+
| GPT-4(zero-shot) | 67.0% | 2023/03 |
71+
| PanGu-Coder2 15B | 61.6% | 2023/08 |
72+
| **CodeFuse-StarCoder-15B** | **54.9%** | 2023/08 |
73+
| CodeLlama-34b-Python | 53.7% | 2023/08 |
74+
| **CodeFuse-QWen-14B** | **48.8%** | 2023/10 |
75+
| CodeLlama-34b | 48.8% | 2023/08 |
76+
| GPT-3.5(zero-shot) | 48.1% | 2022/11 |
77+
| OctoCoder | 46.2% | 2023/08 |
78+
| StarCoder-15B | 33.6% | 2023/05 |
79+
| QWen-14B | 32.3% | 2023/10 |
7280

7381

7482
## Articles
@@ -88,7 +96,7 @@ In MFTCoder, we released two codebases for finetuning Large Language Models:
8896
The aim of this project is to foster collaboration and share advancements in large language models, particularly within the domain of code development.
8997

9098
### Frameworks
91-
![img.png](./assets/img.png)
99+
![img.jpg](./assets/img.jpg)
92100

93101
### Highlights
94102
:white_check_mark: **Multi-task**: Train models on multiple tasks while maintaining a balance between them. The models can even generalize to new, previously unseen tasks.
@@ -133,17 +141,18 @@ If you want to explore some new framework like atorch, you could check:
133141

134142
## Models
135143

136-
We are excited to release the following two CodeLLMs trained by MFTCoder, now available on Hugging Face:
137-
144+
We are excited to release the following two CodeLLMs trained by MFTCoder, now available on both HuggingFace and ModelScope:
138145

139-
| Model | Base Model | Num of examples trained | Batch Size | Seq Length |
140-
|--------------------------------------------------------------------------------------------|--------------------|-------------------------|------------|------------|
141-
| [🔥🔥🔥 CodeFuse-CodeLlama-34B](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 600k | 80 | 4096 |
142-
| [🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python| | | 4096 |
143-
| [🔥🔥🔥 CodeFuse-StarCoder-15B](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) | Starcoder | 600k | 256 | 4096 |
144-
| [🔥🔥🔥 CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 1100k | 256 | 4096 |
145-
| [🔥 CodeFuse-13B](https://huggingface.co/codefuse-ai/CodeFuse-13B) | CodeFuse-13B | 66k | 64 | 4096 |
146146

147+
| Model | | | Base Model | Num of examples trained | Batch Size | Seq Length |
148+
|--------------------------------------|------------------------|---|----------------------|------|------------|------------|
149+
| 🔥🔥 CodeFuse-Deepseek-33B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-Deepseek-33B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Deepseek-33B) | Deepseek-coder-33B | 60万 | 80 | 4096 |
150+
| 🔥🔥 CodeFuse-Mixtral-8x7B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B) | Mixtral-8x7B | 60万 | 80 | 4096 |
151+
| 🔥🔥 CodeFuse-CodeLlama-34B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 60万 | 80 | 4096 |
152+
| 🔥🔥 CodeFuse-CodeLlama-34B-4bits |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
153+
| 🔥🔥 CodeFuse-StarCoder-15B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B) | StarCoder-15B | 60万 | 80 | 4096 |
154+
| 🔥🔥 CodeFuse-QWen-14B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 110万 | 256 | 4096 |
155+
| 🔥🔥 CodeFuse-CodeGeex2-6B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B) | CodeGeex2-6B | 110万 | 256 | 4096 |
147156

148157

149158
## Datasets

README_cn.md

Lines changed: 34 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,11 @@
3939

4040

4141
## 新闻
42+
🔥🔥 [2024/01/17] MFTCoder新版v0.3.0发布。新增对Mixtral(MoE), deepseek等模型的支持;新增支持FSDP(Fully Sharded Data Parallel);新增Self-paced Loss, 支持多任务收敛均衡。 感兴趣详见微信公众号CodeFuse[文章](https://mp.weixin.qq.com/s/PCQPkvbvfxSPzsqjOILCDw)
43+
44+
🔥🔥 [2024/01/17] 开源了[CodeFuse-Deepseek-33B](https://huggingface.co/codefuse-ai/CodeFuse-Deepseek-33B)模型,在HumanEval pass@1(greedy decoding)上可以达到78.7%。感兴趣详见微信公众号CodeFuse[文章](https://mp.weixin.qq.com/s/PCQPkvbvfxSPzsqjOILCDw)
45+
46+
🔥🔥 [2024/01/17] 开源了[CodeFuse-Mixtral-8x7B](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B)模型,在HumanEval pass@1(greedy decoding)上可以达到56.1%。感兴趣详见微信公众号CodeFuse[文章](https://mp.weixin.qq.com/s/PCQPkvbvfxSPzsqjOILCDw)
4247

4348
🔥🔥 [2023/11/07] [MFTCoder论文](https://arxiv.org/abs/2311.02303)在Arxiv公布,介绍了多任务微调的技术细节。
4449

@@ -53,21 +58,23 @@
5358
🔥 [2023/08/26]MFTCoder支持使用LoRA/QLoRA对Code Llama、Llama、Llama2、StarCoder、ChatGLM2、CodeGeeX2、Qwen和GPT-NeoX模型进行微调。
5459

5560
### HumanEval表现
56-
| 模型 | HumanEval(Pass@1) | 日期 |
57-
|:----------------------------|:-----------------:|:-------:|
58-
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
59-
|**CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
60-
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
61-
| GPT-4(zero-shot) | 67.0% | 2023/03 |
62-
| PanGu-Coder2 15B | 61.6% | 2023/08 |
63-
| **CodeFuse-StarCoder-15B** | **54.9%** | 2023/08 |
64-
| CodeLlama-34b-Python | 53.7% | 2023/08 |
65-
| **CodeFuse-QWen-14B** | **48.8%** | 2023/10 |
66-
| CodeLlama-34b | 48.8% | 2023/08 |
67-
| GPT-3.5(zero-shot) | 48.1% | 2022/11 |
68-
| OctoCoder | 46.2% | 2023/08 |
69-
| StarCoder-15B | 33.6% | 2023/05 |
70-
| QWen-14B | 32.3% | 2023/10 |
61+
| 模型 | HumanEval(Pass@1) | 日期 |
62+
|:---------------------------------|:-----------------:|:-------:|
63+
| **CodeFuse-Deepseek-33B** | **78.7%** | 2024/01 |
64+
| **CodeFuse-Mixtral-8x7B** | **56.1%** | 2024/01 |
65+
| **CodeFuse-CodeLlama-34B** | **74.4%** | 2023/09 |
66+
| **CodeFuse-CodeLlama-34B-4bits** | **73.8%** | 2023/09 |
67+
| WizardCoder-Python-34B-V1.0 | 73.2% | 2023/08 |
68+
| GPT-4(zero-shot) | 67.0% | 2023/03 |
69+
| PanGu-Coder2 15B | 61.6% | 2023/08 |
70+
| **CodeFuse-StarCoder-15B** | **54.9%** | 2023/08 |
71+
| CodeLlama-34b-Python | 53.7% | 2023/08 |
72+
| **CodeFuse-QWen-14B** | **48.8%** | 2023/10 |
73+
| CodeLlama-34b | 48.8% | 2023/08 |
74+
| GPT-3.5(zero-shot) | 48.1% | 2022/11 |
75+
| OctoCoder | 46.2% | 2023/08 |
76+
| StarCoder-15B | 33.6% | 2023/05 |
77+
| QWen-14B | 32.3% | 2023/10 |
7178

7279

7380
## 文章
@@ -82,7 +89,7 @@
8289
**Codefuse-MFTCoder** 是一个开源的多任务代码大语言模型项目,包含代码大模型的模型、数据、训练等。我们希望通过开源,分享交流大语言模型在代码领域的进步。
8390

8491
### 项目框架
85-
![img_1.png](./assets/img_1.png)
92+
![img_1.jpg](./assets/img_1.jpg)
8693

8794
### 项目优势
8895
:white_check_mark: **多任务**:一个模型同时支持多个任务,会保证多个任务之间的平衡,甚至可以泛化到新的没有见过的任务上去;
@@ -125,15 +132,18 @@ sh init_env.sh
125132

126133
## 模型
127134

128-
使用本项目的训练代码,以及上述训练数据,我们训练并在huggingface开源了以下模型。
135+
使用本项目的训练代码,以及上述训练数据,我们训练并在huggingface, modelscope开源了以下模型。
136+
137+
| 模型 | HuggingFace links | ModelScope links | 基座模型 | 训练数据 | Batch Size | Seq Length |
138+
|--------------------------------------|------------------------|---|----------------------|------|------------|------------|
139+
| 🔥🔥🔥 CodeFuse-Deepseek-33B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-Deepseek-33B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Deepseek-33B) | Deepseek-coder-33B | 60万 | 80 | 4096 |
140+
| 🔥🔥🔥 CodeFuse-Mixtral-8x7B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-Mixtral-8x7B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-Mixtral-8x7B) | Mixtral-8x7B | 60万 | 80 | 4096 |
141+
| 🔥🔥🔥 CodeFuse-CodeLlama-34B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 60万 | 80 | 4096 |
142+
| 🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
143+
| 🔥🔥🔥 CodeFuse-StarCoder-15B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-StarCoder-15B) | StarCoder-15B | 60万 | 80 | 4096 |
144+
| 🔥🔥🔥 CodeFuse-QWen-14B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 110万 | 256 | 4096 |
145+
| 🔥🔥🔥 CodeFuse-CodeGeex2-6B |[h-link](https://huggingface.co/codefuse-ai/CodeFuse-CodeGeex2-6B) |[m-link](https://modelscope.cn/models/codefuse-ai/CodeFuse-CodeGeex2-6B) | CodeGeex2-6B | 110万 | 256 | 4096 |
129146

130-
| 模型 | 基座模型 | 训练数据 | Batch Size | Seq Length |
131-
|---------------------------------------------------------------|----------------------|------|------------|------------|
132-
| [🔥🔥🔥 CodeFuse-CodeLlama-34B](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B) | CodeLlama-34b-Python | 60万 | 80 | 4096 |
133-
| [🔥🔥🔥 CodeFuse-CodeLlama-34B-4bits](https://huggingface.co/codefuse-ai/CodeFuse-CodeLlama-34B-4bits) | CodeLlama-34b-Python | | | 4096 |
134-
| [🔥🔥🔥 CodeFuse-StarCoder-15B](https://huggingface.co/codefuse-ai/CodeFuse-StarCoder-15B) | CodeLlama-34b-Python | 60万 | 80 | 4096 |
135-
| [🔥🔥🔥 CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B) | Qwen-14b | 110万 | 256 | 4096 |
136-
| [🔥 CodeFuse-13B](https://huggingface.co/codefuse-ai/CodeFuse-13B) | CodeFuse-13B-Base | 6.6万 | 64 | 4096 |
137147

138148

139149

assets/img.jpg

233 KB
Loading

assets/img.png

-134 KB
Binary file not shown.

assets/img_1.jpg

224 KB
Loading

assets/img_1.png

-130 KB
Binary file not shown.

mftcoder_accelerate/src/pefts/mft_accelerate.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,8 @@ def get_task_mask(args, task_id):
7373
return task_mask
7474

7575

76-
def get_ltor_masks_and_position_ids(data):
77-
"""Build masks and position id for left to right model."""
76+
def get_attention_mask_and_position_ids(data):
77+
"""Build masks and position ids if you need to"""
7878

7979
# Extract batch size and sequence length.
8080
batch_size, seq_length = data.size()
@@ -124,11 +124,11 @@ def __call__(self, instances):
124124
result_batch['labels'] = input_ids[:, 1:max_pos].contiguous()
125125

126126
# Get the masks and position ids.
127-
if self.args.model_type == 'phi':
128-
result_batch['attention_mask'], result_batch['position_ids'] = None, None
129-
else:
130-
result_batch['attention_mask'], result_batch['position_ids'] = get_ltor_masks_and_position_ids(
131-
data=result_batch['input_ids'])
127+
# For decoder-only models, attention_mask and position_ids should be None and transformers will create them.
128+
result_batch['attention_mask'], result_batch['position_ids'] = None, None
129+
130+
# if you want to be compatible with non-gpt(non-causal)models, something you can do here
131+
# result_batch['attention_mask'], result_batch['position_ids'] = get_attention_mask_and_position_ids(data=result_batch['input_ids'])
132132

133133
if task_id is not None:
134134
task_id = torch.tensor(np.array(task_id))

0 commit comments

Comments
 (0)