寻求帮助：稍微给ChatGLM添加了一个外挂的网络，但是训练完成后模型使用存在一定的问题 #336

LittleGreenYuan · 2023-11-17T01:14:12Z

LittleGreenYuan
Nov 17, 2023

自定义的网络如下：
···
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM,default_data_collator
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)
config = AutoConfig.from_pretrained(MODEL_PATH, trust_remote_code=True)

config.pre_seq_len = 10
config.prefix_projection = True
class ImageGLM(nn.Module):
def init(self,MODEL_PATH,config):
super().init()
self.GLM = AutoModel.from_pretrained(MODEL_PATH, config=config, trust_remote_code=True)
#for param in self.GLM.parameters():
# param.requires_grad = False
self.layer0 = nn.Linear(4096, 65024)
self.layer1 = nn.Linear(65024, 1024)
self.layer2 = nn.Linear(1024, 65024)

    self.CNN1 = nn.Conv2d(in_channels=3, out_channels=4, kernel_size=2, stride=1, padding=0, dilation=1, groups=1, bias=False, padding_mode='zeros', device=None, dtype=None)
    self.CNN2 = nn.Conv2d(in_channels=4, out_channels=8, kernel_size=3, stride=1, padding=0, dilation=1, groups=1, bias=False, padding_mode='zeros', device=None, dtype=None)
    self.CNN3 = nn.Conv2d(in_channels=8, out_channels=3, kernel_size=2, stride=1, padding=0, dilation=1, groups=1, bias=False, padding_mode='zeros', device=None, dtype=None)
    self.CNNLinearLayer1 = nn.Linear(3*80*80, 65024)
    self.torch_resize = Resize([84,84]) # 定义Resize类对象
    
def forward(self,x):
    input_embeds = self.GLM.transformer.embedding(x['input_ids'])
    input_embeds = torch.transpose(input_embeds,0, 1)
    print(f"input_embeds : {input_embeds.shape} {type(input_embeds)}")
    input_embeds = torch.tensor(input_embeds, dtype=torch.float32)
    output_tensor = self.GLM(x['input_ids'])
    output_tensor = torch.tensor(output_tensor['logits'], dtype=torch.float32)
    print(f"output_tensor['logits'] : {output_tensor.shape} {type(output_tensor)}")
    print(f"x['image'] : {x['image'].shape} {type(x['image'])}")
    
    im1_resize = self.torch_resize(x['image'])
    CNNout = self.CNN1(im1_resize)
    CNNout = self.CNN2(CNNout)
    CNNout = self.CNN3(CNNout)
    Num, Channels, Height, Width = CNNout.shape
    #等价于x = x.view(Num, -1)
    CNNout = CNNout.view(Num, Channels*Height*Width)
    CNNout = self.CNNLinearLayer1(CNNout)
    CNNout = CNNout.unsqueeze(1)
    
    
    
    lora_zero = self.layer0(input_embeds)
    print(f"{output_tensor.shape=}\n{lora_zero.shape=}\n{CNNout.shape=}")
    
    lora_fir = self.layer1(output_tensor+lora_zero+CNNout)
    out = self.layer2(lora_fir)
    return out

···
trainable params: 1765767088 || all params: 8009351088 || trainable%: 22.04631896640863
训练过程中直接是model(**inputs)，完成训练后，model(**test)生成的结果不尽人意，请问我应该如何处理？

zRzRzRzRzRzRzR · 2023-11-23T02:51:37Z

zRzRzRzRzRzRzR
Nov 23, 2023
Maintainer

请在 #253 提出吧

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

寻求帮助：稍微给ChatGLM添加了一个外挂的网络，但是训练完成后模型使用存在一定的问题 #336

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

寻求帮助：稍微给ChatGLM添加了一个外挂的网络，但是训练完成后模型使用存在一定的问题 #336

Uh oh!

LittleGreenYuan Nov 17, 2023

Replies: 1 comment

Uh oh!

zRzRzRzRzRzRzR Nov 23, 2023 Maintainer

LittleGreenYuan
Nov 17, 2023

zRzRzRzRzRzRzR
Nov 23, 2023
Maintainer