Skip to content

Support for GLM ModelsΒ #1701

@ashmalvayani

Description

@ashmalvayani

⚠️ Please check that this feature request hasn't been suggested before.

  • I searched previous Ideas in Discussions didn't find any similar feature requests.
  • I searched previous Issues didn't find any similar feature requests.

πŸ”– Feature description

Can you add the support for the recent GLM models that are released like GLM-4-9b-chat models in axolotl?

βœ”οΈ Solution

I have tried with the following script but this doesn't run sucessfully:

base_model: THUDM/glm-4-9b-chat
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer

trust_remote_code: true

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
    - path: Datasets2.json
      ds_type: json
      type: alpaca
dataset_prepared_path: last_run_prepared/glm
val_set_size: 0
output_dir: ./outputs/glm-lora

sequence_len: 2048
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 64
lora_alpha: 32
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project: ArabicGLM-8B
wandb_entity:
wandb_watch:
wandb_name: ArabicGLM-8B
wandb_log_model:

gradient_accumulation_steps: 8
micro_batch_size: 4
num_epochs: 1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: auto
fp16:
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: 

warmup_steps: 95
evals_per_epoch: 4
eval_table_size:
eval_max_new_tokens: 128
save_total_limit: 5
save_steps: 200

debug:
deepspeed: deepspeed_configs/zero2.json
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
  pad_token: "<|endoftext|>"
  eos_token: "<|endoftext|>"

Error Trace:

  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/monkeypatch/data/batch_dataset_fetcher.py", line 32, in fetch
    return self.collate_fn(data)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/utils/collators.py", line 106, in __call__
    features = self.tokenizer.pad(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3355, in pad
    return BatchEncoding(batch_outputs, tensor_type=return_tensors)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 224, in __init__
    self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 775, in convert_to_tensors
    raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`position_ids` in this case) have excessive nesting (inputs type `list` where ty
pe `int` is expected).
Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 759, in convert_to_tensors
    tensor = as_tensor(value)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 721, in as_tensor
    return torch.tensor(value)
ValueError: expected sequence of length 773 at dim 1 (got 438) 

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 59, in <module>
    fire.Fire(do_cli)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
    return do_train(parsed_cfg, parsed_cli_args)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/cli/train.py", line 55, in do_train
    return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/train.py", line 163, in train
    trainer.train(resume_from_checkpoint=resume_from_checkpoint)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 1859, in train
    return inner_training_loop(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/trainer.py", line 2165, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/accelerate/data_loader.py", line 452, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/monkeypatch/data/batch_dataset_fetcher.py", line 32, in fetch
    return self.collate_fn(data)
  File "/mnt/beegfs/fahad.khan/axolotl/src/axolotl/utils/collators.py", line 106, in __call__
    features = self.tokenizer.pad(
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3355, in pad
    return BatchEncoding(batch_outputs, tensor_type=return_tensors)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 224, in __init__
    self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
  File "/home/ashmal.vayani/anaconda3/envs/axolotl/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 775, in convert_to_tensors
    raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`position_ids` in this case) have excessive nesting (inputs type `list` where type `int` is expected).

I think the main error lies in this line because I've set the padding to True by default:

ValueError: expected sequence of length 773 at dim 1 (got 438)

❓ Alternatives

πŸ“ Additional Context

Acknowledgements

  • My issue title is concise, descriptive, and in title casing.
  • I have searched the existing issues to make sure this feature has not been requested yet.
  • I have provided enough information for the maintainers to understand and evaluate this request.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions