Skip to content

Conversation

Farewell-CK
Copy link
Contributor

The latest version of the Paddleformers library is 0.2.1. If the default download is the latest, the following error will occur.

/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:717: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
W0907 09:45:14.021631  3142 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.8, Runtime API Version: 12.6
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:717: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
LAUNCH INFO 2025-09-07 09:45:17,460 -----------  Configuration  ----------------------
LAUNCH INFO 2025-09-07 09:45:17,460 auto_cluster_config: 0
LAUNCH INFO 2025-09-07 09:45:17,460 auto_parallel_config: None
LAUNCH INFO 2025-09-07 09:45:17,460 auto_tuner_json: None
LAUNCH INFO 2025-09-07 09:45:17,460 devices: 0
LAUNCH INFO 2025-09-07 09:45:17,460 elastic_level: -1
LAUNCH INFO 2025-09-07 09:45:17,460 elastic_timeout: 30
LAUNCH INFO 2025-09-07 09:45:17,460 enable_gpu_log: True
LAUNCH INFO 2025-09-07 09:45:17,460 gloo_port: 6767
LAUNCH INFO 2025-09-07 09:45:17,460 host: None
LAUNCH INFO 2025-09-07 09:45:17,460 ips: None
LAUNCH INFO 2025-09-07 09:45:17,460 job_id: default
LAUNCH INFO 2025-09-07 09:45:17,460 legacy: False
LAUNCH INFO 2025-09-07 09:45:17,460 log_dir: erniekit_dist_log
LAUNCH INFO 2025-09-07 09:45:17,460 log_level: INFO
LAUNCH INFO 2025-09-07 09:45:17,460 log_overwrite: False
LAUNCH INFO 2025-09-07 09:45:17,460 master: 127.0.0.1:8080
LAUNCH INFO 2025-09-07 09:45:17,460 max_restart: 3
LAUNCH INFO 2025-09-07 09:45:17,460 nnodes: 1
LAUNCH INFO 2025-09-07 09:45:17,460 nproc_per_node: None
LAUNCH INFO 2025-09-07 09:45:17,460 rank: -1
LAUNCH INFO 2025-09-07 09:45:17,460 run_mode: collective
LAUNCH INFO 2025-09-07 09:45:17,460 server_num: None
LAUNCH INFO 2025-09-07 09:45:17,460 servers: 
LAUNCH INFO 2025-09-07 09:45:17,460 sort_ip: False
LAUNCH INFO 2025-09-07 09:45:17,460 start_port: 6070
LAUNCH INFO 2025-09-07 09:45:17,460 trainer_num: None
LAUNCH INFO 2025-09-07 09:45:17,460 trainers: 
LAUNCH INFO 2025-09-07 09:45:17,461 training_script: /home/aistudio/ERNIE/erniekit/launcher.py
LAUNCH INFO 2025-09-07 09:45:17,461 training_script_args: ['train', '/home/aistudio/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_32k.yaml']
LAUNCH INFO 2025-09-07 09:45:17,461 with_gloo: 1
LAUNCH INFO 2025-09-07 09:45:17,461 --------------------------------------------------
LAUNCH INFO 2025-09-07 09:45:17,461 Job: default, mode collective, replicas 1[1:1], elastic False
LAUNCH INFO 2025-09-07 09:45:17,462 Run Pod: wsqqoa, replicas 1, status ready
LAUNCH INFO 2025-09-07 09:45:17,474 Watching Pod: wsqqoa, replicas 1, status running
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:717: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
W0907 09:45:21.091935  3501 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.8, Runtime API Version: 12.6
[2025-09-07 09:45:21,425] [    INFO] - user has defined resume_from_checkpoint: None
[2025-09-07 09:45:21,426] [    INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2025-09-07 09:45:21,426] [    INFO] - reset finetuning arguments global_batch_size to 8
[2025-09-07 09:45:21,426] [ WARNING] - eval_batch_size set to 1
[2025-09-07 09:45:21,457] [    INFO] - Tensor_parallel_degree = 1. Set sequence_parallel to False.
[2025-09-07 09:45:21,458] [   DEBUG] - ============================================================
[2025-09-07 09:45:21,458] [   DEBUG] -      Model Configuration Arguments      
[2025-09-07 09:45:21,458] [   DEBUG] - paddle commit id              : 0cd8342db88f96ef57398e888b050b5077ae6e73
[2025-09-07 09:45:21,458] [   DEBUG] - paddleformers commit id       : c8e46d9f40ac234baac500b0621873f2da7b9d47
[2025-09-07 09:45:21,458] [   DEBUG] - add_tail_layers               : False
[2025-09-07 09:45:21,458] [   DEBUG] - bos_token_id                  : 0
[2025-09-07 09:45:21,458] [   DEBUG] - continue_training             : True
[2025-09-07 09:45:21,458] [   DEBUG] - download_hub                  : None
[2025-09-07 09:45:21,458] [   DEBUG] - eos_token_id                  : 1
[2025-09-07 09:45:21,459] [   DEBUG] - fine_tuning                   : LoRA
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_gate_detach_matmul       : True
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_linear                   : False
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_rms_norm                 : True
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_rope                     : True
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_softmax_mask             : False
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_swiglu                   : True
[2025-09-07 09:45:21,459] [   DEBUG] - lora                          : True
[2025-09-07 09:45:21,459] [   DEBUG] - lora_alpha                    : -1
[2025-09-07 09:45:21,459] [   DEBUG] - lora_path                     : None
[2025-09-07 09:45:21,459] [   DEBUG] - lora_plus_scale               : 1.0
[2025-09-07 09:45:21,459] [   DEBUG] - lora_rank                     : 32
[2025-09-07 09:45:21,459] [   DEBUG] - loss_subbatch_seqlen          : 32768
[2025-09-07 09:45:21,459] [   DEBUG] - max_position_embeddings       : 4096
[2025-09-07 09:45:21,459] [   DEBUG] - model_name_or_path            : /home/aistudio/model/ERNIE-4.5-0.3B-Paddle
[2025-09-07 09:45:21,459] [   DEBUG] - moe_aux_loss_lambda           : 1e-05
[2025-09-07 09:45:21,459] [   DEBUG] - moe_gate                      : top2_fused
[2025-09-07 09:45:21,459] [   DEBUG] - moe_group                     : dummy
[2025-09-07 09:45:21,459] [   DEBUG] - moe_group_experts             : False
[2025-09-07 09:45:21,459] [   DEBUG] - moe_multimodal_dispatch_use_allgather: v2-alltoall-unpad
[2025-09-07 09:45:21,459] [   DEBUG] - moe_orthogonal_loss_lambda    : 0.0
[2025-09-07 09:45:21,459] [   DEBUG] - moe_use_aux_free              : None
[2025-09-07 09:45:21,460] [   DEBUG] - moe_use_hard_gate             : False
[2025-09-07 09:45:21,460] [   DEBUG] - moe_with_send_router_loss     : False
[2025-09-07 09:45:21,460] [   DEBUG] - moe_z_loss_lambda             : 0.0
[2025-09-07 09:45:21,460] [   DEBUG] - no_recompute_layers           : None
[2025-09-07 09:45:21,460] [   DEBUG] - num_nextn_predict_layers      : 0
[2025-09-07 09:45:21,460] [   DEBUG] - offload_recompute_inputs      : False
[2025-09-07 09:45:21,460] [   DEBUG] - pp_seg_method                 : layer:Ernie4_5_DecoderLayer|ErnieDecoderLayer|EmptyLayer
[2025-09-07 09:45:21,460] [   DEBUG] - recompute_granularity         : full
[2025-09-07 09:45:21,460] [   DEBUG] - recompute_use_reentrant       : True
[2025-09-07 09:45:21,460] [   DEBUG] - rope_3d                       : True
[2025-09-07 09:45:21,460] [   DEBUG] - rslora                        : False
[2025-09-07 09:45:21,460] [   DEBUG] - rslora_plus                   : False
[2025-09-07 09:45:21,460] [   DEBUG] - stage                         : SFT
[2025-09-07 09:45:21,460] [   DEBUG] - tensor_parallel_output        : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_attn_mask_start_row_indices: True
[2025-09-07 09:45:21,460] [   DEBUG] - use_flash_attention           : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_flash_attn_with_mask      : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_fused_head_and_loss_fn    : False
[2025-09-07 09:45:21,460] [   DEBUG] - use_mem_eff_attn              : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_recompute_loss_fn         : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_recompute_moe             : False
[2025-09-07 09:45:21,460] [   DEBUG] - use_sparse_flash_attn         : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_sparse_head_and_loss_fn   : True
[2025-09-07 09:45:21,461] [   DEBUG] - virtual_pp_degree             : 1
[2025-09-07 09:45:21,461] [   DEBUG] - vision_config                 : VisionArguments(attn_implementation='eager', attn_sep=True, depth=32, embed_dim=1280, hidden_act='quick_gelu', hidden_size=1280, in_channels=3, in_chans=3, mlp_ratio=4, model_type='DFNRope_vision_transformer', num_heads=16, patch_size=14, spatial_merge_size=2, spatial_patch_size=14, tensor_parallel_degree=4, use_recompute=True, vit_num_recompute_layers=10000)
[2025-09-07 09:45:21,461] [   DEBUG] - 
[2025-09-07 09:45:21,461] [   DEBUG] - ============================================================
[2025-09-07 09:45:21,461] [   DEBUG] -       Data Configuration Arguments      
[2025-09-07 09:45:21,461] [   DEBUG] - paddle commit id              : 0cd8342db88f96ef57398e888b050b5077ae6e73
[2025-09-07 09:45:21,461] [   DEBUG] - paddleformers commit id       : c8e46d9f40ac234baac500b0621873f2da7b9d47
[2025-09-07 09:45:21,461] [   DEBUG] - buffer_size                   : 500
[2025-09-07 09:45:21,461] [   DEBUG] - dataset_name                  : KnowledgeBasedSFTReader
[2025-09-07 09:45:21,461] [   DEBUG] - dataset_type                  : iterable
[2025-09-07 09:45:21,461] [   DEBUG] - eval_dataset_path             : /home/aistudio/datasets/val.jsonl
[2025-09-07 09:45:21,461] [   DEBUG] - eval_dataset_prob             : 1.0
[2025-09-07 09:45:21,461] [   DEBUG] - eval_dataset_type             : erniekit
[2025-09-07 09:45:21,461] [   DEBUG] - greedy_intokens               : True
[2025-09-07 09:45:21,461] [   DEBUG] - in_tokens_batching            : True
[2025-09-07 09:45:21,461] [   DEBUG] - mask_out_eos_token            : True
[2025-09-07 09:45:21,461] [   DEBUG] - max_prompt_len                : 2048
[2025-09-07 09:45:21,461] [   DEBUG] - max_seq_len                   : 32768
[2025-09-07 09:45:21,461] [   DEBUG] - num_comparisons               : 6
[2025-09-07 09:45:21,461] [   DEBUG] - num_samples_each_epoch        : 6000000
[2025-09-07 09:45:21,461] [   DEBUG] - offline_dataset_path          : None
[2025-09-07 09:45:21,462] [   DEBUG] - random_shuffle                : True
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_path             : None
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_prob             : None
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_path            : /home/aistudio/datasets/train.jsonl
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_prob            : 1.0
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_type            : erniekit
[2025-09-07 09:45:21,462] [   DEBUG] - use_cls                       : True
[2025-09-07 09:45:21,462] [   DEBUG] - 
[2025-09-07 09:45:21,462] [    INFO] - The global seed is set to 23, local seed is set to 24 and random seed is set to 23.
[2025-09-07 09:45:21,462] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True
[2025-09-07 09:45:21,462] [    INFO] - Start to load model ...
[2025-09-07 09:45:21,463] [    INFO] - Using download source: huggingface
[2025-09-07 09:45:21,463] [    INFO] - Loading configuration file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/config.json
[2025-09-07 09:45:21,463] [ WARNING] - You are using a model of type ernie4_5 to instantiate a model of type ernie4_5_moe. This is not supported for all configurations of models and can yield errors.
[2025-09-07 09:45:21,464] [    INFO] - Loading weights file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/model.safetensors
[2025-09-07 09:45:21,870] [    INFO] - Loaded weights file from disk, setting weights to model.
Traceback (most recent call last):
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 46, in <module>
    launch()
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 34, in launch
    run_tuner()
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 76, in run_tuner
    _training_function(config={"args": args})
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 55, in _training_function
    run_sft(model_args, data_args, generating_args, finetuning_args)
  File "/home/aistudio/ERNIE/erniekit/train/sft/workflow.py", line 362, in run_sft
    model = model_class.from_pretrained(
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/model_utils.py", line 2665, in from_pretrained
    model = cls(config, *init_args, **model_kwargs)
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/utils.py", line 290, in __impl__
    init_func(self, *args, **kwargs)
TypeError: Ernie4_5_MoeForCausalLM.__init__() got an unexpected keyword argument 'convert_from_hf'
LAUNCH INFO 2025-09-07 09:45:22,480 Pod failed
LAUNCH ERROR 2025-09-07 09:45:22,480 Container failed !!!
Container rank 0 status failed cmd ['/opt/conda/envs/python35-paddle120-env/bin/python', '-u', '/home/aistudio/ERNIE/erniekit/launcher.py', 'train', '/home/aistudio/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_32k.yaml'] code 1 log erniekit_dist_log/workerlog.0
LAUNCH INFO 2025-09-07 09:45:22,480 ------------------------- ERROR LOG DETAIL -------------------------
m[2025-09-07 09:45:21,461] [   DEBUG] - num_samples_each_epoch        : 6000000
[2025-09-07 09:45:21,461] [   DEBUG] - offline_dataset_path          : None
[2025-09-07 09:45:21,462] [   DEBUG] - random_shuffle                : True
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_path             : None
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_prob             : None
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_path            : /home/aistudio/datasets/train.jsonl
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_prob            : 1.0
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_type            : erniekit
[2025-09-07 09:45:21,462] [   DEBUG] - use_cls                       : True
[2025-09-07 09:45:21,462] [   DEBUG] - 
[2025-09-07 09:45:21,462] [    INFO] - The global seed is set to 23, local seed is set to 24 and random seed is set to 23.
[2025-09-07 09:45:21,462] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True
[2025-09-07 09:45:21,462] [    INFO] - Start to load model ...
[2025-09-07 09:45:21,463] [    INFO] - Using download source: huggingface
[2025-09-07 09:45:21,463] [    INFO] - Loading configuration file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/config.json
[2025-09-07 09:45:21,463] [ WARNING] - You are using a model of type ernie4_5 to instantiate a model of type ernie4_5_moe. This is not supported for all configurations of models and can yield errors.
[2025-09-07 09:45:21,464] [    INFO] - Loading weights file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/model.safetensors
[2025-09-07 09:45:21,870] [    INFO] - Loaded weights file from disk, setting weights to model.
Traceback (most recent call last):
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 46, in <module>
    launch()
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 34, in launch
    run_tuner()
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 76, in run_tuner
    _training_function(config={"args": args})
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 55, in _training_function
    run_sft(model_args, data_args, generating_args, finetuning_args)
  File "/home/aistudio/ERNIE/erniekit/train/sft/workflow.py", line 362, in run_sft
    model = model_class.from_pretrained(
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/model_utils.py", line 2665, in from_pretrained
    model = cls(config, *init_args, **model_kwargs)
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/utils.py", line 290, in __impl__
    init_func(self, *args, **kwargs)
TypeError: Ernie4_5_MoeForCausalLM.__init__() got an unexpected keyword argument 'convert_from_hf'
LAUNCH INFO 2025-09-07 09:45:22,480 Exit code 1

The latest version of the Paddleformers library is 0.2.1. If the default download is the latest, the following error will occur.
```
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:717: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
W0907 09:45:14.021631  3142 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.8, Runtime API Version: 12.6
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:717: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
LAUNCH INFO 2025-09-07 09:45:17,460 -----------  Configuration  ----------------------
LAUNCH INFO 2025-09-07 09:45:17,460 auto_cluster_config: 0
LAUNCH INFO 2025-09-07 09:45:17,460 auto_parallel_config: None
LAUNCH INFO 2025-09-07 09:45:17,460 auto_tuner_json: None
LAUNCH INFO 2025-09-07 09:45:17,460 devices: 0
LAUNCH INFO 2025-09-07 09:45:17,460 elastic_level: -1
LAUNCH INFO 2025-09-07 09:45:17,460 elastic_timeout: 30
LAUNCH INFO 2025-09-07 09:45:17,460 enable_gpu_log: True
LAUNCH INFO 2025-09-07 09:45:17,460 gloo_port: 6767
LAUNCH INFO 2025-09-07 09:45:17,460 host: None
LAUNCH INFO 2025-09-07 09:45:17,460 ips: None
LAUNCH INFO 2025-09-07 09:45:17,460 job_id: default
LAUNCH INFO 2025-09-07 09:45:17,460 legacy: False
LAUNCH INFO 2025-09-07 09:45:17,460 log_dir: erniekit_dist_log
LAUNCH INFO 2025-09-07 09:45:17,460 log_level: INFO
LAUNCH INFO 2025-09-07 09:45:17,460 log_overwrite: False
LAUNCH INFO 2025-09-07 09:45:17,460 master: 127.0.0.1:8080
LAUNCH INFO 2025-09-07 09:45:17,460 max_restart: 3
LAUNCH INFO 2025-09-07 09:45:17,460 nnodes: 1
LAUNCH INFO 2025-09-07 09:45:17,460 nproc_per_node: None
LAUNCH INFO 2025-09-07 09:45:17,460 rank: -1
LAUNCH INFO 2025-09-07 09:45:17,460 run_mode: collective
LAUNCH INFO 2025-09-07 09:45:17,460 server_num: None
LAUNCH INFO 2025-09-07 09:45:17,460 servers: 
LAUNCH INFO 2025-09-07 09:45:17,460 sort_ip: False
LAUNCH INFO 2025-09-07 09:45:17,460 start_port: 6070
LAUNCH INFO 2025-09-07 09:45:17,460 trainer_num: None
LAUNCH INFO 2025-09-07 09:45:17,460 trainers: 
LAUNCH INFO 2025-09-07 09:45:17,461 training_script: /home/aistudio/ERNIE/erniekit/launcher.py
LAUNCH INFO 2025-09-07 09:45:17,461 training_script_args: ['train', '/home/aistudio/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_32k.yaml']
LAUNCH INFO 2025-09-07 09:45:17,461 with_gloo: 1
LAUNCH INFO 2025-09-07 09:45:17,461 --------------------------------------------------
LAUNCH INFO 2025-09-07 09:45:17,461 Job: default, mode collective, replicas 1[1:1], elastic False
LAUNCH INFO 2025-09-07 09:45:17,462 Run Pod: wsqqoa, replicas 1, status ready
LAUNCH INFO 2025-09-07 09:45:17,474 Watching Pod: wsqqoa, replicas 1, status running
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/paddle/utils/cpp_extension/extension_utils.py:717: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md
  warnings.warn(warning_message)
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
/opt/conda/envs/python35-paddle120-env/lib/python3.10/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml
  warnings.warn(
W0907 09:45:21.091935  3501 gpu_resources.cc:114] Please NOTE: device: 0, GPU Compute Capability: 8.0, Driver API Version: 12.8, Runtime API Version: 12.6
[2025-09-07 09:45:21,425] [    INFO] - user has defined resume_from_checkpoint: None
[2025-09-07 09:45:21,426] [    INFO] - The default value for the training argument `--report_to` will change in v5 (from all installed integrations to none). In v5, you will need to use `--report_to all` to get the same behavior as now. You should start updating your code and make this info disappear :-).
[2025-09-07 09:45:21,426] [    INFO] - reset finetuning arguments global_batch_size to 8
[2025-09-07 09:45:21,426] [ WARNING] - eval_batch_size set to 1
[2025-09-07 09:45:21,457] [    INFO] - Tensor_parallel_degree = 1. Set sequence_parallel to False.
[2025-09-07 09:45:21,458] [   DEBUG] - ============================================================
[2025-09-07 09:45:21,458] [   DEBUG] -      Model Configuration Arguments      
[2025-09-07 09:45:21,458] [   DEBUG] - paddle commit id              : 0cd8342db88f96ef57398e888b050b5077ae6e73
[2025-09-07 09:45:21,458] [   DEBUG] - paddleformers commit id       : c8e46d9f40ac234baac500b0621873f2da7b9d47
[2025-09-07 09:45:21,458] [   DEBUG] - add_tail_layers               : False
[2025-09-07 09:45:21,458] [   DEBUG] - bos_token_id                  : 0
[2025-09-07 09:45:21,458] [   DEBUG] - continue_training             : True
[2025-09-07 09:45:21,458] [   DEBUG] - download_hub                  : None
[2025-09-07 09:45:21,458] [   DEBUG] - eos_token_id                  : 1
[2025-09-07 09:45:21,459] [   DEBUG] - fine_tuning                   : LoRA
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_gate_detach_matmul       : True
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_linear                   : False
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_rms_norm                 : True
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_rope                     : True
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_softmax_mask             : False
[2025-09-07 09:45:21,459] [   DEBUG] - fuse_swiglu                   : True
[2025-09-07 09:45:21,459] [   DEBUG] - lora                          : True
[2025-09-07 09:45:21,459] [   DEBUG] - lora_alpha                    : -1
[2025-09-07 09:45:21,459] [   DEBUG] - lora_path                     : None
[2025-09-07 09:45:21,459] [   DEBUG] - lora_plus_scale               : 1.0
[2025-09-07 09:45:21,459] [   DEBUG] - lora_rank                     : 32
[2025-09-07 09:45:21,459] [   DEBUG] - loss_subbatch_seqlen          : 32768
[2025-09-07 09:45:21,459] [   DEBUG] - max_position_embeddings       : 4096
[2025-09-07 09:45:21,459] [   DEBUG] - model_name_or_path            : /home/aistudio/model/ERNIE-4.5-0.3B-Paddle
[2025-09-07 09:45:21,459] [   DEBUG] - moe_aux_loss_lambda           : 1e-05
[2025-09-07 09:45:21,459] [   DEBUG] - moe_gate                      : top2_fused
[2025-09-07 09:45:21,459] [   DEBUG] - moe_group                     : dummy
[2025-09-07 09:45:21,459] [   DEBUG] - moe_group_experts             : False
[2025-09-07 09:45:21,459] [   DEBUG] - moe_multimodal_dispatch_use_allgather: v2-alltoall-unpad
[2025-09-07 09:45:21,459] [   DEBUG] - moe_orthogonal_loss_lambda    : 0.0
[2025-09-07 09:45:21,459] [   DEBUG] - moe_use_aux_free              : None
[2025-09-07 09:45:21,460] [   DEBUG] - moe_use_hard_gate             : False
[2025-09-07 09:45:21,460] [   DEBUG] - moe_with_send_router_loss     : False
[2025-09-07 09:45:21,460] [   DEBUG] - moe_z_loss_lambda             : 0.0
[2025-09-07 09:45:21,460] [   DEBUG] - no_recompute_layers           : None
[2025-09-07 09:45:21,460] [   DEBUG] - num_nextn_predict_layers      : 0
[2025-09-07 09:45:21,460] [   DEBUG] - offload_recompute_inputs      : False
[2025-09-07 09:45:21,460] [   DEBUG] - pp_seg_method                 : layer:Ernie4_5_DecoderLayer|ErnieDecoderLayer|EmptyLayer
[2025-09-07 09:45:21,460] [   DEBUG] - recompute_granularity         : full
[2025-09-07 09:45:21,460] [   DEBUG] - recompute_use_reentrant       : True
[2025-09-07 09:45:21,460] [   DEBUG] - rope_3d                       : True
[2025-09-07 09:45:21,460] [   DEBUG] - rslora                        : False
[2025-09-07 09:45:21,460] [   DEBUG] - rslora_plus                   : False
[2025-09-07 09:45:21,460] [   DEBUG] - stage                         : SFT
[2025-09-07 09:45:21,460] [   DEBUG] - tensor_parallel_output        : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_attn_mask_start_row_indices: True
[2025-09-07 09:45:21,460] [   DEBUG] - use_flash_attention           : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_flash_attn_with_mask      : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_fused_head_and_loss_fn    : False
[2025-09-07 09:45:21,460] [   DEBUG] - use_mem_eff_attn              : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_recompute_loss_fn         : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_recompute_moe             : False
[2025-09-07 09:45:21,460] [   DEBUG] - use_sparse_flash_attn         : True
[2025-09-07 09:45:21,460] [   DEBUG] - use_sparse_head_and_loss_fn   : True
[2025-09-07 09:45:21,461] [   DEBUG] - virtual_pp_degree             : 1
[2025-09-07 09:45:21,461] [   DEBUG] - vision_config                 : VisionArguments(attn_implementation='eager', attn_sep=True, depth=32, embed_dim=1280, hidden_act='quick_gelu', hidden_size=1280, in_channels=3, in_chans=3, mlp_ratio=4, model_type='DFNRope_vision_transformer', num_heads=16, patch_size=14, spatial_merge_size=2, spatial_patch_size=14, tensor_parallel_degree=4, use_recompute=True, vit_num_recompute_layers=10000)
[2025-09-07 09:45:21,461] [   DEBUG] - 
[2025-09-07 09:45:21,461] [   DEBUG] - ============================================================
[2025-09-07 09:45:21,461] [   DEBUG] -       Data Configuration Arguments      
[2025-09-07 09:45:21,461] [   DEBUG] - paddle commit id              : 0cd8342db88f96ef57398e888b050b5077ae6e73
[2025-09-07 09:45:21,461] [   DEBUG] - paddleformers commit id       : c8e46d9f40ac234baac500b0621873f2da7b9d47
[2025-09-07 09:45:21,461] [   DEBUG] - buffer_size                   : 500
[2025-09-07 09:45:21,461] [   DEBUG] - dataset_name                  : KnowledgeBasedSFTReader
[2025-09-07 09:45:21,461] [   DEBUG] - dataset_type                  : iterable
[2025-09-07 09:45:21,461] [   DEBUG] - eval_dataset_path             : /home/aistudio/datasets/val.jsonl
[2025-09-07 09:45:21,461] [   DEBUG] - eval_dataset_prob             : 1.0
[2025-09-07 09:45:21,461] [   DEBUG] - eval_dataset_type             : erniekit
[2025-09-07 09:45:21,461] [   DEBUG] - greedy_intokens               : True
[2025-09-07 09:45:21,461] [   DEBUG] - in_tokens_batching            : True
[2025-09-07 09:45:21,461] [   DEBUG] - mask_out_eos_token            : True
[2025-09-07 09:45:21,461] [   DEBUG] - max_prompt_len                : 2048
[2025-09-07 09:45:21,461] [   DEBUG] - max_seq_len                   : 32768
[2025-09-07 09:45:21,461] [   DEBUG] - num_comparisons               : 6
[2025-09-07 09:45:21,461] [   DEBUG] - num_samples_each_epoch        : 6000000
[2025-09-07 09:45:21,461] [   DEBUG] - offline_dataset_path          : None
[2025-09-07 09:45:21,462] [   DEBUG] - random_shuffle                : True
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_path             : None
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_prob             : None
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_path            : /home/aistudio/datasets/train.jsonl
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_prob            : 1.0
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_type            : erniekit
[2025-09-07 09:45:21,462] [   DEBUG] - use_cls                       : True
[2025-09-07 09:45:21,462] [   DEBUG] - 
[2025-09-07 09:45:21,462] [    INFO] - The global seed is set to 23, local seed is set to 24 and random seed is set to 23.
[2025-09-07 09:45:21,462] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True
[2025-09-07 09:45:21,462] [    INFO] - Start to load model ...
[2025-09-07 09:45:21,463] [    INFO] - Using download source: huggingface
[2025-09-07 09:45:21,463] [    INFO] - Loading configuration file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/config.json
[2025-09-07 09:45:21,463] [ WARNING] - You are using a model of type ernie4_5 to instantiate a model of type ernie4_5_moe. This is not supported for all configurations of models and can yield errors.
[2025-09-07 09:45:21,464] [    INFO] - Loading weights file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/model.safetensors
[2025-09-07 09:45:21,870] [    INFO] - Loaded weights file from disk, setting weights to model.
Traceback (most recent call last):
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 46, in <module>
    launch()
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 34, in launch
    run_tuner()
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 76, in run_tuner
    _training_function(config={"args": args})
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 55, in _training_function
    run_sft(model_args, data_args, generating_args, finetuning_args)
  File "/home/aistudio/ERNIE/erniekit/train/sft/workflow.py", line 362, in run_sft
    model = model_class.from_pretrained(
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/model_utils.py", line 2665, in from_pretrained
    model = cls(config, *init_args, **model_kwargs)
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/utils.py", line 290, in __impl__
    init_func(self, *args, **kwargs)
TypeError: Ernie4_5_MoeForCausalLM.__init__() got an unexpected keyword argument 'convert_from_hf'
LAUNCH INFO 2025-09-07 09:45:22,480 Pod failed
LAUNCH ERROR 2025-09-07 09:45:22,480 Container failed !!!
Container rank 0 status failed cmd ['/opt/conda/envs/python35-paddle120-env/bin/python', '-u', '/home/aistudio/ERNIE/erniekit/launcher.py', 'train', '/home/aistudio/configs/ERNIE-4.5-0.3B/sft/run_sft_lora_32k.yaml'] code 1 log erniekit_dist_log/workerlog.0
LAUNCH INFO 2025-09-07 09:45:22,480 ------------------------- ERROR LOG DETAIL -------------------------
m[2025-09-07 09:45:21,461] [   DEBUG] - num_samples_each_epoch        : 6000000
[2025-09-07 09:45:21,461] [   DEBUG] - offline_dataset_path          : None
[2025-09-07 09:45:21,462] [   DEBUG] - random_shuffle                : True
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_path             : None
[2025-09-07 09:45:21,462] [   DEBUG] - text_dataset_prob             : None
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_path            : /home/aistudio/datasets/train.jsonl
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_prob            : 1.0
[2025-09-07 09:45:21,462] [   DEBUG] - train_dataset_type            : erniekit
[2025-09-07 09:45:21,462] [   DEBUG] - use_cls                       : True
[2025-09-07 09:45:21,462] [   DEBUG] - 
[2025-09-07 09:45:21,462] [    INFO] - The global seed is set to 23, local seed is set to 24 and random seed is set to 23.
[2025-09-07 09:45:21,462] [ WARNING] - Process rank: -1, device: gpu, world_size: 1, distributed training: False, 16-bits training: True
[2025-09-07 09:45:21,462] [    INFO] - Start to load model ...
[2025-09-07 09:45:21,463] [    INFO] - Using download source: huggingface
[2025-09-07 09:45:21,463] [    INFO] - Loading configuration file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/config.json
[2025-09-07 09:45:21,463] [ WARNING] - You are using a model of type ernie4_5 to instantiate a model of type ernie4_5_moe. This is not supported for all configurations of models and can yield errors.
[2025-09-07 09:45:21,464] [    INFO] - Loading weights file /home/aistudio/model/ERNIE-4.5-0.3B-Paddle/model.safetensors
[2025-09-07 09:45:21,870] [    INFO] - Loaded weights file from disk, setting weights to model.
Traceback (most recent call last):
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 46, in <module>
    launch()
  File "/home/aistudio/ERNIE/erniekit/launcher.py", line 34, in launch
    run_tuner()
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 76, in run_tuner
    _training_function(config={"args": args})
  File "/home/aistudio/ERNIE/erniekit/train/tuner.py", line 55, in _training_function
    run_sft(model_args, data_args, generating_args, finetuning_args)
  File "/home/aistudio/ERNIE/erniekit/train/sft/workflow.py", line 362, in run_sft
    model = model_class.from_pretrained(
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/model_utils.py", line 2665, in from_pretrained
    model = cls(config, *init_args, **model_kwargs)
  File "/home/aistudio/external-libraries/lib/python3.10/site-packages/paddleformers/transformers/utils.py", line 290, in __impl__
    init_func(self, *args, **kwargs)
TypeError: Ernie4_5_MoeForCausalLM.__init__() got an unexpected keyword argument 'convert_from_hf'
LAUNCH INFO 2025-09-07 09:45:22,480 Exit code 1
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant