Skip to content

Finetuned checkpoint generating junk outputs #945

@NITHISHM2410

Description

@NITHISHM2410

Describe the bug

  • I used this config examples/llm_finetune/qwen/qwen3_moe_30b_te_deepep.yaml .
  • I set checkpoint.enabled to True and changed checkpoint.checkpoint_dir to my desired path and started training using this command automodel finetune llm -c qwen3_moe_30b_te_deepep.yaml
  • Training works as expected and loss decreases smoothly.
  • I used vLLM to generate outputs and I found the outputs to be complete junk.

Truncated example output:

" \r\n and and and and and and.\n and and and and and and and and and and and and \n and and and and and and and and and plant, to \n on and with and and and and this and and and and and and and in and and and and,,and and and \n and and and \r\n and \r\n\r\n and at and plant and.\r\n and and and and \ufffd and \n
  • The model fails to follow any instruction and output has some invalid Unicode characters. Seems like model has lost its vocabulary and produces nonsensical terms.

Steps/Code to reproduce bug

  1. Enable checkpointing in examples/llm_finetune/qwen/qwen3_moe_30b_te_deepep.yaml and change checkpoint.checkpoint_dir as per your case.
  2. Run automodel finetune llm -c qwen3_moe_30b_te_deepep.yaml.
  3. Pass the consolidated checkpoint path to a vLLM chat function. Below is the code snippet
from vllm import LLM, SamplingParams

if __name__ == '__main__':
    load_path = "" # your consolidated checkpoint path. For ex: `/workspace/ckpts/epoch_1_step_47/model/consolidated`

    model = LLM(model=load_path, dtype="bfloat16", tensor_parallel_size=8)
    params = SamplingParams(max_tokens=1024, temperature=1.0, top_k=100)
    test_queries = [
        "Then, the man writes over the snow covering the window of a car, and a woman wearing winter clothes smiles. then",
    ]

    test_prompts = [
        [{
            'role': 'user',
            'content': p
        }, ]
        for p in test_queries
    ]

    response = model.chat(test_prompts, params)
    print(response[0].outputs[0].text)

Setup:

  • nemo-automodel version: Version: 0.2.0
  • vLLM version: 0.10.2
  • GPU: 8× NVIDIA H100 80GB HBM3
  • Inference dtype: bf16
  • CUDA Version: 12.2

Expected behavior:

The current output (as mentioned above) is purely random or junk. I expect the output to have some sensible tokens.

Additional context:

A)
I observed that the size of the consolidated checkpoint is very low than the checkpoint of the original model.
Original model from HF: https://huggingface.co/Qwen/Qwen3-30B-A3B
Checkpoint size of the original model in HF: 61.1 GB
Size of the consolidated training checkpoint created by the nemo-automodel trainer: ~10 GB

B)
Like suggested here docs/guides/checkpointing.md, I also tried loading the consolidated checkpoint using HF and generating some outputs and it's still junk.

C)
While loading the consolidated checkpoint from HF, I got a message saying many weights of the model aren't initialized from the checkpoint and are newly initialized.

I used this code for B and C.
I just replaced model_path with path to the consolidated checkpoint and changed prompt according to my use case.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions