Finetuned checkpoint generating junk outputs

**Describe the bug**

- I used this config `examples/llm_finetune/qwen/qwen3_moe_30b_te_deepep.yaml` .
- I set `checkpoint.enabled` to True  and changed `checkpoint.checkpoint_dir` to my desired path and started training using this command `automodel finetune llm -c  qwen3_moe_30b_te_deepep.yaml`
- Training works as expected and loss decreases smoothly. 
- I used vLLM to generate outputs  and I found the outputs to be complete junk.

Truncated example output: 
```
" \r\n and and and and and and.\n and and and and and and and and and and and and \n and and and and and and and and and plant, to \n on and with and and and and this and and and and and and and in and and and and,,and and and \n and and and \r\n and \r\n\r\n and at and plant and.\r\n and and and and \ufffd and \n
```
- The model fails to follow any instruction and output has some invalid Unicode characters. Seems like model has lost its vocabulary and produces nonsensical terms.

**Steps/Code to reproduce bug**
1. Enable checkpointing in `examples/llm_finetune/qwen/qwen3_moe_30b_te_deepep.yaml` and change `checkpoint.checkpoint_dir` as per your case.
2. Run  `automodel finetune llm -c  qwen3_moe_30b_te_deepep.yaml`.
3. Pass the consolidated checkpoint path to a vLLM chat function. Below is the code snippet

```py
from vllm import LLM, SamplingParams

if __name__ == '__main__':
    load_path = "" # your consolidated checkpoint path. For ex: `/workspace/ckpts/epoch_1_step_47/model/consolidated`

    model = LLM(model=load_path, dtype="bfloat16", tensor_parallel_size=8)
    params = SamplingParams(max_tokens=1024, temperature=1.0, top_k=100)
    test_queries = [
        "Then, the man writes over the snow covering the window of a car, and a woman wearing winter clothes smiles. then",
    ]

    test_prompts = [
        [{
            'role': 'user',
            'content': p
        }, ]
        for p in test_queries
    ]

    response = model.chat(test_prompts, params)
    print(response[0].outputs[0].text)


```

Setup:
- nemo-automodel version: Version: 0.2.0
- vLLM version: 0.10.2
- GPU: 8× NVIDIA H100 80GB HBM3
- Inference dtype: bf16
- CUDA Version: 12.2 

**Expected behavior**:

The current output (as mentioned above) is purely random or junk. I expect the output to have some sensible tokens. 


**Additional context**:

A) 
I observed that the size of the consolidated checkpoint is very low than the checkpoint of the original model.
Original model from HF: https://huggingface.co/Qwen/Qwen3-30B-A3B
Checkpoint size of the original model in HF: 61.1 GB
Size of the consolidated training checkpoint created by the nemo-automodel trainer: ~10 GB

B) 
Like suggested here docs/guides/checkpointing.md, I also tried loading the consolidated checkpoint using HF and generating some outputs and it's still junk.

C)
While loading the consolidated checkpoint from HF, I got a message saying many weights of the model aren't initialized from the checkpoint and are newly initialized.

I used [this](https://huggingface.co/Qwen/Qwen3-30B-A3B#:~:text=The%20following%20contains%20a%20code%20snippet%20illustrating%20how%20to%20use%20the%20model%20generate%20content%20based%20on%20given%20inputs.) code for B and C. 
I just replaced `model_path` with path to the consolidated checkpoint and changed `prompt` according to my use case.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finetuned checkpoint generating junk outputs #945

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Finetuned checkpoint generating junk outputs #945

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions