litgpt model responses using simple "out-of-box" code example become incoherent / repetitive after a few hundred tokens

### Bug description

I’m using litgpt (Version: 0.5.11) to chat with the following models:

checkpoints/google/gemma-3-4b-it
checkpoints/meta-llama/Llama-3.2-3B-Instruct

I deliberately prompt both models to generate a long response and I set the max_new_tokens generation parameter to 1500 tokens. In response to this single prompt, both models start out giving a coherent response but after a few hundred generated words, both models begin generating either gibberish or they output repeated instances of the same sentence/word/phrase.  I've been able to consistently reproduce this behavior with different random seeds.

I realize this may not be a bug in litgpt itself (perhaps a misconfiguration), but I see this problem when I try to use litgpt "out-of-box" using code (see attached file runlitgpt.py) that is close to the minimal example described [in this litgpt API documentation](https://github.com/Lightning-AI/litgpt/blob/main/tutorials/python-api.md#generatechat).  Also, I am using values for temperature, top_p, and top_k that are recommended for each model (gemma3 and llama-3.2).  Even if there's a misconfiguration, it seems odd that the models would be so misconfigured out of the box.

Clearly, this behavior isn’t expected since these models have been trained for long contexts and I haven’t seen this behavior when running the same models in frameworks other than Lightning/litgpt.  Examples where I've successfully run the same prompts using the same generation parameters (temperature, top_p, top_k) include [Google's pytorch inference implementation of Gemma3](https://github.com/google/gemma_pytorch/blob/main/scripts/run_multimodal.py) and the HuggingFace transformers implementation of Llama3.2.

I'm including some text file dumps of both prompts and model responses to illustrate the problem.  The file names encode the model used, the random seed, and the generation parameter arguments supplied in calls to llm.generate():  max_new_tokens, temperature, top_p, top_k.

[gemma_seed_0_promptindex_0_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt](https://github.com/user-attachments/files/22991230/gemma_seed_0_promptindex_0_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt)
[gemma_seed_0_promptindex_1_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt](https://github.com/user-attachments/files/22991231/gemma_seed_0_promptindex_1_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt)
[gemma_seed_0_promptindex_2_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt](https://github.com/user-attachments/files/22991228/gemma_seed_0_promptindex_2_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt)
[llama_seed_0_promptindex_0_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt](https://github.com/user-attachments/files/22991232/llama_seed_0_promptindex_0_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt)
[llama_seed_0_promptindex_1_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt](https://github.com/user-attachments/files/22991233/llama_seed_0_promptindex_1_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt)
[llama_seed_0_promptindex_2_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt](https://github.com/user-attachments/files/22991229/llama_seed_0_promptindex_2_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt)

Here is a simple python script that can be used to reproduce the problem:

[runlitgpt.py](https://github.com/user-attachments/files/22991267/runlitgpt.py)

### Reproduced in studio

_No response_

### What operating system are you using?

Linux

### LitGPT Version

Output of pip show litgpt | grep Version:   Version: 0.5.11

Python file I am running on my Linux system which can be used to reproduce the problem:

[runlitgpt.py](https://github.com/user-attachments/files/22991251/runlitgpt.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

litgpt model responses using simple "out-of-box" code example become incoherent / repetitive after a few hundred tokens #2145

Bug description

Reproduced in studio

What operating system are you using?

LitGPT Version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

litgpt model responses using simple "out-of-box" code example become incoherent / repetitive after a few hundred tokens #2145

Description

Bug description

Reproduced in studio

What operating system are you using?

LitGPT Version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions