Skip to content

litgpt model responses using simple "out-of-box" code example become incoherent / repetitive after a few hundred tokens #2145

@drwslacy47

Description

@drwslacy47

Bug description

I’m using litgpt (Version: 0.5.11) to chat with the following models:

checkpoints/google/gemma-3-4b-it
checkpoints/meta-llama/Llama-3.2-3B-Instruct

I deliberately prompt both models to generate a long response and I set the max_new_tokens generation parameter to 1500 tokens. In response to this single prompt, both models start out giving a coherent response but after a few hundred generated words, both models begin generating either gibberish or they output repeated instances of the same sentence/word/phrase. I've been able to consistently reproduce this behavior with different random seeds.

I realize this may not be a bug in litgpt itself (perhaps a misconfiguration), but I see this problem when I try to use litgpt "out-of-box" using code (see attached file runlitgpt.py) that is close to the minimal example described in this litgpt API documentation. Also, I am using values for temperature, top_p, and top_k that are recommended for each model (gemma3 and llama-3.2). Even if there's a misconfiguration, it seems odd that the models would be so misconfigured out of the box.

Clearly, this behavior isn’t expected since these models have been trained for long contexts and I haven’t seen this behavior when running the same models in frameworks other than Lightning/litgpt. Examples where I've successfully run the same prompts using the same generation parameters (temperature, top_p, top_k) include Google's pytorch inference implementation of Gemma3 and the HuggingFace transformers implementation of Llama3.2.

I'm including some text file dumps of both prompts and model responses to illustrate the problem. The file names encode the model used, the random seed, and the generation parameter arguments supplied in calls to llm.generate(): max_new_tokens, temperature, top_p, top_k.

gemma_seed_0_promptindex_0_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt
gemma_seed_0_promptindex_1_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt
gemma_seed_0_promptindex_2_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt
llama_seed_0_promptindex_0_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt
llama_seed_0_promptindex_1_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt
llama_seed_0_promptindex_2_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt

Here is a simple python script that can be used to reproduce the problem:

runlitgpt.py

Reproduced in studio

No response

What operating system are you using?

Linux

LitGPT Version

Output of pip show litgpt | grep Version: Version: 0.5.11

Python file I am running on my Linux system which can be used to reproduce the problem:

runlitgpt.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions