-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Bug description
I’m using litgpt (Version: 0.5.11) to chat with the following models:
checkpoints/google/gemma-3-4b-it
checkpoints/meta-llama/Llama-3.2-3B-Instruct
I deliberately prompt both models to generate a long response and I set the max_new_tokens generation parameter to 1500 tokens. In response to this single prompt, both models start out giving a coherent response but after a few hundred generated words, both models begin generating either gibberish or they output repeated instances of the same sentence/word/phrase. I've been able to consistently reproduce this behavior with different random seeds.
I realize this may not be a bug in litgpt itself (perhaps a misconfiguration), but I see this problem when I try to use litgpt "out-of-box" using code (see attached file runlitgpt.py) that is close to the minimal example described in this litgpt API documentation. Also, I am using values for temperature, top_p, and top_k that are recommended for each model (gemma3 and llama-3.2). Even if there's a misconfiguration, it seems odd that the models would be so misconfigured out of the box.
Clearly, this behavior isn’t expected since these models have been trained for long contexts and I haven’t seen this behavior when running the same models in frameworks other than Lightning/litgpt. Examples where I've successfully run the same prompts using the same generation parameters (temperature, top_p, top_k) include Google's pytorch inference implementation of Gemma3 and the HuggingFace transformers implementation of Llama3.2.
I'm including some text file dumps of both prompts and model responses to illustrate the problem. The file names encode the model used, the random seed, and the generation parameter arguments supplied in calls to llm.generate(): max_new_tokens, temperature, top_p, top_k.
gemma_seed_0_promptindex_0_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt
gemma_seed_0_promptindex_1_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt
gemma_seed_0_promptindex_2_maxtokens_1500_temp_1.0_topk_64_topp_0.95.txt
llama_seed_0_promptindex_0_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt
llama_seed_0_promptindex_1_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt
llama_seed_0_promptindex_2_maxtokens_1500_temp_0.6_topk_64_topp_0.9.txt
Here is a simple python script that can be used to reproduce the problem:
Reproduced in studio
No response
What operating system are you using?
Linux
LitGPT Version
Output of pip show litgpt | grep Version: Version: 0.5.11
Python file I am running on my Linux system which can be used to reproduce the problem: