inference finetuned model using LoRa in Huggingface format

Hello, 
I used [this script](https://github.com/Lightning-AI/lit-llama/blob/main/scripts/convert_lora_weights.py) to merge Lora weights to the base model. Then, I used [this script](https://github.com/Lightning-AI/lit-llama/pull/435) to convert my model to huggingface format. 
But when I inference the model in Huggingface it never output end token, it looks like a pretrained model rather than a finetuned one.
here is my inference pipeline:

```
response = generation_pipeline(prompt,
        pad_token_id=tokenizer.eos_token_id,
        do_sample=False,
        num_beams=4,
        max_length=500,
        top_p=0.1,
        top_k=20,
        repetition_penalty = 3.0,
        no_repeat_ngram_size=3)[0]['generated_text']
```
I'm not sure if the inference pipeline matches the one in this repository.
The reason why I want to inference my model there because I'm facing an issue in the generate script & I want to use beam search.

I appreciate your help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

inference finetuned model using LoRa in Huggingface format #442

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

inference finetuned model using LoRa in Huggingface format #442

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions