Skip to content

Add Unsupported Languages to Base ModelΒ #3636

@pourmand1376

Description

@pourmand1376

Yesterday, I was talking to @andreaskoepf on discord about how to add a new language to Base LLM.

Today I saw this comment from @somerandomguyontheweb:

Hi @pourmand1376, sorry for a slighly off-topic question: could you please share any details on how your friend managed to fine-tune LLaMA on text-only dataset, without instructions? I'm interested in doing the same thing with Belarusian Wikipedia, but so far I've only seen tutorials on how to instruct-tune LLaMA, and Wikipedia articles as such don't contain clearly delimited prompts and responses. Could you please briefly describe the approach?
Thanks in advance for any comments.

It seems that there are others like me who would like to fine-tune LLMs for unsupported languages like Persian.

This can be the place to discuss it. About asked question, I only know that he used this repository as the base and changes lots of things to make it work. I will ask him to give further details.

However, I think this repo can potentially serve as a repo for training base LLMs also.

I think we need a clear guide for people like me on how to do this thing. What I've seen so far, is that the Open-assistant team has done a great job for SFT fine-tuning. But there seems to be no code for fine-tuning base LLMs for other languages.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions