Add Unsupported Languages to Base Model

Yesterday, I was talking to @andreaskoepf on discord about how to add a new language to Base LLM. 

Today I saw [this comment ](https://github.com/LAION-AI/Open-Assistant/pull/3629#issuecomment-1666717757) from @somerandomguyontheweb:

> Hi @pourmand1376, sorry for a slighly off-topic question: could you please share any details on how your friend managed to fine-tune LLaMA on text-only dataset, without instructions? I'm interested in doing the same thing with Belarusian Wikipedia, but so far I've only seen tutorials on how to instruct-tune LLaMA, and Wikipedia articles as such don't contain clearly delimited prompts and responses. Could you please briefly describe the approach?
> Thanks in advance for any comments.

It seems that there are others like me who would like to fine-tune LLMs for unsupported languages like Persian. 

This can be the place to discuss it. About asked question, I only know that he used [this repository](https://github.com/ymcui/Chinese-LLaMA-Alpaca/) as the base and changes lots of things to make it work. I will ask him to give further details. 

However, I think this repo can potentially serve as a repo for training base LLMs also. 

I think we need a clear guide for people like me on how to do this thing. What I've seen so far, is that the Open-assistant team has done a great job for SFT fine-tuning. But there seems to be no code for fine-tuning base LLMs for other languages. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Unsupported Languages to Base Model #3636

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Unsupported Languages to Base Model #3636

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions