The repository contains code and data for LLMs that translate China words to Taiwan words. The main technique is instruction fine-tuning.
Example:
- Input:
這個軟件的質量真高啊 - Output:
這個軟體的品質真高啊
😍😍 See the model card and play it 😍😍
-
Install Miniconda or Anaconda
-
Create a Conda environment:
tw_word.
conda create --name tw_word python=3.10- Activate the environment.
conda activate tw_word- Install PyTorch related packages.
# GPU
pip install torch==2.2.0 torchvision==0.17.0 --index-url https://download.pytorch.org/whl/cu118
# or, CPU-only (This may be very slow)
pip install torch==2.2.0 torchvision==0.17.0- Install required packages.
pip install -r requirements.txt- (Optional) Setup your OpenAI API key if you want to use OpenAI related functions.
export OPENAI_API_KEY=${YOUR_OPENAI_API_KEY}To run the translation powered by Llama translator, just typing following command on your terminal:
python inf.py "這個軟件的質量真高啊" llama --model "feabries/TaiwanWordTranslator-v0.1"For OpenAI translator:
python inf.py "這個軟件的質量真高啊" openaiTo run the testing set evaluation for llama translator:
python eval.py llama --model "feabries/TaiwanWordTranslator-v0.1"For OpenAI translator:
python eval.py openaiTo run llama model training on training set:
python train.pyCurrent dataset is collected from MBZUAI/Bactrian-X and automatically labeled by 繁化姬.