Revealing the Inherent Instructability of Pre-Trained Language Models

The official repository for the paper "Revealing the Inherent Instructability of Pre-Trained Language Models" (Findings of EMNLP 2025) by Seokhyun An, Minji Kim and Hyounghun Kim.

If you find this work useful, please cite:

@misc{an2025revealing,
      title={Revealing the Inherent Instructability of Pre-Trained Language Models}, 
      author={Seokhyun An and Minji Kim and Hyounghun Kim},
      year={2025},
      eprint={2410.02465},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2410.02465}, 
}

Overview

We propose Response Tuning (RT) to verify our hypothesis that the ability to process instructions can be developed in the pre-training stage. Unlike instruction tuning, RT does not condition the response tokens on the paired instruction, which precludes the model from learning to generate responses according to instructions. Rather, it focuses on learning the response distribution.

Prerequisites

conda create -n rt python=3.11
conda activate rt
pip install -r requirements.txt
./prepare_data.sh

Note on JustEval Benchmark

For the JustEval Benchmark, we found that its dependencies conflict with our experimental environment. Therefore, please create a separate environment to run JustEval on the trained model:

conda create -n just_eval python=3.11
conda activate just_eval
pip install git+https://github.com/Re-Align/just-eval.git

Training Data Preparation

We currently support the Alpaca, Dolly, and LIMA datasets. Reformat and save the training datasets using the following script:

For Instruction Tuning (IT) and RT:

python3 generate_training_data.py --dataset [lima/alpaca/dolly]

To reproduce our results from the response refinement experiments, refine the datasets using the script below and train the model using the refined dataset:

CUDA_VISIBLE_DEVICES=[GPU_IDS] python3 refine_responses.py \
    --dataset_path [REFINEMENT_TARGET_DATASET_PATH] \
    --refiner [HF_MODEL_PATH]

Example

CUDA_VISIBLE_DEVICES=0,1,2,3 python3 refine_responses.py \
    --dataset_path "datasets/train/lima.jsonl" \
    --refiner "meta-llama/Meta-Llama-3.1-70B-Instruct"

To reproduce our refusal experiments, create an IT dataset mixed with refusal examples, and train the model using this dataset:

python3 generate_refusal_mixture.py \
    --target_dataset [TRAINING_DATASET_PATH] \
    --num_safety_examples [NUMBER_OF_SAFETY_EXAMPLES_TO_MIX]

Example

python3 generate_refusal_mixture.py \
    --target_dataset "datasets/train/lima.jsonl" \
    --num_safety_examples 200

Model Training (QLoRA)

You can perform both IT and RT using the following scripts.

For IT:

./scripts/train/train_it.sh [GPU_IDS] [HF_MODEL_PATH] [DATASET_PATH]

Example

./scripts/train/train_it.sh 0 "meta-llama/Llama-3.1-8B" "datasets/train/lima.jsonl"

For RT:

./scripts/train/train_rt.sh [GPU_IDS] [HF_MODEL_PATH] [DATASET_PATH]

Example

./scripts/train/train_rt.sh 0 "meta-llama/Llama-3.1-8B" "datasets/train/lima.jsonl"

After training, you can interact with the trained model:

CUDA_VISIBLE_DEVICES=[GPU_IDS] python3 chat.py --model_id [MODEL_PATH]

Instructability Evaluation

You can perform AlpacaEval, JustEval, and core capabilities evaluations (MMLU, OpenBookQA, HellaSwag, ARC, GSM8K, and PIQA) using the following scripts. After running the evaluation, you can find the results in [ckpt_path]/evals/[benchmark_name].

For AlpacaEval:

Run the following script and then execute run_eval.sh generated in the [ckpt_path]/evals/alpaca_eval directory.

./scripts/eval/alpaca_eval.sh [GPU_IDS] [MODEL_PATH] [REFERENCE_OUTPUT_PATH (optional)]

For JustEval:

Run the following script and then execute run_eval.sh generated in the [ckpt_path]/evals/justeval directory. After running the evaluation script, compute the metrics.

./scripts/eval/just_eval/just_eval.sh [GPU_IDS] [MODEL_PATH]

# After running run_eval.sh
scripts/eval/just_eval/just_eval_compute_metric.sh [MODEL_PATH]

For core capabilities evaluation:

./scripts/eval/core.sh [GPU_IDS] [MODEL_PATH]

Refusal Evaluation

Refusal evaluation requires running an evaluator LLM. Since vLLM currently does not natively support running multiple models in a single Python script, you need to first host the Llama-3.1-70B-Instruct model on your local server.

To host the 70B model (requires at least 4 GPUs such as A6000 48GB or A100 40/80GB), you can use the following command:

./scripts/eval/serve_llama.sh [GPU_IDS]

To run the refusal evaluations, execute:

./scripts/eval/refusal.sh [GPU_IDS] [MODEL_PATH]

In-Context Response Learning

You can reproduce our in-context learning experiment results using the following command:

./scripts/eval/just_eval/just_eval_urial.sh [GPU_IDS] [HF_MODEL_PATH] [urial/urial_r/0shot]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
eval		eval
prompts		prompts
scripts		scripts
training_configs		training_configs
.gitignore		.gitignore
README.md		README.md
chat.py		chat.py
generate_refusal_mixture.py		generate_refusal_mixture.py
generate_training_data.py		generate_training_data.py
hf_ft.py		hf_ft.py
llm.py		llm.py
lora_merge.py		lora_merge.py
prepare_data.sh		prepare_data.sh
refine_responses.py		refine_responses.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Revealing the Inherent Instructability of Pre-Trained Language Models

Overview

Prerequisites

Training Data Preparation

Model Training (QLoRA)

Instructability Evaluation

Refusal Evaluation

In-Context Response Learning

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

seokhyunan/response-tuning

Folders and files

Latest commit

History

Repository files navigation

Revealing the Inherent Instructability of Pre-Trained Language Models

Overview

Prerequisites

Training Data Preparation

Model Training (QLoRA)

Instructability Evaluation

Refusal Evaluation

In-Context Response Learning

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages