Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
run_full_kd.sh	run_full_kd.sh
run_sekd.sh	run_sekd.sh
run_sekd3x.sh	run_sekd3x.sh

Name

Last commit message

Last commit date

Example Scripts

This directory contains scripts to reproduce the main experiments from the paper.

Scripts

Script	Method	Expected Results
`run_full_kd.sh`	Full KD baseline	64.4% acc, 7.3 PPL
`run_sekd.sh`	SE-KD (position selection)	64.8% acc, 6.9 PPL
`run_sekd3x.sh`	SE-KD₃ₓ (3-axis selection)	64.4% acc, 7.3 PPL + 70% faster

Running

# Make scripts executable
chmod +x examples/*.sh

# Run SE-KD (recommended starting point)
./examples/run_sekd.sh

Hardware Requirements

2× GPUs with ≥24GB VRAM (e.g., RTX 3090, A100)
Teacher (Qwen3-8B) on one GPU, Student (Qwen3-1.7B) on another
~80M tokens from FineWeb-Edu (~4-8 hours on 2× RTX 3090)

Custom Configurations

Different position selection metrics

# Teacher entropy (baseline)
--topk_tok_selection_metric teacher_entropy

# KL divergence
--topk_tok_selection_metric kl

# CE ratio (best perplexity)
--topk_tok_selection_metric ce_ratio

Different selection budgets

# More aggressive selection (top 10%)
--k_percent 10

# Conservative selection (top 50%)
--k_percent 50

Memory-efficient mode

Enable selective LM heads for reduced peak memory:

--teacher_selective_lm_head \
--student_selective_lm_head

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Example Scripts

Scripts

Running

Hardware Requirements

Custom Configurations

Different position selection metrics

Different selection budgets

Memory-efficient mode

FilesExpand file tree

examples

Directory actions

More options

Directory actions

More options

Latest commit

History

examples

Folders and files

parent directory

README.md

Example Scripts

Scripts

Running

Hardware Requirements

Custom Configurations

Different position selection metrics

Different selection budgets

Memory-efficient mode