Model used:
Qwen/Qwen3-0.6B-Base,banghua/Qwen3-0.6B-SFT,banghua/DL-SFT-Dataset,Qwen/Qwen2.5-0.5B-Instruct,banghua/Qwen2.5-0.5B-DPO,HuggingFaceTB/SmolLM2-135M-Instruct,mrfakename/identity,banghua/DL-DPO-Dataset,openai/gsm8k
Note: Due to GPU and computational expenses, the results might not be great, but they can be further optimized and improved. Also, can use other inference library other than the one Huggingface provides.