Skip to content

caiyuchen-ustc/AlphaRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

On Predictability of Reinforcement Learning Dynamics for Large Language Models


Installation

# clone codebase
git clone https://github.com/xxxx/Alpha-RL.git && cd Alpha-RL

# prepare environment
conda create -y -n AlphaRL python=3.11
conda activate AlphaRL

# install dependencies
pip install -r requirements.txt

Download_Model

You can access the checkpoint at the following link: Hugging Face - xxxx

# run
cd eval
sh download_hf.sh

Singular Value Decomposition

sh svd.sh # Obtain the SVD decomposition of each matrix in a model

Obtain a Rank-k Model

sh upd_rank.sh 

Model Evaluation

sh reasoning_eval.sh

t-SNE Visualization of Training Trajectories

cd analysis #eval/analysis
sh extract_rank1_u.sh #Extract U[:,1]
sh visualize_rank1_u_tsne.sh

PLS (Partial Least Squares) Trajectory Fitting

sh AlphaPLS.sh

AlphaRL Predict

sh AlphaPredVector.sh
sh AlphaRLBuildPredictModel.sh

This repository provides an evaluation framework inspired by LIMO, which can be found here.

If you find this project interesting, feel free to ⭐ star the repository or open an issue for discussion!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors 2

  •  
  •