Converting Dense LLMs to Recurrent Transformers

This repository is my attempt at converting a pretrained LLM like Qwen3-0.6B into a Recurrent Transformer that loops in the center blocks, similar to Huginn.

Recurrent Transformers offer a number of architectural advantages, such as less memory intensive (tradeoff with inference-time compute), and the ability to learn iterative fixed-point algorithms. However, there are close to none (only one being Huginn) pretrained Recurrent Transformers at the scale of dense LLMs. So, I thought let's try converting a pretrained dense LLM into a Recurrent Transformer.

With Qwen3-0.6B, I converted the model by treating the 8-20th layer as a new recurrent block. I add an adapter, that takes the output from the the 20th layer, and merges it together with the 7th layer output to pass back into the 8th layer, repeating however many loops as desired. And I initialize the adapter to map the 7th layer output identically while zeroing out the 20th layer output, so that the model initializes with equivalent distribution as the original. Weights other than the 8-20th layer were frozen, on one hand to minimize unintended harm to the base model, on another hand to reduce training memory.

Unfortunately, I was not able to get any meaningful conversion. The converted model was able to function normally, but it wasn't able to learn anything special. The converted model behaves mostly the same as the base model when trained over the same data. For example with this loss curve, While the converted recurrent model outperforms the baseline (with the same frozen weights), the difference is expected given the extra FLOPs needed to train it. And it underperforms the no-freeze weight baseline.

In short, there is no free lunch, and you get as much performance as you train it, just like any normal model.

Anyways, if you are reading this readme, hopefully this provides you with some inspiration :p

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
qwen_recurrent		qwen_recurrent
.gitignore		.gitignore
README.md		README.md
loss-curve.png		loss-curve.png
recurrent_eval.py		recurrent_eval.py
recurrent_trainer.py		recurrent_trainer.py
recurrent_trainer_baseline.py		recurrent_trainer_baseline.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Converting Dense LLMs to Recurrent Transformers

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Converting Dense LLMs to Recurrent Transformers

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages