🚀 5-dollar-llm: Setup & Speedrun Guide

Welcome to the 5-dollar-llm repository! This project is dedicated to pushing the limits of training efficiency for a 88M parameter model on 1 billion tokens (GPT-1 level model).

Step by step instructions

Step 1: Select a GPU to train on.

If you don't have a GPU, you may use a cloud GPU.

Free GPUs:

Lightning AI: You can use the free L4 GPU.
Google Colab: Use the free T4 or paid A100. Click here to open
Tip: If the model doesn't fit in your GPU memory, you can reduce the model size (e.g., reduce batch_size, n_layer, or n_embd in configs/llm_config.py).

Paid GPUs:

You may rent a GPU affordably at Salad | Novita (or use our affiliate to help us get more compute ❤️) | VastAI - A lot of GPU providers give 50% off on spot billing.

You may watch our tutorial on the AI Research Setup.

🛠️ 1. Environment Setup

We recommend using Python 3.10+.

Clone the Repository

git clone https://github.com/Open-Superintelligence-Lab/5-dollar-llm
cd 5-dollar-llm

Install Dependencies

pip install -r requirements.txt

Option A: Quick Start (40M Tokens) - Most of the time you will just need to download this dataset

python3 -c "
from datasets import load_dataset
import os
print('Downloading 40M Token Subset...')
ds = load_dataset('vukrosic/blueberry-1B-pretrain', split='train[:20000]')
os.makedirs('processed_data/speedrun_40M', exist_ok=True)
ds.save_to_disk('processed_data/speedrun_40M')
print('✅ Speedrun Data Ready!')
"

Option B: If you train on 100M or 1B tokens, first read below

python3 -c "
from datasets import load_dataset
import os
print('Downloading 1B Pretraining Data...')
ds = load_dataset('vukrosic/blueberry-1B-pretrain')
os.makedirs('processed_data/pretrain_1B', exist_ok=True)
ds.save_to_disk('processed_data/pretrain_1B')
print('✅ Full Data Ready!')
"

Step 2: Measure the baseline

You need to know how our (current) code performs on your hardware before changing it, so you can measure the impact of your changes.

This is done by simply running python train_llm.py
After it finishes running, please run it again.
Keep note of Training Time (⏱️ Speedrun): and final Final Val Loss from the second run.
You may notice that these 2 runs give different training time, even though they execute the exact same code. This is normal, and it is because the first run will build / compile the model, the second run is what you need to beat. If you can solve this issue so it compiles graphs if needed with just a single run, and doesn't add that to the training time, please make a pull request.

Step 3:

Now that you have the exact time you need to beat, you can start making changes.

If you ran python train_llm.py as mentioned above, you trained the model on 8 million tokens (default).

Currently we have 4 benchmarks:

8,000,000 Tokens
20,000,000 Tokens
100,000,000 Tokens
1,000,000,000 Tokens

Just an improvement on 1 benchmark is enough to submit, but you may measure multiple.

Step 4 (optional):

If you wish to try 20M tokens, please run python train_llm.py --train_tokens 20000000.

W are not yet sure if you need to rerun it 2 times after you have already built the graphs with 8M tokens. We are working on this. As a safe bet, we recommend running the baseline on 20M 2 times as well and checking the last results.

Same goes for 100M and 1B tokens but make sure you have the full 1 billion token dataset downloaded.

Step 5:

Add your code changes.

Only make a single change at a time and train the model to measure the impact of it. If the resulting time is a lot slower than the baseline, your changes may have broken the torch graph so you will have to run it a second time to get the real results.
Do not combine multiple experiments into one (eg. learning rate, fused adam, attention heads, etc.) because you will not know what caused improvement and what caused regression.

Step 6:

Confirm that your changes ourperform baseline - check the Training Time (⏱️ Speedrun): & final Final Val Loss.

Create a pull request on GitHub into main branch.

Once you submit your changes, we will mesures them ourselves, and if they improve performance, we will add you to the leaderboard - you can leave your X / LinkedIn / GitHub / etc. in the pull request.

📊 Iterating & Research

Configs: Modify configs/llm_config.py to change configs (keep the parameter size around 88M), learning rates, or optimization schedules.
Model: Edit models/llm.py to experiment with new attention mechanisms or layer types.
GPU Memory: If the model doesn't fit on your GPU, you can reduce the model size (e.g., batch_size or n_layer) for faster local iteration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 5-dollar-llm: Setup & Speedrun Guide

Step by step instructions

Step 1: Select a GPU to train on.

Free GPUs:

Paid GPUs:

🛠️ 1. Environment Setup

Clone the Repository

Install Dependencies

Option A: Quick Start (40M Tokens) - Most of the time you will just need to download this dataset

Option B: If you train on 100M or 1B tokens, first read below

Step 2: Measure the baseline

Step 3:

Step 4 (optional):

Step 5:

Step 6:

📊 Iterating & Research

FilesExpand file tree

SETUP_INSTRUCTIONS.md

Latest commit

History

SETUP_INSTRUCTIONS.md

File metadata and controls

🚀 5-dollar-llm: Setup & Speedrun Guide

Step by step instructions

Step 1: Select a GPU to train on.

Free GPUs:

Paid GPUs:

🛠️ 1. Environment Setup

Clone the Repository

Install Dependencies

Option A: Quick Start (40M Tokens) - Most of the time you will just need to download this dataset

Option B: If you train on 100M or 1B tokens, first read below

Step 2: Measure the baseline

Step 3:

Step 4 (optional):

Step 5:

Step 6:

📊 Iterating & Research