Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
dc0629b
initial consolidation + p12
firekind Feb 27, 2026
3a44857
added 1b model and current kernels
firekind Feb 27, 2026
affbf4c
consolidated dataset, checkpointing and related aws code
firekind Feb 27, 2026
5db668c
consolidated latest dataloaders, kernels and 1B model
firekind Feb 27, 2026
99e0b84
consolidated experiment's train.py and main.py
firekind Feb 27, 2026
fa0fbb1
consolidated configs, scripts used in experiments
firekind Feb 27, 2026
887c649
added back fla dep
firekind Feb 27, 2026
069e000
fixed scripts, added relative path resolver for more config fields
firekind Feb 27, 2026
cb561ef
added triton dep for linux
firekind Feb 28, 2026
fcbfb58
updated ci
firekind Mar 2, 2026
265601c
feat: Implement background prefetching from S3 for BinIdx dataloader …
hemanth346 Mar 4, 2026
df0f39f
pre-training pipeline rework - initial
firekind Mar 5, 2026
4380db2
removed unnecessary results folder
firekind Mar 5, 2026
2d43fff
Commited initial implementation of loss spike detection / hendling me…
sualehqureshi-tomtom Mar 5, 2026
576e0e1
Included grad norm in loss spiking action mechanism. Also added embed…
sualehqureshi-tomtom Mar 6, 2026
3f56a62
Updated loss spike action mechanism, considering the distributive tra…
sualehqureshi-tomtom Mar 6, 2026
8872277
Added reset logic to reset the sliding window of Loss spike tracking
sualehqureshi-tomtom Mar 6, 2026
c494a71
Added Read Me file to explain the approach, and list down the changes…
sualehqureshi-tomtom Mar 6, 2026
6edf975
Added local_spike_recovery_test.py file to test loss spike detection …
sualehqureshi-tomtom Mar 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.venv
node_modules
20 changes: 20 additions & 0 deletions .github/workflows/precommit.yml → .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,18 @@ jobs:
with:
python-version: "3.12"

- uses: astral-sh/setup-uv@v5

- name: Install dependencies
run: |
uv sync --group dev
working-directory: llm

- name: Install dependencies
run: |
uv sync --group dev
working-directory: dashboard/backend

- name: Install pre-commit
run: pip install pre-commit

Expand All @@ -29,3 +41,11 @@ jobs:

- name: Run hooks
run: pre-commit run --all-files --show-diff-on-failure

- name: Run basedpyright (llm)
run: uv run basedpyright
working-directory: llm

- name: Run basedpyright (dashboard/backend)
run: uv run basedpyright
working-directory: dashboard/backend
105 changes: 90 additions & 15 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,55 @@
### Project ###
_data

### Linux ###
*~

# temporary files which can be created if a process still has a handle open of a deleted file
.fuse_hidden*

# KDE directory preferences
.directory

# Linux trash folder which might appear on any partition or disk
.Trash-*

# .nfs files are created when an open file is removed but is still being accessed
.nfs*

### macOS ###
# General
.DS_Store
.AppleDouble
.LSOverride

# Icon must end with two \r
Icon


# Thumbnails
._*

# Files that might appear in the root of a volume
.DocumentRevisions-V100
.fseventsd
.Spotlight-V100
.TemporaryItems
.Trashes
.VolumeIcon.icns
.com.apple.timemachine.donotpresent

# Directories potentially created on remote AFP share
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk

### macOS Patch ###
# iCloud generated files
*.icloud

### Python ###
# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
Expand Down Expand Up @@ -51,9 +103,6 @@ coverage.xml
.pytest_cache/
cover/

# Ruff
.ruff_cache/

# Translations
*.mo
*.pot
Expand Down Expand Up @@ -85,8 +134,6 @@ target/
profile_default/
ipython_config.py

.pdm.toml

# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
__pypackages__/

Expand Down Expand Up @@ -130,17 +177,45 @@ dmypy.json
# Cython debug symbols
cython_debug/

# macOS
.DS_Store
# PyCharm
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/

### Python Patch ###
# Poetry local configuration file - https://python-poetry.org/docs/configuration/#local-configuration
poetry.toml

# ruff
.ruff_cache/

# LSP config files
pyrightconfig.json

### Windows ###
# Windows thumbnail cache files
Thumbs.db
Thumbs.db:encryptable
ehthumbs.db
ehthumbs_vista.db

# Dump file
*.stackdump

# Folder config file
[Dd]esktop.ini

logs/
/data
/outputs
/logs
# Recycle Bin used on file shares
$RECYCLE.BIN/

*.pt
*.pth
# Windows Installer files
*.cab
*.msi
*.msix
*.msm
*.msp

# macOS specific files
*.DS_Store
# Windows shortcuts
*.lnk
4 changes: 4 additions & 0 deletions .gitleaks.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
[allowlist]
paths = [
'''\.env\.example''',
]
20 changes: 10 additions & 10 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
minimum_pre_commit_version: 2.20.0

repos:
- repo: https://github.com/psf/black
rev: 24.8.0
rev: 26.1.0
hooks:
- id: black

- repo: https://github.com/pycqa/isort
rev: 5.13.2
rev: 8.0.0
hooks:
- id: isort
args: ["--profile=black"]

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.5.0
rev: v0.15.4
hooks:
- id: ruff

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.6.0
rev: v6.0.0
hooks:
- id: check-json

- id: check-added-large-files
args: ["--maxkb=1024"]

- repo: https://github.com/adrienverge/yamllint
rev: v1.35.1
rev: v1.38.0
hooks:
- id: yamllint
args: [-c, .yamllint]

- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.4
rev: v8.30.0
hooks:
- id: gitleaks
- id: gitleaks
87 changes: 0 additions & 87 deletions CODEOWNERS

This file was deleted.

11 changes: 0 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +0,0 @@
# LLM

Lightning Language Models

## Contribution Guidelines

See [contribution guidelines](https://github.com/The-School-of-AI/LLM/tree/main/experiments/19_reproducibility_provenance_and_experiment_tracking/contribution.md)

## Rebase with Staging

See [rebase with staging](https://github.com/The-School-of-AI/LLM/tree/main/experiments/19_reproducibility_provenance_and_experiment_tracking/rebase_with_stage.md)
Loading