Skip to content

Commit f729e28

Browse files
committed
Merge feature/tinytorch-core: fix notebook filename convention in docs and Binder
2 parents 3a633df + 850a91a commit f729e28

File tree

28 files changed

+2731
-530
lines changed

28 files changed

+2731
-530
lines changed

binder/postBuild

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,16 @@ echo "📓 Generating student notebooks from source..."
1818
for module_dir in src/*/; do
1919
module_name=$(basename "$module_dir")
2020
py_file="$module_dir/${module_name}.py"
21+
# Strip numeric prefix for notebook name (e.g., "01_tensor" -> "tensor")
22+
short_name="${module_name#*_}"
2123

2224
if [ -f "$py_file" ]; then
2325
# Create output directory
2426
mkdir -p "modules/$module_name"
2527

2628
# Convert .py to .ipynb using jupytext
2729
echo " 📝 Converting $module_name..."
28-
jupytext --to notebook "$py_file" --output "modules/$module_name/${module_name}.ipynb" 2>/dev/null || {
30+
jupytext --to notebook "$py_file" --output "modules/$module_name/${short_name}.ipynb" 2>/dev/null || {
2931
echo " ⚠️ Warning: Could not convert $module_name"
3032
}
3133
fi

tinytorch/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ TinyTorch/
193193
194194
├── modules/ # 📓 Generated notebooks (learners work here)
195195
│ ├── 01_tensor/ # Auto-generated from src/
196-
│ │ ├── 01_tensor.ipynb # Jupyter notebook for learning
196+
│ │ ├── tensor.ipynb # Jupyter notebook for learning
197197
│ │ ├── README.md # Practical implementation guide
198198
│ │ └── tensor.py # Your implementation
199199
│ └── ... # (20 module directories)

tinytorch/binder/postBuild

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,14 +15,16 @@ echo "📓 Generating student notebooks from source..."
1515
for module_dir in src/*/; do
1616
module_name=$(basename "$module_dir")
1717
py_file="$module_dir/${module_name}.py"
18+
# Strip numeric prefix for notebook name (e.g., "01_tensor" -> "tensor")
19+
short_name="${module_name#*_}"
1820

1921
if [ -f "$py_file" ]; then
2022
# Create output directory
2123
mkdir -p "modules/$module_name"
2224

2325
# Convert .py to .ipynb using jupytext
2426
echo " 📝 Converting $module_name..."
25-
jupytext --to notebook "$py_file" --output "modules/$module_name/${module_name}.ipynb" 2>/dev/null || {
27+
jupytext --to notebook "$py_file" --output "modules/$module_name/${short_name}.ipynb" 2>/dev/null || {
2628
echo " ⚠️ Warning: Could not convert $module_name"
2729
}
2830
fi

tinytorch/site/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ clean:
6666
install:
6767
@echo "📦 Installing dependencies..."
6868
pip install -U pip
69-
pip install "jupyter-book<1.0"
69+
pip install "jupyter-book>=1.0.0,<2.0.0"
7070
pip install -r requirements.txt
7171

7272
test:

tinytorch/site/_config_pdf.yml

Lines changed: 30 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,10 +13,10 @@ description: >-
1313
Learn by implementing your own PyTorch-style framework with hands-on coding,
1414
real datasets, and production-ready practices.
1515
16-
# Execution settings - disable for PDF
16+
# Execution settings - cache mode enables {glue} computed values in ABOUT.md files
1717
execute:
18-
execute_notebooks: "off"
19-
allow_errors: false
18+
execute_notebooks: "cache"
19+
allow_errors: true
2020
timeout: 300
2121

2222
# Exclude patterns
@@ -57,8 +57,9 @@ sphinx:
5757
# --pdfFit scales PDF to fit the diagram (not full page)
5858
# --scale 1.0 keeps diagrams at natural size (1.5 was too large for tall diagrams)
5959
mermaid_output_format: "pdf"
60-
# Width 800 constrains diagram width; scale must be integer (1 = natural size)
61-
mermaid_params: ['--pdfFit', '--scale', '1', '--width', '800', '--backgroundColor', 'white']
60+
# Width 600 constrains diagram viewport; scale 1 = natural size
61+
# Smaller viewport + pdfcrop produces tighter diagrams that don't stretch to full page width
62+
mermaid_params: ['--pdfFit', '--scale', '1', '--width', '600', '--backgroundColor', 'white']
6263
# Use pdfcrop to trim whitespace from mermaid PDFs
6364
mermaid_pdfcrop: "pdfcrop"
6465
# Use professional sans-serif font for mermaid diagrams to match document
@@ -91,6 +92,9 @@ sphinx:
9192
papersize: 'letterpaper'
9293
pointsize: '10pt'
9394
figure_align: 'H'
95+
# Pass 'export' option to adjustbox before Sphinx loads it (avoids option clash).
96+
# This enables max width/height keys in \includegraphics for mermaid figure capping.
97+
passoptionstopackages: '\PassOptionsToPackage{export}{adjustbox}'
9498
fontpkg: |
9599
% Professional academic font stack (TeX Gyre - available in TeX Live)
96100
\usepackage{fontspec}
@@ -111,6 +115,27 @@ sphinx:
111115
\usepackage{hyperref}
112116
\usepackage{float}
113117
118+
% Cap Mermaid diagram width at 75% of text width.
119+
% sphinxcontrib-mermaid hardcodes width=\linewidth for all diagrams,
120+
% which stretches small flowcharts to full page width. This override
121+
% intercepts \includegraphics and uses adjustbox's max width for
122+
% mermaid-*.pdf files while passing other images through unchanged.
123+
% Note: adjustbox 'export' option passed via passoptionstopackages above.
124+
\let\OrigIncludeGraphics\includegraphics
125+
\makeatletter
126+
\renewcommand{\includegraphics}[2][]{%
127+
\begingroup
128+
\def\@mermaidtest{mermaid-}%
129+
\@expandtwoargs\in@{\@mermaidtest}{#2}%
130+
\ifin@
131+
\OrigIncludeGraphics[max width=0.75\linewidth,max height=0.45\textheight,keepaspectratio]{#2}%
132+
\else
133+
\OrigIncludeGraphics[#1]{#2}%
134+
\fi
135+
\endgroup
136+
}
137+
\makeatother
138+
114139
% Better figure placement - keep figures inline with text
115140
\renewcommand{\topfraction}{0.9}
116141
\renewcommand{\bottomfraction}{0.9}

tinytorch/site/getting-started.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,10 +104,10 @@ This opens the module notebook and tracks your progress.
104104

105105
### Work in the notebook
106106

107-
Edit `modules/01_tensor/01_tensor.ipynb` in Jupyter:
107+
Edit `modules/01_tensor/tensor.ipynb` in Jupyter:
108108

109109
```bash
110-
jupyter lab modules/01_tensor/01_tensor.ipynb
110+
jupyter lab modules/01_tensor/tensor.ipynb
111111
```
112112

113113
You'll implement:

tinytorch/site/tito/modules.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -409,11 +409,11 @@ src/ ← Developer source code
409409
410410
modules/ ← Generated notebooks (students use)
411411
├── 01_tensor/
412-
│ └── 01_tensor.ipynb ← AUTO-GENERATED for students
412+
│ └── tensor.ipynb ← AUTO-GENERATED for students
413413
├── 02_activations/
414-
│ └── 02_activations.ipynb ← AUTO-GENERATED for students
414+
│ └── activations.ipynb ← AUTO-GENERATED for students
415415
└── 03_layers/
416-
└── 03_layers.ipynb ← AUTO-GENERATED for students
416+
└── layers.ipynb ← AUTO-GENERATED for students
417417
```
418418

419419
### Where Code Exports

tinytorch/site/tito/troubleshooting.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -455,19 +455,19 @@ File → Save File (or Cmd/Ctrl + S)
455455

456456
**Step 2: Check file permissions**:
457457
```bash
458-
ls -la modules/01_tensor/01_tensor.ipynb
458+
ls -la modules/01_tensor/tensor.ipynb
459459
# Should be writable (not read-only)
460460
```
461461

462462
**Step 3: If read-only, fix permissions**:
463463
```bash
464-
chmod u+w modules/01_tensor/01_tensor.ipynb
464+
chmod u+w modules/01_tensor/tensor.ipynb
465465
```
466466

467467
**Step 4: Verify changes saved**:
468468
```bash
469469
# Check the notebook was updated
470-
ls -l modules/01_tensor/01_tensor.ipynb
470+
ls -l modules/01_tensor/tensor.ipynb
471471
```
472472

473473
</div>

tinytorch/src/01_tensor/ABOUT.md

Lines changed: 56 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
---
2+
file_format: mystnb
3+
kernelspec:
4+
name: python3
5+
---
6+
17
# Module 01: Tensor
28

39
:::{admonition} Module Info
@@ -30,7 +36,7 @@ Listen to an AI-generated overview.
3036
3137
Run interactively in your browser.
3238
33-
<a href="https://mybinder.org/v2/gh/harvard-edge/cs249r_book/main?labpath=tinytorch%2Fmodules%2F01_tensor%2F01_tensor.ipynb" target="_blank" style="display: flex; align-items: center; justify-content: center; width: 100%; height: 54px; margin-top: auto; background: #f97316; color: white; text-align: center; text-decoration: none; border-radius: 27px; font-size: 14px; box-sizing: border-box;">Open in Binder →</a>
39+
<a href="https://mybinder.org/v2/gh/harvard-edge/cs249r_book/main?labpath=tinytorch%2Fmodules%2F01_tensor%2Ftensor.ipynb" target="_blank" style="display: flex; align-items: center; justify-content: center; width: 100%; height: 54px; margin-top: auto; background: #f97316; color: white; text-align: center; text-decoration: none; border-radius: 27px; font-size: 14px; box-sizing: border-box;">Open in Binder →</a>
3440
```
3541
3642
```{grid-item-card} 📄 View Source
@@ -502,7 +508,20 @@ The rules are simpler than they look. Compare shapes from right to left. At each
502508
| `(3, 4)` | `(3,)` | Error | ✗ (3 ≠ 4) |
503509
| `(2, 3, 4)` | `(3, 4)` | `(2, 3, 4)` ||
504510

505-
The memory savings are dramatic. Adding a `(768,)` vector to a `(32, 512, 768)` tensor would require copying the vector 32×512 times without broadcasting, allocating 50 MB of redundant data (12.5 million float32 numbers). With broadcasting, you store just the original 3 KB vector.
511+
```{code-cell} python3
512+
:tags: [remove-input, remove-output]
513+
from myst_nb import glue
514+
515+
# Broadcasting memory comparison
516+
broadcast_full_elements = 32 * 512 * 768
517+
broadcast_full_bytes = broadcast_full_elements * 4
518+
broadcast_vec_bytes = 768 * 4
519+
glue("bcast_mb", f"{broadcast_full_bytes / 1024**2:.0f} MB")
520+
glue("bcast_elements", f"{broadcast_full_elements / 1e6:.1f} million")
521+
glue("bcast_vec_kb", f"{broadcast_vec_bytes / 1024:.0f} KB")
522+
```
523+
524+
The memory savings are dramatic. Adding a `(768,)` vector to a `(32, 512, 768)` tensor would require copying the vector 32×512 times without broadcasting, allocating {glue:text}`bcast_mb` of redundant data ({glue:text}`bcast_elements` float32 numbers). With broadcasting, you store just the original {glue:text}`bcast_vec_kb` vector.
506525

507526
### Views vs. Copies
508527

@@ -803,16 +822,45 @@ Broadcasting rules, shape semantics, and API design patterns. When you debug PyT
803822

804823
### Why Tensors Matter at Scale
805824

825+
```{code-cell} python3
826+
:tags: [remove-input, remove-output]
827+
828+
# LLM parameter storage (fp16 = 2 bytes per param)
829+
llm_params = 175_000_000_000
830+
llm_bytes = llm_params * 2
831+
glue("llm_gb", f"{llm_bytes / 1024**3:.0f} GB")
832+
833+
# Batch of images (float32)
834+
batch_128_bytes = 128 * 3 * 224 * 224 * 4
835+
glue("batch128_mb", f"{batch_128_bytes / 1024**2:.1f} MB")
836+
```
837+
806838
To appreciate why tensor operations matter, consider the scale of modern ML systems:
807839

808-
- **Large language models**: 175 billion numbers stored as tensors = **350 GB** (like storing 70,000 full-resolution photos)
809-
- **Image processing**: A batch of 128 images = **77 MB** of tensor data
840+
- **Large language models**: 175 billion numbers stored as tensors = **{glue:text}`llm_gb`** (like storing 70,000 full-resolution photos)
841+
- **Image processing**: A batch of 128 images = **{glue:text}`batch128_mb`** of tensor data
810842
- **Self-driving cars**: Process tensor operations at **36 FPS** across multiple cameras (each frame = millions of operations in 28 milliseconds)
811843

812844
A single matrix multiplication can consume **90% of computation time** in neural networks. Understanding tensor operations isn't just academic; it's essential for building and debugging real ML systems.
813845

814846
## Check Your Understanding
815847

848+
```{code-cell} python3
849+
:tags: [remove-input, remove-output]
850+
851+
# Q1: Batch memory
852+
q1_bytes = 32 * 3 * 224 * 224 * 4
853+
glue("q1_bytes", f"{q1_bytes:,}")
854+
glue("q1_mb", f"{q1_bytes / 1024**2:.1f} MB")
855+
856+
# Q2: Broadcasting
857+
q2_full_bytes = 32 * 512 * 768 * 4
858+
q2_vec_bytes = 768 * 4
859+
glue("q2_full_mb", f"{q2_full_bytes / 1024**2:.1f} MB")
860+
glue("q2_vec_kb", f"{q2_vec_bytes / 1024:.0f} KB")
861+
glue("q2_savings_mb", f"~{q2_full_bytes / 1024**2:.0f} MB")
862+
```
863+
816864
Test yourself with these systems thinking questions. They're designed to build intuition for the performance characteristics you'll encounter in production ML.
817865

818866
**Q1: Memory Calculation**
@@ -822,7 +870,7 @@ A batch of 32 RGB images (224×224 pixels) stored as float32. How much memory?
822870
```{admonition} Answer
823871
:class: dropdown
824872
825-
32 × 3 × 224 × 224 × 4 = **19,267,584 bytes ≈ 19.3 MB**
873+
32 × 3 × 224 × 224 × 4 = **{glue:text}`q1_bytes` bytes ≈ {glue:text}`q1_mb`**
826874
827875
This is why batch size matters - double the batch, double the memory!
828876
```
@@ -834,11 +882,11 @@ Adding a vector `(768,)` to a 3D tensor `(32, 512, 768)`. How much memory does b
834882
```{admonition} Answer
835883
:class: dropdown
836884
837-
Without broadcasting: 32 × 512 × 768 × 4 = **50.3 MB**
885+
Without broadcasting: 32 × 512 × 768 × 4 = **{glue:text}`q2_full_mb`**
838886
839-
With broadcasting: 768 × 4 = **3 KB**
887+
With broadcasting: 768 × 4 = **{glue:text}`q2_vec_kb`**
840888
841-
Savings: **~50 MB per operation** - this adds up across hundreds of operations in a neural network!
889+
Savings: **{glue:text}`q2_savings_mb` per operation** - this adds up across hundreds of operations in a neural network!
842890
```
843891

844892
**Q3: Matmul Scaling**

tinytorch/src/02_activations/ABOUT.md

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,9 @@
1+
---
2+
file_format: mystnb
3+
kernelspec:
4+
name: python3
5+
---
6+
17
# Module 02: Activations
28

39
:::{admonition} Module Info
@@ -30,7 +36,7 @@ Listen to an AI-generated overview.
3036
3137
Run interactively in your browser.
3238
33-
<a href="https://mybinder.org/v2/gh/harvard-edge/cs249r_book/main?labpath=tinytorch%2Fmodules%2F02_activations%2F02_activations.ipynb" target="_blank" style="display: flex; align-items: center; justify-content: center; width: 100%; height: 54px; margin-top: auto; background: #f97316; color: white; text-align: center; text-decoration: none; border-radius: 27px; font-size: 14px; box-sizing: border-box;">Open in Binder →</a>
39+
<a href="https://mybinder.org/v2/gh/harvard-edge/cs249r_book/main?labpath=tinytorch%2Fmodules%2F02_activations%2Factivations.ipynb" target="_blank" style="display: flex; align-items: center; justify-content: center; width: 100%; height: 54px; margin-top: auto; background: #f97316; color: white; text-align: center; text-decoration: none; border-radius: 27px; font-size: 14px; box-sizing: border-box;">Open in Binder →</a>
3440
```
3541
3642
```{grid-item-card} 📄 View Source
@@ -693,16 +699,47 @@ Let's walk through the key similarities and differences:
693699
Mathematical functions, numerical stability techniques (max subtraction in softmax), and the concept of element-wise transformations. When you debug PyTorch activation issues, you'll understand exactly what's happening because you implemented the same logic.
694700
```
695701

702+
```{code-cell} python3
703+
:tags: [remove-input, remove-output]
704+
from myst_nb import glue
705+
706+
# Prose: "Why Activations Matter at Scale"
707+
prose_gelu_ops = 96 * 2
708+
glue("prose_gelu_ops", f"{prose_gelu_ops:,}")
709+
710+
prose_daily_activations = 1000 * 86400
711+
glue("prose_daily_activations", f"{prose_daily_activations / 1e6:.0f} million")
712+
```
713+
696714
### Why Activations Matter at Scale
697715

698716
To appreciate why activation choice matters, consider the scale of modern ML systems:
699717

700-
- **Large language models**: GPT-3 has 96 transformer layers, each with 2 GELU activations. That's **192 GELU operations per forward pass** on billions of parameters.
718+
- **Large language models**: GPT-3 has 96 transformer layers, each with 2 GELU activations. That's **{glue:text}`prose_gelu_ops` GELU operations per forward pass** on billions of parameters.
701719
- **Image classification**: ResNet-50 has 49 convolutional layers, each followed by ReLU. Processing a batch of 256 images at 224×224 resolution means **12 billion ReLU operations** per batch.
702-
- **Production serving**: A model serving 1000 requests per second performs **86 million activation computations per day**. A 20% speedup from ReLU vs GELU saves hours of compute time.
720+
- **Production serving**: A model serving 1000 requests per second performs **{glue:text}`prose_daily_activations` activation computations per day**. A 20% speedup from ReLU vs GELU saves hours of compute time.
703721

704722
Activation functions account for **5-15% of total training time** in typical networks (the rest is matrix multiplication). But in transformer models with many layers and small matrix sizes, activations can account for **20-30% of compute time**. This is why GELU vs ReLU is a real trade-off: slower computation but potentially better accuracy.
705723

724+
```{code-cell} python3
725+
:tags: [remove-input, remove-output]
726+
from myst_nb import glue
727+
728+
# Q1: Memory calculation
729+
q1_bytes = 32 * 4096 * 4
730+
glue("q1_bytes", f"{q1_bytes:,}")
731+
glue("q1_kb", f"{q1_bytes / 1024:.0f} KB")
732+
733+
q1_100layer_kb = 100 * (q1_bytes / 1024)
734+
glue("q1_100layer_mb", f"{q1_100layer_kb / 1024:.0f} MB")
735+
736+
# Q4: Sparsity analysis
737+
q4_total = 128 * 1024
738+
q4_zeros = q4_total // 2
739+
glue("q4_total", f"{q4_total:,}")
740+
glue("q4_zeros", f"≈ {q4_zeros:,}")
741+
```
742+
706743
## Check Your Understanding
707744

708745
Test yourself with these systems thinking questions. They're designed to build intuition for how activations behave in real neural networks.
@@ -714,9 +751,9 @@ A batch of 32 samples passes through a hidden layer with 4096 neurons and ReLU a
714751
```{admonition} Answer
715752
:class: dropdown
716753
717-
32 × 4096 × 4 bytes = **524,288 bytes ≈ 512 KB**
754+
32 × 4096 × 4 bytes = **{glue:text}`q1_bytes` bytes ≈ {glue:text}`q1_kb`**
718755
719-
This is the activation memory for ONE layer. A 100-layer network needs 50 MB just to store activations for one forward pass. This is why activation memory dominates training memory usage — activations must be cached for backpropagation.
756+
This is the activation memory for ONE layer. A 100-layer network needs {glue:text}`q1_100layer_mb` just to store activations for one forward pass. This is why activation memory dominates training memory usage — activations must be cached for backpropagation.
720757
```
721758

722759
**Q2: Computational Cost**
@@ -764,8 +801,8 @@ For a standard normal distribution N(0, 1), approximately **50% of values are ne
764801
765802
ReLU zeros all negative values, so approximately **50% of outputs will be exactly zero**.
766803
767-
Total elements: 128 × 1024 = 131,072
768-
Zeros: ≈ 65,536
804+
Total elements: 128 × 1024 = {glue:text}`q4_total`
805+
Zeros: {glue:text}`q4_zeros`
769806
770807
This sparsity has major implications:
771808
- **Speed**: Multiplying by zero is free, so downstream computations can skip ~50% of operations
@@ -839,7 +876,7 @@ Implement Linear layers that combine your Tensor operations with your activation
839876

840877
```{tip} Interactive Options
841878
842-
- **[Launch Binder](https://mybinder.org/v2/gh/harvard-edge/cs249r_book/main?urlpath=lab/tree/tinytorch/modules/02_activations/02_activations.ipynb)** - Run interactively in browser, no setup required
879+
- **[Launch Binder](https://mybinder.org/v2/gh/harvard-edge/cs249r_book/main?urlpath=lab/tree/tinytorch/modules/02_activations/activations.ipynb)** - Run interactively in browser, no setup required
843880
- **[View Source](https://github.com/harvard-edge/cs249r_book/blob/main/tinytorch/src/02_activations/02_activations.py)** - Browse the implementation code
844881
```
845882

0 commit comments

Comments
 (0)