Paper 18: Relational RNN Implementation Task: P2-T3 - Implement training utilities and loss functions Status: COMPLETED ✓
Size: 1,074 lines of code Dependencies: NumPy only
- ✓
cross_entropy_loss()- Numerically stable cross-entropy for classification - ✓
mse_loss()- Mean squared error for regression tasks - ✓
softmax()- Stable softmax computation - ✓
accuracy()- Classification accuracy metric
- ✓
compute_numerical_gradient()- Element-wise finite differences - ✓
compute_numerical_gradient_fast()- Vectorized gradient estimation
- ✓
clip_gradients()- Global norm gradient clipping - ✓
learning_rate_schedule()- Exponential decay scheduling - ✓
EarlyStoppingclass - Prevent overfitting with patience
- ✓
train_step()- Single gradient descent step - ✓
evaluate()- Model evaluation without gradient updates - ✓
create_batches()- Batch creation with shuffling - ✓
train_model()- Full training loop with all features
- ✓
plot_training_curves()- Comprehensive training visualization
All 21 tests passed:
✓ Loss Functions (6 tests)
- Cross-entropy with perfect predictions
- Cross-entropy with random predictions
- Cross-entropy with one-hot targets (equivalence check)
- MSE with perfect predictions
- MSE with known values
- Accuracy computation
✓ Optimization Utilities (4 tests)
- Gradient clipping with small gradients
- Gradient clipping with large gradients
- Learning rate schedule
- Early stopping behavior
✓ Training Loop (5 tests)
- Dataset creation
- Model initialization
- Single training step
- Evaluation
- Full training loop
Fast sanity check of all core functions:
- All 6 component tests passed
- Execution time: <5 seconds
- Validates integration between components
Four comprehensive demonstrations:
-
Basic LSTM Training (20 epochs)
- Loss: 1.1038 → 1.0906 (train)
- Accuracy: 0.363 → 0.399 (train)
- Test accuracy: 0.420
-
Early Stopping Detection (28 epochs, stopped early)
- Patience: 5 epochs
- Best validation loss: 1.1142
- Successfully prevented overfitting
-
Learning Rate Schedule (15 epochs)
- Initial LR: 0.050
- Final LR: 0.033 (34% reduction)
- Smooth exponential decay
-
Gradient Clipping (10 epochs)
- Max gradient norm: 0.720
- Avg gradient norm: 0.594
- All gradients within bounds (clipping available when needed)
- Log-sum-exp trick for cross-entropy
- Stable softmax implementation
- Prevents NaN/Inf in loss computation
- Gradient clipping by global norm (prevents exploding gradients)
- Early stopping (prevents overfitting)
- Learning rate decay (enables fine-tuning)
Works with any model implementing:
def forward(X, return_sequences=False): ...
def get_params(): ...
def set_params(params): ...Currently compatible:
- LSTM (from
lstm_baseline.py) - Future: Relational RNN
Training history tracks:
- Training loss and metric per epoch
- Validation loss and metric per epoch
- Learning rates used
- Gradient norms (for stability monitoring)
- Classification (cross-entropy + accuracy)
- Regression (MSE + negative loss)
Choice: Implemented numerical gradients (finite differences)
Pros:
- Simple to implement and understand
- No risk of backpropagation bugs
- Educational value for understanding gradients
- Works with any model (black-box)
Cons:
- Slow: O(parameters) forward passes per step
- Approximate: finite difference error ~ε²
- Not suitable for large models
Justification:
- For educational implementation and prototyping
- NumPy-only constraint makes BPTT complex
- Easy to swap in analytical gradients later
Choice: Plain stochastic gradient descent only
Justification:
- Clean, understandable implementation
- Foundation for more advanced optimizers
- Easy to extend (Adam, momentum, etc.)
Choice: Pure NumPy, sequential processing
Justification:
- Project requirement (NumPy only)
- Focus on algorithmic correctness
- Easier to debug and understand
- Small models (< 10K parameters): ~1-2 seconds/epoch
- Medium models (10K-50K parameters): ~5-10 seconds/epoch
- Dominated by numerical gradient computation
- Proportional to batch size and model size
- No gradient accumulation or caching
- Minimal overhead beyond model parameters
- Suitable for: Educational use, prototyping, small experiments
- Not suitable for: Large-scale training, production deployments
from lstm_baseline import LSTM
from training_utils import train_model, evaluate
# Create model
model = LSTM(input_size=10, hidden_size=32, output_size=3)
# Prepare data
X_train = np.random.randn(500, 20, 10) # (samples, seq_len, features)
y_train = np.random.randint(0, 3, size=500) # class labels
X_val = np.random.randn(100, 20, 10)
y_val = np.random.randint(0, 3, size=100)
# Train with all features
history = train_model(
model,
train_data=(X_train, y_train),
val_data=(X_val, y_val),
epochs=50,
batch_size=32,
learning_rate=0.01,
lr_decay=0.95,
lr_decay_every=10,
clip_norm=5.0,
patience=10,
task='classification',
verbose=True
)
# Evaluate
test_loss, test_acc = evaluate(model, X_test, y_test)
print(f"Test accuracy: {test_acc:.4f}")
# Visualize
plot_training_curves(history, save_path='training.png')-
training_utils.py(1,074 lines)- Main implementation with all utilities
- Comprehensive docstrings
- Built-in test suite
-
training_demo.py(300+ lines)- Four demonstration scenarios
- Shows all features in action
- Generates realistic training curves
-
test_training_utils_quick.py(150+ lines)- Fast sanity check
- Tests all core functions
- Validates integration
-
TRAINING_UTILS_README.md(500+ lines)- Complete documentation
- API reference
- Usage examples
- Integration guide
-
TASK_P2_T3_SUMMARY.md(this file)- Task completion summary
- Test results
- Design decisions
These utilities are ready for immediate use with the Relational RNN model:
from relational_rnn import RelationalRNN # When implemented
from training_utils import train_model
# Same interface as LSTM
model = RelationalRNN(input_size=10, hidden_size=32, output_size=3)
history = train_model(
model,
train_data=(X_train, y_train),
val_data=(X_val, y_val),
epochs=50
)Requirements for Relational RNN:
- Implement
forward(X, return_sequences=False) - Implement
get_params()returning dict of parameters - Implement
set_params(params)to update parameters
- Cross-entropy loss implemented and tested
- MSE loss implemented and tested
- Accuracy metric working
- Gradient clipping functional
- Learning rate schedule working
- Early stopping prevents overfitting
- Single training step updates parameters correctly
- Evaluation works without updating parameters
- Full training loop tracks all metrics
- Visualization generates plots (or text fallback)
- All tests pass
- Demo shows realistic training scenarios
- Documentation complete
- Compatible with existing LSTM model
- Ready for Relational RNN integration
Task P2-T3 is COMPLETE. All required training utilities have been implemented, tested, and documented. The implementation is:
- ✓ Fully functional with LSTM baseline
- ✓ Ready for Relational RNN integration
- ✓ Well-tested (21+ unit tests)
- ✓ Comprehensively documented
- ✓ NumPy-only (no external ML frameworks)
- ✓ Educational and easy to understand
The training utilities provide a complete infrastructure for training and evaluating both LSTM and Relational RNN models on classification and regression tasks.
Note: As requested, no git commit was created. Files are ready for review and integration.