Skip to content

Commit 21901f7

Browse files
release
1 parent ade768d commit 21901f7

File tree

11 files changed

+2906
-213
lines changed

11 files changed

+2906
-213
lines changed

docs/Machine-Learning/Normalization Regularisation.md

Lines changed: 66 additions & 65 deletions
Large diffs are not rendered by default.

docs/Machine-Learning/Overfitting, Underfitting.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@ description: Comprehensive guide to Overfitting and Underfitting with mathematic
44
comments: true
55
---
66

7-
# Overfitting and Underfitting
7+
# 🎯 Overfitting and Underfitting
88

99
Overfitting and Underfitting are fundamental concepts in machine learning that describe how well a model generalizes to unseen data - the central challenge in building reliable predictive models.
1010

1111
**Resources:** [Scikit-learn Model Selection](https://scikit-learn.org/stable/modules/model_evaluation.html) | [ESL Chapter 7](https://web.stanford.edu/~hastie/ElemStatLearn/) | [Bias-Variance Tradeoff Paper](https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote12.html)
1212

13-
## Summary
13+
## 📊 Summary
1414

1515
**Overfitting** occurs when a model learns the training data too well, capturing noise and specific patterns that don't generalize to new data. **Underfitting** happens when a model is too simple to capture the underlying patterns in the data.
1616

@@ -42,7 +42,7 @@ Overfitting and Underfitting are fundamental concepts in machine learning that d
4242
- **Regularization**: Primary technique to prevent overfitting
4343
- **Cross-Validation**: Method to detect and measure fitting issues
4444

45-
## Intuition
45+
## 🧠 Intuition
4646

4747
### How Overfitting and Underfitting Work
4848

@@ -86,7 +86,7 @@ Training and validation error as functions of:
8686
- **Sample size**: $\text{Error}(n)$
8787
- **Model complexity**: $\text{Error}(\lambda)$ where $\lambda$ controls complexity
8888

89-
## =" Implementation using Libraries
89+
## 🛠️ Implementation using Libraries
9090

9191
### Scikit-learn Implementation
9292

@@ -148,7 +148,7 @@ for i, (name, model) in enumerate(models.items(), 1):
148148
plt.scatter(X_train, y_train, alpha=0.6, label='Training Data')
149149
plt.scatter(X_test, y_test, alpha=0.6, label='Test Data')
150150
plt.plot(X_plot, results[name]['predictions'], 'r-', linewidth=2)
151-
plt.title(f'{name}\nTrain R²: {train_score:.3f}, Test R²: {test_score:.3f}')
151+
plt.title(f'{name}\nTrain R²: {train_score:.3f}, Test R²: {test_score:.3f}')
152152
plt.legend()
153153
plt.xlabel('X')
154154
plt.ylabel('y')
@@ -160,7 +160,7 @@ plt.show()
160160
print("Model Performance Comparison:")
161161
print("-" * 50)
162162
for name, result in results.items():
163-
print(f"{name:25s} | Train R²: {result['train_score']:.3f} | Test R²: {result['test_score']:.3f}")
163+
print(f"{name:25s} | Train R²: {result['train_score']:.3f} | Test R²: {result['test_score']:.3f}")
164164
```
165165

166166
### Learning Curves Analysis
@@ -233,7 +233,7 @@ alpha_range = np.logspace(-4, 2, 20)
233233
plot_validation_curve(ridge_model, X, y, 'ridge__alpha', alpha_range, 'Ridge Alpha')
234234
```
235235

236-
## ™ From Scratch Implementation
236+
## 🔧 From Scratch Implementation
237237

238238
### Simple Overfitting Detection Framework
239239

@@ -290,7 +290,7 @@ class FittingAnalyzer:
290290
I = np.eye(X.shape[1])
291291
I[0, 0] = 0 # Don't regularize intercept
292292

293-
# Ridge regression solution: (X^T X + ±I)^(-1) X^T y
293+
# Ridge regression solution: (X^T X + λI)^(-1) X^T y
294294
coefficients = np.linalg.solve(X.T @ X + alpha * I, X.T @ y)
295295

296296
return coefficients
@@ -304,7 +304,7 @@ class FittingAnalyzer:
304304
return np.mean((y_true - y_pred) ** 2)
305305

306306
def r2_score(self, y_true: np.ndarray, y_pred: np.ndarray) -> float:
307-
"""Calculate R² score"""
307+
"""Calculate R² score"""
308308
ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
309309
ss_res = np.sum((y_true - y_pred) ** 2)
310310
return 1 - (ss_res / ss_tot)
@@ -387,14 +387,14 @@ class FittingAnalyzer:
387387
ax1.legend()
388388
ax1.grid(True, alpha=0.3)
389389

390-
# Plot R² vs complexity
390+
# Plot R² vs complexity
391391
ax2.plot(results['degrees'], results['train_r2s'], 'o-',
392-
label='Training R²', color='blue')
392+
label='Training R²', color='blue')
393393
ax2.plot(results['degrees'], results['test_r2s'], 'o-',
394-
label='Validation R²', color='red')
394+
label='Validation R²', color='red')
395395
ax2.set_xlabel('Polynomial Degree (Model Complexity)')
396-
ax2.set_ylabel('R² Score')
397-
ax2.set_title('R² vs Model Complexity')
396+
ax2.set_ylabel('R² Score')
397+
ax2.set_title('R² vs Model Complexity')
398398
ax2.legend()
399399
ax2.grid(True, alpha=0.3)
400400

@@ -433,7 +433,7 @@ if __name__ == "__main__":
433433
optimal_degree = results['degrees'][optimal_idx]
434434

435435
print(f"\nOptimal polynomial degree: {optimal_degree}")
436-
print(f"Test R² at optimal complexity: {results['test_r2s'][optimal_idx]:.3f}")
436+
print(f"Test R² at optimal complexity: {results['test_r2s'][optimal_idx]:.3f}")
437437

438438
# Detect fitting issues for different complexities
439439
for i, degree in enumerate([1, optimal_degree, 15]):
@@ -445,7 +445,7 @@ if __name__ == "__main__":
445445
print(f"Degree {degree}: {status}")
446446
```
447447

448-
##   Assumptions and Limitations
448+
## ⚠️ Assumptions and Limitations
449449

450450
### Overfitting Assumptions and Limitations
451451

@@ -492,7 +492,7 @@ if __name__ == "__main__":
492492
- **Advantages**: Interpretable individual models
493493
- **Disadvantages**: May still overfit collectively
494494

495-
## Interview Questions
495+
## Interview Questions
496496

497497
??? question "1. What is the fundamental difference between overfitting and underfitting? How do they relate to the bias-variance tradeoff?"
498498
**Answer:**
@@ -502,7 +502,7 @@ if __name__ == "__main__":
502502
- Overfitting: Low training error, high test error (high variance)
503503
- Underfitting: High training error, high test error (high bias)
504504
- Optimal model: Balance between bias and variance
505-
- **Total Error** = Bias² + Variance + Irreducible Error
505+
- **Total Error** = Bias² + Variance + Irreducible Error
506506
- **Goal**: Find the sweet spot that minimizes total expected error
507507

508508
??? question "2. How would you detect overfitting in a machine learning model? Provide multiple approaches."
@@ -592,8 +592,8 @@ if __name__ == "__main__":
592592
- Flexibility of the model to fit different patterns
593593
- Measured by VC dimension, degrees of freedom, etc.
594594
- **Relationship to Overfitting**:
595-
- Higher complexity ’ Higher risk of overfitting
596-
- Lower complexity ’ Higher risk of underfitting
595+
- Higher complexity Higher risk of overfitting
596+
- Lower complexity Higher risk of underfitting
597597
- Sweet spot depends on data size and problem complexity
598598
- **Choosing Right Complexity**:
599599
- **Validation curves**: Plot performance vs complexity parameter
@@ -697,7 +697,7 @@ if __name__ == "__main__":
697697
- For deep learning, often need much more
698698
- Consider domain complexity when sizing models
699699

700-
## Examples
700+
## 📝 Examples
701701

702702
### Real-World Example: House Price Prediction
703703

@@ -740,10 +740,10 @@ train_score_simple = simple_model.score(X_train[['Size_sqft']], y_train)
740740
test_score_simple = simple_model.score(X_test[['Size_sqft']], y_test)
741741

742742
print(f"Simple Model (Size only):")
743-
print(f"Training R²: {train_score_simple:.3f}")
744-
print(f"Test R²: {test_score_simple:.3f}")
743+
print(f"Training R²: {train_score_simple:.3f}")
744+
print(f"Test R²: {test_score_simple:.3f}")
745745
print(f"Performance Gap: {abs(train_score_simple - test_score_simple):.3f}")
746-
print("Analysis: Both scores are low ’ UNDERFITTING")
746+
print("Analysis: Both scores are low UNDERFITTING")
747747

748748
# Model 2: Good fit
749749
print("\n2. GOOD FIT EXAMPLE:")
@@ -756,10 +756,10 @@ train_score_good = good_model.score(X_train, y_train)
756756
test_score_good = good_model.score(X_test, y_test)
757757

758758
print(f"Ridge Model (All features):")
759-
print(f"Training R²: {train_score_good:.3f}")
760-
print(f"Test R²: {test_score_good:.3f}")
759+
print(f"Training R²: {train_score_good:.3f}")
760+
print(f"Test R²: {test_score_good:.3f}")
761761
print(f"Performance Gap: {abs(train_score_good - test_score_good):.3f}")
762-
print("Analysis: Both scores reasonable, small gap ’ GOOD FIT")
762+
print("Analysis: Both scores reasonable, small gap GOOD FIT")
763763

764764
# Model 3: Overfitting (too complex)
765765
print("\n3. OVERFITTING EXAMPLE:")
@@ -777,10 +777,10 @@ train_score_overfit = overfit_model.score(X_train, y_train)
777777
test_score_overfit = overfit_model.score(X_test, y_test)
778778

779779
print(f"Polynomial Model (degree=8):")
780-
print(f"Training R²: {train_score_overfit:.3f}")
781-
print(f"Test R²: {test_score_overfit:.3f}")
780+
print(f"Training R²: {train_score_overfit:.3f}")
781+
print(f"Test R²: {test_score_overfit:.3f}")
782782
print(f"Performance Gap: {abs(train_score_overfit - test_score_overfit):.3f}")
783-
print("Analysis: High training score, low test score ’ OVERFITTING")
783+
print("Analysis: High training score, low test score OVERFITTING")
784784

785785
# Cross-validation analysis
786786
print("\n4. CROSS-VALIDATION ANALYSIS:")
@@ -798,7 +798,7 @@ for name, model in models.items():
798798
else:
799799
cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='r2')
800800

801-
print(f"{name:12s}: Mean CV R² = {cv_scores.mean():.3f}{cv_scores.std():.3f})")
801+
print(f"{name:12s}: Mean CV R² = {cv_scores.mean():.3f}{cv_scores.std():.3f})")
802802

803803
# Learning curves visualization
804804
def plot_learning_curve_example():
@@ -871,12 +871,12 @@ print("" Feature engineering might help more than complex models")
871871
- **Overfitting**: High-degree polynomial model shows perfect training performance but poor test performance
872872
- **Learning Curves**: Reveal the characteristic patterns of each fitting scenario
873873

874-
## References
874+
## 📚 References
875875

876876
1. **Books:**
877877
- [The Elements of Statistical Learning - Hastie, Tibshirani, Friedman](https://web.stanford.edu/~hastie/ElemStatLearn/)
878878
- [Pattern Recognition and Machine Learning - Bishop](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf)
879-
- [Hands-On Machine Learning - Aurélien Géron](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)
879+
- [Hands-On Machine Learning - Aurélien Géron](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)
880880

881881
2. **Papers:**
882882
- [A Few Useful Things to Know About Machine Learning - Domingos](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)

docs/Machine-Learning/PCA.md

Lines changed: 20 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ description: Comprehensive guide to Principal Component Analysis with mathematic
44
comments: true
55
---
66

7-
# Principal Component Analysis (PCA)
7+
# 🎯 Principal Component Analysis (PCA)
88

99
PCA is a fundamental dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving maximum variance, making it invaluable for data visualization, noise reduction, and feature extraction.
1010

@@ -38,7 +38,7 @@ Principal Component Analysis (PCA) is an unsupervised linear dimensionality redu
3838
- **Sparse PCA**: Incorporates sparsity constraints on components
3939
- **Incremental PCA**: For large datasets that don't fit in memory
4040

41-
## >à Intuition
41+
## > Intuition
4242

4343
### How PCA Works
4444

@@ -242,7 +242,7 @@ def compare_with_without_pca(X, y, n_components=2):
242242
compare_with_without_pca(X_scaled, y, n_components=2)
243243
```
244244

245-
## ™ From Scratch Implementation
245+
## From Scratch Implementation
246246

247247
```python
248248
import numpy as np
@@ -458,7 +458,7 @@ for comp, error in zip(components, errors):
458458
print(f"{comp} components: {error:.6f}")
459459
```
460460

461-
##   Assumptions and Limitations
461+
## Assumptions and Limitations
462462

463463
### Key Assumptions
464464

@@ -506,15 +506,15 @@ for comp, error in zip(components, errors):
506506
- With very sparse data (consider specialized sparse PCA)
507507
- When you need exactly interpretable features for regulatory compliance
508508

509-
## Interview Questions
509+
## Interview Questions
510510

511511
??? question "What is the mathematical intuition behind PCA and how does it work?"
512512

513513
**Answer:** PCA finds the directions (principal components) in the data that capture the maximum variance. Mathematically, it performs eigenvalue decomposition on the covariance matrix:
514514

515515
1. **Center the data**: Subtract the mean from each feature
516516
2. **Compute covariance matrix**: C = (X^T * X) / (n-1)
517-
3. **Find eigenvalues and eigenvectors**: C*v = »*v
517+
3. **Find eigenvalues and eigenvectors**: C*v = *v
518518
4. **Sort by eigenvalues**: Largest eigenvalues correspond to directions with most variance
519519
5. **Project data**: Transform original data onto selected eigenvectors
520520

@@ -530,7 +530,7 @@ for comp, error in zip(components, errors):
530530

531531
**Example**: Without standardization, if you have height (cm, ~170) and weight (kg, ~70), height will dominate simply due to larger numerical values, not because it's more important.
532532

533-
**Solution**: Use z-score standardization: (x - ¼) / Ã for each feature.
533+
**Solution**: Use z-score standardization: (x - ) / for each feature.
534534

535535
??? question "How do you choose the optimal number of principal components?"
536536

@@ -583,20 +583,20 @@ for comp, error in zip(components, errors):
583583

584584
**Example interpretation**:
585585
```
586-
PC1 loadings: [0.8 height, 0.7 weight, 0.1 age] ’ "Physical size factor"
587-
PC2 loadings: [0.2 height, -0.1 weight, 0.9 age] ’ "Age factor"
586+
PC1 loadings: [0.8 height, 0.7 weight, 0.1 age] "Physical size factor"
587+
PC2 loadings: [0.2 height, -0.1 weight, 0.9 age] "Age factor"
588588
```
589589

590590
??? question "What are the limitations of PCA and when should you not use it?"
591591

592592
**Answer:** Major limitations and alternatives:
593593

594594
**Limitations**:
595-
1. **Linear only**: Cannot capture non-linear relationships ’ Use Kernel PCA, t-SNE
596-
2. **Variance ` Importance**: High variance doesn't always mean importance ’ Use domain knowledge
597-
3. **Loss of interpretability**: PCs are combinations of original features ’ Use Sparse PCA, Factor Analysis
598-
4. **Outlier sensitive**: Outliers can skew components ’ Use Robust PCA
599-
5. **No class consideration**: Doesn't consider target variable ’ Use LDA for classification
595+
1. **Linear only**: Cannot capture non-linear relationships Use Kernel PCA, t-SNE
596+
2. **Variance ` Importance**: High variance doesn't always mean importance Use domain knowledge
597+
3. **Loss of interpretability**: PCs are combinations of original features Use Sparse PCA, Factor Analysis
598+
4. **Outlier sensitive**: Outliers can skew components Use Robust PCA
599+
5. **No class consideration**: Doesn't consider target variable Use LDA for classification
600600

601601
**When NOT to use PCA**:
602602
- Categorical data without proper encoding
@@ -642,17 +642,17 @@ for comp, error in zip(components, errors):
642642

643643
**SVD decomposition** of centered data matrix X:
644644
```
645-
X = U * £ * V^T
645+
X = U * * V^T
646646
```
647647
Where:
648648
- U: Left singular vectors
649-
- £: Singular values (diagonal matrix)
649+
- : Singular values (diagonal matrix)
650650
- V: Right singular vectors
651651

652652
**Connection to PCA**:
653653
- **Principal components** = columns of V
654-
- **Explained variance** = (singular values)² / (n-1)
655-
- **Transformed data** = U * £
654+
- **Explained variance** = (singular values) / (n-1)
655+
- **Transformed data** = U *
656656

657657
**Advantages of SVD approach**:
658658
1. More numerically stable
@@ -705,7 +705,7 @@ for comp, error in zip(components, errors):
705705
-  Components are interpretable
706706
-  Downstream performance maintained
707707

708-
## >à Examples
708+
## > Examples
709709

710710
### Real-world Example: Image Compression with PCA
711711

@@ -917,7 +917,7 @@ for i, exposure in enumerate(factor_exposures):
917917
print(f" PC{i+1}: {exposure:.3f}")
918918
```
919919

920-
## References
920+
## 📚 References
921921

922922
- **Books:**
923923
- [The Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) by Hastie, Tibshirani, and Friedman - Chapter 14

0 commit comments

Comments
 (0)