datascienceinterviews
diff --git a/‎docs/Machine-Learning/Normalization Regularisation.md‎
Lines changed: 66 additions & 65 deletions b/‎docs/Machine-Learning/Normalization Regularisation.md‎
Lines changed: 66 additions & 65 deletions
diff --git a/‎docs/Machine-Learning/Overfitting, Underfitting.md‎
Lines changed: 33 additions & 33 deletions b/‎docs/Machine-Learning/Overfitting, Underfitting.md‎
Lines changed: 33 additions & 33 deletions
diff --git a/‎docs/Machine-Learning/PCA.md‎
Lines changed: 20 additions & 20 deletions b/‎docs/Machine-Learning/PCA.md‎
Lines changed: 20 additions & 20 deletions
@@ -4,13 +4,13 @@ description: Comprehensive guide to Overfitting and Underfitting with mathematic
 comments: true
 ---
 
-# =Ø Overfitting and Underfitting
+# 🎯 Overfitting and Underfitting
 
 Overfitting and Underfitting are fundamental concepts in machine learning that describe how well a model generalizes to unseen data - the central challenge in building reliable predictive models.
 
 **Resources:** [Scikit-learn Model Selection](https://scikit-learn.org/stable/modules/model_evaluation.html) | [ESL Chapter 7](https://web.stanford.edu/~hastie/ElemStatLearn/) | [Bias-Variance Tradeoff Paper](https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote12.html)
 
-##  Summary
+## 📊 Summary
 
 **Overfitting** occurs when a model learns the training data too well, capturing noise and specific patterns that don't generalize to new data. **Underfitting** happens when a model is too simple to capture the underlying patterns in the data.
 
@@ -42,7 +42,7 @@ Overfitting and Underfitting are fundamental concepts in machine learning that d
 - **Regularization**: Primary technique to prevent overfitting
 - **Cross-Validation**: Method to detect and measure fitting issues
 
-## >à Intuition
+## 🧠 Intuition
 
 ### How Overfitting and Underfitting Work
 
@@ -86,7 +86,7 @@ Training and validation error as functions of:
 - **Sample size**: $\text{Error}(n)$
 - **Model complexity**: $\text{Error}(\lambda)$ where $\lambda$ controls complexity
 
-## =" Implementation using Libraries
+## 🛠️ Implementation using Libraries
 
 ### Scikit-learn Implementation
 
@@ -148,7 +148,7 @@ for i, (name, model) in enumerate(models.items(), 1):
     plt.scatter(X_train, y_train, alpha=0.6, label='Training Data')
     plt.scatter(X_test, y_test, alpha=0.6, label='Test Data')
     plt.plot(X_plot, results[name]['predictions'], 'r-', linewidth=2)
-    plt.title(f'{name}\nTrain R²: {train_score:.3f}, Test R²: {test_score:.3f}')
+    plt.title(f'{name}\nTrain R²: {train_score:.3f}, Test R²: {test_score:.3f}')
     plt.legend()
     plt.xlabel('X')
     plt.ylabel('y')
@@ -160,7 +160,7 @@ plt.show()
 print("Model Performance Comparison:")
 print("-" * 50)
 for name, result in results.items():
-    print(f"{name:25s} | Train R²: {result['train_score']:.3f} | Test R²: {result['test_score']:.3f}")
+    print(f"{name:25s} | Train R²: {result['train_score']:.3f} | Test R²: {result['test_score']:.3f}")
 ```
 
 ### Learning Curves Analysis
@@ -233,7 +233,7 @@ alpha_range = np.logspace(-4, 2, 20)
 plot_validation_curve(ridge_model, X, y, 'ridge__alpha', alpha_range, 'Ridge Alpha')
 ```
 
-##  From Scratch Implementation
+## 🔧 From Scratch Implementation
 
 ### Simple Overfitting Detection Framework
 
@@ -290,7 +290,7 @@ class FittingAnalyzer:
         I = np.eye(X.shape[1])
         I[0, 0] = 0  # Don't regularize intercept
 
-        # Ridge regression solution: (X^T X + ±I)^(-1) X^T y
+        # Ridge regression solution: (X^T X + λI)^(-1) X^T y
         coefficients = np.linalg.solve(X.T @ X + alpha * I, X.T @ y)
 
         return coefficients
@@ -304,7 +304,7 @@ class FittingAnalyzer:
         return np.mean((y_true - y_pred) ** 2)
 
     def r2_score(self, y_true: np.ndarray, y_pred: np.ndarray) -> float:
-        """Calculate R² score"""
+        """Calculate R² score"""
         ss_tot = np.sum((y_true - np.mean(y_true)) ** 2)
         ss_res = np.sum((y_true - y_pred) ** 2)
         return 1 - (ss_res / ss_tot)
@@ -387,14 +387,14 @@ class FittingAnalyzer:
         ax1.legend()
         ax1.grid(True, alpha=0.3)
 
-        # Plot R² vs complexity
+        # Plot R² vs complexity
         ax2.plot(results['degrees'], results['train_r2s'], 'o-', 
-                label='Training R²', color='blue')
+                label='Training R²', color='blue')
         ax2.plot(results['degrees'], results['test_r2s'], 'o-', 
-                label='Validation R²', color='red')
+                label='Validation R²', color='red')
         ax2.set_xlabel('Polynomial Degree (Model Complexity)')
-        ax2.set_ylabel('R² Score')
-        ax2.set_title('R² vs Model Complexity')
+        ax2.set_ylabel('R² Score')
+        ax2.set_title('R² vs Model Complexity')
         ax2.legend()
         ax2.grid(True, alpha=0.3)
 
@@ -433,7 +433,7 @@ if __name__ == "__main__":
     optimal_degree = results['degrees'][optimal_idx]
 
     print(f"\nOptimal polynomial degree: {optimal_degree}")
-    print(f"Test R² at optimal complexity: {results['test_r2s'][optimal_idx]:.3f}")
+    print(f"Test R² at optimal complexity: {results['test_r2s'][optimal_idx]:.3f}")
 
     # Detect fitting issues for different complexities
     for i, degree in enumerate([1, optimal_degree, 15]):
@@ -445,7 +445,7 @@ if __name__ == "__main__":
             print(f"Degree {degree}: {status}")
 ```
 
-##   Assumptions and Limitations
+## ⚠️ Assumptions and Limitations
 
 ### Overfitting Assumptions and Limitations
 
@@ -492,7 +492,7 @@ if __name__ == "__main__":
 - **Advantages**: Interpretable individual models
 - **Disadvantages**: May still overfit collectively
 
-## =¡ Interview Questions
+## ❓ Interview Questions
 
 ??? question "1. What is the fundamental difference between overfitting and underfitting? How do they relate to the bias-variance tradeoff?"
     **Answer:**
@@ -502,7 +502,7 @@ if __name__ == "__main__":
       - Overfitting: Low training error, high test error (high variance)
       - Underfitting: High training error, high test error (high bias)
       - Optimal model: Balance between bias and variance
-    - **Total Error** = Bias² + Variance + Irreducible Error
+    - **Total Error** = Bias² + Variance + Irreducible Error
     - **Goal**: Find the sweet spot that minimizes total expected error
 
 ??? question "2. How would you detect overfitting in a machine learning model? Provide multiple approaches."
@@ -592,8 +592,8 @@ if __name__ == "__main__":
       - Flexibility of the model to fit different patterns
       - Measured by VC dimension, degrees of freedom, etc.
     - **Relationship to Overfitting**:
-      - Higher complexity  Higher risk of overfitting
-      - Lower complexity  Higher risk of underfitting
+      - Higher complexity → Higher risk of overfitting
+      - Lower complexity → Higher risk of underfitting
       - Sweet spot depends on data size and problem complexity
     - **Choosing Right Complexity**:
       - **Validation curves**: Plot performance vs complexity parameter
@@ -697,7 +697,7 @@ if __name__ == "__main__":
       - For deep learning, often need much more
       - Consider domain complexity when sizing models
 
-## >à Examples
+## 📝 Examples
 
 ### Real-World Example: House Price Prediction
 
@@ -740,10 +740,10 @@ train_score_simple = simple_model.score(X_train[['Size_sqft']], y_train)
 test_score_simple = simple_model.score(X_test[['Size_sqft']], y_test)
 
 print(f"Simple Model (Size only):")
-print(f"Training R²: {train_score_simple:.3f}")
-print(f"Test R²: {test_score_simple:.3f}")
+print(f"Training R²: {train_score_simple:.3f}")
+print(f"Test R²: {test_score_simple:.3f}")
 print(f"Performance Gap: {abs(train_score_simple - test_score_simple):.3f}")
-print("Analysis: Both scores are low  UNDERFITTING")
+print("Analysis: Both scores are low → UNDERFITTING")
 
 # Model 2: Good fit
 print("\n2. GOOD FIT EXAMPLE:")
@@ -756,10 +756,10 @@ train_score_good = good_model.score(X_train, y_train)
 test_score_good = good_model.score(X_test, y_test)
 
 print(f"Ridge Model (All features):")
-print(f"Training R²: {train_score_good:.3f}")
-print(f"Test R²: {test_score_good:.3f}")
+print(f"Training R²: {train_score_good:.3f}")
+print(f"Test R²: {test_score_good:.3f}")
 print(f"Performance Gap: {abs(train_score_good - test_score_good):.3f}")
-print("Analysis: Both scores reasonable, small gap  GOOD FIT")
+print("Analysis: Both scores reasonable, small gap → GOOD FIT")
 
 # Model 3: Overfitting (too complex)
 print("\n3. OVERFITTING EXAMPLE:")
@@ -777,10 +777,10 @@ train_score_overfit = overfit_model.score(X_train, y_train)
 test_score_overfit = overfit_model.score(X_test, y_test)
 
 print(f"Polynomial Model (degree=8):")
-print(f"Training R²: {train_score_overfit:.3f}")
-print(f"Test R²: {test_score_overfit:.3f}")
+print(f"Training R²: {train_score_overfit:.3f}")
+print(f"Test R²: {test_score_overfit:.3f}")
 print(f"Performance Gap: {abs(train_score_overfit - test_score_overfit):.3f}")
-print("Analysis: High training score, low test score  OVERFITTING")
+print("Analysis: High training score, low test score → OVERFITTING")
 
 # Cross-validation analysis
 print("\n4. CROSS-VALIDATION ANALYSIS:")
@@ -798,7 +798,7 @@ for name, model in models.items():
     else:
         cv_scores = cross_val_score(model, X_train, y_train, cv=5, scoring='r2')
 
-    print(f"{name:12s}: Mean CV R² = {cv_scores.mean():.3f} (±{cv_scores.std():.3f})")
+    print(f"{name:12s}: Mean CV R² = {cv_scores.mean():.3f} (±{cv_scores.std():.3f})")
 
 # Learning curves visualization
 def plot_learning_curve_example():
@@ -871,12 +871,12 @@ print("" Feature engineering might help more than complex models")
 - **Overfitting**: High-degree polynomial model shows perfect training performance but poor test performance
 - **Learning Curves**: Reveal the characteristic patterns of each fitting scenario
 
-## =Ú References
+## 📚 References
 
 1. **Books:**
    - [The Elements of Statistical Learning - Hastie, Tibshirani, Friedman](https://web.stanford.edu/~hastie/ElemStatLearn/)
    - [Pattern Recognition and Machine Learning - Bishop](https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf)
-   - [Hands-On Machine Learning - Aurélien Géron](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)
+   - [Hands-On Machine Learning - Aurélien Géron](https://www.oreilly.com/library/view/hands-on-machine-learning/9781492032632/)
 
 2. **Papers:**
    - [A Few Useful Things to Know About Machine Learning - Domingos](https://homes.cs.washington.edu/~pedrod/papers/cacm12.pdf)
 
@@ -4,7 +4,7 @@ description: Comprehensive guide to Principal Component Analysis with mathematic
 comments: true
 ---
 
-# =Ø Principal Component Analysis (PCA)
+# 🎯 Principal Component Analysis (PCA)
 
 PCA is a fundamental dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving maximum variance, making it invaluable for data visualization, noise reduction, and feature extraction.
 
@@ -38,7 +38,7 @@ Principal Component Analysis (PCA) is an unsupervised linear dimensionality redu
 - **Sparse PCA**: Incorporates sparsity constraints on components
 - **Incremental PCA**: For large datasets that don't fit in memory
 
-## >à Intuition
+## >� Intuition
 
 ### How PCA Works
 
@@ -242,7 +242,7 @@ def compare_with_without_pca(X, y, n_components=2):
 compare_with_without_pca(X_scaled, y, n_components=2)
 ```
 
-##  From Scratch Implementation
+## � From Scratch Implementation
 
 ```python
 import numpy as np
@@ -458,7 +458,7 @@ for comp, error in zip(components, errors):
     print(f"{comp} components: {error:.6f}")
 ```
 
-##   Assumptions and Limitations
+## � Assumptions and Limitations
 
 ### Key Assumptions
 
@@ -506,15 +506,15 @@ for comp, error in zip(components, errors):
 - With very sparse data (consider specialized sparse PCA)
 - When you need exactly interpretable features for regulatory compliance
 
-## =¡ Interview Questions
+## ❓ Interview Questions
 
 ??? question "What is the mathematical intuition behind PCA and how does it work?"
 
     **Answer:** PCA finds the directions (principal components) in the data that capture the maximum variance. Mathematically, it performs eigenvalue decomposition on the covariance matrix:
 
     1. **Center the data**: Subtract the mean from each feature
     2. **Compute covariance matrix**: C = (X^T * X) / (n-1)
-    3. **Find eigenvalues and eigenvectors**: C*v = »*v
+    3. **Find eigenvalues and eigenvectors**: C*v = �*v
     4. **Sort by eigenvalues**: Largest eigenvalues correspond to directions with most variance
     5. **Project data**: Transform original data onto selected eigenvectors
 
@@ -530,7 +530,7 @@ for comp, error in zip(components, errors):
 
     **Example**: Without standardization, if you have height (cm, ~170) and weight (kg, ~70), height will dominate simply due to larger numerical values, not because it's more important.
 
-    **Solution**: Use z-score standardization: (x - ¼) / Ã for each feature.
+    **Solution**: Use z-score standardization: (x - �) / � for each feature.
 
 ??? question "How do you choose the optimal number of principal components?"
 
@@ -583,20 +583,20 @@ for comp, error in zip(components, errors):
 
     **Example interpretation**:
     ```
-    PC1 loadings: [0.8 height, 0.7 weight, 0.1 age]  "Physical size factor"
-    PC2 loadings: [0.2 height, -0.1 weight, 0.9 age]  "Age factor"
+    PC1 loadings: [0.8 height, 0.7 weight, 0.1 age] � "Physical size factor"
+    PC2 loadings: [0.2 height, -0.1 weight, 0.9 age] � "Age factor"
     ```
 
 ??? question "What are the limitations of PCA and when should you not use it?"
 
     **Answer:** Major limitations and alternatives:
 
     **Limitations**:
-    1. **Linear only**: Cannot capture non-linear relationships  Use Kernel PCA, t-SNE
-    2. **Variance ` Importance**: High variance doesn't always mean importance  Use domain knowledge
-    3. **Loss of interpretability**: PCs are combinations of original features  Use Sparse PCA, Factor Analysis
-    4. **Outlier sensitive**: Outliers can skew components  Use Robust PCA
-    5. **No class consideration**: Doesn't consider target variable  Use LDA for classification
+    1. **Linear only**: Cannot capture non-linear relationships � Use Kernel PCA, t-SNE
+    2. **Variance ` Importance**: High variance doesn't always mean importance � Use domain knowledge
+    3. **Loss of interpretability**: PCs are combinations of original features � Use Sparse PCA, Factor Analysis
+    4. **Outlier sensitive**: Outliers can skew components � Use Robust PCA
+    5. **No class consideration**: Doesn't consider target variable � Use LDA for classification
 
     **When NOT to use PCA**:
     - Categorical data without proper encoding
@@ -642,17 +642,17 @@ for comp, error in zip(components, errors):
 
     **SVD decomposition** of centered data matrix X:
     ```
-    X = U * £ * V^T
+    X = U * � * V^T
     ```
     Where:
     - U: Left singular vectors
-    - £: Singular values (diagonal matrix)
+    - �: Singular values (diagonal matrix)
     - V: Right singular vectors
 
     **Connection to PCA**:
     - **Principal components** = columns of V
-    - **Explained variance** = (singular values)² / (n-1)
-    - **Transformed data** = U * £
+    - **Explained variance** = (singular values)� / (n-1)
+    - **Transformed data** = U * �
 
     **Advantages of SVD approach**:
     1. More numerically stable
@@ -705,7 +705,7 @@ for comp, error in zip(components, errors):
     -  Components are interpretable
     -  Downstream performance maintained
 
-## >à Examples
+## >� Examples
 
 ### Real-world Example: Image Compression with PCA
 
@@ -917,7 +917,7 @@ for i, exposure in enumerate(factor_exposures):
     print(f"  PC{i+1}: {exposure:.3f}")
 ```
 
-## =Ú References
+## 📚 References
 
 - **Books:**
   - [The Elements of Statistical Learning](https://web.stanford.edu/~hastie/ElemStatLearn/) by Hastie, Tibshirani, and Friedman - Chapter 14