Open-Deep-ML · preethamak · Oct 12, 2025 · Oct 12, 2025 · Oct 13, 2025
diff --git a/169. AdamW Optimizer step.py b/169. AdamW Optimizer step.py
@@ -0,0 +1,35 @@
+import numpy as np
+
+def adamw_update(w, g, m, v, t, lr, beta1, beta2, epsilon, weight_decay):
+    """
+    Perform one AdamW optimizer step.
+    Args:
+      w: parameter vector (np.ndarray)
+      g: gradient vector (np.ndarray)
+      m: first moment vector (np.ndarray)
+      v: second moment vector (np.ndarray)
+      t: integer, current time step
+      lr: float, learning rate
+      beta1: float, beta1 parameter
+      beta2: float, beta2 parameter
+      epsilon: float, small constant
+      weight_decay: float, weight decay coefficient
+    Returns:
+      w_new, m_new, v_new
+    """
+
+    # Apply weight decay (decoupled from gradient)
+    w = w - lr * weight_decay * w
+
+    # Update biased first and second moments
+    m = beta1 * m + (1 - beta1) * g
+    v = beta2 * v + (1 - beta2) * (g ** 2)
+
+    # Compute bias-corrected estimates
+    m_hat = m / (1 - beta1 ** t)
+    v_hat = v / (1 - beta2 ** t)
+
+    # Update parameters
+    w = w - lr * m_hat / (np.sqrt(v_hat) + epsilon)
+
+    return w, m, v
diff --git a/build/186.json b/build/186.json
@@ -0,0 +1,46 @@
+{
+  "id": "186",
+  "title": "Gaussian Process for Regression",
+  "difficulty": "medium",
+  "category": "Machine Learning",
+  "video": "",
+  "likes": "0",
+  "dislikes": "0",
+  "contributor": [
+    {
+      "profile_link": "https://github.com/Coder1010ayush",
+      "name": "Ayush"
+    }
+  ],
+  "description": "## Problem\n\nProblem Statement: Task is to implement GaussianProcessRegression class which is a guassian process model for prediction regression problems.",
+  "learn_section": "# **Gaussian Processes (GP): From-Scratch Regression Example**\n\n## **1. What’s a Gaussian Process?**\n\nA **Gaussian Process** defines a distribution over functions $f(\\cdot)$.\nFor any finite set of inputs $( X = {x_i}_{i=1}^n )$, the function values $f(X)$ follow a multivariate normal:\n\n$$\nf(X) \\sim \\mathcal{N}\\big(0,; K(X,X)\\big)\n$$\n\nwhere ( K ) is a **kernel** (covariance) function encoding similarity between inputs.\nWith noisy targets $( y = f(X) + \\varepsilon, \\varepsilon \\sim \\mathcal{N}(0,\\sigma_n^2 I) )$,\nGP regression yields a closed-form posterior predictive mean and variance at new points $( X_* )$.\n\n---\n\n## **2. The Implementation at a Glance**\n\nThe provided code builds a minimal yet complete GP regression stack:\n\n* **Kernels implemented**\n\n  * Radial Basis Function (RBF / Squared Exponential)\n  * Matérn $(( \\nu = 0.5, 1.5, 2.5 ), or general ( \\nu ))$\n  * Periodic\n  * Linear\n  * Rational Quadratic\n\n* **Core GP classes**\n\n  * `_GaussianProcessBase`: kernel selection & covariance matrix computation\n  * `GaussianProcessRegression`:\n\n    * `fit`: $builds ( K )$, does **Cholesky decomposition**, $solves ( \\alpha )$\n    * `predict`: returns posterior mean & variance\n    * `log_marginal_likelihood`: computes GP evidence\n    * `optimize_hyperparameters`: basic optimizer (for RBF hyperparams)\n\n---\n\n## **3. Kernel Cheat-Sheet**\n\nLet $( x, x' \\in \\mathbb{R}^d ), ( r = \\lVert x - x' \\rVert )$.\n\n* **RBF (SE):**\n  $$\n  k_{\\text{RBF}}(x,x') = \\sigma^2 \\exp!\\left(-\\tfrac{1}{2}\\tfrac{r^2}{\\ell^2}\\right)\n  $$\n\n* **Matérn (( \\nu = 1.5 )):**\n  $$\n  k(x,x') = \\Big(1 + \\tfrac{\\sqrt{3},r}{\\ell}\\Big)\\exp!\\Big(-\\tfrac{\\sqrt{3},r}{\\ell}\\Big)\n  $$\n\n* **Periodic:**\n  $$\n  k(x,x') = \\sigma^2 \\exp!\\left(-\\tfrac{2}{\\ell^2}\\sin^2!\\Big(\\tfrac{\\pi r}{p}\\Big)\\right)\n  $$\n\n* **Linear:**\n  $$\n  k(x,x') = \\sigma_b^2 + \\sigma_v^2,x^\\top x'\n  $$\n\n* **Rational Quadratic:**\n  $$\n  k(x,x') = \\sigma^2\\Big(1 + \\tfrac{r^2}{2\\alpha \\ell^2}\\Big)^{-\\alpha}\n  $$\n\n---\n\n## **4. GP Regression Mechanics**\n\n### **Training**\n\n1. Build covariance:\n   $$\n   K = K(X,X) + \\sigma_n^2 I\n   $$\n\n2. Cholesky factorization:\n   $$\n   K = L L^\\top\n   $$\n\n3. Solve ( \\alpha ):\n   $$\n   L L^\\top \\alpha = y\n   $$\n\n### **Prediction**\n\nAt new inputs ( X_* ):\n\n* $( K_* = K(X, X_*) ), ( K_{**} = K(X_*, X_*) )$\n\n* **Mean:**\n  $$\n  \\mu_* = K_*^\\top \\alpha\n  $$\n\n* **Covariance:**\n  $$\n  \\Sigma_* = K_{**} - V^\\top V, \\quad V = L^{-1} K_*\n  $$\n\n### **Model Selection**\n\n* **Log Marginal Likelihood (LML):**\n  $$\n  \\log p(y \\mid X) = -\\tfrac{1}{2} y^\\top \\alpha - \\sum\\nolimits_i \\log L_{ii} - \\tfrac{n}{2}\\log(2\\pi)\n  $$\n\n---\n\n## **5. Worked Example (Linear Kernel)**\n\n```python\nimport numpy as np\ngp = GaussianProcessRegression(kernel='linear',\n                               kernel_params={'sigma_b': 0.0, 'sigma_v': 1.0},\n                               noise=1e-8)\n\nX_train = np.array([[1], [2], [4]])\ny_train = np.array([3, 5, 9])   # y = 2x + 1\ngp.fit(X_train, y_train)\n\nX_test = np.array([[3.0]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")   # -> 7.0000\n```\n\n---\n\n## **6. When to Use GP Regression**\n\n* **Small-to-medium datasets** where uncertainty estimates are valuable\n* Cases requiring **predictive intervals** (not just point predictions)\n* **Nonparametric modeling** with kernel priors\n* Automatic hyperparameter tuning via **marginal likelihood**\n\n---\n\n## **7. Practical Tips**\n\n* Always add **jitter** $10^{-6}$ to the diagonal for numerical stability\n* **Standardize inputs/outputs** before training\n* Be aware: Exact GP has complexity **$\\mathcal{O}(n^3)$** in time and **$\\mathcal{O}(n^2)$** in memory\n* Choose kernels to match problem structure:\n\n  * **RBF:** smooth functions\n  * **Matérn:** rougher functions\n  * **Periodic:** seasonal/cyclical data\n  * **Linear:** global linear trends",
+  "starter_code": "import math  # ---------------------------------------- utf-8 encoding ---------------------------------\n\n# This file contains Gaussian Process implementation.\nimport numpy as np\nimport math\n\n\ndef matern_kernel(x: np.ndarray, x_prime: np.ndarray, length_scale=1.0, nu=1.5):\n    pass\n\n\ndef rbf_kernel(x: np.ndarray, x_prime, sigma=1.0, length_scale=1.0):\n    pass\n\n\ndef periodic_kernel(\n    x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, period=1.0\n):\n    pass\n\n\ndef linear_kernel(x: np.ndarray, x_prime: np.ndarray, sigma_b=1.0, sigma_v=1.0):\n    pass\n\n\ndef rational_quadratic_kernel(\n    x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, alpha=1.0\n):\n    pass\n\n\n# --- BASE CLASS -------------------------------------------------------------\n\n\nclass _GaussianProcessBase:\n    def __init__(self, kernel=\"rbf\", noise=1e-5, kernel_params=None):\n        pass\n\n    def _select_kernel(self, x1, x2):\n        \"\"\"Selects and computes the kernel value for two single data points.\"\"\"\n        pass\n\n    def _compute_covariance(self, X1, X2):\n        \"\"\"\n        Computes the covariance matrix between two sets of points.\n        This method fixes the vectorization bug from the original code.\n        \"\"\"\n        pass\n\n\n# --- REGRESSION MODEL -------------------------------------------------------\nclass GaussianProcessRegression(_GaussianProcessBase):\n    def fit(self, X, y):\n        pass\n\n    def predict(self, X_test, return_std=False):\n        pass\n\n    def log_marginal_likelihood(self):\n        pass\n\n    def optimize_hyperparameters(self):\n        pass",
+  "solution": "# ---------------------------------------- utf-8 encoding ---------------------------------\n# This file contains Gaussian Process implementation.\nimport numpy as np\nimport math\nfrom scipy.spatial.distance import euclidean\nfrom scipy.special import kv as bessel_kv\nfrom scipy.special import gamma\nfrom scipy.linalg import cholesky, solve_triangular\nfrom scipy.optimize import minimize\nfrom scipy.special import expit, softmax\n\n\n# --- KERNEL FUNCTIONS --------------------------------------------------------\ndef matern_kernel(x: np.ndarray, x_prime: np.ndarray, length_scale=1.0, nu=1.5):\n    d = euclidean(x, x_prime)\n    if d == 0:\n        return 1.0  # Covariance with self is 1 before scaling\n    if nu == 0.5:\n        return np.exp(-d / length_scale)\n    elif nu == 1.5:\n        return (1 + np.sqrt(3) * d / length_scale) * np.exp(\n            -np.sqrt(3) * d / length_scale\n        )\n    elif nu == 2.5:\n        return (\n            1 + np.sqrt(5) * d / length_scale + 5 * d**2 / (3 * length_scale**2)\n        ) * np.exp(-np.sqrt(5) * d / length_scale)\n    else:\n        factor = (2 ** (1 - nu)) / gamma(nu)\n        scaled_d = np.sqrt(2 * nu) * d / length_scale\n        return factor * (scaled_d**nu) * bessel_kv(nu, scaled_d)\n\n\ndef rbf_kernel(x: np.ndarray, x_prime, sigma=1.0, length_scale=1.0):\n    # This is a squared exponential kernel\n\n    # Calculate the squared euclidean distance\n    sq_norm = np.linalg.norm(x - x_prime) ** 2\n\n    # Correctly implement the formula\n    return sigma**2 * np.exp(-sq_norm / (2 * length_scale**2))\n\n\ndef periodic_kernel(\n    x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, period=1.0\n):\n    return sigma**2 * np.exp(\n        -2 * np.sin(np.pi * np.linalg.norm(x - x_prime) / period) ** 2 / length_scale**2\n    )\n\n\ndef linear_kernel(x: np.ndarray, x_prime: np.ndarray, sigma_b=1.0, sigma_v=1.0):\n    return sigma_b**2 + sigma_v**2 * np.dot(x, x_prime)\n\n\ndef rational_quadratic_kernel(\n    x: np.ndarray, x_prime: np.ndarray, sigma=1.0, length_scale=1.0, alpha=1.0\n):\n    return sigma**2 * (\n        1 + np.linalg.norm(x - x_prime) ** 2 / (2 * alpha * length_scale**2)\n    ) ** (-alpha)\n\n\n# --- BASE CLASS -------------------------------------------------------------\n\n\nclass _GaussianProcessBase:\n    def __init__(self, kernel=\"rbf\", noise=1e-5, kernel_params=None):\n        self.kernel_name = kernel\n        self.noise = noise\n        self.kernel_params = kernel_params if kernel_params else {}\n        self.X_train = None\n        self.y_train = None\n        self.K = None\n\n    def _select_kernel(self, x1, x2):\n        \"\"\"Selects and computes the kernel value for two single data points.\"\"\"\n        if self.kernel_name == \"rbf\":\n            return rbf_kernel(x1, x2, **self.kernel_params)\n        elif self.kernel_name == \"matern\":\n            return matern_kernel(x1, x2, **self.kernel_params)\n        elif self.kernel_name == \"periodic\":\n            return periodic_kernel(x1, x2, **self.kernel_params)\n        elif self.kernel_name == \"linear\":\n            return linear_kernel(x1, x2, **self.kernel_params)\n        elif self.kernel_name == \"rational_quadratic\":\n            return rational_quadratic_kernel(x1, x2, **self.kernel_params)\n        else:\n            raise ValueError(\n                \"Unsupported kernel. Choose from ['rbf', 'matern', 'periodic', 'linear', 'rational_quadratic'].\"\n            )\n\n    def _compute_covariance(self, X1, X2):\n        \"\"\"\n        Computes the covariance matrix between two sets of points.\n        This method fixes the vectorization bug from the original code.\n        \"\"\"\n        # Ensuring X1 and X2 are 2D arrays\n        X1 = np.atleast_2d(X1)\n        X2 = np.atleast_2d(X2)\n\n        n1, _ = X1.shape\n        n2, _ = X2.shape\n        K = np.zeros((n1, n2))\n        for i in range(n1):\n            for j in range(n2):\n                K[i, j] = self._select_kernel(X1[i], X2[j])\n        return K\n\n\n# --- REGRESSION MODEL -------------------------------------------------------\nclass GaussianProcessRegression(_GaussianProcessBase):\n    def fit(self, X, y):\n        self.X_train = np.asarray(X)\n        self.y_train = np.asarray(y)\n        self.K = self._compute_covariance(\n            self.X_train, self.X_train\n        ) + self.noise * np.eye(len(self.X_train))\n\n        # Compute Cholesky decomposition for stable inversion\n        self.L = cholesky(self.K, lower=True)\n        # alpha = K_inv * y\n        self.alpha = solve_triangular(\n            self.L.T, solve_triangular(self.L, self.y_train, lower=True)\n        )\n\n    def predict(self, X_test, return_std=False):\n        X_test = np.atleast_2d(X_test)\n        K_s = self._compute_covariance(self.X_train, X_test)\n        K_ss = self._compute_covariance(X_test, X_test)\n\n        # Compute predictive mean\n        mu = K_s.T @ self.alpha\n\n        # Compute predictive variance\n        v = solve_triangular(self.L, K_s, lower=True)\n        cov = K_ss - v.T @ v\n\n        if return_std:\n            return mu, np.sqrt(np.diag(cov))\n        return mu\n\n    def log_marginal_likelihood(self):\n        return (\n            -0.5 * (self.y_train.T @ self.alpha)\n            - np.sum(np.log(np.diag(self.L)))\n            - len(self.X_train) / 2 * np.log(2 * np.pi)\n        )\n\n    def optimize_hyperparameters(self):\n        # NOTE: This is a simplified optimizer for 'rbf' kernel's params.\n        def objective(params):\n            self.kernel_params = {\n                \"length_scale\": np.exp(params[0]),\n                \"sigma\": np.exp(params[1]),\n            }\n            self.fit(self.X_train, self.y_train)\n            return -self.log_marginal_likelihood()\n\n        init_params = np.log(\n            [\n                self.kernel_params.get(\"length_scale\", 1.0),\n                self.kernel_params.get(\"sigma\", 1.0),\n            ]\n        )\n        res = minimize(\n            objective, init_params, method=\"L-BFGS-B\", bounds=[(-5, 5), (-5, 5)]\n        )\n\n        self.kernel_params = {\n            \"length_scale\": np.exp(res.x[0]),\n            \"sigma\": np.exp(res.x[1]),\n        }\n        # Re-fit with optimal hyperparameters\n        self.fit(self.X_train, self.y_train)\n\n\nif __name__ == \"__main__\":\n    gp = GaussianProcessRegression(\n        kernel=\"linear\", kernel_params={\"sigma_b\": 0.0, \"sigma_v\": 1.0}, noise=1e-8\n    )\n    X_train = np.array([[1], [2], [4]])\n    y_train = np.array([3, 5, 9])\n    gp.fit(X_train, y_train)\n    X_test = np.array([[3.0]])\n    mu = gp.predict(X_test)",
+  "example": {
+    "input": "import numpy as np\ngp = GaussianProcessRegression(kernel='linear', kernel_params={'sigma_b': 0.0, 'sigma_v': 1.0}, noise=1e-8)\nX_train = np.array([[1], [2], [4]])\ny_train = np.array([3, 5, 9])\ngp.fit(X_train, y_train)\nX_test = np.array([[3.0]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
+    "output": "7.0000",
+    "reasoning": "A Gaussian Process with a linear kernel is trained on perfectly linear data that follows the function y = 2x + 1. When asked to predict the value at x=3, the model perfectly interpolates the linear function it has learned, resulting in a prediction of 2*3 + 1 = 7. The near-zero noise ensures the prediction is exact."
+  },
+  "test_cases": [
+    {
+      "test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.0}, noise=1e-8)\nX_train = np.array([[0], [2.5], [5.0], [7.5], [10.0]])\ny_train = np.sin(X_train).ravel()\ngp.fit(X_train, y_train)\nX_test = np.array([[1.25]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
+      "expected_output": "0.2814"
+    },
+    {
+      "test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.0}, noise=1e-8)\nX_train = np.array([[0], [2.5], [5.0], [7.5], [10.0]])\ny_train = np.sin(X_train).ravel()\ngp.fit(X_train, y_train)\nX_test = np.array([[1.25]])\nmu, std = gp.predict(X_test, return_std=True)\nprint(f\"mu={mu[0]:.4f}, std={std[0]:.4f}\")",
+      "expected_output": "mu=0.2814, std=0.7734"
+    },
+    {
+      "test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.0}, noise=1e-8)\nX_train = np.array([[0], [2.5], [5.0]])\ny_train = np.array([1.0, 3.0, 1.5])\ngp.fit(X_train, y_train)\nX_test = np.array([[2.5]])\nmu, std = gp.predict(X_test, return_std=True)\nprint(f\"mu={mu[0]:.4f}, std={std[0]:.4f}\")",
+      "expected_output": "mu=3.0000, std=0.0001"
+    },
+    {
+      "test": "import numpy as np\ngp = GaussianProcessRegression(kernel='linear', kernel_params={'sigma_b': 0.1, 'sigma_v': 1.0}, noise=1e-8)\nX_train = np.array([[1], [2], [4]])\ny_train = np.array([3, 5, 9])\ngp.fit(X_train, y_train)\nX_test = np.array([[3.0]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
+      "expected_output": "7.0000"
+    },
+    {
+      "test": "import numpy as np\ngp = GaussianProcessRegression(kernel='rbf', kernel_params={'sigma': 1.0, 'length_scale': 1.5}, noise=1e-8)\nX_train = np.array([[1, 2], [3, 4], [5, 1]])\ny_train = np.sum(X_train, axis=1)\ngp.fit(X_train, y_train)\nX_test = np.array([[2, 3]])\nmu = gp.predict(X_test)\nprint(f\"{mu[0]:.4f}\")",
+      "expected_output": "5.5553"
+    }
+  ]
+}