Skip to content

Commit 57e1293

Browse files
committed
Added Q: Linear Regression using OLS
1 parent b903254 commit 57e1293

File tree

7 files changed

+142
-0
lines changed

7 files changed

+142
-0
lines changed
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
### Problem
2+
3+
Implement simple linear regression using Ordinary Least Squares (OLS). Given 1D inputs `X` and targets `y`, compute the slope `m`, intercept `b`, and use them to predict on a provided test input.
4+
5+
You should implement the closed-form OLS solution:
6+
7+
$$
8+
m = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2},\quad
9+
b = \bar{y} - m\,\bar{x}.
10+
$$
11+
12+
Then, given `X_test`, output predictions `y_pred = m * X_test + b`.
13+
14+
Return values: `m`, `b`, and `y_pred`.
15+
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"input": "X_train = [1, 2, 3]; y_train = [2, 2.5, 3.5]; X_test = [4]",
3+
"output": "m = 0.75, b = 1.166667, y_pred = [4.166667]",
4+
"reasoning": "Using OLS: m = Cov(X,Y)/Var(X) = 1.5/2 = 0.75 and b = y_bar - m*x_bar = (8/3) - 0.75*2 = 1.166667. Prediction for X_test=[4] is 0.75*4 + 1.166667 = 4.166667."
5+
}
6+
7+
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Learning: Ordinary Least Squares for Simple Linear Regression
2+
3+
### Idea and formula
4+
- **Goal**: Fit a line $y = m x + b$ that minimizes the sum of squared errors.
5+
- **Closed-form OLS solution** for 1D features:
6+
7+
$$
8+
m = \frac{\sum_i (x_i - \bar{x})(y_i - \bar{y})}{\sum_i (x_i - \bar{x})^2},\quad
9+
b = \bar{y} - m\,\bar{x}
10+
$$
11+
12+
### Intuition
13+
- The numerator is the sample covariance between $x$ and $y$; the denominator is the sample variance of $x$.
14+
- So $m = \operatorname{Cov}(x,y) / \operatorname{Var}(x)$ measures how much $y$ changes per unit change in $x$.
15+
- The intercept $b$ anchors the best-fit line so it passes through the mean point $(\bar{x},\bar{y})$.
16+
17+
### Algorithm steps
18+
1. Compute $\bar{x}$ and $\bar{y}$.
19+
2. Accumulate numerator $\sum_i (x_i-\bar{x})(y_i-\bar{y})$ and denominator $\sum_i (x_i-\bar{x})^2$.
20+
3. Compute $m = \text{numerator}/\text{denominator}$ (guard against zero denominator).
21+
4. Compute $b = \bar{y} - m\,\bar{x}$.
22+
5. Predict: $\hat{y} = m\,x + b$ for any new $x$.
23+
24+
### Edge cases and tips
25+
- If all $x_i$ are identical, $\operatorname{Var}(x)=0$ and the slope is undefined. In practice, return $m=0$ and $b=\bar{y}$ or raise an error.
26+
- Centering data helps numerical stability but is not required for the closed form.
27+
- Outliers can strongly influence OLS; consider robust alternatives if needed.
28+
29+
### Worked example
30+
Given $X = [1,2,3]$ and $y = [2,2.5,3.5]$:
31+
32+
- $\bar{x} = 2$, $\bar{y} = 8/3$.
33+
- $\sum (x_i-\bar{x})(y_i-\bar{y}) = (1-2)(2-8/3) + (2-2)(2.5-8/3) + (3-2)(3.5-8/3) = 1.5$
34+
- $\sum (x_i-\bar{x})^2 = (1-2)^2 + (2-2)^2 + (3-2)^2 = 2$
35+
- $m = 1.5/2 = 0.75$
36+
- $b = \bar{y} - m\,\bar{x} = 8/3 - 0.75\cdot 2 = 1.166666\ldots$
37+
38+
Prediction for $X_{test} = [4]$: $y_{pred} = 0.75\cdot 4 + 1.1666\ldots = 4.1666\ldots$
39+
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
{
2+
"id": "186",
3+
"title": "Linear Regression via Ordinary Least Squares (OLS)",
4+
"difficulty": "hard",
5+
"category": "Machine Learning",
6+
"video": "",
7+
"likes": "0",
8+
"dislikes": "0",
9+
"contributor": [
10+
{
11+
"profile_link": "https://github.com/Jeet009",
12+
"name": "Jeet Mukherjee"
13+
}
14+
]
15+
}
16+
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
from typing import List, Tuple
2+
3+
4+
def fit_and_predict(X_train: List[float], y_train: List[float], X_test: List[float]) -> Tuple[float, float, List[float]]:
5+
n = len(X_train)
6+
x_mean = sum(X_train) / n
7+
y_mean = sum(y_train) / n
8+
9+
num = 0.0
10+
den = 0.0
11+
for i in range(n):
12+
dx = X_train[i] - x_mean
13+
dy = y_train[i] - y_mean
14+
num += dx * dy
15+
den += dx * dx
16+
17+
m = num / den if den != 0 else 0.0
18+
b = y_mean - m * x_mean
19+
20+
y_pred = [m * x + b for x in X_test]
21+
return m, b, y_pred
22+
23+
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
from typing import List, Tuple
2+
3+
4+
def fit_and_predict(X_train: List[float], y_train: List[float], X_test: List[float]) -> Tuple[float, float, List[float]]:
5+
"""
6+
Implement simple linear regression (OLS) to compute slope m, intercept b,
7+
and predictions on X_test.
8+
9+
Returns (m, b, y_pred).
10+
"""
11+
# Your code here
12+
pass
13+
14+
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[
2+
{
3+
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([1,2,3],[2,2.5,3.5],[4]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
4+
"expected_output": "0.75 1.166667 [4.166667]"
5+
},
6+
{
7+
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([0,1,2,3],[1,3,5,7],[4,5]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
8+
"expected_output": "2 1 [9, 11]"
9+
},
10+
{
11+
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([0,1,2],[5,2,-1],[3]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
12+
"expected_output": "-3 5 [-4]"
13+
},
14+
{
15+
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([2,2,2],[1,4,7],[10]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
16+
"expected_output": "0.0 4.0 [4.0]"
17+
},
18+
{
19+
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([1,2,3,4],[1.1,1.9,3.05,3.9],[5]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
20+
"expected_output": "0.955 0.1 [4.875]"
21+
},
22+
{
23+
"test": "from questions.186_linear_regression_ordinary_least_squares.solution import fit_and_predict; m,b,y=fit_and_predict([3],[7],[10]); print(round(m,6), round(b,6), [round(v,6) for v in y])",
24+
"expected_output": "0.0 7.0 [7.0]"
25+
}
26+
]
27+
28+

0 commit comments

Comments
 (0)