Skip to content

Commit 12c00ff

Browse files
committed
Edits for pausing content creation
1 parent b0dd401 commit 12c00ff

10 files changed

+101
-23
lines changed

_includes/footer/custom.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
<!-- start custom footer snippets -->
22

3-
<p>Open to Work <i class="fa fa-briefcase" aria-hidden="true"></i></p>
3+
<p>Open to Networking <i class="fa fa-briefcase" aria-hidden="true"></i></p>
44
<!-- end custom footer snippets -->

_posts/2025-02-22-newsvendor.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ If the quantity ordered is also continuous, like 3.42 gallons, then there is a p
8787

8888
$$F(q^*)=\frac{c_s-c_p}{c_s+c_h}$$
8989

90+
To prove the critical ratio is the same, you would set the derivative of the cost function equal to zero; however, computing this derivative will require Leibniz's Rule.
9091

9192
## Example
9293

_posts/2025-03-05-estimates.md

Lines changed: 13 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -146,33 +146,31 @@ $$ f(x; \lambda) = \lambda e^{-\lambda x}, \quad x \geq 0 $$
146146

147147
The loglikelihood function and derivative follow.
148148

149-
$$l(\lambda) = \sum_{i=1}^{n} \log(\lambda e^{-\lambda x}) = n \log \lambda - \lambda \sum_{i=1}^n x_i$$
149+
$$l(\lambda) = \sum_{i=1}^{n} \log(\lambda e^{-\lambda x}) = n \log \lambda - \lambda \sum_{i=1}^n x_i$$ \\
150150
$$ \frac{dl}{d\lambda} = \frac{n}{\lambda} - \sum_{i=1}^{n} x_i $$
151151

152-
The MLE has derivative equal to zero
153-
$$\frac{n}{\lambda} - \sum_{i=1}^{n} x_i = 0 $$
154-
$$ \hat{\lambda} = \frac{n}{\sum_{i=1}^{n} x_i} $$
152+
The MLE has derivative equal to zero \\
153+
$$\frac{n}{\lambda} - \sum_{i=1}^{n} x_i = 0 $$ \\
154+
$$ \hat{\lambda} = \frac{n}{\sum_{i=1}^{n} x_i} $$ \\
155155

156156
## Normal
157157

158158
For a normal distribution with mean $\mu$ and variance $\sigma^2$, the pdf is
159159
$$ f(x; \mu, \sigma^2) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x - \mu)^2}{2\sigma^2}} $$
160160

161-
The likelihood is
161+
The likelihood is \\
162162
$$ L(\mu, \sigma^2) = \prod_{i=1}^{n} \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x_i - \mu)^2}{2\sigma^2}} $$
163163

164-
The loglikelihood is
164+
The loglikelihood is \\
165165
$$ l(\mu, \sigma^2) = -\frac{n}{2} \log(2\pi\sigma^2) - \frac{1}{2\sigma^2} \sum_{i=1}^{n} (x_i - \mu)^2 $$
166166

167-
To find the MLEs, we take the partial derivatives of $l$ with respect to $\mu$ and $\sigma^2$ and set them equal to zero. For $\mu$
167+
To find the MLEs, we take the partial derivatives of $l$ with respect to $\mu$ and $\sigma^2$ and set them equal to zero. For $\mu$ \\
168+
$$ \frac{\partial l}{\partial \mu} = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu) = 0 $$ \\
169+
$$ \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i $$ \\
168170

169-
$$ \frac{\partial l}{\partial \mu} = \frac{1}{\sigma^2} \sum_{i=1}^{n} (x_i - \mu) = 0 $$
170-
$$ \hat{\mu} = \frac{1}{n} \sum_{i=1}^{n} x_i $$
171-
172-
For $\sigma^2$
173-
174-
$$ \frac{\partial l}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2 = 0 $$
175-
$$ \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{\mu})^2 $$
171+
For $\sigma^2$ \\
172+
$$ \frac{\partial l}{\partial \sigma^2} = -\frac{n}{2\sigma^2} + \frac{1}{2\sigma^4} \sum_{i=1}^{n} (x_i - \mu)^2 = 0 $$ \\
173+
$$ \hat{\sigma}^2 = \frac{1}{n} \sum_{i=1}^{n} (x_i - \hat{\mu})^2 $$ \\
176174

177175
For Exponential and Normal, the maximum likelihood estimates are the same as the method of moments estimates.
178176
That's a good illustration of why method of moments is a good rule of thumb for simple distributions.
@@ -230,7 +228,7 @@ The Fisher Information quantifies the amount of information that an observable r
230228
Let $X=(X_1,X_2,...,X_N)$ be a vector of i.i.d. random variables, $T(X)$ be a transformation of $X$, and $x$ denote an outcome/value of $X$. Then $X_i$ has pmf/pdf $f(x_i;\theta)$ with parameter $\theta$ and $X$ has joint pmf/pdf $f(x;\theta) $ $=$ $f(x_1,x_2,...,x_n; \theta) $ $ =$ $\prod_{i=1}^n f(x_i; \theta)$.
231229
An estimate for $\theta$ calculated from a data sample is not necessarily a sufficient statistic. However, any efficient estimate for $\theta$ must be a function of a sufficient statistic $T(X)$. Definition: $T(X)$ is a sufficient statistic if the probability distribution of $X$ given a value for $T(X)$ is constant with respect to $\theta$. In math notation, $f(x \| T(X)=T(x))$ is not a function of $\theta$. In other words, the statistic $T(X)$ provides as much information about $\theta$ as the entire data sample $X$.
232230

233-
The Fisher-Neyman Factorization Theorem provides a shortcut to prove that a statistic is sufficient. The theorem states $T(X)$ is a sufficient statistic for $\theta$ if joint pmf/pdf $f(x; \theta)$ $=$ $g(T(x), \theta) h(x)$ for some $g$ that is a function of $T(X)$ and the parameters and some $h$ that is any function of $X$ and the parameters besides $\theta$.
231+
The Fisher-Neyman Factorization Theorem provides a shortcut to prove that a statistic is sufficient. The theorem states $T(X)$ is a sufficient statistic for $\theta$ if joint pmf/pdf $f(x; \theta)$ $=$ $g(T(x), \theta) h(x)$ for some $g$ that is a function of $T(X)$ and the parameters and some $h$ that is any function of $X$ and the parameters besides $\theta$. In otherwords, $g$ is an expression with $x$ only inside $T(x)$ and $h$ is an expression without $\theta$.
234232

235233
For example, let $X_1, X_2, ..., X_n \sim \text{Poisson} (\lambda)$. The maximum likelihood estimate for $\lambda$ is the sample mean $\frac{1}{n} \sum_{i=1}^n x_i$.
236234
This is a function of $T(x)$ $ =$ $\sum_{i=1}^n x_i$, which is a sufficient statistic for $\lambda$. The sufficiency can be shown by the definition or theorem.

_posts/2026-01-01-regression1.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
layout: single
3+
title: "Multivariable Linear Regression"
4+
excerpt: ""
5+
categories:
6+
- Intro-Data-Science
7+
tags:
8+
- regression
9+
toc: true
10+
toc_label: "Table of Contents"
11+
# toc_icon:
12+
# header:
13+
# image:
14+
# teaser:
15+
---
Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
layout: single
3-
title: "Model Selection in Linear Regression"
3+
title: "Model Selection for Linear Regression"
44
excerpt: ""
55
categories:
66
- Intro-Data-Science
@@ -14,7 +14,21 @@ toc_label: "Table of Contents"
1414
# teaser:
1515
---
1616

17-
# Var
17+
# Train-Test Split
18+
The **train-test split** is a method used to evaluate the performance of a machine learning model. The dataset is divided into two subsets: the training set and the testing set. The model is trained on the training set and evaluated on the testing set.
19+
20+
21+
## Train-Test-Validate
22+
The **train-test-validate** approach involves splitting the dataset into three subsets: training, validation, and testing. The model is trained on the training set, tuned on the validation set, and evaluated on the testing set. The test data is reserved until after models are selected and compared. to provide an unseen dataset to test whether the model selection process and final model is accurate on unseen data
23+
24+
25+
## Cross-fold Validation
26+
**Cross-fold validation** (also called $k$-folds or CV) is a technique where the dataset is divided into $k$ subsets. The model is trained and evaluated \( k \) times, each time using a different fold as the testing set and the remaining folds as the training set. The final performance is the average of the \( k \) evaluations.
27+
28+
29+
##
30+
31+
# Variable Selection
1832

1933
## Forward Selection
2034

_posts/2026-01-01-regression3.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
layout: single
3+
title: "Gradient Descent for Linear Regression"
4+
excerpt: ""
5+
categories:
6+
- Intro-Data-Science
7+
tags:
8+
- regression
9+
toc: true
10+
toc_label: "Table of Contents"
11+
# toc_icon:
12+
# header:
13+
# image:
14+
# teaser:
15+
---

_posts/2026-01-01-regression4.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
layout: single
3+
title: "Regularization for Linear Regression"
4+
excerpt: ""
5+
categories:
6+
- Intro-Data-Science
7+
tags:
8+
- regression
9+
toc: true
10+
toc_label: "Table of Contents"
11+
# toc_icon:
12+
# header:
13+
# image:
14+
# teaser:
15+
---

_posts/2026-01-01-regression5.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
layout: single
3+
title: "Object Oriented Program for Linear Regression"
4+
excerpt: ""
5+
categories:
6+
- Intro-Data-Science
7+
tags:
8+
- regression
9+
toc: true
10+
toc_label: "Table of Contents"
11+
# toc_icon:
12+
# header:
13+
# image:
14+
# teaser:
15+
---

code/regression_classification.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ def __init__(self):
3939

4040
def predict(self, X: np.array) -> np.array:
4141
self.check_num_features(X)
42-
X_b = np.c_[np.ones((X.shape[0], 1)), X]
42+
X_b = self.add_bias(X)
4343
return X_b @ self.coefficients
4444

4545
def fit(self, X: np.array, y: np.array):
@@ -168,8 +168,13 @@ def fit(self, X: np.array, y: np.array, learning_rate=1, epochs=1000, sgd=False)
168168
self.beta = np.zeros((self.num_features + 1, self.num_classes))
169169
self.loss_history = self.accuracy_history = []
170170
for _ in range(epochs):
171-
probs = self.softmax(X_b @ self.beta)
172-
grad = -X_b.T @ (y_enc - probs) / X_b.shape[0]
171+
if sgd:
172+
i = np.random.randint(X_b.shape[0])
173+
probs = self.softmax(X_b[i:i+1] @ self.beta)
174+
grad = -X_b[i:i+1].T @ (y_enc[i:i+1] - probs)
175+
else:
176+
probs = self.softmax(X_b @ self.beta)
177+
grad = -X_b.T @ (y_enc - probs) / X_b.shape[0]
173178
self.beta -= learning_rate * grad
174179
self.loss_history.append(self.loss(probs, y_enc))
175180
self.accuracy_history.append(accuracy(y, self.predict(X)))

index.html

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@
55

66
<p>Welcome to my site!</p>
77

8-
<p> I am a part-time student at Georgia Tech graduating May 3rd <i class="fa fa-graduation-cap" aria-hidden="true"></i>.
9-
I am currently seeking new opportunities to solve the hardest problems in data science <i class="fa fa-puzzle-piece" aria-hidden="true"></i>
8+
<p> I am a student at Georgia Tech graduating May 3rd <i class="fa fa-graduation-cap" aria-hidden="true"></i>.
9+
I am a Data Scientist at FIS seeking to solve the hardest machine learning problems <i class="fa fa-puzzle-piece" aria-hidden="true"></i>
1010
and continue to deploy use cases of large language models <i class="fa fa-language" aria-hidden="true"></i>. </p>
1111

1212
<iframe src="assets/files/Charles_Bauer_Resume.pdf" width="100%" height="400px"></iframe>

0 commit comments

Comments
 (0)