codeharborhub
diff --git a/‎docs/machine-learning/probability/basics-of-probability.mdx‎
Lines changed: 91 additions & 0 deletions b/‎docs/machine-learning/probability/basics-of-probability.mdx‎
Lines changed: 91 additions & 0 deletions
diff --git a/‎docs/machine-learning/probability/bayes-theorem.mdx‎
Lines changed: 94 additions & 0 deletions b/‎docs/machine-learning/probability/bayes-theorem.mdx‎
Lines changed: 94 additions & 0 deletions
diff --git a/‎docs/machine-learning/probability/conditional-probability.mdx‎
Lines changed: 86 additions & 0 deletions b/‎docs/machine-learning/probability/conditional-probability.mdx‎
Lines changed: 86 additions & 0 deletions
diff --git a/‎docs/machine-learning/probability/pdf-pmf.mdx‎
Lines changed: 102 additions & 0 deletions b/‎docs/machine-learning/probability/pdf-pmf.mdx‎
Lines changed: 102 additions & 0 deletions
@@ -0,0 +1,91 @@
+---
+title: "Basics of Probability"
+sidebar_label: Probability Basics
+description: "An intuitive introduction to probability theory, sample spaces, events, and the fundamental axioms that govern uncertainty in Machine Learning."
+tags: [probability, mathematics-for-ml, sample-space, axioms, statistics]
+---
+
+In Machine Learning, we never have perfect information. Data is noisy, sensors are imperfect, and the future is uncertain. **Probability** is the mathematical framework we use to quantify this uncertainty.
+
+## 1. Key Terminology
+
+Before we calculate anything, we must define the "world" we are looking at.
+
+```mermaid
+mindmap
+  root((Probability Experiment))
+    Sample Space
+      All possible outcomes
+      Denoted by S or Omega
+    Event
+      A subset of the Sample Space
+      The outcome we care about
+    Random Variable
+      Mapping outcomes to numbers
+
+```
+
+* **Experiment:** An action with an uncertain outcome (e.g., classifying an image).
+* **Sample Space ($S$):** The set of all possible outcomes. For a coin flip, $S = \{Heads, Tails\}$.
+* **Event (A):** A specific outcome or set of outcomes. For a die roll, an event could be "rolling an even number" ($A = \{2, 4, 6\}$).
+
+## 2. The Three Axioms of Probability
+
+To ensure our probability system is consistent, it must follow these three rules defined by Kolmogorov:
+
+1. **Non-negativity:** The probability of any event A is at least 0.
+$P(A) \ge 0$
+2. **Certainty:** The probability of the entire sample space S is exactly 1.
+P(S) = 1
+3. **Additivity:** For mutually exclusive events (events that cannot happen at the same time), the probability of their union is the sum of their probabilities.
+$P(A \cup B) = P(A) + P(B)$
+
+## 3. Calculating Probability
+
+In the simplest case (where every outcome is equally likely), probability is a ratio of counting:
+
+$$
+P(A) = \frac{\text{Number of favorable outcomes}}{\text{Total number of outcomes in } S}
+$$
+
+### Complement Rule
+
+The probability that an event **does not** occur is 1 minus the probability that it does.
+
+$$
+P(A^c) = 1 - P(A)
+$$
+
+## 4. Types of Probability
+
+<br />
+
+```mermaid
+sankey-beta
+    %% source,target,value
+    Probability,Joint Probability,20
+    Probability,Marginal Probability,20
+    Probability,Conditional Probability,40
+    Joint Probability,P(A and B),20
+    Marginal Probability,P(A),20
+    Conditional Probability,P(A | B),40
+
+```
+
+<br />
+
+* **Marginal Probability:** The probability of an event occurring ($P(A)$), regardless of other variables.
+* **Joint Probability:** The probability of two events occurring at the same time ($P(A \cap B)$).
+* **Conditional Probability:** The probability of event A occurring **given** that B has already occurred ($P(A|B)$).
+
+## 5. Why Probability is the "Heart" of ML
+
+Machine Learning models are essentially **probabilistic estimators**.
+
+* **Classification:** When a model says an image is a "cat," it is actually saying: $P(\text{Class} = \text{Cat} \mid \text{Pixels}) = 0.94$.
+* **Generative AI:** Large Language Models (LLMs) like GPT predict the "next token" by calculating the probability distribution of all possible words.
+* **Anomaly Detection:** We flag data points that have a very low probability of occurring based on the training distribution.
+
+---
+
+Knowing the basics is just the start. In ML, we often need to update our beliefs as new data comes in. This brings us to one of the most famous formulas in all of mathematics.
@@ -0,0 +1,94 @@
+---
+title: "Bayes' Theorem"
+sidebar_label: "Bayes' Theorem"
+description: "A deep dive into Bayes' Theorem: the formula for updating probabilities based on new evidence, and its massive impact on Machine Learning."
+tags: [probability, bayes-theorem, inference, mathematics-for-ml, naive-bayes]
+---
+
+**Bayes' Theorem** is more than just a formula; it is a philosophy of how to learn. It describes the probability of an event based on prior knowledge of conditions that might be related to the event. In Machine Learning, it is the engine behind **Bayesian Inference** and the **Naive Bayes** classifier.
+
+## 1. The Formula
+
+Bayes' Theorem allows us to find $P(A|B)$ if we already know $P(B|A)$.
+
+$$
+P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}
+$$
+
+### Breaking Down the Terms
+
+* **$P(A|B)$ (Posterior):** The probability of our hypothesis $A$ *after* seeing the evidence $B$.
+* **$P(B|A)$ (Likelihood):** The probability of the evidence $B$ appearing *given* that hypothesis $A$ is true.
+* **$P(A)$ (Prior):** Our initial belief about hypothesis $A$ *before* seeing any evidence.
+* **$P(B)$ (Evidence/Marginal Likelihood):** The total probability of seeing evidence $B$ under all possible hypotheses.
+
+## 2. The Logic of Bayesian Updating
+
+Bayesian logic is iterative. Today's **Posterior** becomes tomorrow's **Prior**.
+
+<br />
+
+```mermaid
+flowchart LR
+    A[Initial Belief: Prior] --> B[New Evidence: Likelihood]
+    B --> C[Updated Belief: Posterior]
+    C -->|New Data Arrives| A
+
+```
+
+<br />
+
+```mermaid
+sankey-beta
+    %% source,target,value
+    Prior_Knowledge,Posterior_Probability,50
+    New_Evidence,Posterior_Probability,50
+    Posterior_Probability,Final_Prediction,100
+
+```
+
+<br />
+
+## 3. A Practical Example: Medical Testing
+
+Suppose a disease affects **1%** of the population (Prior). A test for this disease is **99%** accurate (Likelihood). If a patient tests positive, what is the probability they actually have the disease?
+
+1. $P(\text{Disease}) = 0.01$
+2. $P(\text{Pos} | \text{Disease}) = 0.99$
+3. $P(\text{Pos} | \text{No Disease}) = 0.01 (False Positive rate)$
+
+### Using Bayes' Theorem:
+
+Even with a 99% accurate test, the probability of having the disease given a positive result is only **50%**. This is because the disease is so rare (low Prior) that the number of false positives equals the number of true positives.
+
+## 4. Bayes' Theorem in Machine Learning
+
+### A. Naive Bayes Classifier
+
+Naive Bayes is a popular algorithm for text classification (like spam detection). It assumes that every feature (word) is independent of every other feature (the "Naive" part) and uses Bayes' Theorem to calculate the probability of a category:
+
+$$ 
+P(\text{Spam} | \text{Words}) \propto P(\text{Words} | \text{Spam}) P(\text{Spam}) 
+$$
+
+### B. Bayesian Neural Networks
+
+Unlike standard neural networks that have fixed weights, Bayesian Neural Networks represent weights as **probability distributions**. This allows the model to express **uncertainty**, it can say "I think this is a cat, but I'm only 60% sure."
+
+### C. Hyperparameter Optimization
+
+**Bayesian Optimization** is a strategy used to find the best hyperparameters for a model. It builds a probability model of the objective function and uses it to select the most promising hyperparameters to evaluate next.
+
+## 5. Summary Table
+
+| Concept | Traditional (Frequentist) | Bayesian |
+| --- | --- | --- |
+| **View of Probability** | Long-run frequency of events. | Measure of "degree of belief." |
+| **Parameters** | Fixed, unknown constants. | Random variables with distributions. |
+| **New Data** | Used to refine the estimate. | Used to update the entire belief (Prior \to Posterior). |
+
+
+---
+
+
+Now that we can update our beliefs using Bayes' Theorem, we need to understand how these probabilities are spread across different outcomes. This brings us to Random Variables and Probability Distributions.
@@ -0,0 +1,86 @@
+---
+title: "Conditional Probability"
+sidebar_label: Conditional Probability
+description: "Understanding how the probability of an event changes given the occurrence of another event, and its role in predictive modeling."
+tags: [probability, conditional-probability, dependency, mathematics-for-ml, bayes-rule]
+---
+
+In the real world, events are rarely isolated. The probability of it raining is higher **given** that it is cloudy. The probability of a user clicking an ad is higher **given** their past search history. This "given" is the essence of **Conditional Probability**.
+
+## 1. The Definition
+
+Conditional probability is the probability of an event $A$ occurring, given that another event $B$ has already occurred. It is denoted as $P(A|B)$.
+
+The formula is:
+
+$$
+P(A|B) = \frac{P(A \cap B)}{P(B)}
+$$
+
+Where:
+* $P(A \cap B)$ is the **Joint Probability** (both $A$ and $B$ happen).
+* $P(B)$ is the probability of the condition (the "new universe").
+
+## 2. Intuition: Shrinking the Universe
+
+Think of probability as a "Universe" of possibilities. When we say "given $B$," we are throwing away every part of the universe where $B$ did not happen. Our new total area is just $B$.
+
+<br />
+
+```mermaid
+sankey-beta
+    %% source,target,value
+    OriginalUniverse,EventB_Happens,60
+    OriginalUniverse,EventB_DoesNotHappen,40
+    EventB_Happens,EventA_Happens_GivenB,20
+    EventB_Happens,EventA_DoesNotHappen_GivenB,40
+
+```
+
+<br />
+
+## 3. Independent vs. Dependent Events
+
+How do we know if one event affects another? We look at their conditional probabilities.
+
+### A. Independent Events
+
+Event A and B are independent if the occurrence of B provides **zero** new information about $A$.
+
+* **Mathematical Check:** $P(A|B) = P(A)$
+* **Example:** Rolling a 6 on a die given that you ate an apple for breakfast.
+
+### B. Dependent Events
+
+Event A and B are dependent if knowing B happened changes the likelihood of $A$.
+
+* **Mathematical Check:** $P(A|B) \neq P(A)$
+* **Example:** Having a cough $(A)$ given that you have a cold $(B)$.
+
+## 4. The Multiplication Rule
+
+We can rearrange the conditional probability formula to find the probability of both events happening:
+
+This is the foundation for the **Chain Rule of Probability**, which allows ML models to calculate the probability of a long sequence of events (like a sentence in an LLM).
+
+## 5. Application: Predictive Modeling
+
+In Machine Learning, almost every prediction is a conditional probability.
+
+```mermaid
+flowchart LR
+    Input[Data Features X] --> Model[ML Model]
+    Model --> Output["P(Y | X)"]
+    style Output fill:#f9f,stroke:#333,color:#333,stroke-width:2px
+
+```
+
+* **Medical Diagnosis:** $P(\text{Disease} \mid \text{Symptoms})$
+* **Spam Filter:** $P(\text{Spam} \mid \text{Words in Email})$
+* **Self-Driving Cars:** $P(\text{Pedestrian crosses} \mid \text{Camera Image})$
+
+
+---
+
+
+If we flip the question—if we know $P(A|B)$ but we want to find $P(B|A)$ we use the most powerful tool in probability theory.
@@ -0,0 +1,102 @@
+---
+title: "PMF vs. PDF"
+sidebar_label: PMF & PDF
+description: "A deep dive into Probability Mass Functions (PMF) for discrete data and Probability Density Functions (PDF) for continuous data."
+tags: [probability, pmf, pdf, statistics, mathematics-for-ml, distributions]
+---
+
+To work with data in Machine Learning, we need a mathematical way to describe how likely different values are to occur. Depending on whether our data is **Discrete** (countable) or **Continuous** (measurable), we use either a **PMF** or a **PDF**.
+
+## 1. Probability Mass Function (PMF)
+
+The **PMF** is used for discrete random variables. It gives the probability that a discrete random variable is exactly equal to some value.
+
+### Key Mathematical Properties:
+1.  **Direct Probability:** $P(X = x) = f(x)$. The "height" of the bar is the actual probability.
+2.  **Summation:** All individual probabilities must sum to 1.
+    $$ 
+    \sum_{i} P(X = x_i) = 1 
+    $$
+3.  **Range:** $0 \le P(X = x) \le 1$.
+
+
+<img className="rounded p-4" src="/tutorial/img/tutorials/ml/probability-mass-function.jpg" alt="Probability Mass Function plot for a Binomial Distribution" />
+
+**Example:** If you roll a fair die, the PMF is $1/6$ for each value $\{1, 2, 3, 4, 5, 6\}$. There is no "1.5" or "2.7"; the probability exists only at specific points.
+
+## 2. Probability Density Function (PDF)
+
+The **PDF** is used for continuous random variables. Unlike the PMF, the "height" of a PDF curve does **not** represent probability; it represents **density**.
+
+### The "Zero Probability" Paradox
+In a continuous world (like height or time), the probability of a variable being *exactly* a specific number (e.g., exactly $175.00000...$ cm) is effectively **0**. 
+
+Instead, we find the probability over an **interval** by calculating the **area under the curve**.
+
+### Key Mathematical Properties:
+1.  **Area is Probability:** The probability that $X$ falls between $a$ and $b$ is the integral of the PDF:
+    $$ 
+    P(a \le X \le b) = \int_{a}^{b} f(x) dx 
+    $$
+2.  **Total Area:** The total area under the entire curve must equal 1.
+    $$ 
+    \int_{-\infty}^{\infty} f(x) dx = 1 
+    $$
+3.  **Density vs. Probability:** $f(x)$ can be greater than 1, as long as the total area remains 1.
+
+
+## 3. Comparison at a Glance
+
+```mermaid
+graph LR
+    Data[Data Type] --> Disc[Discrete]
+    Data --> Cont[Continuous]
+    
+    Disc --> PMF["PMF: $$P(X=x)$$"]
+    Cont --> PDF["PDF: $$f(x)$$"]
+    
+    PMF --> P_Sum["$$\sum P(x) = 1$$"]
+    PDF --> P_Int["$$\int f(x)dx = 1$$"]
+    
+    PMF --> P_Val["Height = Probability"]
+    PDF --> P_Area["Area = Probability"]
+```
+
+| Feature | PMF (Discrete) | PDF (Continuous) |
+| --- | --- | --- |
+| **Variable Type** | Countable (Integers) | Measurable (Real Numbers) |
+| **Probability at a point** | $P(X=x) = \text{Height}$ | $P(X=x) = 0$ |
+| **Probability over range** | Sum of heights | Area under the curve (Integral) |
+| **Visualization** | Bar chart / Stem plot | Smooth curve |
+
+---
+
+## 4. The Bridge: Cumulative Distribution Function (CDF)
+
+The **CDF** is the "running total" of probability. It tells you the probability that a variable is **less than or equal to** $x$.
+
+* **For PMF:** It is a step function (it jumps at every discrete value).
+* **For PDF:** It is a smooth S-shaped curve.
+
+$$ 
+F(x) = P(X \le x) 
+$$
+
+```mermaid
+graph LR
+    PDF["PDF (Density) <br/> $$f(x)$$"] -- " Integrate: <br/> $$\int_{-\infty}^{x} f(t) dt$$ " --> CDF["CDF (Cumulative) <br/> $$F(x)$$"]
+    CDF -- " Differentiate: <br/> $$\frac{d}{dx} F(x)$$ " --> PDF
+
+    style PDF fill:#fdf,stroke:#333,color:#333
+    style CDF fill:#def,stroke:#333,color:#333
+```
+
+## 5. Why this matters in Machine Learning
+
+1. **Likelihood Functions:** When training models (like Logistic Regression), we maximize the **Likelihood**. For discrete labels, this uses the PMF; for continuous targets, it uses the PDF.
+2. **Anomaly Detection:** We often flag a data point as an outlier if its PDF value (density) is below a certain threshold.
+3. **Generative Models:** VAEs and GANs attempt to learn the underlying **PDF** of a dataset so they can sample new points from high-density regions (creating realistic images or text).
+
+---
+
+Now that you understand how we describe probability at a point or over an area, it's time to meet the most important distribution in all of data science.