From 940af6dbdcc2912849d040192060a3caf80787ef Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Wed, 30 Jul 2025 20:45:59 +0200
Subject: [PATCH 01/48] Create matrices.md

---
 notes/matrices.md | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 notes/matrices.md

diff --git a/notes/matrices.md b/notes/matrices.md
new file mode 100644
index 0000000..d25cd52
--- /dev/null
+++ b/notes/matrices.md
@@ -0,0 +1 @@
+rotations

From d9306c3f153e35321a1d179847632931f984aa85 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 2 Aug 2025 12:22:01 +0200
Subject: [PATCH 02/48] Update graphs.md

---
 notes/graphs.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/notes/graphs.md b/notes/graphs.md
index 1b1aa5f..ce094e4 100644
--- a/notes/graphs.md
+++ b/notes/graphs.md
@@ -1,3 +1,6 @@
+TODO:
+- topological sort
+
 ## Graphs
 
 In many areas of life, we come across systems where elements are deeply interconnected—whether through physical routes, digital networks, or abstract relationships. Graphs offer a flexible way to represent and make sense of these connections.

From 5003ccbc44d634c55fac40e55d02ac61e0d80460 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 2 Aug 2025 12:22:44 +0200
Subject: [PATCH 03/48] Update brain_teasers.md

---
 notes/brain_teasers.md | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/notes/brain_teasers.md b/notes/brain_teasers.md
index 1ee7a4e..a9136c8 100644
--- a/notes/brain_teasers.md
+++ b/notes/brain_teasers.md
@@ -1,3 +1,9 @@
+todo:
+
+- bisect
+- heaps
+- fast and slow pointer for lists
+
 ## Solving Programming Brain Teasers
 
 Programming puzzles and brain teasers are excellent tools for testing and enhancing your coding abilities and problem-solving skills. They are frequently used in technical interviews to evaluate a candidate's logical thinking, analytical prowess, and ability to devise efficient algorithms. To excel in these scenarios, it is recommended to master effective strategies for approaching and solving these problems.

From 0d1cb6263b2160c636b5837691a49a6e1fdbd611 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 2 Aug 2025 17:46:39 +0200
Subject: [PATCH 04/48] Create greedy_algorithms.md

---
 notes/greedy_algorithms.md | 1 +
 1 file changed, 1 insertion(+)
 create mode 100644 notes/greedy_algorithms.md

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
new file mode 100644
index 0000000..8b13789
--- /dev/null
+++ b/notes/greedy_algorithms.md
@@ -0,0 +1 @@
+

From fbc8fe06f061e379a3346506a61e580f6f8063cc Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 2 Aug 2025 20:54:46 +0200
Subject: [PATCH 05/48] Create searching.md

---
 notes/searching.md | 61 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 61 insertions(+)
 create mode 100644 notes/searching.md

diff --git a/notes/searching.md b/notes/searching.md
new file mode 100644
index 0000000..5f69f29
--- /dev/null
+++ b/notes/searching.md
@@ -0,0 +1,61 @@
+### 1. **Linear & Sequential Search**
+- **Linear Search (Sequential Search)**
+  - Checks each element one by one.
+- **Sentinel Linear Search**
+  - Uses a sentinel value to reduce comparisons.
+
+### 2. **Divide & Conquer Search**
+- **Binary Search**
+  - Efficient on sorted arrays (O(log n)).
+- **Ternary Search**
+  - Divides array into three parts instead of two.
+- **Jump Search**
+  - Jumps ahead by fixed steps, then does linear search.
+- **Exponential Search**
+  - Finds range with exponential jumps, then does binary search.
+- **Interpolation Search**
+  - Estimates position based on value distribution.
+
+### 3. **Tree-based Search**
+- **Binary Search Tree (BST) Search**
+  - Search in a binary search tree.
+- **AVL Tree Search / Red-Black Tree Search**
+  - Balanced BSTs for faster search.
+- **B-Tree / B+ Tree Search**
+  - Used in databases and file systems.
+- **Trie (Prefix Tree) Search**
+  - Efficient for searching words/prefixes.
+
+### 4. **Hash-based Search**
+- **Hash Table Search**
+  - Uses hash functions for constant time lookups.
+- **Open Addressing (Linear/Quadratic Probing, Double Hashing)**
+  - Methods for collision resolution.
+- **Separate Chaining**
+  - Uses linked lists for collisions.
+- **Cuckoo Hashing**
+  - Multiple hash functions to resolve collisions.
+
+### 5. **Probabilistic & Approximate Search**
+- **Bloom Filter**
+  - Probabilistic; fast membership test with false positives.
+- **Counting Bloom Filter**
+  - Supports deletion.
+- **Cuckoo Filter**
+  - Similar to Bloom filters but supports deletion.
+
+### 6. **Graph-based Search Algorithms**
+- **Breadth-First Search (BFS)**
+  - Explores neighbors first in unweighted graphs.
+- **Depth-First Search (DFS)**
+  - Explores as far as possible along branches.
+- **A* Search**
+  - Heuristic-based best-first search.
+- **Bidirectional Search**
+  - Runs two simultaneous searches from source and target.
+
+### 7. **String Search Algorithms**
+- **Naive String Search**
+- **Knuth-Morris-Pratt (KMP)**
+- **Boyer-Moore**
+- **Rabin-Karp**

From fdb271e29fcc8086c371dd993a652169b9604be6 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Mon, 4 Aug 2025 10:51:43 +0200
Subject: [PATCH 06/48] Update backtracking.md

---
 notes/backtracking.md | 43 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 43 insertions(+)

diff --git a/notes/backtracking.md b/notes/backtracking.md
index b50d1f9..26481f1 100644
--- a/notes/backtracking.md
+++ b/notes/backtracking.md
@@ -209,6 +209,49 @@ Main Idea:
 6. If the partial solution is complete and valid, record or output it.
 7. If all options are exhausted at a level, remove the last component and backtrack to the previous level.
 
+General Template (pseudocode)
+
+```
+function backtrack(partial):
+    if is_complete(partial):
+        handle_solution(partial)
+        return    // or continue if looking for all solutions
+
+    for candidate in generate_candidates(partial):
+        if is_valid(candidate, partial):
+            place(candidate, partial)          // extend partial with candidate
+            backtrack(partial)
+            unplace(candidate, partial)        // undo extension (backtrack)
+```
+
+Pieces you supply per problem:
+
+* `is_complete`: does `partial` represent a full solution?
+* `handle_solution`: record/output the solution.
+* `generate_candidates`: possible next choices given current partial.
+* `is_valid`: pruning test to reject infeasible choices early.
+* `place` / `unplace`: apply and revert the choice.
+
+Python-ish Generic Framework
+
+```python
+def backtrack(partial, is_complete, generate_candidates, is_valid, handle_solution):
+    if is_complete(partial):
+        handle_solution(partial)
+        return
+
+    for candidate in generate_candidates(partial):
+        if not is_valid(candidate, partial):
+            continue
+        # make move
+        partial.append(candidate)
+        backtrack(partial, is_complete, generate_candidates, is_valid, handle_solution)
+        # undo move
+        partial.pop()
+```
+
+You can wrap those callbacks into a class or closures for stateful problems.
+
 #### N-Queens Problem
 
 The N-Queens problem is a classic puzzle in which the goal is to place $N$ queens on an $N \times N$ chessboard such that no two queens threaten each other. In chess, a queen can move any number of squares along a row, column, or diagonal. Therefore, no two queens can share the same row, column, or diagonal.

From 3d1ade21f6aa85a928a2ee8a8cf3b7865324becf Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Mon, 4 Aug 2025 13:47:16 +0200
Subject: [PATCH 07/48] Update basic_concepts.md

---
 notes/basic_concepts.md | 152 +++++++++++++++++++++++++++++++++-------
 1 file changed, 126 insertions(+), 26 deletions(-)

diff --git a/notes/basic_concepts.md b/notes/basic_concepts.md
index fd5d9a8..faf0d5f 100644
--- a/notes/basic_concepts.md
+++ b/notes/basic_concepts.md
@@ -10,12 +10,73 @@ Data structures and algorithms are fundamental concepts in computer science that
 
 A **data structure** organizes and stores data in a way that allows efficient access, modification, and processing. The choice of the appropriate data structure depends on the specific use case and can significantly impact the performance of an application. Here are some common data structures:
 
-1. Imagine an **array** as a row of lockers, each labeled with a number and capable of holding one item of the same type. Technically, arrays are blocks of memory storing elements sequentially, allowing quick access using an index. However, arrays have a fixed size, which limits their flexibility when you need to add or remove items.
-2. Think of a **stack** like stacking plates: you always add new plates on top (push), and remove them from the top as well (pop). This structure follows the Last-In, First-Out (LIFO) approach, meaning the most recently added item is removed first. Stacks are particularly helpful in managing function calls (like in the call stack of a program) or enabling "undo" operations in applications.
-3. A **queue** is similar to a line at the grocery store checkout. People join at the end (enqueue) and leave from the front (dequeue), adhering to the First-In, First-Out (FIFO) principle. This ensures the first person (or item) that arrives is also the first to leave. Queues work great for handling tasks or events in the exact order they occur, like scheduling print jobs or processing messages.
-4. You can picture a **linked list** as a treasure hunt, where each clue leads you to the next one. Each clue, or node, holds data and a pointer directing you to the next node. Because nodes can be added or removed without shifting other elements around, linked lists offer dynamic and flexible management of data at any position.
-5. A **tree** resembles a family tree, starting from one ancestor (the root) and branching out into multiple descendants (nodes), each of which can have their own children. Formally, trees are hierarchical structures organized across various levels. They’re excellent for showing hierarchical relationships, such as organizing files on your computer or visualizing company structures.
-6. Consider a **graph** like a network of cities connected by roads. Each city represents a node, and the roads connecting them are edges, which can either be one-way (directed) or two-way (undirected). Graphs effectively illustrate complex relationships and networks, such as social media connections, website link structures, or even mapping transportation routes.
+**I. Array**
+
+Imagine an **array** as a row of lockers, each labeled with a number and capable of holding one item of the same type. Technically, arrays are blocks of memory storing elements sequentially, allowing quick access using an index. However, arrays have a fixed size, which limits their flexibility when you need to add or remove items.
+
+```
+Indices:  0   1   2   3
+Array:   [A] [B] [C] [D]
+```
+
+**II. Stack**
+
+Think of a **stack** like stacking plates: you always add new plates on top (push), and remove them from the top as well (pop). This structure follows the Last-In, First-Out (LIFO) approach, meaning the most recently added item is removed first. Stacks are particularly helpful in managing function calls (like in the call stack of a program) or enabling "undo" operations in applications.
+
+```
+Top
+ ┌───┐
+ │ C │  ← most recent (pop/push here)
+ ├───┤
+ │ B │
+ ├───┤
+ │ A │
+ └───┘
+Bottom
+```
+
+**III. Queue**
+
+A **queue** is similar to a line at the grocery store checkout. People join at the end (enqueue) and leave from the front (dequeue), adhering to the First-In, First-Out (FIFO) principle. This ensures the first person (or item) that arrives is also the first to leave. Queues work great for handling tasks or events in the exact order they occur, like scheduling print jobs or processing messages.
+
+```
+Front → [A] → [B] → [C] → [D] ← Rear
+(dequeue)                 (enqueue)
+```
+
+**IV. Linked List**
+
+You can picture a **linked list** as a treasure hunt, where each clue leads you to the next one. Each clue, or node, holds data and a pointer directing you to the next node. Because nodes can be added or removed without shifting other elements around, linked lists offer dynamic and flexible management of data at any position.
+
+```
+Head -> [A] -> [B] -> [C] -> NULL
+```
+
+**V. Tree**
+
+A **tree** resembles a family tree, starting from one ancestor (the root) and branching out into multiple descendants (nodes), each of which can have their own children. Formally, trees are hierarchical structures organized across various levels. They’re excellent for showing hierarchical relationships, such as organizing files on your computer or visualizing company structures.
+
+```
+# Tree
+        (Root)
+        /    \
+     (L)     (R)
+    /  \       \
+ (LL) (LR)     (RR)
+```
+
+**VI. Graph**
+
+Consider a **graph** like a network of cities connected by roads. Each city represents a node, and the roads connecting them are edges, which can either be one-way (directed) or two-way (undirected). Graphs effectively illustrate complex relationships and networks, such as social media connections, website link structures, or even mapping transportation routes.
+
+```
+(A) ↔ (B)
+ |       \
+(C) ---> (D)
+```
+
+(↔ undirected edge, ---> directed edge)
+
 
 ![image](https://github.com/user-attachments/assets/f1962de7-aa28-4348-9933-07e49c737cd9)
 
@@ -88,10 +149,7 @@ sum = num1 + num2
 print("The sum is", sum)
 ```
 
-To recap:
-
-- Algorithms are abstract instructions designed to terminate after a finite number of steps.
-- Programs are concrete implementations, which may sometimes run indefinitely or until an external action stops them. For instance, an operating system is a program designed to run continuously until explicitly terminated.
+Programs may sometimes run indefinitely or until an external action stops them. For instance, an operating system is a program designed to run continuously until explicitly terminated.
 
 #### Types of Algorithms
 
@@ -101,13 +159,14 @@ I. **Sorting Algorithms** arrange data in a specific order, such as ascending or
 
 Example: Bubble Sort
 
-```
-Initial Array: [5, 3, 8, 4, 2]
+Initial Array: `[5, 3, 8, 4, 2]`
 
 Steps:
+
 1. Compare adjacent elements and swap if needed.
 2. Repeat for all elements.
 
+```
 After 1st Pass: [3, 5, 4, 2, 8]
 After 2nd Pass: [3, 4, 2, 5, 8]
 After 3rd Pass: [3, 2, 4, 5, 8]
@@ -118,16 +177,17 @@ II. **Search Algorithms** are designed to find a specific item or value within a
 
 Example: Binary Search
 
-```
-Searching 33 in Sorted Array: [1, 3, 5, 7, 9, 11, 33, 45, 77, 89]
+Searching 33 in Sorted Array: `[1, 3, 5, 7, 9, 11, 33, 45, 77, 89]`
 
 Steps:
+
 1. Start with the middle element.
 2. If the middle element is the target, return it.
 3. If the target is greater, ignore the left half.
 4. If the target is smaller, ignore the right half.
 5. Repeat until the target is found or the subarray is empty.
 
+```
 Mid element at start: 9
 33 > 9, so discard left half
 New mid element: 45
@@ -137,64 +197,98 @@ New mid element: 11
 The remaining element is 33, which is the target.
 ```
 
-**Graph Algorithms** address problems related to graphs, such as finding the shortest path between nodes or determining if a graph is connected. Examples include Dijkstra's algorithm and the Floyd-Warshall algorithm.
+III. **Graph Algorithms** address problems related to graphs, such as finding the shortest path between nodes or determining if a graph is connected. Examples include Dijkstra's algorithm and the Floyd-Warshall algorithm.
 
 Example: Dijkstra's Algorithm
 
-```
 Given a graph with weighted edges, find the shortest path from a starting node to all other nodes.
 
 Steps:
+
 1. Initialize the starting node with a distance of 0 and all other nodes with infinity.
 2. Visit the unvisited node with the smallest known distance.
 3. Update the distances of its neighboring nodes.
 4. Repeat until all nodes have been visited.
 
 Example Graph:
+
+```
 A -> B (1)
 A -> C (4)
 B -> C (2)
 B -> D (5)
 C -> D (1)
+```
+
+Trace Table
+
+| Iter | Extracted Node (u) | PQ before extraction               | dist[A,B,C,D] | prev[A,B,C,D] | Visited   | Comments / Updates                                                                      |
+| ---- | ------------------ | ---------------------------------- | -------------- | -------------- | --------- | --------------------------------------------------------------------------------------- |
+| 0    | — (initial)        | (0, A)                             | [0, ∞, ∞, ∞]  | [-, -, -, -]  | {}        | Initialization: A=0, others ∞                                                           |
+| 1    | A (0)              | (0, A)                             | [0, 1, 4, ∞]  | [-, A, A, -]  | {A}       | Relax A→B (1), A→C (4); push (1,B), (4,C)                                               |
+| 2    | B (1)              | (1, B), (4, C)                     | [0, 1, 3, 6]  | [-, A, B, B]  | {A, B}    | Relax B→C: alt=3 <4 ⇒ update C; B→D: dist[D]=6; push (3,C), (6,D). (4,C) becomes stale |
+| 3    | C (3)              | (3, C), (4, C) stale, (6, D)       | [0, 1, 3, 4]  | [-, A, B, C]  | {A, B, C} | Relax C→D: alt=4 <6 ⇒ update D; push (4,D). (6,D) becomes stale                         |
+| 4    | D (4)              | (4, D), (4, C) stale, (6, D) stale | [0, 1, 3, 4]  | [-, A, B, C]  | {A,B,C,D} | No outgoing improvements; done                                                          |
+
+Legend:
+
+* `dist[X]`: current best known distance from A to X
+* `prev[X]`: predecessor of X on that best path
+* PQ: min-heap of (tentative distance, node); stale entries (superseded by better distance) are shown in parentheses
+* Visited: nodes whose shortest distance is finalized
 
 Starting from A:
+
 - Shortest path to B: A -> B (1)
 - Shortest path to C: A -> B -> C (3)
 - Shortest path to D: A -> B -> C -> D (4)
-```
 
-**String Algorithms** deal with problems related to strings, such as finding patterns or matching sequences. Examples include the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore algorithm.
+IV. **String Algorithms** deal with problems related to strings, such as finding patterns or matching sequences. Examples include the Knuth-Morris-Pratt (KMP) algorithm and the Boyer-Moore algorithm.
 
 Example: Boyer-Moore Algorithm
 
 ```
 Text:    "ABABDABACDABABCABAB"
 Pattern: "ABABCABAB"
+```
 
 Steps:
+
 1. Compare the pattern from right to left.
 2. If a mismatch occurs, use the bad character and good suffix heuristics to skip alignments.
 3. Repeat until the pattern is found or the text is exhausted.
 
+| Iter | Start | Text window | Mismatch (pattern vs text)                | Shift applied                                      | Next Start | Result          |
+| ---- | ----- | ----------- | ----------------------------------------- | -------------------------------------------------- | ---------- | --------------- |
+| 1    | 0     | `ABABDABAC` | pattern[8]=B vs text[8]=C                 | bad char C → last in pattern at idx4 ⇒ 8−4 = **4** | 4          | no match        |
+| 2    | 4     | `DABACDABA` | pattern[8]=B vs text[12]=A                | bad char A → last at idx7 ⇒ 8−7 = **1**            | 5          | no match        |
+| 3    | 5     | `ABACDABAB` | pattern[4]=C vs text[9]=D                 | D not in pattern ⇒ 4−(−1)= **5**                   | 10         | no match        |
+| 4    | 10    | `ABABCABAB` | full right-to-left comparison → **match** | —                                                  | —          | **found** at 10 |
+
 Pattern matched starting at index 10 in the text.
-```
 
 #### Important Algorithms for Software Engineers
 
-- As a software engineer, it is not necessary to **master every algorithm**. Instead, knowing how to effectively use libraries and packages that implement widely-used algorithms is more practical.
+- As a software engineer, it is not necessary to **master every algorithm**. Instead, knowing how to use libraries and packages that implement widely-used algorithms is more practical.
 - The important skill is the ability to **select the right algorithm** for a task by considering factors such as its efficiency, the problem’s requirements, and any specific constraints.
 - Learning **algorithms** during the early stages of programming enhances problem-solving skills. It builds a solid foundation in logical thinking, introduces various problem-solving strategies, and helps in understanding how to approach complex issues.
 - Once the **fundamentals of algorithms** are understood, the focus often shifts to utilizing pre-built libraries and tools for solving real-world problems, as writing algorithms from scratch is rarely needed in practice.
 
+Real Life Story:
+
+```
+When Zara landed her first job at a logistics-tech startup, her assignment was to route delivery vans through a sprawling city in under a second—something she’d never tackled before.  She remembered the semester she’d wrestled with graph theory and Dijkstra’s algorithm purely for practice, so instead of hand-coding the logic she opened the company’s Python stack and pulled in NetworkX, benchmarking its built-in shortest-path routines against the map’s size and the firm’s latency budget.  The initial results were sluggish, so she compared A* with Dijkstra, toggling heuristics until the run time dipped below 500 ms, well under the one-second target.  Her teammates were impressed not because she reinvented an algorithm, but because she knew which one to choose, how to reason about its complexity, and where to find a rock-solid library implementation.  Later, in a sprint retrospective, Zara admitted that mastering algorithms in college hadn’t been about memorizing code—it had trained her to dissect problems, weigh trade-offs, and plug in the right tool when every millisecond and memory block counted.
+```
+
 ### Understanding Algorithmic Complexity
 
 Algorithmic complexity helps us understand the computational resources (time or space) an algorithm needs as the input size increases. Here’s a breakdown of different types of complexity:
 
-* *Best-case complexity* describes how quickly or efficiently an algorithm runs under the most favorable conditions. For example, an algorithm with a best-case complexity of O(1) performs its task instantly, regardless of how much data it processes.
-* *Average-case complexity* reflects the typical performance of an algorithm across all possible inputs. Determining this can be complex, as it involves analyzing how often different inputs occur and how each one influences the algorithm's overall performance.
-* *Worst-case complexity* defines the maximum amount of time or resources an algorithm could consume when faced with the most difficult or demanding inputs. Understanding the worst-case scenario is crucial because it sets an upper limit on performance, ensuring predictable and reliable behavior.
-* *Space complexity* refers to how much memory an algorithm needs relative to the amount of data it processes. It's an important consideration when memory availability is limited or when optimizing an algorithm to be resource-efficient.
-* *Time complexity* indicates how the execution time of an algorithm increases as the input size grows. Typically, this is the primary focus when evaluating algorithm efficiency because faster algorithms are generally more practical and user-friendly.
+* In an ideal input scenario, *best-case complexity* shows the minimum work an algorithm will do; include it to set expectations for quick interactions, omit it and you may overlook fast paths that are useful for user experience, as when insertion sort finishes almost immediately on a nearly sorted list.
+* When you ask what to expect most of the time, *average-case complexity* estimates typical running time; include it to make useful forecasts under normal workloads, omit it and designs can seem fine in tests but lag on common inputs, as with randomly ordered customer IDs that need $O(n log n)$ sorting.
+* By establishing an upper bound, *worst-case complexity* tells you the maximum time or space an algorithm might need; include it to ensure predictable behavior, omit it and peak loads can surprise you, as when quicksort degrades to $O(n^2)$ on already sorted input without careful pivot selection.
+* On memory-limited devices, *space complexity* measures how much extra storage an algorithm requires; include it to fit within available RAM, omit it and an otherwise fast solution may crash or swap, as when merge sort’s $O(n)$ auxiliary array overwhelms a phone with little free memory.
+* As your dataset scales, *time complexity* describes how running time expands with input size; include it to choose faster approaches, omit it and performance can degrade sharply, as when an $O(n^2)$ deduplication routine turns a minute-long job into hours after a customer list doubles.
 
 #### Analyzing Algorithm Growth Rates
 
@@ -208,6 +302,8 @@ If we designate $f(n)$ as the actual complexity and $g(n)$ as the function in Bi
 
 For instance, if an algorithm has a time complexity of $O(n)$, it signifies that the algorithm's running time does not grow more rapidly than a linear function of the input size, in the worst-case scenario.
 
+<img width="1750" height="1110" alt="0902bace-952d-4c80-9533-5706e28ef3e9" src="https://github.com/user-attachments/assets/152fe1b7-3e0b-4a6d-b2d1-abf248ca90cf" />
+
 ##### Big Omega Notation (Ω-notation)
 
 The Big Omega notation provides an asymptotic lower bound that expresses the best-case scenario for the time or space complexity of an algorithm.
@@ -216,13 +312,17 @@ If $f(n) = Ω(g(n))$, this means that $f(n)$ grows at a rate that is at least as
 
 For example, if an algorithm has a time complexity of $Ω(n)$, it implies that the running time is at the bare minimum proportional to the input size in the best-case scenario.
 
+<img width="1707" height="1103" alt="d189ece7-e9c2-4797-8e0d-720336c4ba4a" src="https://github.com/user-attachments/assets/9984cad4-e131-4d52-bcad-8206b03e625f" />
+
 ##### Theta Notation (Θ-notation)
 
 Theta notation offers a representation of the average-case scenario for an algorithm's time or space complexity. It sets an asymptotically tight bound, implying that the function grows neither more rapidly nor slower than the bound.
 
 Stating $f(n) = Θ(g(n))$ signifies that $f(n)$ grows at the same rate as $g(n)$ under average circumstances. This indicates the time or space complexity is both at most and at least a linear function of the input size.
 
-Remember, these notations primarily address the growth rate as the input size becomes significantly large. While they offer a high-level comprehension of an algorithm's performance, the actual running time in practice can differ based on various factors, such as the specific input data, the hardware or environment where the algorithm is operating, and the precise way the algorithm is implemented in the code.
+<img width="1707" height="1103" alt="ef39373a-8e6a-4e5b-832f-698b4dde7c7e" src="https://github.com/user-attachments/assets/bb11e34a-da8f-45a6-9eab-cbc05676a334" />
+
+These notations primarily address the growth rate as the input size becomes significantly large. While they offer a high-level comprehension of an algorithm's performance, the actual running time in practice can differ based on various factors, such as the specific input data, the hardware or environment where the algorithm is operating, and the precise way the algorithm is implemented in the code.
 
 #### Diving into Big O Notation Examples
 

From b02d8eb5beb91d360e948cadce4a165ca04ceb71 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Mon, 4 Aug 2025 13:47:50 +0200
Subject: [PATCH 08/48] Create time_complexity.py

---
 resources/time_complexity.py | 50 ++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)
 create mode 100644 resources/time_complexity.py

diff --git a/resources/time_complexity.py b/resources/time_complexity.py
new file mode 100644
index 0000000..0191eed
--- /dev/null
+++ b/resources/time_complexity.py
@@ -0,0 +1,50 @@
+import numpy as np
+import matplotlib.pyplot as plt
+
+# Data range
+n = np.arange(2, 101)
+
+# Big O example: f(n) = n log n, upper bound g(n) = n^2 (showing f(n) = O(n^2))
+f_big_o = n * np.log2(n)
+upper_bound_big_o = n ** 2
+
+plt.figure()
+plt.scatter(n, f_big_o, label=r"$f(n) = n \log_2 n$ (data points)", s=10)
+plt.plot(n, upper_bound_big_o, label=r"Upper bound $g(n) = n^2$", linewidth=1.5)
+plt.title("Big O Notation: $f(n) = O(n^2)$")
+plt.xlabel("n")
+plt.ylabel("Time / Growth")
+plt.legend()
+plt.grid(True)
+
+# Big Omega example: f(n) = n log n, lower bound h(n) = n (showing f(n) = Ω(n))
+f_big_omega = n * np.log2(n)
+lower_bound_big_omega = n
+
+plt.figure()
+plt.scatter(n, f_big_omega, label=r"$f(n) = n \log_2 n$ (data points)", s=10)
+plt.plot(n, lower_bound_big_omega, label=r"Lower bound $h(n) = n$", linewidth=1.5)
+plt.title("Big Omega Notation: $f(n) = \Omega(n)$")
+plt.xlabel("n")
+plt.ylabel("Time / Growth")
+plt.legend()
+plt.grid(True)
+
+# Theta example: noisy f(n) around n log n, bounds 0.8*n log n and 1.2*n log n
+base_theta = n * np.log2(n)
+np.random.seed(42)
+f_theta = base_theta * (1 + np.random.uniform(-0.15, 0.15, size=n.shape))
+lower_theta = 0.8 * base_theta
+upper_theta = 1.2 * base_theta
+
+plt.figure()
+plt.scatter(n, f_theta, label=r"Noisy $f(n) \approx n \log_2 n$", s=10)
+plt.plot(n, lower_theta, label=r"Lower tight bound $0.8 \cdot n \log_2 n$", linewidth=1.5)
+plt.plot(n, upper_theta, label=r"Upper tight bound $1.2 \cdot n \log_2 n$", linewidth=1.5)
+plt.title("Theta Notation: $f(n) = \Theta(n \log n)$")
+plt.xlabel("n")
+plt.ylabel("Time / Growth")
+plt.legend()
+plt.grid(True)
+
+plt.show()

From 190c56215732520316caf298e1dc8b42b286dd0f Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Mon, 4 Aug 2025 13:54:33 +0200
Subject: [PATCH 09/48] Update dynamic_programming.md

---
 notes/dynamic_programming.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes/dynamic_programming.md b/notes/dynamic_programming.md
index da180c0..9233e1c 100644
--- a/notes/dynamic_programming.md
+++ b/notes/dynamic_programming.md
@@ -2,9 +2,9 @@
 
 Dynamic Programming (DP) is a way to solve complex problems by breaking them into smaller, easier problems. Instead of solving the same small problems again and again, DP **stores their solutions** in a structure like an array, table, or map. This avoids wasting time on repeated calculations and makes the process much faster and more efficient.
 
-DP works best for problems that have two key features. The first is **optimal substructure**, which means you can build the solution to a big problem from the solutions to smaller problems. The second is **overlapping subproblems**, where the same smaller problems show up multiple times during the process. By focusing on these features, DP ensures that each part of the problem is solved only once.
+DP works best for problems that have two features. The first is **optimal substructure**, which means you can build the solution to a big problem from the solutions to smaller problems. The second is **overlapping subproblems**, where the same smaller problems show up multiple times during the process. By focusing on these features, DP ensures that each part of the problem is solved only once.
 
-This method was introduced by Richard Bellman in the 1950s and has become a valuable tool in areas like computer science, economics, and operations research. It has been used to solve problems that would otherwise take too long by turning slow, exponential-time algorithms into much faster polynomial-time solutions. DP is practical and powerful for tackling real-world optimization challenges.
+This method was introduced by Richard Bellman in the 1950s and has become a valuable tool in areas like computer science, economics, and operations research. It has been used to solve problems that would otherwise take too long by turning slow, exponential-time algorithms into much faster polynomial-time solutions. DP is used in practice for tackling real-world optimization challenges.
 
 ### Principles
 

From e909df1dcd60867e01beb2564953a2200663c13a Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Fri, 8 Aug 2025 21:44:23 +0200
Subject: [PATCH 10/48] Update matrices.md

---
 notes/matrices.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/notes/matrices.md b/notes/matrices.md
index d25cd52..72167a3 100644
--- a/notes/matrices.md
+++ b/notes/matrices.md
@@ -1 +1,4 @@
 rotations
+
+
+backtracking dfs search

From 417e1d6bc0b517b0f2fbb479ffcf03d69ee3d8f9 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Fri, 8 Aug 2025 21:52:31 +0200
Subject: [PATCH 11/48] Update greedy_algorithms.md

---
 notes/greedy_algorithms.md | 230 +++++++++++++++++++++++++++++++++++++
 1 file changed, 230 insertions(+)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index 8b13789..46870be 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -1 +1,231 @@
+## What are greedy algorithms?
 
+Greedy methods construct a solution piece by piece, always choosing the currently best-looking option according to a simple rule. The subtlety is not the rule itself but the proof that local optimality extends to global optimality. Two proof tools do most of the work: exchange arguments (you can swap an optimal solution’s first “deviation” back to the greedy choice without harm) and loop invariants (you maintain a statement that pins down exactly what your partial solution guarantees at each step).
+
+Formally, consider a finite ground set $E$, a family of feasible subsets $\mathcal{F}\subseteq 2^E$, and a weight function $w:E\to \mathbb{R}$. A generic greedy scheme orders elements of $E$ by a key $\kappa(e)$ and scans them, adding $e$ to the building solution $S$ if $S\cup\{e\}\in\mathcal{F}$. Correctness means
+
+$$
+\text{Greedy}(E,\mathcal{F},w,\kappa)\in\arg\max\{\,w(S):S\in\mathcal{F}\,\}.
+$$
+
+The nice, crisp setting where this always works is the theory of matroids. Outside that, correctness must be argued problem-by-problem.
+
+```
+scan order:  e1  e2  e3  e4  e5  ...
+feasible?     Y   N   Y   Y   N
+solution S:  {e1,    e3, e4}
+```
+
+## The greedy-choice principle and exchange arguments
+
+A problem exhibits the greedy-choice principle if there exists an optimal solution that begins with a greedy step. Once you show that, you can peel off that step and repeat on the residual subproblem. Exchange arguments are a tight way to prove it.
+
+Let $G$ be the greedy solution and $O$ an optimal solution. Suppose the first element where they differ is $g\in G$ and $o\in O$. If replacing $o$ by $g$ in $O$ keeps feasibility and does not reduce the objective, then there exists an optimal solution that agrees with greedy at that position. Repeating this swap inductively transforms $O$ into $G$ with no loss, hence greedy is optimal.
+
+Visually, you keep “pushing” the optimal solution toward the greedy one:
+
+```
+O:   o  ?  ?  ?       → swap o→g at first difference
+G:   g  g  g  g
+```
+
+## Matroids
+
+Matroids capture exactly when a simple weight-ordered greedy works for all weights.
+
+A matroid is a pair $(E,\mathcal{I})$ where $\mathcal{I}\subseteq 2^E$ satisfies three axioms: non-emptiness $\varnothing\in\mathcal{I}$, heredity $A\in\mathcal{I}$ and $B\subseteq A\Rightarrow B\in\mathcal{I}$, and augmentation if $A,B\in\mathcal{I}$ with $|A|<|B|$ then there exists $x\in B\setminus A$ such that $A\cup\{x\}\in\mathcal{I}$.
+
+Given nonnegative weights $w:E\to\mathbb{R}_{\ge 0}$, the greedy algorithm that scans in order of decreasing $w$ and keeps an element whenever independence is preserved returns a maximum-weight independent set. The proof is a one-line exchange: if greedy kept $g$ and an optimal $O$ did not, augmentation in the matroid allows swapping in $g$ for some element of $O$ without losing feasibility or weight.
+
+Two takeaways rise above the formalism. When your feasibility system has an augmentation flavor, greedy tends to shine. When augmentation fails, greedy can fail spectacularly and you should be suspicious.
+
+## Loop invariants
+
+Some greedy scans are best understood with invariants that certify what your partial state already guarantees.
+
+A linear reachability scan on positions $0,1,\dots,n-1$ with “hop budgets” $a_i\ge 0$ uses the invariant
+
+$$
+F_i=\max\{\, j+a_j : 0\le j\le i,\ j\le F_{j}\ \text{when processed}\,\},
+$$
+
+where $F_i$ is the furthest index known reachable after processing index $i$. If at some step $i>F_{i-1}$, progress is impossible, because every candidate that could have extended the frontier was already considered.
+
+```
+indices: 0   1   2   3   4   5
+a[i]:    3   1   0   2   4   1
+frontier after i=0: F=3
+after i=1: F=max(3,1+1)=3
+after i=2: F=max(3,2+0)=3
+stuck at i=4 because 4>F
+```
+
+The invariant is monotone and essentially proves itself: no future step can invent a larger $j+a_j$ from a $j$ you skipped as unreachable.
+
+## Minimum spanning trees
+
+Two greedy algorithms produce minimum spanning trees in a connected weighted graph $G=(V,E,w)$.
+
+Kruskal’s rule sorts edges by nondecreasing weight and adds an edge if it connects two different components of the partial forest. Prim’s rule grows a single tree, always adding the lightest edge that leaves the current tree. Both succeed due to the cut and cycle properties.
+
+For any partition $(S,V\setminus S)$, the lightest edge crossing the cut belongs to some minimum spanning tree. For any cycle $C$, the heaviest edge on $C$ does not belong to any minimum spanning tree. The first property justifies “always add the cheapest safe edge,” while the second justifies “never add an edge that would create a cycle unless it is not the heaviest on that cycle.”
+
+A crisp exchange proves the cut property. If a minimum tree $T$ does not use the cut’s lightest edge $e$, add $e$ to $T$ to form a cycle. That cycle must cross the cut at least twice; remove a heavier cross-edge from the cycle to get a strictly cheaper spanning tree, a contradiction.
+
+```
+cut S | V\S
+   \  |  /
+    \ | /   ← add the cheapest cross-edge
+-----\|/-----
+```
+
+Union–find makes Kruskal’s rule nearly linear; a binary heap makes Prim’s rule nearly linear as well.
+
+## Shortest paths with nonnegative weights
+
+Dijkstra’s method picks an unsettled vertex with minimum tentative distance and settles it forever. The loop invariant is the heart: when a vertex $u$ is settled, its label $d(u)$ equals the true shortest-path distance $\delta(s,u)$. The proof uses nonnegativity. If there were a shorter route leaving the settled set, it would have to traverse an edge of nonnegative weight to an as-yet unsettled vertex v, which cannot reduce the label below the minimum label currently available to settle. Formally, an exchange of the “last edge that leaves the settled set” gives the contradiction.
+
+It helps to picture the labels as a wavefront expanding through the graph:
+
+```
+settled:  ####
+frontier:  .....
+unseen:       ooooo
+labels grow outward like ripples in a pond
+```
+
+Negative edge weights break the monotonicity that makes “settle and forget” safe.
+
+## Prefix sums and the maximum contiguous sum
+
+Consider a finite sequence $x_1,\dots,x_n$. Define prefix sums $S_0=0$ and $S_j=\sum_{k=1}^j x_k$. Any contiguous sum equals $S_j-S_i$ for some $0\le i<j$. The optimum is
+
+$$
+\max_{1\le j\le n}\bigl(S_j-\min_{0\le t<j} S_t\bigr).
+$$
+
+One pass that tracks the current prefix and the smallest prefix seen so far achieves the maximum. The greedy flavor is the “reset if the running sum drops too low” heuristic, which is simply the above formula written incrementally.
+
+```
+S:   0   2   1   4   3  -1   2
+min: 0   0   0   0   0  -1  -1
+gap: 0   2   1   4   3   0   3   → optimum 4
+```
+
+When every $x_i<0$, the best segment is the largest (least negative) single element, which appears automatically since the minimum prefix keeps descending.
+
+## Scheduling themes
+
+Two fundamental one-line rules show up over and over.
+
+Choosing as many compatible time intervals as possible is achieved by sorting by finishing time and repeatedly keeping the earliest finisher that does not conflict with what you already kept. An exchange argument makes it airtight: whenever an optimal schedule picks a later finisher as its next meeting, swapping it for the earlier finisher cannot reduce the pool of future compatible meetings because ending earlier never hurts.
+
+Minimizing maximum lateness for jobs with equal processing times is achieved by ordering by nondecreasing deadlines. If two adjacent jobs $i$ and $j$ disobey this order with $d_i>d_j$, swapping them does not increase any lateness and strictly helps one of them; repeatedly fixing inversions yields the sorted order with no worse objective.
+
+```
+time →
+kept:   [---)   [--)    [----)     [---)
+others:    [-----)    [------)  [--------)
+ending earlier opens more future room
+```
+
+Weighted versions of these problems typically need dynamic programming rather than greedy rules.
+
+## Huffman coding
+
+Given symbol frequencies $f_i>0$ with $\sum_i f_i=1$, a prefix code assigns codeword lengths $L_i$ satisfying the Kraft inequality $\sum_i 2^{-L_i}\le 1$. The expected codeword length is
+
+$$
+\mathbb{E}[L]=\sum_i f_i L_i.
+$$
+
+Huffman’s algorithm repeatedly merges the two least frequent symbols. The exchange idea is simple. In an optimal tree, the two deepest leaves must be siblings and must be the two least frequent symbols; otherwise, swapping their positions with the two least frequent decreases expected length by at least $f_{\text{heavy}}-f_{\text{light}}>0$. Merging these two leaves into a single pseudo-symbol of frequency $f_a+f_b$ reduces the problem size while preserving optimality, leading to optimality by induction.
+
+The code tree literally grows from the bottom:
+
+```
+      *
+     / \
+    *   c
+   / \
+  a   b         merge a,b first if f_a ≤ f_b ≤ f_c ≤ ...
+```
+
+The cost identity $\mathbb{E}[L]=\sum_{\text{internal nodes}} \text{weight}$ turns the greedy step into a visible decrement of objective value at every merge.
+
+## When greedy fails (and how to quantify “not too bad”)
+
+The $0\text{–}1$ knapsack with arbitrary weights defeats the obvious density-based rule. A small, dense item can block space needed for a medium-density item that pairs perfectly with a third, leading to a globally superior pack. Weighted interval scheduling similarly breaks the “earliest finish” rule; taking a long, heavy meeting can beat two short light ones that finish earlier.
+
+Approximation guarantees rescue several hard problems with principled greedy performance. For set cover on a universe $U$ with $|U|=n$, the greedy rule that repeatedly picks the set covering the largest number of uncovered elements achieves an $H_n$ approximation:
+
+$$
+\text{cost}_{\text{greedy}} \le H_n\cdot \text{OPT},\qquad H_n=\sum_{k=1}^n \frac{1}{k}\le \ln n+1.
+$$
+
+A tight charging argument proves it: each time you cover new elements, charge them equally; no element is charged more than the harmonic sum relative to the optimum’s coverage.
+
+Maximizing a nondecreasing submodular set function $f:2^E\to\mathbb{R}_{\ge 0}$ under a cardinality constraint $|S|\le k$ is a crown jewel. Submodularity means diminishing returns:
+
+$$
+A\subseteq B,\ x\notin B \ \Rightarrow\ f(A\cup\{x\})-f(A)\ \ge\ f(B\cup\{x\})-f(B).
+$$
+
+The greedy algorithm that adds the element with largest marginal gain at each step satisfies the celebrated bound
+
+$$
+f(S_k)\ \ge\ \Bigl(1-\frac{1}{e}\Bigr)\,f(S^\star),
+$$
+
+where $S^\star$ is an optimal size-$k$ set. The proof tracks the residual gap $g_i=f(S^\star)-f(S_i)$ and shows
+
+$$
+g_{i+1}\ \le\ \Bigl(1-\frac{1}{k}\Bigr)g_i,
+$$
+
+hence $g_k\le e^{-k/k}g_0=e^{-1}g_0$. Diminishing returns is exactly what makes the greedy increments add up to a constant-factor slice of the unreachable optimum.
+
+## Sweep lines and event counts: a greedy counting lens
+
+Many timeline problems reduce to counting the maximum load. Turn each interval $[s,e)$ into an arrival at $s$ and a departure at $e$. Sort all events and scan from left to right, increasing a counter on arrivals and decreasing it on departures. The answer is the peak value of the counter:
+
+$$
+\max_t C(t).
+$$
+
+Ties are processed with departures before arrivals, which matches the half-open convention and prevents phantom conflicts when one interval ends exactly where the next begins.
+
+```
+time → 1 2 3 4 5 6 7
+A:     [-----)
+B:        [---)
+C:           [-----)
+load:  1 2 3 2 2 1 0
+peak:  3
+```
+
+While this is not “optimization by selection,” it is greedy in spirit: you never need to look back, and the loop invariant (counter equals number of currently active intervals) makes the peak exact.
+
+## Anatomy of a greedy proof
+
+It pays to recognize the tiny handful of templates that keep recurring.
+
+* Exchange template for selection: assume the first divergence between a greedy solution and an optimal solution. Show the greedy choice weakly dominates the optimal one with respect to the future, swap, and iterate.
+* Cut template for graphs: argue that the cheapest edge crossing a cut is always safe to add, or that the heaviest edge on any cycle is always safe to discard.
+* Potential or invariant template for scans: identify a monotone quantity that only moves one way; once it passes a threshold, later steps cannot undo it.
+
+These are the same ideas with different clothes.
+
+## Pitfalls, boundary choices, and complexity
+
+Monotonicity assumptions are not decoration. Dijkstra needs nonnegative edges; the proof breaks the moment a negative edge lets a later step undercut a settled label. Interval boundaries love the half-open convention $[s,e)$ to make “end at $t$, start at $t$” compatible and to simplify event-ordering in sweeps.
+
+Complexity usually splits into a sort plus a scan. Sorting $n$ items costs $O(n\log n)$. A scan with constant-time updates costs $O(n)$. Minimum spanning trees achieve $O(m\alpha(n))$ with union–find for Kruskal and $O(m\log n)$ for Prim with a heap, where $\alpha$ is the inverse Ackermann function.
+
+## A compact design checklist you can actually use
+
+* Identify a key order that seems to make future choices easier. Finishing times, smallest weights, or largest marginal gains are usual suspects.
+* Propose a local rule that never looks back, then write down the loop invariant that must be true if the rule is right.
+* Decide whether an exchange argument, a cut/cycle argument, or a potential function is the appropriate proof lens.
+* Stress-test the rule on a crafted counterexample. If one appears, consider whether the structure is missing a matroid-like augmentation or a monotonicity prerequisite.
+* If exact optimality is out of reach, look for submodularity or harmonic charging to get a clean approximation guarantee.

From a44c8ceb16314e45ecb77fbc5820b27a4c2224a2 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Fri, 8 Aug 2025 22:39:19 +0200
Subject: [PATCH 12/48] Update greedy_algorithms.md

---
 notes/greedy_algorithms.md | 664 +++++++++++++++++++++++++++++++------
 1 file changed, 554 insertions(+), 110 deletions(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index 46870be..9cae8c4 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -16,216 +16,660 @@ feasible?     Y   N   Y   Y   N
 solution S:  {e1,    e3, e4}
 ```
 
-## The greedy-choice principle and exchange arguments
+### The greedy-choice principle and exchange arguments
 
-A problem exhibits the greedy-choice principle if there exists an optimal solution that begins with a greedy step. Once you show that, you can peel off that step and repeat on the residual subproblem. Exchange arguments are a tight way to prove it.
+Greedy methods feel simple on the surface—always take the best-looking move right now—but the proof that this is globally safe is subtle. The core idea is to show that at the first moment an optimal solution “disagrees” with your greedy choice, you can surgically swap in the greedy move without making things worse. Do that repeatedly and you literally transform some optimal solution into the greedy one. That’s the exchange argument.
 
-Let $G$ be the greedy solution and $O$ an optimal solution. Suppose the first element where they differ is $g\in G$ and $o\in O$. If replacing $o$ by $g$ in $O$ keeps feasibility and does not reduce the objective, then there exists an optimal solution that agrees with greedy at that position. Repeating this swap inductively transforms $O$ into $G$ with no loss, hence greedy is optimal.
+Let $E$ be a finite ground set of “atoms.” Feasible solutions are subsets $S\subseteq E$ belonging to a family $\mathcal{F}\subseteq 2^E$. The objective is additive:
 
-Visually, you keep “pushing” the optimal solution toward the greedy one:
+$$
+\text{maximize } w(S)=\sum_{e\in S} w(e)\quad\text{subject to } S\in\mathcal{F},\qquad w:E\to\mathbb{R}.
+$$
 
-```
-O:   o  ?  ?  ?       → swap o→g at first difference
-G:   g  g  g  g
-```
+A generic greedy algorithm fixes an order $e_1,e_2,\dots,e_m$ determined by a key $\kappa$ (for example, sort by nonincreasing $w$ or by earliest finishing time), then scans the elements and keeps $e_i$ whenever $S\cup\{e_i\}\in\mathcal{F}$.
 
-## Matroids
+Two structural properties make the exchange proof go through.
 
-Matroids capture exactly when a simple weight-ordered greedy works for all weights.
+1. Feasibility exchange. Whenever $A,B\in\mathcal{F}$ with $|A|<|B|$, there exists $x\in B\setminus A$ such that $A\cup\{x\}\in\mathcal{F}$. This “augmentation flavor” is what lets you replace a non-greedy element by a greedy one while staying feasible.
 
-A matroid is a pair $(E,\mathcal{I})$ where $\mathcal{I}\subseteq 2^E$ satisfies three axioms: non-emptiness $\varnothing\in\mathcal{I}$, heredity $A\in\mathcal{I}$ and $B\subseteq A\Rightarrow B\in\mathcal{I}$, and augmentation if $A,B\in\mathcal{I}$ with $|A|<|B|$ then there exists $x\in B\setminus A$ such that $A\cup\{x\}\in\mathcal{I}$.
+2. Local dominance. At the first position where greedy would keep $g$ but some optimal $O$ keeps $o\neq g$, you can drop some element $x\in O\setminus A$ and insert $g$ so that
 
-Given nonnegative weights $w:E\to\mathbb{R}_{\ge 0}$, the greedy algorithm that scans in order of decreasing $w$ and keeps an element whenever independence is preserved returns a maximum-weight independent set. The proof is a one-line exchange: if greedy kept $g$ and an optimal $O$ did not, augmentation in the matroid allows swapping in $g$ for some element of $O$ without losing feasibility or weight.
+$$
+A\cup\{g\}\cup\bigl(O\setminus\{x\}\bigr)\in\mathcal{F}
+\quad\text{and}\quad
+w(g)\ge w(x),
+$$
 
-Two takeaways rise above the formalism. When your feasibility system has an augmentation flavor, greedy tends to shine. When augmentation fails, greedy can fail spectacularly and you should be suspicious.
+where $A$ is the common prefix chosen by both up to that point. The inequality ensures the objective does not decrease during the swap.
 
-## Loop invariants
+When $(E,\mathcal{F})$ is a matroid, the feasibility exchange always holds; if you also order by nonincreasing $w$, local dominance holds trivially with $x$ chosen by the matroid’s augmentation. Many everyday problems satisfy these two properties even without full matroid machinery.
 
-Some greedy scans are best understood with invariants that certify what your partial state already guarantees.
+Write the greedy picks as a sequence $G=(g_1,g_2,\dots,g_k)$, in the order chosen. The following lemma is the workhorse.
 
-A linear reachability scan on positions $0,1,\dots,n-1$ with “hop budgets” $a_i\ge 0$ uses the invariant
+**Lemma (first-difference exchange).** Suppose there exists an optimal solution $O$ whose first $t-1$ elements agree with greedy, meaning $g_1,\dots,g_{t-1}\in O$. If $g_t\in O$ as well, continue. Otherwise there exists $x\in O\setminus\{g_1,\dots,g_{t-1}\}$ such that
 
 $$
-F_i=\max\{\, j+a_j : 0\le j\le i,\ j\le F_{j}\ \text{when processed}\,\},
+O' \;=\;\bigl(O\setminus\{x\}\bigr)\cup\{g_t\}\in\mathcal{F}
+\quad\text{and}\quad
+w(O')\ge w(O).
 $$
 
-where $F_i$ is the furthest index known reachable after processing index $i$. If at some step $i>F_{i-1}$, progress is impossible, because every candidate that could have extended the frontier was already considered.
+Hence there is an optimal solution that agrees with greedy on the first $t$ positions.
+
+*Proof sketch.* Let $A_{t-1}=\{g_1,\dots,g_{t-1}\}$. Because greedy considered $g_t$ before any element in $O\setminus A_{t-1}$ that it skipped, local dominance says some $x\in O\setminus A_{t-1}$ can be traded for $g_t$ without breaking feasibility and without decreasing weight. This creates $O'$ optimal and consistent with greedy for one more step. Apply the same reasoning inductively.
+
+Induction on $t$ yields the main theorem: there exists an optimal solution that agrees with greedy everywhere, hence greedy is optimal.
+
+It helps to picture the two solutions aligned in the greedy order. The top row is the greedy decision at each position; the bottom row is some optimal solution, possibly disagreeing. At the first disagreement, one swap pushes the optimal line upward to match greedy, and the objective value does not drop.
 
 ```
-indices: 0   1   2   3   4   5
-a[i]:    3   1   0   2   4   1
-frontier after i=0: F=3
-after i=1: F=max(3,1+1)=3
-after i=2: F=max(3,2+0)=3
-stuck at i=4 because 4>F
+positions →   1      2      3      4      5      6      7
+greedy G:    [g1]   [g2]   [g3]   [g4]   [g5]   [g6]   [g7]
+optimal O:   [g1]   [g2]   [ o ]  [ ? ]  [ ? ]  [ ? ]  [ ? ]
+                                      
+exchange at position 3:
+drop some x from O beyond position 2 and insert g3
+
+after swap:
+optimal O':  [g1]   [g2]   [g3]   [ ? ]  [ ? ]  [ ? ]  [ ? ]
 ```
 
-The invariant is monotone and essentially proves itself: no future step can invent a larger $j+a_j$ from a $j$ you skipped as unreachable.
+The key is not the letter symbols but the invariants. Up to position $t-1$, both solutions coincide. The swap keeps feasibility and weight, so you have a new optimal that also matches at position $t$. Repeat, and the bottom row becomes the top row.
+
+### Matroids
+
+Greedy methods don’t usually get ironclad guarantees, but there is a beautiful class of feasibility systems where they do. That class is the matroids. Once your constraints form a matroid, the simplest weight-ordered greedy scan is not a heuristic anymore; it is provably optimal for every nonnegative weight assignment.
+
+A matroid is a pair $(E,\mathcal{I})$ with $E$ a finite ground set and $\mathcal{I}\subseteq 2^E$ the “independent” subsets. Three axioms hold.
+
+* Non-emptiness says $\varnothing\in\mathcal{I}$.
+* Heredity says independence is downward-closed: if $A\in\mathcal{I}$ and $B\subseteq A$, then $B\in\mathcal{I}$.
+* Augmentation says independence grows smoothly: if $A,B\in\mathcal{I}$ with $|A|<|B|$, then some $x\in B\setminus A$ exists with $A\cup\{x\}\in\mathcal{I}$.
+
+The last axiom is the heart. It forbids “dead ends” where a smaller feasible set cannot absorb a single element from any larger feasible set. That smoothness is exactly what greedy needs to keep repairing early choices.
 
-## Minimum spanning trees
+### Reachability on a line
 
-Two greedy algorithms produce minimum spanning trees in a connected weighted graph $G=(V,E,w)$.
+You’re standing on square 0 of a line of squares $0,1,\dots,n-1$.
+Each square $i$ tells you how far you’re allowed to jump forward from there: a number $a[i]$. From $i$, you can jump to any square $i+1, i+2, \dots, i+a[i]$. The goal is to decide whether you can ever reach the last square, and, if not, what the furthest square is that you can reach.
 
-Kruskal’s rule sorts edges by nondecreasing weight and adds an edge if it connects two different components of the partial forest. Prim’s rule grows a single tree, always adding the lightest edge that leaves the current tree. Both succeed due to the cut and cycle properties.
+#### Example input and the expected output
 
-For any partition $(S,V\setminus S)$, the lightest edge crossing the cut belongs to some minimum spanning tree. For any cycle $C$, the heaviest edge on $C$ does not belong to any minimum spanning tree. The first property justifies “always add the cheapest safe edge,” while the second justifies “never add an edge that would create a cycle unless it is not the heaviest on that cycle.”
+Input array: `a = [3, 1, 0, 0, 4, 1]`
+There are 6 squares (0 through 5).
+Correct output: you cannot reach the last square; the furthest you can get is square `3`.
 
-A crisp exchange proves the cut property. If a minimum tree $T$ does not use the cut’s lightest edge $e$, add $e$ to $T$ to form a cycle. That cycle must cross the cut at least twice; remove a heavier cross-edge from the cycle to get a strictly cheaper spanning tree, a contradiction.
+#### A slow but obvious approach
+
+Think “paint everything I can reach, one wave at a time.”
+
+1. Start with square 0 marked “reachable.”
+2. For every square already marked, paint all squares it can jump to.
+3. Keep doing this until no new squares get painted.
+
+This is correct because you literally try every allowed jump from every spot you know is reachable. It can be wasteful, though, because the same squares get reconsidered over and over in dense cases.
+
+Walking the example:
 
 ```
-cut S | V\S
-   \  |  /
-    \ | /   ← add the cheapest cross-edge
------\|/-----
+start:   reachable = {0}
+from 0:  can reach {1,2,3} → reachable = {0,1,2,3}
+from 1:  can reach {2}     → no change
+from 2:  can reach {}      → no change (a[2]=0)
+from 3:  can reach {}      → no change (a[3]=0)
+done:    no new squares → furthest is 3, last is unreachable
 ```
 
-Union–find makes Kruskal’s rule nearly linear; a binary heap makes Prim’s rule nearly linear as well.
+#### A clean, fast greedy scan
 
-## Shortest paths with nonnegative weights
+Carry one number as you sweep left to right: `F`, the furthest square you can reach **so far**.
+Rule of thumb:
 
-Dijkstra’s method picks an unsettled vertex with minimum tentative distance and settles it forever. The loop invariant is the heart: when a vertex $u$ is settled, its label $d(u)$ equals the true shortest-path distance $\delta(s,u)$. The proof uses nonnegativity. If there were a shorter route leaving the settled set, it would have to traverse an edge of nonnegative weight to an as-yet unsettled vertex v, which cannot reduce the label below the minimum label currently available to settle. Formally, an exchange of the “last edge that leaves the settled set” gives the contradiction.
+* If you’re looking at square `i` and `i` is beyond `F`, you’re stuck forever.
+* Otherwise, extend the frontier with `F = max(F, i + a[i])` and move on.
 
-It helps to picture the labels as a wavefront expanding through the graph:
+That’s it—one pass, no backtracking.
+
+Why this is safe in a sentence: `F` always summarizes “the best jump end we have discovered from any square we truly reached,” and it never goes backward; if you hit a gap where `i > F`, then no earlier jump can help because its effect was already folded into `F`.
+
+Plugging in the same numbers
 
 ```
-settled:  ####
-frontier:  .....
-unseen:       ooooo
-labels grow outward like ripples in a pond
+a = [3, 1, 0, 0, 4, 1]
+n = 6
+F = 0           # we start at square 0 (we’ll extend immediately at i=0)
+
+i=0: 0 ≤ F → F = max(0, 0+3) = 3
+i=1: 1 ≤ F → F = max(3, 1+1) = 3
+i=2: 2 ≤ F → F = max(3, 2+0) = 3
+i=3: 3 ≤ F → F = max(3, 3+0) = 3
+i=4: 4 > F  → stuck here
 ```
 
-Negative edge weights break the monotonicity that makes “settle and forget” safe.
+Final state: `F = 3`, which means the furthest reachable square is 3. Since `F < n-1 = 5`, the last square is not reachable.
+
+### Minimum spanning trees
 
-## Prefix sums and the maximum contiguous sum
+You’ve got a connected weighted graph and you want the cheapest way to connect **all** its vertices without any cycles—that’s a minimum spanning tree (MST). Think “one network of cables that touches every building, with the total cost as small as possible.”
 
-Consider a finite sequence $x_1,\dots,x_n$. Define prefix sums $S_0=0$ and $S_j=\sum_{k=1}^j x_k$. Any contiguous sum equals $S_j-S_i$ for some $0\le i<j$. The optimum is
+#### Example input → expected output
+
+Vertices: $V=\{A,B,C,D,E\}$
+
+Edges with weights:
+
+* $A\!-\!B:1,\ A\!-\!C:5,\ A\!-\!E:9$
+* $B\!-\!C:4,\ B\!-\!D:2,\ B\!-\!E:7$
+* $C\!-\!D:6,\ C\!-\!E:3$
+* $D\!-\!E:8$
+
+A correct MST for this graph is:
 
 $$
-\max_{1\le j\le n}\bigl(S_j-\min_{0\le t<j} S_t\bigr).
+\{A\!-\!B(1),\ B\!-\!D(2),\ C\!-\!E(3),\ B\!-\!C(4)\}
 $$
 
-One pass that tracks the current prefix and the smallest prefix seen so far achieves the maximum. The greedy flavor is the “reset if the running sum drops too low” heuristic, which is simply the above formula written incrementally.
+Total weight $=1+2+3+4=10$.
+
+You can’t do better: any cheaper set of 4 edges would either miss a vertex or create a cycle.
+
+#### A slow, baseline way (what you’d do if time didn’t matter)
+
+Enumerate every spanning tree and pick the one with the smallest total weight. That’s conceptually simple—“try all combinations of $n-1$ edges that connect everything and have no cycles”—but it explodes combinatorially. Even medium graphs have an astronomical number of spanning trees, so this approach is only good as a thought experiment.
+
+#### Two fast greedy methods that always work
+
+Both fast methods rely on two facts:
+
+* **Cut rule (safe to add):** for any cut $(S, V\setminus S)$, the cheapest edge that crosses the cut appears in some MST. Intuition: if your current partial connection is on one side, the cheapest bridge to the other side is never a bad idea.
+* **Cycle rule (safe to skip):** in any cycle, the most expensive edge is never in an MST. Intuition: if you already have a loop, drop the priciest link and you’ll still be connected but strictly cheaper.
+
+#### Kruskal’s method
+Sort edges from lightest to heaviest; walk down that list and keep an edge if it connects two **different** components. Stop when you have $n-1$ edges.
+
+Sorted edges by weight:
+
+$$
+1: A\!-\!B,\quad
+2: B\!-\!D,\quad
+3: C\!-\!E,\quad
+4: B\!-\!C,\quad
+5: A\!-\!C,\quad
+6: C\!-\!D,\quad
+7: B\!-\!E,\quad
+8: D\!-\!E,\quad
+9: A\!-\!E
+$$
+
+We’ll keep a running view of the components; initially each vertex is alone.
+
+1. $A\!-\!B(1)$ connects $\{A\}$ and $\{B\}$ → **keep**
+   Components: $\{A,B\},\{C\},\{D\},\{E\}$
+
+2. $B\!-\!D(2)$ connects $\{A,B\}$ and $\{D\}$ → **keep**
+   Components: $\{A,B,D\},\{C\},\{E\}$
+
+3. $C\!-\!E(3)$ connects $\{C\}$ and $\{E\}$ → **keep**
+   Components: $\{A,B,D\},\{C,E\}$
+
+4. $B\!-\!C(4)$ connects $\{A,B,D\}$ and $\{C,E\}$ → **keep**
+   Components: $\{A,B,C,D,E\}$  ← all connected; we have 4 edges → stop.
+
+Edges kept: $A\!-\!B(1), B\!-\!D(2), C\!-\!E(3), B\!-\!C(4)$.
+Total $=10$. Every later edge would create a cycle and is skipped by the cycle rule.
+
+### Prim's method
+
+Start from any vertex; repeatedly add the lightest edge that leaves the current tree to bring in a new vertex. Stop when all vertices are in.
+
+Let’s start from $A$. The “tree” grows one cheapest boundary edge at a time.
+
+Start: tree = $\{A\}$. Boundary edges from $A$: $A\!-\!B(1), A\!-\!C(5), A\!-\!E(9)$.
+
+1. Take $A\!-\!B(1)$ (lightest leaving the tree).
+   Tree vertices: $\{A,B\}$. New boundary includes $B\!-\!D(2), B\!-\!C(4), B\!-\!E(7)$ as well.
+
+2. Take $B\!-\!D(2)$ (now the lightest boundary edge).
+   Tree: $\{A,B,D\}$. Boundary now has $B\!-\!C(4), A\!-\!C(5), D\!-\!C(6), B\!-\!E(7), D\!-\!E(8), A\!-\!E(9)$.
+
+3. Take $B\!-\!C(4)$.
+   Tree: $\{A,B,C,D\}$. Boundary updates to include $C\!-\!E(3)$.
+
+4. Take $C\!-\!E(3)$ (cheapest boundary edge now).
+   Tree: $\{A,B,C,D,E\}$ → all vertices included → stop.
+
+Edges chosen: exactly the same four as Kruskal, total $=10$.
+
+Why did step 4 grab a weight-3 edge after we already took a 4? Because earlier that 3 wasn’t **available**—it didn’t cross from the tree to the outside until $C$ joined the tree. Prim never regrets earlier picks because of the cut rule: at each moment it adds the cheapest bridge from “inside” to “outside,” and that’s always safe.
+
+### Shortest paths with non-negative weights
+
+You’ve got a weighted graph and a starting node $s$. Every edge has a cost $\ge 0$. The task is to find the cheapest cost to reach every node from $s$, and a cheapest route for each if you want it.
+
+#### Example input → expected output
+
+Nodes: $A,B,C,D,E$
+
+Edges (undirected, weights in parentheses):
+
+* $A\!-\!B(2)$, $A\!-\!C(5)$
+* $B\!-\!C(1)$, $B\!-\!D(2)$, $B\!-\!E(7)$
+* $C\!-\!D(3)$, $C\!-\!E(1)$
+* $D\!-\!E(2)$
+
+Start at $A$.
+
+Correct shortest-path costs from $A$:
+
+* $d(A)=0$
+* $d(B)=2$ via $A\!\to\!B$
+* $d(C)=3$ via $A\!\to\!B\!\to\!C$
+* $d(D)=4$ via $A\!\to\!B\!\to\!D$
+* $d(E)=4$ via $A\!\to\!B\!\to\!C\!\to\!E$
+
+#### A slow baseline (what you’d do without the greedy insight)
+
+One safe—but slower—approach is to relax all edges repeatedly until nothing improves. Think of it as “try to shorten paths by one edge at a time, do that $|V|-1$ rounds.” This eventually converges to the true shortest costs, but it touches every edge many times, so its work is about $|V|\cdot|E|$. It also handles negative edges, which is why it has to be cautious and keep looping.
+
+#### The greedy method: Dijkstra’s idea
+
+Carry two sets and a distance label for each node.
+
+* “Settled” nodes are done forever; their labels are final.
+* “Unsettled” nodes still might improve.
+
+Initialize all labels to $\infty$ except $d(s)=0$. Over and over:
+
+1. Pick the **unsettled** node with the **smallest** label. Call it $u$. Move $u$ to settled.
+2. For each neighbor $v$ of $u$, try to improve $d(v)$ using the route through $u$:
+
+   $$
+   d(v)\ \leftarrow\ \min\bigl(d(v),\ d(u)+w(u,v)\bigr).
+   $$
+
+Greedy “settle and forget” is safe **because all edges are non-negative**. At the moment you pick the smallest-label unsettled node $u$, every route to any other unsettled node must cost at least $d(u)$ plus some non-negative edge to leave the settled set, so nobody can beat $d(u)$ later.
+
+#### Plugging the numbers
+
+We’ll keep a tiny table each round. “S” means settled. Ties can be broken arbitrarily.
+
+Start:
+
+* Labels: $d(A)=0$; $d(B)=d(C)=d(D)=d(E)=\infty$
+* Settled: $\varnothing$
+
+Round 1
+Pick min unsettled → $A$ (0). Settle $A$. Relax its neighbors.
+
+* Update via $A$: $d(B)=2$, $d(C)=5$
+* Labels now: $A:0\text{ (S)},\ B:2,\ C:5,\ D:\infty,\ E:\infty$
+
+Round 2
+Pick min unsettled → $B$ (2). Settle $B$. Relax neighbors of $B$.
+
+* $C$ via $B$: $2+1=3 < 5$ → $d(C)=3$
+* $D$ via $B$: $2+2=4$ → $d(D)=4$
+* $E$ via $B$: $2+7=9$ → $d(E)=9$
+* Labels: $A:0\text{ (S)},\ B:2\text{ (S)},\ C:3,\ D:4,\ E:9$
+
+Round 3
+Pick min unsettled → $C$ (3). Settle $C$. Relax neighbors of $C$.
+
+* $D$ via $C$: $3+3=6$ (worse than 4) → no change
+* $E$ via $C$: $3+1=4 < 9$ → $d(E)=4$
+* Labels: $A:0\text{ (S)},\ B:2\text{ (S)},\ C:3\text{ (S)},\ D:4,\ E:4$
+
+Round 4
+Pick min unsettled → tie $D$ or $E$ at 4. Take $D$. Settle $D$. Relax neighbors.
+
+* $E$ via $D$: $4+2=6$ (worse than 4) → no change
+* Labels: $A:0\text{ (S)},\ B:2\text{ (S)},\ C:3\text{ (S)},\ D:4\text{ (S)},\ E:4$
+
+Round 5
+Pick $E$ (4). Settle $E$. No better updates. Done: all settled.
+
+Final labels:
+
+* $d(A)=0$
+* $d(B)=2$
+* $d(C)=3$
+* $d(D)=4$
+* $d(E)=4$
+
+Recovering paths by remembering “who improved whom” gives:
+
+* $B$ from $A$
+* $C$ from $B$
+* $D$ from $B$
+* $E$ from $C$
+
+### Maximum contiguous sum
+
+You’re given a list of numbers laid out in a line. You may pick one **contiguous** block, and you want that block’s sum to be as large as possible.
+
+### Example input and the expected output
+
+Take $x = [\,2,\,-3,\,4,\,-1,\,2,\,-5,\,3\,]$.
+
+A best block is $[\,4,\,-1,\,2\,]$. Its sum is $5$.
+So the correct output is “maximum sum $=5$” and one optimal segment is positions $3$ through $5$ (1-based).
+
+#### A slow, obvious baseline
+
+Try every possible block and keep the best total. To sum any block $i..j$ quickly, precompute **prefix sums** $S_0=0$ and $S_j=\sum_{k=1}^j x_k$. Then
+
+$$
+\sum_{k=i}^j x_k \;=\; S_j - S_{i-1}.
+$$
+
+Loop over all $j$ and all $i\le j$, compute $S_j-S_{i-1}$, and take the maximum. This is easy to reason about and always correct, but it does $O(n^2)$ block checks.
+
+#### A clean one-pass greedy scan
+
+Walk left to right once and carry two simple numbers.
+
+* $S$: the running prefix sum up to the current position.
+* $M$: the **smallest** prefix seen so far (to the left of the current position).
+
+At each step $j$, the best block **ending at** $j$ is “current prefix minus the smallest older prefix”:
+
+$$
+\text{best\_ending\_at\ }j \;=\; S_j - \min_{0\le t<j} S_t.
+$$
+
+So during the scan:
+
+1. Update $S \leftarrow S + x_j$.
+2. Update the answer with $S - M$.
+3. Update $M \leftarrow \min(M, S)$.
+
+This is the whole algorithm. In words: keep the lowest floor you’ve ever seen and measure how high you are above it now. If you dip to a new floor, remember it; if you rise, maybe you’ve set a new record.
+
+A widely used equivalent form keeps a “best sum ending here” value $E$: set $E \leftarrow \max(x_j,\; E+x_j)$ and track a global maximum. It’s the same idea written incrementally: if the running sum ever hurts you, you “reset” and start fresh at the current element.
+
+#### Work the example by hand
+
+Sequence $x = [\,2,\,-3,\,4,\,-1,\,2,\,-5,\,3\,]$.
+Initialize $S=0$, $M=0$, and $\text{best}=-\infty$. Keep the index $t$ where the current $M$ occurred so we can reconstruct the block as $(t+1)..j$.
 
 ```
-S:   0   2   1   4   3  -1   2
-min: 0   0   0   0   0  -1  -1
-gap: 0   2   1   4   3   0   3   → optimum 4
-```
+ j   x_j   S_j = S+x_j   M (min prefix so far)   S_j - M   best   chosen block
+--------------------------------------------------------------------------------
+ 1    2         2                0                  2        2     (1..1)
+       update: M = min(0,2) = 0
 
-When every $x_i<0$, the best segment is the largest (least negative) single element, which appears automatically since the minimum prefix keeps descending.
+ 2   -3        -1                0                 -1        2     (still 1..1)
+       update: M = min(0,-1) = -1   [new floor at t=2]
 
-## Scheduling themes
+ 3    4         3               -1                  4        4     (3..3)
+       update: M = min(-1,3) = -1
 
-Two fundamental one-line rules show up over and over.
+ 4   -1         2               -1                  3        4     (still 3..3)
+       update: M = min(-1,2) = -1
 
-Choosing as many compatible time intervals as possible is achieved by sorting by finishing time and repeatedly keeping the earliest finisher that does not conflict with what you already kept. An exchange argument makes it airtight: whenever an optimal schedule picks a later finisher as its next meeting, swapping it for the earlier finisher cannot reduce the pool of future compatible meetings because ending earlier never hurts.
+ 5    2         4               -1                  5        5     (3..5)  ✓
+       update: M = min(-1,4) = -1
 
-Minimizing maximum lateness for jobs with equal processing times is achieved by ordering by nondecreasing deadlines. If two adjacent jobs $i$ and $j$ disobey this order with $d_i>d_j$, swapping them does not increase any lateness and strictly helps one of them; repeatedly fixing inversions yields the sorted order with no worse objective.
+ 6   -5        -1               -1                  0        5     (still 3..5)
+       update: M = min(-1,-1) = -1
 
+ 7    3         2               -1                  3        5     (still 3..5)
+       update: M = min(-1,2) = -1
 ```
-time →
-kept:   [---)   [--)    [----)     [---)
-others:    [-----)    [------)  [--------)
-ending earlier opens more future room
+
+Final answer: maximum sum $=5$, achieved by indices $3..5$ (that’s $[4,-1,2]$).
+
+You can picture $S_j$ as a hilly skyline and $M$ as the lowest ground you’ve touched. The best block is the tallest vertical gap between the skyline and any earlier ground level.
+
+```
+prefix S: 0 → 2 → -1 → 3 → 2 → 4 → -1 → 2
+ground M: 0    0    -1   -1  -1  -1   -1  -1
+gap S-M: 0    2     0    4   3   5    0   3
+                             ^ peak gap = 5 here
 ```
 
-Weighted versions of these problems typically need dynamic programming rather than greedy rules.
+#### Edge cases
+
+When all numbers are negative, the best block is the **least negative single element**. The scan handles this automatically because $M$ keeps dropping with every step, so the maximum of $S_j-M$ happens when you take just the largest entry.
+
+Empty-block conventions matter. If you define the answer to be strictly nonempty, initialize $\text{best}$ with $x_1$ and $E=x_1$ in the incremental form; if you allow empty blocks with sum $0$, initialize $\text{best}=0$ and $M=0$. Either way, the one-pass logic doesn’t change.
+
+### Scheduling themes
 
-## Huffman coding
+Two everyday scheduling goals keep popping up. One tries to pack as many non-overlapping intervals as possible, like booking the most meetings in a single room. The other tries to keep lateness under control when jobs have deadlines, like finishing homework so the worst overrun is as small as possible. Both have crisp greedy rules, and both are easy to run by hand once you see them.
 
-Given symbol frequencies $f_i>0$ with $\sum_i f_i=1$, a prefix code assigns codeword lengths $L_i$ satisfying the Kraft inequality $\sum_i 2^{-L_i}\le 1$. The expected codeword length is
+Imagine you have time intervals on a single line, and you can keep an interval only if it doesn’t overlap anything you already kept. The aim is to keep as many as possible.
+
+#### Example input and the desired output
+
+Intervals (start, finish):
+
+* $(1,3)$, $(2,5)$, $(4,7)$, $(6,9)$, $(8,10)$, $(9,11)$
+
+A best answer keeps four intervals, for instance $(1,3),(4,7),(8,10),(10,11)$. I wrote $(10,11)$ for clarity even though the original end was $11$; think half-open $[s,e)$ if you want “touching” to be allowed.
+
+#### A slow baseline
+
+Try all subsets and keep the largest that has no overlaps. That’s conceptually simple and always correct, but it’s exponential in the number of intervals, which is a non-starter for anything but tiny inputs.
+
+#### The greedy rule
+
+Sort by finishing time, then walk once from earliest finisher to latest. Keep an interval if its start is at least the end time of the last one you kept. Ending earlier leaves more room for the future, and that is the whole intuition.
+
+Sorted by finish:
 
 $$
-\mathbb{E}[L]=\sum_i f_i L_i.
+(1,3),\ (2,5),\ (4,7),\ (6,9),\ (8,10),\ (9,11)
 $$
 
-Huffman’s algorithm repeatedly merges the two least frequent symbols. The exchange idea is simple. In an optimal tree, the two deepest leaves must be siblings and must be the two least frequent symbols; otherwise, swapping their positions with the two least frequent decreases expected length by at least $f_{\text{heavy}}-f_{\text{light}}>0$. Merging these two leaves into a single pseudo-symbol of frequency $f_a+f_b$ reduces the problem size while preserving optimality, leading to optimality by induction.
+Run the scan and track the end of the last kept interval.
+
+```
+last_end = -∞
+(1,3):  1 ≥ -∞ → keep → last_end = 3
+(2,5):  2 < 3  → skip
+(4,7):  4 ≥ 3  → keep → last_end = 7
+(6,9):  6 < 7  → skip
+(8,10): 8 ≥ 7  → keep → last_end = 10
+(9,11): 9 < 10 → skip
+```
+
+Kept intervals: $(1,3),(4,7),(8,10)$. If we allow a meeting that starts exactly at $10$, we can also keep $(10,11)$ if it exists. Four kept, which matches the claim.
 
-The code tree literally grows from the bottom:
+A tiny picture helps the “finish early” idea feel natural:
 
 ```
-      *
-     / \
-    *   c
-   / \
-  a   b         merge a,b first if f_a ≤ f_b ≤ f_c ≤ ...
+time →
+kept:   [1──3)      [4───7)        [8─10)
+skip:      [2────5)    [6────9)        [9───11)
+ending earlier leaves more open space to the right
 ```
 
-The cost identity $\mathbb{E}[L]=\sum_{\text{internal nodes}} \text{weight}$ turns the greedy step into a visible decrement of objective value at every merge.
+Why this works in one sentence: at the first place an optimal schedule would choose a later-finishing interval, swapping in the earlier finisher cannot reduce what still fits afterward, so you can push the optimal schedule to match greedy without losing size.
 
-## When greedy fails (and how to quantify “not too bad”)
+### Minimize the maximum lateness
 
-The $0\text{–}1$ knapsack with arbitrary weights defeats the obvious density-based rule. A small, dense item can block space needed for a medium-density item that pairs perfectly with a third, leading to a globally superior pack. Weighted interval scheduling similarly breaks the “earliest finish” rule; taking a long, heavy meeting can beat two short light ones that finish earlier.
+Now think of $n$ jobs, all taking the same amount of time (say one unit). Each job $i$ has a deadline $d_i$. When you run them in some order, the completion time of the $k$-th job is $C_k=k$ (since each takes one unit), and its lateness is
 
-Approximation guarantees rescue several hard problems with principled greedy performance. For set cover on a universe $U$ with $|U|=n$, the greedy rule that repeatedly picks the set covering the largest number of uncovered elements achieves an $H_n$ approximation:
+$$
+L_i = C_i - d_i.
+$$
+
+Negative values mean you finished early; the quantity to control is the worst lateness $L_{\max}=\max_i L_i$. The goal is to order the jobs so $L_{\max}$ is as small as possible.
+
+#### Example input and the desired output
+
+Jobs and deadlines:
+
+* $J_1: d_1=3$
+* $J_2: d_2=1$
+* $J_3: d_3=4$
+* $J_4: d_4=2$
+
+An optimal schedule is $J_2,J_4, J_1, J_3$. The maximum lateness there is $0$.
+
+#### A slow baseline
+
+Try all $n!$ orders, compute every job’s completion time and lateness, and take the order with the smallest $L_{\max}$. This explodes even for modest $n$.
+
+#### The greedy rule
+
+Order jobs by nondecreasing deadlines (earliest due date first, often called EDD). Fixing any “inversion” where a later deadline comes before an earlier one can only help the maximum lateness, so sorting by deadlines is safe.
+
+Deadlines in increasing order:
 
 $$
-\text{cost}_{\text{greedy}} \le H_n\cdot \text{OPT},\qquad H_n=\sum_{k=1}^n \frac{1}{k}\le \ln n+1.
+J_2(d=1),\ J_4(d=2),\ J_1(d=3),\ J_3(d=4)
 $$
 
-A tight charging argument proves it: each time you cover new elements, charge them equally; no element is charged more than the harmonic sum relative to the optimum’s coverage.
+Run them one by one and compute completion times and lateness.
 
-Maximizing a nondecreasing submodular set function $f:2^E\to\mathbb{R}_{\ge 0}$ under a cardinality constraint $|S|\le k$ is a crown jewel. Submodularity means diminishing returns:
+```
+slot 1: J2 finishes at C=1 → L2 = 1 - d2(=1) = 0
+slot 2: J4 finishes at C=2 → L4 = 2 - d4(=2) = 0
+slot 3: J1 finishes at C=3 → L1 = 3 - d1(=3) = 0
+slot 4: J3 finishes at C=4 → L3 = 4 - d3(=4) = 0
+L_max = 0
+```
+
+If you scramble the order, the worst lateness jumps. For example, $J_1,J_2,J_3,J_4$ gives
+
+```
+slot 1: J1 → L1 = 1 - 3 = -2
+slot 2: J2 → L2 = 2 - 1 = 1
+slot 3: J3 → L3 = 3 - 4 = -1
+slot 4: J4 → L4 = 4 - 2 = 2
+L_max = 2   (worse)
+```
+
+A quick timeline sketch shows how EDD keeps you out of trouble:
+
+```
+time → 1   2   3   4
+EDD:   [J2][J4][J1][J3]   deadlines: 1   2   3   4
+late?    0    0    0    0  → max lateness 0
+```
+
+Why this works in one sentence: if two adjacent jobs are out of deadline order, swapping them never increases any completion time relative to its own deadline, and strictly improves at least one, so repeatedly fixing these inversions leads to the sorted-by-deadline order with no worse maximum lateness.
+
+### Huffman coding
+
+You have symbols that occur with known frequencies $f_i>0$ and $\sum_i f_i=1$. The goal is to assign each symbol a binary codeword so that no codeword is a prefix of another (a prefix code), and the average length
 
 $$
-A\subseteq B,\ x\notin B \ \Rightarrow\ f(A\cup\{x\})-f(A)\ \ge\ f(B\cup\{x\})-f(B).
+\mathbb{E}[L]=\sum_i f_i\,L_i
 $$
 
-The greedy algorithm that adds the element with largest marginal gain at each step satisfies the celebrated bound
+is as small as possible. Prefix codes exactly correspond to full binary trees whose leaves are the symbols and whose leaf depths are the codeword lengths $L_i$. The Kraft inequality $\sum_i 2^{-L_i}\le 1$ is the feasibility condition; equality holds for full trees.
+
+#### Example input and the target output
+
+Frequencies:
 
 $$
-f(S_k)\ \ge\ \Bigl(1-\frac{1}{e}\Bigr)\,f(S^\star),
+A:0.40,\quad B:0.20,\quad C:0.20,\quad D:0.10,\quad E:0.10.
 $$
 
-where $S^\star$ is an optimal size-$k$ set. The proof tracks the residual gap $g_i=f(S^\star)-f(S_i)$ and shows
+A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths $L_A,\dots,L_E$, plus a concrete codebook.
+
+### A naive way to think about it
+
+One conceptual baseline is to enumerate all full binary trees with five labeled leaves and pick the one minimizing $\sum f_i\,L_i$. That is correct but explodes combinatorially as the number of symbols grows. A simpler but usually suboptimal baseline is to give every symbol the same length $\lceil \log_2 5\rceil=3$. That fixed-length code has $\mathbb{E}[L]=3$.
+
+#### The greedy method that is actually optimal
+
+Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights $p$ and $q$, you create a parent of weight $p+q$. The act of merging adds exactly $p+q$ to the objective $\mathbb{E}[L]$ because every leaf inside those two subtrees becomes one level deeper. Summing over all merges yields the final cost:
 
 $$
-g_{i+1}\ \le\ \Bigl(1-\frac{1}{k}\Bigr)g_i,
+\mathbb{E}[L]=\sum_{\text{merges}} (p+q)=\sum_{\text{internal nodes}} \text{weight}.
 $$
 
-hence $g_k\le e^{-k/k}g_0=e^{-1}g_0$. Diminishing returns is exactly what makes the greedy increments add up to a constant-factor slice of the unreachable optimum.
+The greedy choice is safe because in an optimal tree the two deepest leaves must be siblings and must be the two least frequent symbols; otherwise swapping depths strictly reduces the cost by at least $f_{\text{heavy}}-f_{\text{light}}>0$. Collapsing those siblings into one pseudo-symbol reduces the problem size without changing optimality, so induction finishes the proof.
+
+Start with the multiset $\{0.40, 0.20, 0.20, 0.10, 0.10\}$. At each line, merge the two smallest weights and add their sum to the running cost.
+
+```
+1) merge 0.10 + 0.10 → 0.20        cost += 0.20   (total 0.20)
+   multiset becomes {0.20, 0.20, 0.20, 0.40}
+
+2) merge 0.20 + 0.20 → 0.40        cost += 0.40   (total 0.60)
+   multiset becomes {0.20, 0.40, 0.40}
+
+3) merge 0.20 + 0.40 → 0.60        cost += 0.60   (total 1.20)
+   multiset becomes {0.40, 0.60}
+
+4) merge 0.40 + 0.60 → 1.00        cost += 1.00   (total 2.20)
+   multiset becomes {1.00}  (done)
+```
+
+So the optimal expected length is $\boxed{\mathbb{E}[L]=2.20}$ bits per symbol. This already beats the naive fixed-length baseline $3$. It also matches the information-theoretic bound $H(f)\le \mathbb{E}[L]<H(f)+1$, since the entropy here is $H\approx 2.122$.
+
+Now assign actual lengths. Record who merged with whom:
 
-## Sweep lines and event counts: a greedy counting lens
+* Step 1 merges $D(0.10)$ and $E(0.10)$ → those two become siblings.
+* Step 2 merges $B(0.20)$ and $C(0.20)$ → those two become siblings.
+* Step 3 merges the pair $D\!E(0.20)$ with $A(0.40)$.
+* Step 4 merges the pair from step 3 with the pair $B\!C(0.40)$.
 
-Many timeline problems reduce to counting the maximum load. Turn each interval $[s,e)$ into an arrival at $s$ and a departure at $e$. Sort all events and scan from left to right, increasing a counter on arrivals and decreasing it on departures. The answer is the peak value of the counter:
+Depths follow directly:
 
 $$
-\max_t C(t).
+L_A=2,\quad L_B=L_C=2,\quad L_D=L_E=3.
 $$
 
-Ties are processed with departures before arrivals, which matches the half-open convention and prevents phantom conflicts when one interval ends exactly where the next begins.
+Check the Kraft sum $3\cdot 2^{-2}+2\cdot 2^{-3}=3/4+1/4=1$ and the cost $0.4\cdot2+0.2\cdot2+0.2\cdot2+0.1\cdot3+0.1\cdot3=2.2$.
+
+A tidy ASCII tree (weights shown for clarity):
 
 ```
-time → 1 2 3 4 5 6 7
-A:     [-----)
-B:        [---)
-C:           [-----)
-load:  1 2 3 2 2 1 0
-peak:  3
+                [1.00]
+               /      \
+           [0.60]     [0.40]=BC
+           /    \        /   \
+       [0.40]=A [0.20]=DE    B     C
+                    /   \
+                   D     E
 ```
 
-While this is not “optimization by selection,” it is greedy in spirit: you never need to look back, and the loop invariant (counter equals number of currently active intervals) makes the peak exact.
+One concrete codebook arises by reading left edges as 0 and right edges as 1:
+
+* $A \mapsto 00$
+* $B \mapsto 10$
+* $C \mapsto 11$
+* $D \mapsto 010$
+* $E \mapsto 011$
+
+You can verify the prefix property immediately and recompute $\mathbb{E}[L]$ from these lengths to get $2.20$ again.
+
+### When greedy fails (and how to quantify “not too bad”)
 
-## Anatomy of a greedy proof
+The $0\text{–}1$ knapsack with arbitrary weights defeats the obvious density-based rule. A small, dense item can block space needed for a medium-density item that pairs perfectly with a third, leading to a globally superior pack. Weighted interval scheduling similarly breaks the “earliest finish” rule; taking a long, heavy meeting can beat two short light ones that finish earlier.
+
+Approximation guarantees rescue several hard problems with principled greedy performance. For set cover on a universe $U$ with $|U|=n$, the greedy rule that repeatedly picks the set covering the largest number of uncovered elements achieves an $H_n$ approximation:
+
+$$
+\text{cost}_{\text{greedy}} \le H_n\cdot \text{OPT},\qquad H_n=\sum_{k=1}^n \frac{1}{k}\le \ln n+1.
+$$
+
+A tight charging argument proves it: each time you cover new elements, charge them equally; no element is charged more than the harmonic sum relative to the optimum’s coverage.
 
-It pays to recognize the tiny handful of templates that keep recurring.
+Maximizing a nondecreasing submodular set function $f:2^E\to\mathbb{R}_{\ge 0}$ under a cardinality constraint $|S|\le k$ is a crown jewel. Submodularity means diminishing returns:
 
-* Exchange template for selection: assume the first divergence between a greedy solution and an optimal solution. Show the greedy choice weakly dominates the optimal one with respect to the future, swap, and iterate.
-* Cut template for graphs: argue that the cheapest edge crossing a cut is always safe to add, or that the heaviest edge on any cycle is always safe to discard.
-* Potential or invariant template for scans: identify a monotone quantity that only moves one way; once it passes a threshold, later steps cannot undo it.
+$$
+A\subseteq B,\ x\notin B \ \Rightarrow\ f(A\cup\{x\})-f(A)\ \ge\ f(B\cup\{x\})-f(B).
+$$
 
-These are the same ideas with different clothes.
+The greedy algorithm that adds the element with largest marginal gain at each step satisfies the celebrated bound
 
-## Pitfalls, boundary choices, and complexity
+$$
+f(S_k)\ \ge\ \Bigl(1-\frac{1}{e}\Bigr)\,f(S^\star),
+$$
 
-Monotonicity assumptions are not decoration. Dijkstra needs nonnegative edges; the proof breaks the moment a negative edge lets a later step undercut a settled label. Interval boundaries love the half-open convention $[s,e)$ to make “end at $t$, start at $t$” compatible and to simplify event-ordering in sweeps.
+where $S^\star$ is an optimal size-$k$ set. The proof tracks the residual gap $g_i=f(S^\star)-f(S_i)$ and shows
 
-Complexity usually splits into a sort plus a scan. Sorting $n$ items costs $O(n\log n)$. A scan with constant-time updates costs $O(n)$. Minimum spanning trees achieve $O(m\alpha(n))$ with union–find for Kruskal and $O(m\log n)$ for Prim with a heap, where $\alpha$ is the inverse Ackermann function.
+$$
+g_{i+1}\ \le\ \Bigl(1-\frac{1}{k}\Bigr)g_i,
+$$
 
-## A compact design checklist you can actually use
+hence $g_k\le e^{-k/k}g_0=e^{-1}g_0$. Diminishing returns is exactly what makes the greedy increments add up to a constant-factor slice of the unreachable optimum.
 
-* Identify a key order that seems to make future choices easier. Finishing times, smallest weights, or largest marginal gains are usual suspects.
-* Propose a local rule that never looks back, then write down the loop invariant that must be true if the rule is right.
-* Decide whether an exchange argument, a cut/cycle argument, or a potential function is the appropriate proof lens.
-* Stress-test the rule on a crafted counterexample. If one appears, consider whether the structure is missing a matroid-like augmentation or a monotonicity prerequisite.
-* If exact optimality is out of reach, look for submodularity or harmonic charging to get a clean approximation guarantee.

From 36dfb71eabca9b6d406302c8550370e0866c5f9a Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 9 Aug 2025 22:15:11 +0200
Subject: [PATCH 13/48] Update brain_teasers.md

---
 notes/brain_teasers.md | 51 +++++++++++++++++++++++++-----------------
 1 file changed, 30 insertions(+), 21 deletions(-)

diff --git a/notes/brain_teasers.md b/notes/brain_teasers.md
index a9136c8..bc7eed4 100644
--- a/notes/brain_teasers.md
+++ b/notes/brain_teasers.md
@@ -1,8 +1,8 @@
 todo:
 
-- bisect
 - heaps
 - fast and slow pointer for lists
+- tree traversal in order, post oreder etc.
 
 ## Solving Programming Brain Teasers
 
@@ -12,16 +12,14 @@ Programming puzzles and brain teasers are excellent tools for testing and enhanc
 
 When tackling programming puzzles, consider the following strategies:
 
-- Starting with a **simple solution** can help you understand the problem better and identify challenges. This initial approach often highlights areas where optimization is needed later.
-- Writing **unit tests** ensures your solution works for a variety of input scenarios. These tests are invaluable for catching logical errors and handling edge cases, and they allow for safe updates through regression testing.
-- Analyzing the **time and space complexity** of your algorithm helps you measure its efficiency. Aim for the best possible complexity, such as $O(n)$, while avoiding unnecessary memory usage.
-- Choosing the **appropriate data structure** is important for achieving better performance. Knowing when to use structures like arrays, linked lists, stacks, or trees can greatly enhance your solution.
-- **Hash tables** are ideal for problems that require fast lookups, such as counting elements, detecting duplicates, or associating keys with values, as they offer average-case $O(1)$ complexity.
-- Implementing **memoization or dynamic programming** can optimize problems with overlapping subproblems by storing and reusing previously computed results to save time.
-- Breaking a problem into **smaller subproblems** often simplifies the process. Solving these subproblems individually makes it easier to manage and integrate the solutions.
-- Considering both **recursive and iterative approaches** allows flexibility. Recursion can simplify the logic for certain problems, while iteration may be more efficient and avoid stack overflow risks.
-- Paying attention to **edge cases and constraints** helps ensure robustness. Examples include handling empty inputs, very large or very small values, and duplicate data correctly.
-- While optimizing too early can complicate development, **targeted optimization** at the right time focuses on the most resource-intensive parts of the code, improving performance without reducing clarity or maintainability.
+* Starting with a *simple solution* can be helpful in understanding the problem and revealing areas that may need further optimization later on.
+* Writing *unit tests* is useful for ensuring that your solution works correctly across a range of input scenarios, including edge cases.
+* Analyzing *time and space complexity* of your algorithm is important for assessing its efficiency and striving for an optimal *performance*.
+* Choosing the *appropriate data structure*, such as an array or tree, is beneficial for improving the speed and clarity of your solution.
+* Breaking down the problem into *smaller parts* can make the overall task more manageable and easier to solve.
+* Considering both *recursive* and *iterative* approaches gives you flexibility in selecting the method that best suits the problem’s needs.
+* Paying attention to *edge cases* and *constraints* ensures your solution handles unusual or extreme inputs gracefully.
+* *Targeted optimization*, when applied at the right time, can improve performance in specific areas without sacrificing clarity.
 
 ### Data Structures
 
@@ -29,16 +27,27 @@ Understanding and effectively using data structures is fundamental in programmin
 
 #### Working with Arrays
 
-Arrays are fundamental data structures that store elements in contiguous memory locations, allowing efficient random access. Here are strategies for working with arrays:
-
-- **Sorting** an array can simplify many problems. Algorithms like Quick Sort and Merge Sort are efficient with $O(n \log n)$ time complexity. For nearly sorted or small arrays, **Insertion Sort** may be a better option due to its simplicity and efficiency in those cases.
-- In **sorted arrays**, binary search provides a fast way to find elements or their positions, working in $O(\log n)$. Be cautious with **mid-point calculations** in languages prone to integer overflow due to fixed-size integer types.
-- The **two-pointer technique** uses two indices, often starting from opposite ends of the array, to solve problems involving pairs or triplets, like finding two numbers that add up to a target sum. It helps optimize time and space.
-- The **sliding window technique** is effective for subarray or substring problems, such as finding the longest substring without repeating characters. It keeps a dynamic subset of the array while iterating, improving efficiency.
-- **Prefix sums** enable quick range sum queries after preprocessing the array in $O(n)$. Similarly, **difference arrays** allow efficient range updates without modifying individual elements one by one.
-- **In-place operations** modify the array directly without using extra memory. This approach saves space but requires careful handling to avoid unintended side effects on other parts of the program.
-- When dealing with **duplicates**, it’s important to adjust the algorithm to handle them correctly. For example, in the two-pointer technique, duplicates may need to be skipped to prevent redundant results or errors.
-- **Memory usage** is a important consideration with large arrays, as they can consume significant space. Be mindful of space complexity in constrained environments to prevent excessive memory usage.
+Arrays are basic data structures that store elements in a continuous block of memory, making it easy to access any element quickly. Here are some tips for working with arrays:
+
+* Sorting an array can often simplify many problems, with algorithms like Quick Sort and Merge Sort offering efficient $O(n \log n)$ time complexity. For nearly sorted or small arrays, *Insertion Sort* might be a better option due to its simplicity and efficiency in such cases.
+* In sorted arrays, *binary search* provides a fast way to find elements or their positions, working in $O(\log n)$. Be cautious with mid-point calculations in languages that may experience integer overflow due to fixed-size integer types.
+* The *two-pointer* technique uses two indices, typically starting from opposite ends of the array, to solve problems involving pairs or triplets, like finding two numbers that sum to a target. It helps optimize both time and space efficiency.
+* The *sliding window* technique is effective for solving subarray or substring problems, such as finding the longest substring without repeating characters. It maintains a dynamic subset of the array while iterating, improving overall efficiency.
+* *Prefix sums* enable fast range sum queries after preprocessing the array in $O(n)$. Likewise, difference arrays allow efficient range updates without the need to modify individual elements one by one.
+* In-place operations modify the array directly without using extra memory. This method saves space but requires careful handling to avoid unintended side effects on other parts of the program.
+* When dealing with duplicates, it’s important to adjust the algorithm to handle them appropriately. For example, the two-pointer technique may need to skip duplicates to prevent redundant results or errors.
+* When working with large arrays, it’s important to be mindful of memory usage, as they can consume a lot of space. To optimize, try to minimize the space complexity by using more memory-efficient data structures or algorithms. For instance, instead of storing a full array of values, consider using a *sliding window* or *in-place modifications* to avoid extra memory allocation. Additionally, analyze the space complexity of your solution and check for operations that create large intermediate data structures, which can lead to excessive memory consumption. In constrained environments, tools like memory profiling or checking the space usage of your program (e.g., using Python’s `sys.getsizeof()`) can help you identify areas for improvement.
+* When using dynamic arrays, it’s helpful to allow automatic resizing, which lets the array expand or shrink based on the data size. This avoids the need for manual memory management and improves flexibility.
+* Resizing arrays frequently can be costly in terms of time complexity. A more efficient approach is to resize the array exponentially, such as doubling its size, rather than resizing it by a fixed amount each time.
+* To avoid unnecessary memory usage, it's important to pass arrays by reference (or using pointers in some languages) when possible, instead of copying the entire array for each function call.
+* For arrays with many zero or null values, using sparse arrays or hash maps can be useful. This allows you to store only non-zero values, saving memory when dealing with large arrays that contain mostly empty data.
+* When dealing with multi-dimensional arrays, flattening them into a one-dimensional array can make it easier to perform operations, but be aware that this can temporarily increase memory usage.
+* To improve performance, accessing memory in contiguous blocks is important. Random access patterns may lead to cache misses, which can slow down operations, so try to access array elements sequentially when possible.
+* The `bisect` module helps maintain sorted order in a list by finding the appropriate index for inserting an element or by performing binary searches.
+* Use `bisect.insort()` to insert elements into a sorted list while keeping it ordered.
+* Use `bisect.bisect_left()` or `bisect.bisect_right()` to find the index where an element should be inserted.
+* Don’t use on unsorted lists or when frequent updates are needed, as maintaining order can be inefficient.
+* Binary search operations like `bisect_left()` are `O(log n)`, but `insort()` can be `O(n)` due to shifting elements.
 
 #### Working with Strings
 

From b3fbabbcdfdbfc75873565fce4fa58fdce0be1f2 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 9 Aug 2025 22:15:47 +0200
Subject: [PATCH 14/48] Update brain_teasers.md

---
 notes/brain_teasers.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes/brain_teasers.md b/notes/brain_teasers.md
index bc7eed4..4b65655 100644
--- a/notes/brain_teasers.md
+++ b/notes/brain_teasers.md
@@ -6,7 +6,7 @@ todo:
 
 ## Solving Programming Brain Teasers
 
-Programming puzzles and brain teasers are excellent tools for testing and enhancing your coding abilities and problem-solving skills. They are frequently used in technical interviews to evaluate a candidate's logical thinking, analytical prowess, and ability to devise efficient algorithms. To excel in these scenarios, it is recommended to master effective strategies for approaching and solving these problems.
+Programming puzzles and brain teasers are great ways to improve your coding and problem-solving skills. They're commonly used in technical interviews to assess a candidate's logical thinking, analytical ability, and skill in creating efficient solutions. To do well in these situations, it's important to learn and apply effective strategies for solving these problems.
 
 ### General Strategies
 

From f51eafe4010fb019382cb4a4f055c476792e6837 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Fri, 29 Aug 2025 17:43:49 +0200
Subject: [PATCH 15/48] Update searching.md

---
 notes/searching.md | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/notes/searching.md b/notes/searching.md
index 5f69f29..2a31fcb 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -1,3 +1,7 @@
+## Searching
+
+Searching refers to the process of finding the location of a specific element within a collection of data, such as an array, list, tree, or graph. It underpins many applications, from databases and information retrieval to routing and artificial intelligence. Depending on the organization of the data, different search techniques are used—such as linear search for unsorted data, binary search for sorted data, and more advanced approaches like hash-based lookup or tree traversals for hierarchical structures. Efficient searching is important because it directly impacts the performance and scalability of software systems.
+
 ### 1. **Linear & Sequential Search**
 - **Linear Search (Sequential Search)**
   - Checks each element one by one.

From 3a79bd22eda383cb724946c3ee01efc10e695d25 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Fri, 29 Aug 2025 19:36:24 +0200
Subject: [PATCH 16/48] Update searching.md

---
 notes/searching.md | 940 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 879 insertions(+), 61 deletions(-)

diff --git a/notes/searching.md b/notes/searching.md
index 2a31fcb..df2398a 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -2,64 +2,882 @@
 
 Searching refers to the process of finding the location of a specific element within a collection of data, such as an array, list, tree, or graph. It underpins many applications, from databases and information retrieval to routing and artificial intelligence. Depending on the organization of the data, different search techniques are used—such as linear search for unsorted data, binary search for sorted data, and more advanced approaches like hash-based lookup or tree traversals for hierarchical structures. Efficient searching is important because it directly impacts the performance and scalability of software systems.
 
-### 1. **Linear & Sequential Search**
-- **Linear Search (Sequential Search)**
-  - Checks each element one by one.
-- **Sentinel Linear Search**
-  - Uses a sentinel value to reduce comparisons.
-
-### 2. **Divide & Conquer Search**
-- **Binary Search**
-  - Efficient on sorted arrays (O(log n)).
-- **Ternary Search**
-  - Divides array into three parts instead of two.
-- **Jump Search**
-  - Jumps ahead by fixed steps, then does linear search.
-- **Exponential Search**
-  - Finds range with exponential jumps, then does binary search.
-- **Interpolation Search**
-  - Estimates position based on value distribution.
-
-### 3. **Tree-based Search**
-- **Binary Search Tree (BST) Search**
-  - Search in a binary search tree.
-- **AVL Tree Search / Red-Black Tree Search**
-  - Balanced BSTs for faster search.
-- **B-Tree / B+ Tree Search**
-  - Used in databases and file systems.
-- **Trie (Prefix Tree) Search**
-  - Efficient for searching words/prefixes.
-
-### 4. **Hash-based Search**
-- **Hash Table Search**
-  - Uses hash functions for constant time lookups.
-- **Open Addressing (Linear/Quadratic Probing, Double Hashing)**
-  - Methods for collision resolution.
-- **Separate Chaining**
-  - Uses linked lists for collisions.
-- **Cuckoo Hashing**
-  - Multiple hash functions to resolve collisions.
-
-### 5. **Probabilistic & Approximate Search**
-- **Bloom Filter**
-  - Probabilistic; fast membership test with false positives.
-- **Counting Bloom Filter**
-  - Supports deletion.
-- **Cuckoo Filter**
-  - Similar to Bloom filters but supports deletion.
-
-### 6. **Graph-based Search Algorithms**
-- **Breadth-First Search (BFS)**
-  - Explores neighbors first in unweighted graphs.
-- **Depth-First Search (DFS)**
-  - Explores as far as possible along branches.
-- **A* Search**
-  - Heuristic-based best-first search.
-- **Bidirectional Search**
-  - Runs two simultaneous searches from source and target.
-
-### 7. **String Search Algorithms**
-- **Naive String Search**
-- **Knuth-Morris-Pratt (KMP)**
-- **Boyer-Moore**
-- **Rabin-Karp**
+### Linear & Sequential Search
+
+#### Linear Search
+
+Scan the list from left to right, comparing the target with each element until you either find a match (return its index) or finish the list (report “not found”).
+
+**Example inputs and outputs**
+
+* Input: list = \[7, 3, 5, 2, 9], target = 5 → Output: index = 2
+* Input: list = \[4, 4, 4], target = 4 → Output: index = 0 (first match)
+* Input: list = \[10, 20, 30], target = 25 → Output: not found
+
+**How it works**
+
+Start at index 0, compare, move right; stop on first equal or after the last element.
+
+```
+Indexes:     0    1    2    3    4
+List:      [ 7 ][ 3 ][ 5 ][ 2 ][ 9 ]
+Target: 5
+
+Pass 1: pointer at 0 → compare 7 vs 5  → no
+          v
+Indexes:   0    1    2    3    4
+           |    
+List:      7    3    5    2    9
+
+Pass 2: pointer at 1 → compare 3 vs 5  → no
+               v
+Indexes:   0    1    2    3    4
+                |   
+List:      7    3    5    2    9
+
+Pass 3: pointer at 2 → compare 5 vs 5  → YES → return 2
+                    v
+Indexes:   0    1    2    3    4
+                     |  
+List:      7    3    5    2    9
+```
+
+**Worst case (not found):** you compare every element and then stop.
+
+```
+Indexes:     0    1    2
+List:      [ 1 ][ 2 ][ 3 ]
+Target: 9
+
+Checks: (1≠9) → (2≠9) → (3≠9) → end → not found
+```
+
+* Works on any list; no sorting or structure required.
+* Returns the first index containing the target; if absent, reports “not found.”
+* Time: O(n) comparisons on average and in the worst case; best case O(1) if the first element matches.
+* Space: O(1) extra memory.
+* Naturally finds the earliest occurrence when duplicates exist.
+* Simple and dependable for short or unsorted data.
+* Assumes 0-based indexing in these notes.
+
+### Sentinel Linear Search
+
+Place one copy of the target at the very end as a “sentinel” so the scan can run without checking bounds each step; afterward, decide whether the match was inside the original list or only at the sentinel position.
+
+**Example inputs and outputs**
+
+* Input: list = \[12, 8, 6, 15], target = 6 → Output: index = 2
+* Input: list = \[2, 4, 6, 8], target = 5 → Output: not found (only the sentinel matched)
+
+**How it works**
+
+Put the target at one extra slot at the end so the loop is guaranteed to stop on a match; afterward, check whether the match was inside the original range.
+
+```
+Original length n = 5
+Before:   [ 4 ][ 9 ][ 1 ][ 7 ][ 6 ]
+Target: 11
+
+Add sentinel (extra slot):
+          [ 4 ][ 9 ][ 1 ][ 7 ][ 6 ][ 11 ]
+Indexes:    0    1    2    3    4     5  ← sentinel position
+
+Scan left→right until you see 11:
+
+Step 1: 4 ≠ 11
+          ^
+Step 2: 9 ≠ 11
+               ^
+Step 3: 1 ≠ 11
+                    ^
+Step 4: 7 ≠ 11
+                         ^
+Step 5: 6 ≠ 11
+                              ^
+Step 6: 11 (match at index 5, which is the sentinel)
+
+Because the first match is at index 5 (the sentinel position), the target was not in the original indexes 0..4 → report “not found”.
+```
+
+**When the target exists inside the list:**
+
+```
+List:      [ 12 ][ 8 ][ 6 ][ 15 ]   n = 4
+Target: 6
+With sentinel: [ 12 ][ 8 ][ 6 ][ 15 ][ 6 ]
+
+Scan: 12≠6 → 8≠6 → 6=6 (index 2 < n) → real match at 2
+```
+
+* Removes the per-iteration “have we reached the end?” check; the sentinel guarantees termination.
+* Same O(n) time in big-O terms, but slightly fewer comparisons in tight loops.
+* Space: needs one extra slot; if you cannot append, you can temporarily overwrite the last element (store it, write the target, then restore it).
+* After scanning, decide by index: if the first match index < original length, it’s a real match; otherwise, it’s only the sentinel.
+* Use when micro-optimizing linear scans over arrays where bounds checks are costly.
+* Behavior with duplicates: still returns the first occurrence within the original range.
+* Be careful to restore any overwritten last element if you used the in-place variant.
+
+### Divide & Conquer Search
+
+#### Binary Search
+
+On a sorted array, repeatedly halve the search interval by comparing the target to the middle element until found or the interval is empty.
+
+**Example inputs and outputs**
+
+* Input: A = \[2, 5, 8, 12, 16, 23, 38], target = 16 → Output: index = 4
+* Input: A = \[1, 3, 3, 3, 9], target = 3 → Output: index = 2 (any valid match; first/last requires a slight variant)
+* Input: A = \[10, 20, 30, 40], target = 35 → Output: not found
+
+**How it works**
+
+```
+Sorted A:  [ 2 ][ 5 ][ 8 ][12 ][16 ][23 ][38 ]
+Indexes:     0    1    2   3    4    5    6
+Target: 16
+
+1) low=0, high=6  → mid=(0+6)//2=3
+   A[3]=12 < 16 → discard left half up to mid, keep [mid+1..high]
+         [ 2 ][ 5 ][ 8 ]  |[12 ]| [16 ][23 ][38 ]
+                          low=4               high=6
+
+2) low=4, high=6 → mid=(4+6)//2=5
+   A[5]=23 > 16 → discard right half after mid, keep [low..mid-1]
+         [16 ][23 ]|[38 ]
+        low=4  high=4
+
+3) low=4, high=4 → mid=4
+   A[4]=16 = target → FOUND at index 4
+```
+
+* Requires a sorted array (assume ascending here).
+* Time: O(log n); Space: O(1) iterative.
+* Returns any one matching index by default; “first/last occurrence” is a small, common refinement.
+* Robust, cache-friendly, and a building block for many higher-level searches.
+* Beware of off-by-one errors when shrinking bounds.
+
+#### Ternary Search
+Like binary, but splits the current interval into three parts using two midpoints; used mainly for unimodal functions or very specific array cases.
+
+**Example inputs and outputs**
+
+* Input: A = \[1, 4, 7, 9, 12, 15], target = 9 → Output: index = 3
+* Input: A = \[2, 6, 10, 14], target = 5 → Output: not found
+
+**How it works**
+
+```
+Sorted A: [ 1 ][ 4 ][ 7 ][ 9 ][12 ][15 ]
+Indexes:    0    1    2    3    4    5
+Target: 9
+
+1) low=0, high=5
+   m1 = low + (high-low)//3      = 0 + 5//3 = 1
+   m2 = high - (high-low)//3     = 5 - 5//3 = 3
+
+   Compare A[m1]=4, A[m2]=9 with target=9:
+
+   A[m2]=9 = target → FOUND at index 3
+
+(If no immediate match:)
+- If target < A[m1], keep [low..m1-1]
+- Else if target > A[m2], keep [m2+1..high]
+- Else keep [m1+1..m2-1] and repeat
+```
+
+* Also assumes a sorted array.
+* For discrete sorted arrays, it does **not** beat binary search asymptotically; it performs more comparisons per step.
+* Most valuable for searching the extremum of a **unimodal function** on a continuous domain; for arrays, prefer binary search.
+* Complexity: O(log n) steps but with larger constant factors than binary search.
+
+#### Jump Search
+On a sorted array, jump ahead in fixed block sizes to find the block that may contain the target, then do a linear scan inside that block.
+
+**Example inputs and outputs**
+
+* Input: A = \[1, 4, 9, 16, 25, 36, 49], target = 25, jump = ⌊√7⌋=2 → Output: index = 4
+* Input: A = \[3, 8, 15, 20, 22, 27], target = 21, jump = 2 → Output: not found
+
+**How it works**
+
+```
+Sorted A: [ 1 ][ 4 ][ 9 ][16 ][25 ][36 ][49 ]
+Indexes:    0    1    2    3    4    5    6
+Target: 25
+Choose block size ≈ √n → here n=7 → jump=2
+
+Jumps (probe at 0,2,4,6 until A[probe] ≥ target):
+- probe=0 → A[0]=1   (<25) → next probe=2
+- probe=2 → A[2]=9   (<25) → next probe=4
+- probe=4 → A[4]=25 (≥25) → target must be in block (2..4]
+
+Linear scan inside last block (indexes 3..4):
+- i=3 → A[3]=16 (<25)
+- i=4 → A[4]=25 (=) FOUND at index 4
+```
+
+* Works on sorted arrays; pick jump ≈ √n for good balance.
+* Time: O(√n) comparisons on average; Space: O(1).
+* Useful when random access is cheap but full binary search isn’t desirable (e.g., limited CPU branch prediction, or when scanning in blocks is cache-friendly).
+* Degrades gracefully to “scan block then stop.”
+
+#### Exponential Search
+
+On a sorted array, grow the right boundary exponentially (1, 2, 4, 8, …) to find a containing range, then finish with binary search in that range.
+
+**Example inputs and outputs**
+
+* Input: A = \[2, 3, 5, 7, 11, 13, 17, 19, 23], target = 19 → Output: index = 7
+* Input: A = \[10, 20, 30, 40, 50], target = 12 → Output: not found
+
+**How it works**
+
+```
+Sorted A: [ 2 ][ 3 ][ 5 ][ 7 ][11 ][13 ][17 ][19 ][23 ]
+Indexes:    0    1    2    3    4    5    6    7    8
+Target: 19
+
+1) Find range by exponential jumps (check A[1], A[2], A[4], A[8], ...):
+   - A[1]=3  ≤ 19
+   - A[2]=5  ≤ 19
+   - A[4]=11 ≤ 19
+   - A[8]=23 > 19 → stop; range is (prev_power_of_two..8] → (4..8]
+
+2) Do binary search on A[5..8]:
+   Subarray: [13 ][17 ][19 ][23 ]
+   Indices:    5    6    7    8
+   Binary search finds A[7]=19 → FOUND at index 7
+```
+
+* Great when the target is likely to be near the beginning or when the array is **unbounded**/**stream-like** but sorted (you can probe indices safely).
+* Time: O(log p) to find the range where p is the final bound, plus O(log p) for binary search → overall O(log p).
+* Space: O(1).
+* Often paired with data sources where you can test “is index i valid?” while doubling i.
+
+#### Interpolation Search
+
+On a sorted (roughly uniformly distributed) array, estimate the likely position using the values themselves and probe there; repeat on the narrowed side.
+
+**Example inputs and outputs**
+
+* Input: A = \[10, 20, 30, 40, 50, 60, 70], target = 55 → Output: not found (probes near index 4–5)
+* Input: A = \[5, 15, 25, 35, 45, 55, 65], target = 45 → Output: index = 4
+* Input: A = \[1, 1000, 1001, 1002], target = 2 → Output: not found (bad distribution for interpolation)
+
+**How it works**
+
+```
+Assumes A is sorted and values are roughly uniform.
+
+Idea: "Guess" the likely index by linearly interpolating the target’s value
+between A[low] and A[high]:
+
+Estimated position:
+pos ≈ low + (high - low) * (target - A[low]) / (A[high] - A[low])
+
+Example:
+A = [10, 20, 30, 40, 50, 60, 70], target = 45
+low=0 (A[0]=10), high=6 (A[6]=70)
+
+pos ≈ 0 + (6-0) * (45-10)/(70-10) = 6 * 35/60 ≈ 3.5 → probe index 3 or 4
+
+Probe at 3: A[3]=40  (<45) → new low=4
+Probe at 4: A[4]=50  (>45) → new high=3
+Now low>high → not found
+```
+
+* Best on **uniformly distributed** sorted data; expected time O(log log n).
+* Worst case can degrade to O(n), especially on skewed or clustered values.
+* Space: O(1).
+* Very fast when value-to-index mapping is close to linear (e.g., near-uniform numeric keys).
+* Requires careful handling when A\[high] = A\[low] (avoid division by zero); also sensitive to integer rounding in discrete arrays.
+
+### Hash-based Search
+* **Separate chaining:** Easiest deletions, steady O(1) with α≈1; good when memory fragmentation isn’t a concern.
+* **Open addressing (double hashing):** Best probe quality among OA variants; great cache locality; keep α < 0.8.
+* **Open addressing (linear/quadratic):** Simple and fast at low α; watch clustering and tombstones.
+* **Cuckoo hashing:** Tiny and predictable lookup cost; inserts costlier and may rehash; great for read-heavy workloads.
+* In all cases: pick strong hash functions and resize early to keep α healthy.
+
+#### Hash Table Search
+Map a key to an array index with a hash function; look at that bucket to find the key, giving expected O(1) lookups under a good hash and healthy load factor.
+
+**Example inputs and outputs**
+
+* Table size m = 7; keys stored = {10, 24, 31}; target = 24 → Output: “found (bucket 3)”
+* Same table; target = 18 → Output: “not found”
+
+**How it works**
+
+```
+Concept:
+key  --hash-->  index in array  --search/compare-->  match?
+
+Array (buckets/indexes 0..6):
+Idx:    0     1     2     3     4     5     6
+      [   ][   ][   ][   ][   ][   ][   ]
+
+Example mapping with h(k)=k mod 7, stored keys {10, 24, 31}:
+10 -> 3
+24 -> 3   (collides with 10; resolved by the chosen strategy)
+31 -> 3   (collides again)
+
+Search(24):
+1) Compute index = h(24) = 3
+2) Inspect bucket 3 (and possibly its collision path)
+3) If 24 is found along that path → found; otherwise → not found
+```
+
+* Quality hash + low load factor (α = n/m) ⇒ expected O(1) search/insert/delete.
+* Collisions are inevitable; the collision strategy (open addressing vs. chaining vs. cuckoo) dictates actual steps.
+* Rehashing (growing and re-inserting) is used to keep α under control.
+* Uniform hashing assumption underpins the O(1) expectation; adversarial keys or poor hashes can degrade performance.
+
+#### Open Addressing — Linear Probing
+
+Keep everything in one array; on collision, probe alternative positions in a deterministic sequence until an empty slot or the key is found.
+
+**Example inputs and outputs**
+
+* m = 10; stored = {12, 22, 32}; target = 22 → Output: “found (index 3)”
+* Same table; target = 42 → Output: “not found”
+
+**How it works**
+
+```
+h(k) = k mod 10, probe sequence: i, i+1, i+2, ... (wrap around)
+
+Insertions already done:
+12 -> h=2 → put at 2
+22 -> h=2 (occupied) → try 3 → put at 3
+32 -> h=2 (occupied), 3 (occupied) → try 4 → put at 4
+
+Array:
+Idx:  0  1  2   3   4  5  6  7  8  9
+     [ ][ ][12][22][32][ ][ ][ ][ ][ ]
+
+Search(22):
+- Start at h(22)=2 → 12 ≠ 22
+- Next 3 → 22 = 22 → FOUND at index 3
+
+Search(42):
+- Start at 2 → 12 ≠ 42
+- 3 → 22 ≠ 42
+- 4 → 32 ≠ 42
+- 5 → empty → stop → NOT FOUND
+```
+
+* Simple and cache-friendly; clusters form (“primary clustering”) which can slow probes.
+* Deletion uses **tombstones** to keep probe chains intact.
+* Performance depends sharply on load factor; keep α well below 1 (e.g., α ≤ 0.7).
+* Expected search \~ O(1) at low α; degrades as clusters grow.
+
+#### Open Addressing — Quadratic Probing
+
+**Example inputs and outputs**
+
+* m = 11 (prime); stored = {22, 33, 44}; target = 33 → Output: “found (index 4)”
+* Same table; target = 55 → Output: “not found”
+
+**How it works**
+
+```
+h(k) = k mod 11
+Probe offsets: +1^2, +2^2, +3^2, ... (i.e., +1, +4, +9, +16≡+5, +25≡+3, ... mod 11)
+
+Insert:
+22 -> h=0 → put at 0
+33 -> h=0 (occupied) → 0+1^2=1 → put at 1? (showing a typical sequence)
+(For clarity we'll place 33 at the first free among 0,1,4,9,... Suppose 1 is free.)
+44 -> h=0 (occupied) → try 1 (occupied) → try 4 → put at 4
+
+Array (one possible state):
+Idx:  0   1   2  3  4  5  6  7  8  9  10
+     [22][33][ ][ ][44][ ][ ][ ][ ][ ][  ]
+
+Search(33):
+- h=0 → 22 ≠ 33
+- 0+1^2=1 → 33 = 33 → FOUND at index 1
+
+Search(55):
+- h=0 → 22 ≠ 55
+- +1^2=1 → 33 ≠ 55
+- +2^2=4 → 44 ≠ 55
+- +3^2=9 → empty → NOT FOUND
+```
+
+* Reduces primary clustering but can exhibit **secondary clustering** (keys with same h(k) follow same probe squares).
+* Table size choice matters (often prime); ensure the probe sequence can reach many slots.
+* Keep α modest; deletion still needs tombstones.
+* Expected O(1) at healthy α; simpler than double hashing.
+
+#### Open Addressing — Double Hashing
+
+**Example inputs and outputs**
+
+* m = 11; h₁(k) = k mod 11; h₂(k) = 1 + (k mod 10)
+* Stored = {22, 33, 44}; target = 33 → Output: “found (index 4)”
+* Same table; target = 55 → Output: “not found”
+
+**How it works**
+
+```
+Probe sequence: i, i+h₂, i+2·h₂, i+3·h₂, ... (all mod m)
+
+Insert:
+22: h₁=0 → put at 0
+33: h₁=0 (occupied), h₂=1+(33 mod 10)=4
+    Probes: 0, 4 → put at 4
+44: h₁=0 (occupied), h₂=1+(44 mod 10)=5
+    Probes: 0, 5 → put at 5
+
+Array:
+Idx:  0   1  2  3  4   5  6  7  8  9  10
+     [22][ ][ ][ ][33][44][ ][ ][ ][ ][ ]
+
+Search(33):
+- Start 0 → 22 ≠ 33
+- Next 0+4=4 → 33 → FOUND
+
+Search(55):
+- h₁=0, h₂=1+(55 mod 10)=6
+- Probes: 0 (22), 6 (empty) → NOT FOUND
+```
+
+* Minimizes clustering; probe steps depend on the key.
+* Choose h₂ so it’s **non-zero** and relatively prime to m, ensuring a full cycle.
+* Excellent performance at higher α than linear/quadratic, but still sensitive if α → 1.
+* Deletion needs tombstones; implementation slightly more complex.
+
+#### Separate Chaining
+
+Each array cell holds a small container (e.g., a linked list); colliding keys live together in that bucket.
+
+**Example inputs and outputs**
+
+* m = 5; buckets hold lists
+* Stored = {12, 22, 7, 3, 14}; target = 22 → Output: “found (bucket 2, position 2)”
+* Same table; target = 9 → Output: “not found”
+
+**How it works**
+
+```
+h(k) = k mod 5
+Buckets store small lists (linked lists or dynamic arrays)
+
+Idx:   0             1           2                    3            4
+     [  ]         [  ]       [ 12 → 22 → 7 ]       [ 3 ]       [ 14 ]
+
+Search(22):
+- Compute bucket b = h(22) = 2
+- Linearly scan bucket 2 → find 22
+
+Search(9):
+- b = h(9) = 4
+- Bucket 4: [14] → 9 not present → NOT FOUND
+```
+
+* Simple deletes (remove from a bucket) and no tombstones.
+* Expected O(1 + α) time; with good hashing and α kept near/below 1, bucket lengths stay tiny.
+* Memory overhead for bucket nodes; cache locality worse than open addressing.
+* Buckets can use **ordered lists** or **small vectors** to accelerate scans.
+* Rehashing still needed as n grows; α = n/m controls performance.
+
+#### Cuckoo Hashing
+Keep two (or more) hash positions per key; insert by “kicking out” occupants to their alternate home so lookups check only a couple of places.
+
+**Example inputs and outputs**
+
+* Two tables T₁ and T₂ (same size m = 5) with two hashes h₁, h₂
+* Inserted keys produce relocations; target = 15 → Output: “found in T₂ at index 4”
+* If insertion loops (cycle), rebuild with new hash functions (rehash)
+
+**How it works**
+
+```
+Example hashes:
+h₁(k) = k mod 5
+h₂(k) = 1 + (k mod 4)
+
+Start empty T₁ and T₂ (indexes 0..4):
+
+T₁: [   ][   ][   ][   ][   ]
+T₂: [   ][   ][   ][   ][   ]
+
+Insert 10:
+- Place at T₁[h₁(10)=0] = 0
+
+T₁: [10 ][   ][   ][   ][   ]
+T₂: [   ][   ][   ][   ][   ]
+
+Insert 15:
+- T₁[h₁(15)=0] occupied by 10 → cuckoo step:
+  Evict 10; put 15 at T₁[0]
+  Reinsert evicted 10 at its alternate home T₂[h₂(10)=1+(10 mod 4)=3]
+
+T₁: [15 ][   ][   ][   ][   ]
+T₂: [   ][   ][   ][10 ][   ]
+
+Insert 20:
+- T₁[h₁(20)=0] occupied by 15 → evict 15; place 20 at T₁[0]
+  Reinsert 15 at T₂[h₂(15)=1+(15 mod 4)=4]
+
+T₁: [20 ][   ][   ][   ][   ]
+T₂: [   ][   ][   ][10 ][15 ]
+
+Insert 25:
+- T₁[h₁(25)=0] occupied by 20 → evict 20; place 25 at T₁[0]
+  Reinsert 20 at T₂[h₂(20)=1+(20 mod 4)=1]
+
+T₁: [25 ][   ][   ][   ][   ]
+T₂: [   ][20 ][   ][10 ][15 ]
+
+Search(15):
+- Check T₁[h₁(15)=0] → 25 ≠ 15
+- Check T₂[h₂(15)=4] → 15 → FOUND
+```
+
+* Lookups probe at **most two places** (with two hashes) → excellent constant factors.
+* Inserts may trigger a chain of evictions; detect cycles and **rehash** with new functions.
+* High load factors achievable (e.g., \~0.5–0.9 depending on variant and number of hashes/tables).
+* Deletions are easy (remove key); no tombstones, but ensure invariants remain.
+* Sensitive to hash quality; poor hashes increase cycle risk.
+
+### Probabilistic & Approximate Search
+
+#### Bloom Filter
+Space-efficient structure for fast membership tests; answers **“maybe present”** or **“definitely not present”** with a tunable false-positive rate and no false negatives (if built correctly, without deletions).
+
+**Example inputs and outputs**
+
+* Setup: m = 16 bits, k = 3 hash functions (h₁, h₂, h₃).
+* Insert: {"cat", "dog"}
+* `contains("cat")` → **maybe present** (actual member)
+* `contains("cow")` → **definitely not present** (one probed bit is 0)
+* `contains("eel")` → **maybe present** (all probed bits happen to be 1 → **false positive**)
+
+**How it works**
+
+```
+Bit array (m = 16), initially all zeros:
+
+Idx:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+     [0 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0]
+
+INSERT "cat":
+h1(cat)=3, h2(cat)=7, h3(cat)=12  → set those bits to 1
+
+     [0 0 0 1 0 0 0 1 0 0  0  0  1  0  0  0]
+                 ^   ^           ^
+                3    7          12
+
+INSERT "dog":
+h1(dog)=1, h2(dog)=7, h3(dog)=9  → set 1,7,9 to 1 (7 already 1)
+
+     [0 1 0 1 0 0 0 1 0 1  0  0  1  0  0  0]
+         ^         ^       ^
+
+QUERY "cow":
+h1(cow)=1 (bit=1), h2(cow)=3 (bit=1), h3(cow)=6 (bit=0) → at least one zero
+→ Result: DEFINITELY NOT PRESENT
+
+QUERY "eel":
+h1(eel)=7 (1), h2(eel)=9 (1), h3(eel)=12 (1) → all ones
+→ Result: MAYBE PRESENT (could be a FALSE POSITIVE)
+```
+
+* Answers: **maybe present** / **definitely not present**; never false negatives (without deletions).
+* False-positive rate is tunable via bit-array size **m**, number of hashes **k**, and items **n**; more space & good **k** → lower FPR.
+* Time: O(k) per insert/lookup; Space: \~m bits.
+* No deletions in the basic form; duplicates are harmless (idempotent sets).
+* Union = bitwise OR; intersection = bitwise AND (for same m,k,hashes).
+* Choose independent, well-mixed hash functions to avoid correlated bits.
+
+#### Counting Bloom Filter
+Bloom filter variant that keeps a small counter per bit so you can **delete** by decrementing; still probabilistic and may have false positives.
+
+**Example inputs and outputs**
+
+* Setup: m = 12 counters (each 2–4 bits), k = 3 hash functions.
+* Insert: {"alpha", "beta"}
+* Delete: remove "alpha".
+* `contains("alpha")` after deletion → **definitely not present** (one counter back to 0)
+* `contains("beta")` → **maybe present**
+* `contains("gamma")` → **definitely not present** (some counter = 0)
+
+**How it works**
+
+```
+Counters (not bits). Each cell stores a small integer (0..15 if 4-bit).
+
+Idx:  0 1 2 3 4 5 6 7 8 9 10 11
+     [0 0 0 0 0 0 0 0 0 0  0  0]
+
+INSERT "alpha": h: {2, 5, 9} → increment those counters
+     [0 0 1 0 0 1 0 0 0 1  0  0]
+
+INSERT "beta":  h: {3, 5, 11} → increment
+     [0 0 1 1 0 2 0 0 0 1  0  1]
+
+LOOKUP "beta": counters at {3,5,11} = {1,2,1} > 0 → MAYBE PRESENT
+
+DELETE "alpha": decrement {2,5,9}
+     [0 0 0 1 0 1 0 0 0 0  0  1]
+
+LOOKUP "alpha": counters at {2,5,9} = {0,1,0}
+→ has a zero → DEFINITELY NOT PRESENT
+```
+
+* Supports **deletion** by decrementing counters; insertion increments.
+* Still probabilistic: may return false positives; avoids false negatives **if counters never underflow** and hashes are consistent.
+* Space: more than Bloom (a few bits per counter instead of 1).
+* Watch for counter **saturation** (caps at max value) and **underflow** (don’t decrement below 0).
+* Good for dynamic sets with frequent inserts and deletes.
+
+##### Cuckoo Filter
+Hash-table–style filter that stores short **fingerprints** in two possible buckets; supports **insert, lookup, delete** with low false-positive rates and high load factors.
+
+**Example inputs and outputs**
+
+* Setup: bucket array with **b = 8** buckets, **bucket size = 2**, **fingerprint = 8 bits**.
+* Insert: {"cat", "dog", "eel"} (each stored as short fingerprints).
+* Query:
+
+  * `contains("cat")` → **maybe present** (fingerprint found in one of its two buckets)
+  * `contains("fox")` → **definitely not present** (fingerprint absent from both)
+* Delete: `remove("dog")` → fingerprint removed from its bucket.
+
+**How it works**
+
+```
+Each key x → short fingerprint f = FP(x)
+Two candidate buckets:
+i1 = H(x) mod b
+i2 = i1 XOR H(f) mod b   (so moving f between i1 and i2 preserves alternation)
+
+Buckets (capacity 2 each), showing fingerprints as hex bytes:
+
+Start (empty):
+[0]: [  -- , -- ]   [1]: [ -- , -- ]   [2]: [ -- , -- ]   [3]: [ -- , -- ]
+[4]: [  -- , -- ]   [5]: [ -- , -- ]   [6]: [ -- , -- ]   [7]: [ -- , -- ]
+
+INSERT "cat": f=0xA7, i1=1, i2=1 XOR H(0xA7)=5
+- Bucket 1 has space → place 0xA7 in [1]
+
+[1]: [ A7 , -- ]
+
+INSERT "dog": f=0x3C, i1=5, i2=5 XOR H(0x3C)=2
+- Bucket 5 has space → place 0x3C in [5]
+
+[5]: [ 3C , -- ]
+
+INSERT "eel": f=0xD2, i1=1, i2=1 XOR H(0xD2)=4
+- Bucket 1 has one free slot → place 0xD2 in [1]
+
+[1]: [ A7 , D2 ]
+
+LOOKUP "cat":
+- Compute f=0xA7, check buckets 1 and 5 → found in bucket 1 → MAYBE PRESENT
+
+LOOKUP "fox":
+- Compute f=0x9B, buckets say 0 and 7 → fingerprint not in [0] or [7]
+→ DEFINITELY NOT PRESENT
+
+If an insertion finds both buckets full:
+- Evict one resident fingerprint (“cuckoo kick”), move it to its alternate bucket,
+  possibly triggering a chain; if a loop is detected, resize/rehash.
+```
+
+* Stores **fingerprints**, not full keys; answers **maybe present** / **definitely not present**.
+* Supports **deletion** by removing a matching fingerprint from either bucket.
+* Very high load factors (often 90%+ with small buckets) and excellent cache locality.
+* False-positive rate controlled by fingerprint length (more bits → lower FPR).
+* Insertions can trigger **eviction chains**; worst case requires a **rehash/resize**.
+* Two buckets per item (or more in variants); lookups check a tiny, fixed set of places.
+
+### String Search Algorithms
+
+* **KMP:** Best all-rounder for guaranteed **O(n + m)** and tiny memory.
+* **Boyer–Moore:** Fastest in practice on long patterns / large alphabets due to big skips.
+* **Rabin–Karp:** Great for **many patterns** or streaming; hashing enables batched checks.
+* **Naive:** Fine for tiny inputs or as a baseline; simplest to reason about.
+
+#### Naive String Search
+
+Slide the pattern one position at a time over the text; at each shift compare characters left-to-right until a mismatch or a full match.
+
+**Example inputs and outputs**
+
+* Text: `"abracadabra"`, Pattern: `"abra"` → Output: matches at indices **0** and **7**
+* Text: `"aaaaa"`, Pattern: `"aaa"` → Output: matches at indices **0**, **1**, **2**
+
+**How it works**
+
+```
+Text (index): 0 1 2 3 4 5 6 7 8 9 10
+              a b r a c a d a b r  a
+Pattern:      a b r a
+
+Shift 0:
+a b r a
+a b r a   ← all match → REPORT 0
+
+Shift 1:
+  a b r a
+  b r a c ← mismatch at first char → advance by 1
+
+Shift 2:
+    a b r a
+    r a c a ← mismatch → advance
+
+Shift 3:
+      a b r a
+      a c a d ← mismatch → advance
+
+Shift 4:
+        a b r a
+        c a d a ← mismatch → advance
+
+Shift 5:
+          a b r a
+          a d a b ← mismatch → advance
+
+Shift 6:
+            a b r a
+            d a b r ← mismatch → advance
+
+Shift 7:
+              a b r a
+              a b r a ← all match → REPORT 7
+```
+
+* Works anywhere; no preprocessing.
+* Time: worst/average **O(n·m)** (text length n, pattern length m).
+* Space: **O(1)**.
+* Good for very short patterns or tiny inputs; otherwise use KMP/BM/RK.
+
+#### Knuth–Morris–Pratt (KMP)
+
+Precompute a table (LPS / prefix-function) for the pattern so that on a mismatch you “jump” the pattern to the longest proper prefix that is also a suffix, avoiding rechecks.
+
+**Example inputs and outputs**
+
+* Text: `"ababcabcabababd"`, Pattern: `"ababd"` → Output: match at index **10**
+* Text: `"aaaaab"`, Pattern: `"aaab"` → Output: match at index **2**
+
+**How it works**
+```
+1) Precompute LPS (Longest Proper Prefix that is also Suffix) for the pattern.
+
+Pattern:  a  b  a  b  d
+Index:    0  1  2  3  4
+LPS:      0  0  1  2  0
+
+Meaning: at each position, how far can we "fall back" within the pattern itself
+to avoid rechecking text characters.
+
+2) Scan the text with two pointers i (text), j (pattern):
+
+Text:    a b a b c a b c a b a b a b d
+Index:   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
+Pattern: a b a b d
+
+Walkthrough (only key steps shown):
+
+- i=0..3 match "abab" (j=4 points to 'd'), then
+  i=4 is 'c' vs pattern[j]='d' → mismatch
+  → set j = LPS[j-1] = LPS[3] = 2  (jump pattern back to "ab")
+  (i stays at 4; we do NOT recheck earlier text chars)
+
+- Continue matching; eventually at i=14, j advances to 5 (pattern length)
+  → FULL MATCH ends at i=14 → start index = 14 - 5 + 1 = 10 → REPORT 10
+```
+
+* Time: **O(n + m)** (preprocessing + scan).
+* Space: **O(m)** for LPS table.
+* Never moves i backward; avoids redundant comparisons.
+* Ideal for repeated searches with the same pattern.
+* LPS is also called prefix-function / failure-function.
+
+#### Boyer–Moore (BM)
+
+Compare the pattern right-to-left; on a mismatch, skip ahead using bad-character and good-suffix rules so many text characters are never touched.
+
+**Example inputs and outputs**
+
+* Text: `"HERE IS A SIMPLE EXAMPLE"`, Pattern: `"EXAMPLE"` → Output: match at index **17**
+* Text: `"NEEDLE IN A HAYSTACK"`, Pattern: `"STACK"` → Output: match at index **15**
+
+**How it works**
+
+```
+Idea: align pattern under text, compare RIGHT→LEFT.
+On mismatch, shift by the MAX of:
+  - Bad-character shift: move so the mismatching text char lines up with its last
+    occurrence in the pattern (or skip past if absent).
+  - Good-suffix shift: if a suffix matched, align another occurrence of that suffix
+    (or a prefix) with the text.
+
+Example (bad-character only shown for brevity):
+Text (0..):  H E R E _ I S _ A _ S I M P L E _ E X A M P L E
+Pattern:                       E X A M P L E
+                               ↑ compare from here (rightmost)
+
+1) Align at text index 10..16 ("SIMPLE"):
+   compare L→R? No, BM compares R→L:
+   E vs E (ok), L vs L (ok), P vs P (ok), M vs M (ok), A vs A (ok), X vs I (mismatch)
+   Bad char = 'I' in text; last 'I' in pattern? none → shift pattern PAST 'I'
+
+2) After shifts, eventually align under "... E X A M P L E":
+   Compare from right:
+   E=E, L=L, P=P, M=M, A=A, X=X, E=E → FULL MATCH at index 17
+```
+
+* Average case sublinear (often skips large chunks of text).
+* Worst case can be **O(n·m)**; with both rules + Galil’s optimization, comparisons can be bounded **O(n + m)**.
+* Space: **O(σ + m)** for tables (σ = alphabet size).
+* Shines on long patterns over large alphabets (e.g., English text, logs).
+* Careful table prep (bad-character & good-suffix) is crucial.
+
+#### Rabin–Karp (RK)
+
+Compare rolling hashes of the current text window and the pattern; only if hashes match do a direct character check (to rule out collisions).
+
+**Example inputs and outputs**
+
+* Text: `"ABCDABCABCD"`, Pattern: `"ABC"` → Output: matches at indices **0**, **4**, **7**
+* Text: `"ABCDE"`, Pattern: `"FG"` → Output: **no match**
+
+**How it works**
+
+```
+Pick a base B and modulus M. Compute:
+- pattern hash H(P)
+- rolling window hash H(T[i..i+m-1]) for each window of length m
+
+Example windows (conceptual; showing only positions, not numbers):
+
+Text:   A B C D A B C A B C D
+Index:  0 1 2 3 4 5 6 7 8 9 10
+Pat:    A B C      (m = 3)
+
+Windows & hashes:
+[0..2] ABC → hash h0
+[1..3] BCD → hash h1  (derived from h0 by removing 'A', adding 'D')
+[2..4] CDA → hash h2
+[3..5] DAB → hash h3
+[4..6] ABC → hash h4 (equals H(P) → verify chars → MATCH at 4)
+[5..7] BCA → hash h5
+[6..8] CAB → hash h6
+[7..9] ABC → hash h7 (equals H(P) → verify → MATCH at 7)
+
+Rolling update (conceptually):
+h_next = (B*(h_curr - value(left_char)*B^(m-1)) + value(new_char)) mod M
+Only on hash equality do we compare characters to avoid false positives.
+```
+
+* Expected time **O(n + m)** with a good modulus and low collision rate; worst case **O(n·m)** if many collisions.
+* Space: **O(1)** beyond the text/pattern and precomputed powers.
+* Excellent for multi-pattern search (compute many pattern hashes, reuse rolling windows).
+* Choose modulus to reduce collisions; verify on hash hits to ensure correctness.
+* Works naturally on streams/very large texts since it needs only the current window.

From 331a7fb67fb0f4311724f517c443e783904828d8 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Fri, 29 Aug 2025 19:59:36 +0200
Subject: [PATCH 17/48] Update searching.md

---
 notes/searching.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes/searching.md b/notes/searching.md
index df2398a..b0c0f48 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -60,7 +60,7 @@ Checks: (1≠9) → (2≠9) → (3≠9) → end → not found
 * Simple and dependable for short or unsorted data.
 * Assumes 0-based indexing in these notes.
 
-### Sentinel Linear Search
+#### Sentinel Linear Search
 
 Place one copy of the target at the very end as a “sentinel” so the scan can run without checking bounds each step; afterward, decide whether the match was inside the original list or only at the sentinel position.
 
@@ -634,7 +634,7 @@ LOOKUP "alpha": counters at {2,5,9} = {0,1,0}
 * Watch for counter **saturation** (caps at max value) and **underflow** (don’t decrement below 0).
 * Good for dynamic sets with frequent inserts and deletes.
 
-##### Cuckoo Filter
+#### Cuckoo Filter
 Hash-table–style filter that stores short **fingerprints** in two possible buckets; supports **insert, lookup, delete** with low false-positive rates and high load factors.
 
 **Example inputs and outputs**

From 35f67bf2e88d7349ee84dfc8c172f10f75fedbb6 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 30 Aug 2025 14:07:31 +0200
Subject: [PATCH 18/48] Refactor examples in searching.md to LaTeX format

Updated examples in the searching documentation to use LaTeX formatting for inputs and outputs.
---
 notes/searching.md | 707 +++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 650 insertions(+), 57 deletions(-)

diff --git a/notes/searching.md b/notes/searching.md
index b0c0f48..f667cea 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -10,9 +10,35 @@ Scan the list from left to right, comparing the target with each element until y
 
 **Example inputs and outputs**
 
-* Input: list = \[7, 3, 5, 2, 9], target = 5 → Output: index = 2
-* Input: list = \[4, 4, 4], target = 4 → Output: index = 0 (first match)
-* Input: list = \[10, 20, 30], target = 25 → Output: not found
+*Example 1*
+
+$$
+\text{Input: } [7, 3, 5, 2, 9], \quad \text{target} = 5
+$$
+
+$$
+\text{Output: } \text{index} = 2
+$$
+
+*Example 2*
+
+$$
+\text{Input: } [4, 4, 4], \quad \text{target} = 4
+$$
+
+$$
+\text{Output: } \text{index} = 0 \; (\text{first match})
+$$
+
+*Example 3*
+
+$$
+\text{Input: } [10, 20, 30], \quad \text{target} = 25
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
 
 **How it works**
 
@@ -66,8 +92,25 @@ Place one copy of the target at the very end as a “sentinel” so the scan can
 
 **Example inputs and outputs**
 
-* Input: list = \[12, 8, 6, 15], target = 6 → Output: index = 2
-* Input: list = \[2, 4, 6, 8], target = 5 → Output: not found (only the sentinel matched)
+*Example 1*
+
+$$
+\text{Input: } [12, 8, 6, 15], \quad \text{target} = 6
+$$
+
+$$
+\text{Output: } \text{index} = 2
+$$
+
+*Example 2*
+
+$$
+\text{Input: } [2, 4, 6, 8], \quad \text{target} = 5
+$$
+
+$$
+\text{Output: } \text{not found } \; (\text{only the sentinel matched})
+$$
 
 **How it works**
 
@@ -125,9 +168,36 @@ On a sorted array, repeatedly halve the search interval by comparing the target
 
 **Example inputs and outputs**
 
-* Input: A = \[2, 5, 8, 12, 16, 23, 38], target = 16 → Output: index = 4
-* Input: A = \[1, 3, 3, 3, 9], target = 3 → Output: index = 2 (any valid match; first/last requires a slight variant)
-* Input: A = \[10, 20, 30, 40], target = 35 → Output: not found
+*Example 1*
+
+$$
+\text{Input: } A = [2, 5, 8, 12, 16, 23, 38], \quad \text{target} = 16
+$$
+
+$$
+\text{Output: } \text{index} = 4
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [1, 3, 3, 3, 9], \quad \text{target} = 3
+$$
+
+$$
+\text{Output: } \text{index} = 2 \quad (\text{any valid match; first/last needs a variant})
+$$
+
+*Example 3*
+
+$$
+\text{Input: } A = [10, 20, 30, 40], \quad \text{target} = 35
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
+
 
 **How it works**
 
@@ -161,8 +231,25 @@ Like binary, but splits the current interval into three parts using two midpoint
 
 **Example inputs and outputs**
 
-* Input: A = \[1, 4, 7, 9, 12, 15], target = 9 → Output: index = 3
-* Input: A = \[2, 6, 10, 14], target = 5 → Output: not found
+*Example 1*
+
+$$
+\text{Input: } A = [1, 4, 7, 9, 12, 15], \quad \text{target} = 9
+$$
+
+$$
+\text{Output: } \text{index} = 3
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [2, 6, 10, 14], \quad \text{target} = 5
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
 
 **How it works**
 
@@ -195,8 +282,25 @@ On a sorted array, jump ahead in fixed block sizes to find the block that may co
 
 **Example inputs and outputs**
 
-* Input: A = \[1, 4, 9, 16, 25, 36, 49], target = 25, jump = ⌊√7⌋=2 → Output: index = 4
-* Input: A = \[3, 8, 15, 20, 22, 27], target = 21, jump = 2 → Output: not found
+*Example 1*
+
+$$
+\text{Input: } A = [1, 4, 9, 16, 25, 36, 49], \quad \text{target} = 25, \quad \text{jump} = \lfloor \sqrt{7} \rfloor = 2
+$$
+
+$$
+\text{Output: } \text{index} = 4
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [3, 8, 15, 20, 22, 27], \quad \text{target} = 21, \quad \text{jump} = 2
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
 
 **How it works**
 
@@ -227,8 +331,25 @@ On a sorted array, grow the right boundary exponentially (1, 2, 4, 8, …) to fi
 
 **Example inputs and outputs**
 
-* Input: A = \[2, 3, 5, 7, 11, 13, 17, 19, 23], target = 19 → Output: index = 7
-* Input: A = \[10, 20, 30, 40, 50], target = 12 → Output: not found
+*Example 1*
+
+$$
+\text{Input: } A = [2, 3, 5, 7, 11, 13, 17, 19, 23], \quad \text{target} = 19
+$$
+
+$$
+\text{Output: } \text{index} = 7
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [10, 20, 30, 40, 50], \quad \text{target} = 12
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
 
 **How it works**
 
@@ -260,9 +381,35 @@ On a sorted (roughly uniformly distributed) array, estimate the likely position
 
 **Example inputs and outputs**
 
-* Input: A = \[10, 20, 30, 40, 50, 60, 70], target = 55 → Output: not found (probes near index 4–5)
-* Input: A = \[5, 15, 25, 35, 45, 55, 65], target = 45 → Output: index = 4
-* Input: A = \[1, 1000, 1001, 1002], target = 2 → Output: not found (bad distribution for interpolation)
+*Example 1*
+
+$$
+\text{Input: } A = [10, 20, 30, 40, 50, 60, 70], \quad \text{target} = 55
+$$
+
+$$
+\text{Output: } \text{not found } \; (\text{probes near indices } 4\text{–}5)
+$$
+
+*Example 2*
+
+$$
+\text{Input: } A = [5, 15, 25, 35, 45, 55, 65], \quad \text{target} = 45
+$$
+
+$$
+\text{Output: } \text{index} = 4
+$$
+
+*Example 3*
+
+$$
+\text{Input: } A = [1, 1000, 1001, 1002], \quad \text{target} = 2
+$$
+
+$$
+\text{Output: } \text{not found } \; (\text{bad distribution for interpolation})
+$$
 
 **How it works**
 
@@ -304,8 +451,29 @@ Map a key to an array index with a hash function; look at that bucket to find th
 
 **Example inputs and outputs**
 
-* Table size m = 7; keys stored = {10, 24, 31}; target = 24 → Output: “found (bucket 3)”
-* Same table; target = 18 → Output: “not found”
+*Example 1*
+
+$$
+\text{Table size: } m = 7, 
+\quad \text{Keys stored: } \{10, 24, 31\}, 
+\quad \text{Target: } 24
+$$
+
+$$
+\text{Output: } \text{found (bucket 3)}
+$$
+
+*Example 2*
+
+$$
+\text{Table size: } m = 7, 
+\quad \text{Keys stored: } \{10, 24, 31\}, 
+\quad \text{Target: } 18
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
 
 **How it works**
 
@@ -339,8 +507,29 @@ Keep everything in one array; on collision, probe alternative positions in a det
 
 **Example inputs and outputs**
 
-* m = 10; stored = {12, 22, 32}; target = 22 → Output: “found (index 3)”
-* Same table; target = 42 → Output: “not found”
+*Example 1*
+
+$$
+m = 10, 
+\quad \text{Stored keys: } \{12, 22, 32\}, 
+\quad \text{Target: } 22
+$$
+
+$$
+\text{Output: } \text{found (index 3)}
+$$
+
+*Example 2*
+
+$$
+m = 10, 
+\quad \text{Stored keys: } \{12, 22, 32\}, 
+\quad \text{Target: } 42
+$$
+
+$$
+\text{Output: } \text{not found}
+$$
 
 **How it works**
 
@@ -376,8 +565,37 @@ Search(42):
 
 **Example inputs and outputs**
 
-* m = 11 (prime); stored = {22, 33, 44}; target = 33 → Output: “found (index 4)”
-* Same table; target = 55 → Output: “not found”
+*Example 1*
+
+$$
+m = 11 \; (\text{prime}), 
+\quad \text{Stored keys: } \{22, 33, 44\}, 
+\quad \text{Target: } 33
+$$
+
+$$
+h(k) = k \bmod m, \quad h(33) = 33 \bmod 11 = 0
+$$
+
+$$
+\text{Output: found (index 4)}
+$$
+
+*Example 2*
+
+$$
+m = 11 \; (\text{prime}), 
+\quad \text{Stored keys: } \{22, 33, 44\}, 
+\quad \text{Target: } 55
+$$
+
+$$
+h(55) = 55 \bmod 11 = 0
+$$
+
+$$
+\text{Output: not found}
+$$
 
 **How it works**
 
@@ -415,9 +633,74 @@ Search(55):
 
 **Example inputs and outputs**
 
-* m = 11; h₁(k) = k mod 11; h₂(k) = 1 + (k mod 10)
-* Stored = {22, 33, 44}; target = 33 → Output: “found (index 4)”
-* Same table; target = 55 → Output: “not found”
+*Hash functions*
+
+$$
+h_{1}(k) = k \bmod 11, 
+\quad h_{2}(k) = 1 + (k \bmod 10)
+$$
+
+Probing sequence:
+
+$$
+h(k,i) = \big(h_{1}(k) + i \cdot h_{2}(k)\big) \bmod 11
+$$
+
+*Example 1*
+
+$$
+m = 11, 
+\quad \text{Stored keys: } \{22, 33, 44\}, 
+\quad \text{Target: } 33
+$$
+
+* For $k = 33$:
+
+$$
+h_{1}(33) = 33 \bmod 11 = 0,
+\quad h_{2}(33) = 1 + (33 \bmod 10) = 1 + 3 = 4
+$$
+
+So probe sequence is
+
+$$
+h(33,0) = 0,\;
+h(33,1) = (0 + 1\cdot 4) \bmod 11 = 4,\;
+h(33,2) = (0 + 2\cdot 4) \bmod 11 = 8,\; \dots
+$$
+
+Since the stored layout places $33$ at index $4$, the search succeeds.
+
+$$
+\text{Output: found (index 4)}
+$$
+
+*Example 2*
+
+$$
+m = 11, 
+\quad \text{Stored keys: } \{22, 33, 44\}, 
+\quad \text{Target: } 55
+$$
+
+* For $k = 55$:
+
+$$
+h_{1}(55) = 55 \bmod 11 = 0,
+\quad h_{2}(55) = 1 + (55 \bmod 10) = 1 + 5 = 6
+$$
+
+Probing sequence:
+
+$$
+0, \; (0+6)\bmod 11 = 6,\; (0+2\cdot 6)\bmod 11 = 1,\; (0+3\cdot 6)\bmod 11 = 7,\; \dots
+$$
+
+No slot matches $55$.
+
+$$
+\text{Output: not found}
+$$
 
 **How it works**
 
@@ -455,9 +738,62 @@ Each array cell holds a small container (e.g., a linked list); colliding keys li
 
 **Example inputs and outputs**
 
-* m = 5; buckets hold lists
-* Stored = {12, 22, 7, 3, 14}; target = 22 → Output: “found (bucket 2, position 2)”
-* Same table; target = 9 → Output: “not found”
+*Setup*
+
+$$
+m = 5, \quad h(k) = k \bmod 5, \quad \text{buckets hold linked lists}
+$$
+
+Keys stored:
+
+$$
+\{12, 22, 7, 3, 14\}
+$$
+
+Bucket contents after hashing:
+
+$$
+\begin{aligned}
+h(12) &= 12 \bmod 5 = 2 &\;\;\Rightarrow& \;\; \text{bucket 2: } [12] \\[6pt]
+h(22) &= 22 \bmod 5 = 2 &\;\;\Rightarrow& \;\; \text{bucket 2: } [12, 22] \\[6pt]
+h(7)  &= 7  \bmod 5 = 2 &\;\;\Rightarrow& \;\; \text{bucket 2: } [12, 22, 7] \\[6pt]
+h(3)  &= 3  \bmod 5 = 3 &\;\;\Rightarrow& \;\; \text{bucket 3: } [3] \\[6pt]
+h(14) &= 14 \bmod 5 = 4 &\;\;\Rightarrow& \;\; \text{bucket 4: } [14]
+\end{aligned}
+$$
+
+*Example 1*
+
+$$
+\text{Target: } 22
+$$
+
+$$
+h(22) = 2 \;\;\Rightarrow\;\; \text{bucket 2} = [12, 22, 7]
+$$
+
+Found at **position 2** in the list.
+
+$$
+\text{Output: found (bucket 2, position 2)}
+$$
+
+*Example 2*
+
+$$
+\text{Target: } 9
+$$
+
+$$
+h(9) = 9 \bmod 5 = 4 \;\;\Rightarrow\;\; \text{bucket 4} = [14]
+$$
+
+No match.
+
+$$
+\text{Output: not found}
+$$
+
 
 **How it works**
 
@@ -488,9 +824,57 @@ Keep two (or more) hash positions per key; insert by “kicking out” occupants
 
 **Example inputs and outputs**
 
-* Two tables T₁ and T₂ (same size m = 5) with two hashes h₁, h₂
-* Inserted keys produce relocations; target = 15 → Output: “found in T₂ at index 4”
-* If insertion loops (cycle), rebuild with new hash functions (rehash)
+*Setup*
+Two hash tables $T_{1}$ and $T_{2}$, each of size
+
+$$
+m = 5
+$$
+
+Two independent hash functions:
+
+$$
+h_{1}(k), \quad h_{2}(k)
+$$
+
+Cuckoo hashing invariant:
+
+* Each key is stored either in $T_{1}[h_{1}(k)]$ or $T_{2}[h_{2}(k)]$.
+* On insertion, if a spot is occupied, the existing key is **kicked out** and reinserted into the other table.
+* If relocations form a cycle, the table is **rebuilt (rehash)** with new hash functions.
+
+*Example 1*
+
+$$
+\text{Target: } 15
+$$
+
+Lookup procedure:
+
+1. Check $T_{1}[h_{1}(15)]$.
+2. If not found, check $T_{2}[h_{2}(15)]$.
+
+Result:
+
+$$
+\text{found in } T_{2} \text{ at index } 4
+$$
+
+$$
+\text{Output: found (T₂, index 4)}
+$$
+
+*Example 2*
+
+If insertion causes repeated displacements and eventually loops:
+
+$$
+\text{Cycle detected } \;\;\Rightarrow\;\; \text{rehash with new } h_{1}, h_{2}
+$$
+
+$$
+\text{Output: rebuild / rehash required}
+$$
 
 **How it works**
 
@@ -550,11 +934,54 @@ Space-efficient structure for fast membership tests; answers **“maybe present
 
 **Example inputs and outputs**
 
-* Setup: m = 16 bits, k = 3 hash functions (h₁, h₂, h₃).
-* Insert: {"cat", "dog"}
-* `contains("cat")` → **maybe present** (actual member)
-* `contains("cow")` → **definitely not present** (one probed bit is 0)
-* `contains("eel")` → **maybe present** (all probed bits happen to be 1 → **false positive**)
+*Setup*
+
+$$
+m = 16 \; \text{bits}, 
+\quad k = 3 \; \text{hash functions } (h_{1}, h_{2}, h_{3})
+$$
+
+Inserted set:
+
+$$
+\{"cat", "dog"\}
+$$
+
+*Example 1*
+
+$$
+\text{Query: contains("cat")}
+$$
+
+All $h_{i}(\text{"cat"})$ bits are set → actual member.
+
+$$
+\text{Output: maybe present (true positive)}
+$$
+
+*Example 2*
+
+$$
+\text{Query: contains("cow")}
+$$
+
+One probed bit = 0 → cannot be present.
+
+$$
+\text{Output: definitely not present}
+$$
+
+*Example 3*
+
+$$
+\text{Query: contains("eel")}
+$$
+
+All $h_{i}(\text{"eel"})$ bits happen to be set, even though "eel" was never inserted.
+
+$$
+\text{Output: maybe present (false positive)}
+$$
 
 **How it works**
 
@@ -598,12 +1025,56 @@ Bloom filter variant that keeps a small counter per bit so you can **delete** by
 
 **Example inputs and outputs**
 
-* Setup: m = 12 counters (each 2–4 bits), k = 3 hash functions.
-* Insert: {"alpha", "beta"}
-* Delete: remove "alpha".
-* `contains("alpha")` after deletion → **definitely not present** (one counter back to 0)
-* `contains("beta")` → **maybe present**
-* `contains("gamma")` → **definitely not present** (some counter = 0)
+*Setup*
+
+$$
+m = 12 \;\; \text{counters (each 2–4 bits)}, 
+\quad k = 3 \;\; \text{hash functions}
+$$
+
+Inserted set:
+
+$$
+\{\text{"alpha"}, \; \text{"beta"}\}
+$$
+
+Then delete `"alpha"`.
+
+*Example 1*
+
+$$
+\text{Query: contains("alpha")}
+$$
+
+Counters for `"alpha"` decremented; at least one probed counter is now $0$.
+
+$$
+\text{Output: definitely not present}
+$$
+
+*Example 2*
+
+$$
+\text{Query: contains("beta")}
+$$
+
+All three counters for `"beta"` remain $>0$.
+
+$$
+\text{Output: maybe present}
+$$
+
+*Example 3*
+
+$$
+\text{Query: contains("gamma")}
+$$
+
+At least one probed counter is $0$.
+
+$$
+\text{Output: definitely not present}
+$$
 
 **How it works**
 
@@ -639,13 +1110,57 @@ Hash-table–style filter that stores short **fingerprints** in two possible buc
 
 **Example inputs and outputs**
 
-* Setup: bucket array with **b = 8** buckets, **bucket size = 2**, **fingerprint = 8 bits**.
-* Insert: {"cat", "dog", "eel"} (each stored as short fingerprints).
-* Query:
+*Setup*
+
+$$
+b = 8 \;\; \text{buckets}, 
+\quad \text{bucket size} = 2, 
+\quad \text{fingerprint size} = 8 \; \text{bits}
+$$
+
+Inserted set:
+
+$$
+\{\text{"cat"}, \; \text{"dog"}, \; \text{"eel"}\}
+$$
+
+Each element is stored as a short fingerprint in one of two candidate buckets.
+
+*Example 1*
+
+$$
+\text{Query: contains("cat")}
+$$
+
+Fingerprint for `"cat"` is present in one of its candidate buckets.
+
+$$
+\text{Output: maybe present (true positive)}
+$$
+
+*Example 2*
+
+$$
+\text{Query: contains("fox")}
+$$
+
+Fingerprint for `"fox"` is absent from both candidate buckets.
+
+$$
+\text{Output: definitely not present}
+$$
+
+*Example 3 (Deletion)*
+
+$$
+\text{Operation: remove("dog")}
+$$
+
+Fingerprint for `"dog"` is removed from its bucket.
 
-  * `contains("cat")` → **maybe present** (fingerprint found in one of its two buckets)
-  * `contains("fox")` → **definitely not present** (fingerprint absent from both)
-* Delete: `remove("dog")` → fingerprint removed from its bucket.
+$$
+\text{Result: deletion supported directly by removing the fingerprint}
+$$
 
 **How it works**
 
@@ -708,8 +1223,27 @@ Slide the pattern one position at a time over the text; at each shift compare ch
 
 **Example inputs and outputs**
 
-* Text: `"abracadabra"`, Pattern: `"abra"` → Output: matches at indices **0** and **7**
-* Text: `"aaaaa"`, Pattern: `"aaa"` → Output: matches at indices **0**, **1**, **2**
+*Example 1*
+
+$$
+\text{Text: } "abracadabra", 
+\quad \text{Pattern: } "abra"
+$$
+
+$$
+\text{Output: matches at indices } \; 0 \;\; \text{and} \;\; 7
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "aaaaa", 
+\quad \text{Pattern: } "aaa"
+$$
+
+$$
+\text{Output: matches at indices } \; 0, \; 1, \; 2
+$$
 
 **How it works**
 
@@ -762,8 +1296,27 @@ Precompute a table (LPS / prefix-function) for the pattern so that on a mismatch
 
 **Example inputs and outputs**
 
-* Text: `"ababcabcabababd"`, Pattern: `"ababd"` → Output: match at index **10**
-* Text: `"aaaaab"`, Pattern: `"aaab"` → Output: match at index **2**
+*Example 1*
+
+$$
+\text{Text: } "ababcabcabababd", 
+\quad \text{Pattern: } "ababd"
+$$
+
+$$
+\text{Output: match at index } 10
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "aaaaab", 
+\quad \text{Pattern: } "aaab"
+$$
+
+$$
+\text{Output: match at index } 2
+$$
 
 **How it works**
 ```
@@ -805,8 +1358,28 @@ Compare the pattern right-to-left; on a mismatch, skip ahead using bad-character
 
 **Example inputs and outputs**
 
-* Text: `"HERE IS A SIMPLE EXAMPLE"`, Pattern: `"EXAMPLE"` → Output: match at index **17**
-* Text: `"NEEDLE IN A HAYSTACK"`, Pattern: `"STACK"` → Output: match at index **15**
+*Example 1*
+
+$$
+\text{Text: } "HERE IS A SIMPLE EXAMPLE", 
+\quad \text{Pattern: } "EXAMPLE"
+$$
+
+$$
+\text{Output: match at index } 17
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "NEEDLE IN A HAYSTACK", 
+\quad \text{Pattern: } "STACK"
+$$
+
+$$
+\text{Output: match at index } 15
+$$
+
 
 **How it works**
 
@@ -845,8 +1418,28 @@ Compare rolling hashes of the current text window and the pattern; only if hashe
 
 **Example inputs and outputs**
 
-* Text: `"ABCDABCABCD"`, Pattern: `"ABC"` → Output: matches at indices **0**, **4**, **7**
-* Text: `"ABCDE"`, Pattern: `"FG"` → Output: **no match**
+*Example 1*
+
+$$
+\text{Text: } "ABCDABCABCD", 
+\quad \text{Pattern: } "ABC"
+$$
+
+$$
+\text{Output: matches at indices } 0, \; 4, \; 7
+$$
+
+*Example 2*
+
+$$
+\text{Text: } "ABCDE", 
+\quad \text{Pattern: } "FG"
+$$
+
+$$
+\text{Output: no match}
+$$
+
 
 **How it works**
 

From a2656bd15a8b66dfcbfa89faef266735390541b0 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 30 Aug 2025 14:37:44 +0200
Subject: [PATCH 19/48] Enhance searching algorithms documentation

Updated explanations and formatting for linear search, sentinel search, binary search, ternary search, and jump search in the notes.
---
 notes/searching.md | 265 +++++++++++++++++++++++++++++++--------------
 1 file changed, 185 insertions(+), 80 deletions(-)

diff --git a/notes/searching.md b/notes/searching.md
index f667cea..33c9219 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -40,42 +40,66 @@ $$
 \text{Output: } \text{not found}
 $$
 
-**How it works**
+**How Linear Search Works**
+
+We start at index `0`, compare the value with the target, and keep moving right until we either **find it** or reach the **end**.
 
-Start at index 0, compare, move right; stop on first equal or after the last element.
+Target **5** in `[7, 3, 5, 2, 9]`
 
 ```
-Indexes:     0    1    2    3    4
-List:      [ 7 ][ 3 ][ 5 ][ 2 ][ 9 ]
+Indexes:   0    1    2    3    4
+List:     [7]  [3]  [5]  [2]  [9]
 Target: 5
+```
 
-Pass 1: pointer at 0 → compare 7 vs 5  → no
-          v
-Indexes:   0    1    2    3    4
-           |    
-List:      7    3    5    2    9
+*Step 1:* pointer at index 0
 
-Pass 2: pointer at 1 → compare 3 vs 5  → no
-               v
-Indexes:   0    1    2    3    4
-                |   
-List:      7    3    5    2    9
+```
+|
+v
+7   3   5   2   9
 
-Pass 3: pointer at 2 → compare 5 vs 5  → YES → return 2
-                    v
-Indexes:   0    1    2    3    4
-                     |  
-List:      7    3    5    2    9
+→ compare 7 vs 5  → no
 ```
 
-**Worst case (not found):** you compare every element and then stop.
+*Step 2:* pointer moves to index 1
+
+```    
+    |
+    v
+7   3   5   2   9
 
+→ compare 3 vs 5  → no
 ```
-Indexes:     0    1    2
-List:      [ 1 ][ 2 ][ 3 ]
+
+*Step 3:* pointer moves to index 2
+
+```
+        |
+        v
+7   3   5   2   9
+
+→ compare 5 vs 5  → YES ✅ → return index 2
+```
+
+**Worst Case (Not Found)**
+
+Target **9** in `[1, 2, 3]`
+
+```
+Indexes:   0    1    2
+List:     [1]  [2]  [3]
 Target: 9
+```
 
-Checks: (1≠9) → (2≠9) → (3≠9) → end → not found
+Checks:
+
+```
+→  1 ≠ 9
+→  2 ≠ 9
+→  3 ≠ 9
+→  end
+→ not found ❌
 ```
 
 * Works on any list; no sorting or structure required.
@@ -116,40 +140,55 @@ $$
 
 Put the target at one extra slot at the end so the loop is guaranteed to stop on a match; afterward, check whether the match was inside the original range.
 
+Target **11** not in the list
+
 ```
-Original length n = 5
-Before:   [ 4 ][ 9 ][ 1 ][ 7 ][ 6 ]
+Original list (n=5):
+[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ]
 Target: 11
+```
 
 Add sentinel (extra slot):
-          [ 4 ][ 9 ][ 1 ][ 7 ][ 6 ][ 11 ]
-Indexes:    0    1    2    3    4     5  ← sentinel position
 
-Scan left→right until you see 11:
+```
+[ 4 ][ 9 ][ 1 ][ 7 ][ 6 ][ 11 ]
+  0    1    2    3    4     5  ← sentinel
+```
 
-Step 1: 4 ≠ 11
-          ^
-Step 2: 9 ≠ 11
-               ^
-Step 3: 1 ≠ 11
-                    ^
-Step 4: 7 ≠ 11
-                         ^
-Step 5: 6 ≠ 11
-                              ^
-Step 6: 11 (match at index 5, which is the sentinel)
+Scan step by step:
 
-Because the first match is at index 5 (the sentinel position), the target was not in the original indexes 0..4 → report “not found”.
 ```
+4 ≠ 11   → pointer at 0
+9 ≠ 11   → pointer at 1
+1 ≠ 11   → pointer at 2
+7 ≠ 11   → pointer at 3
+6 ≠ 11   → pointer at 4
+11 = 11  → pointer at 5 (sentinel)
+```
+
+Therefore, **not found** in original list.
 
-**When the target exists inside the list:**
+Target **6** inside the list
 
 ```
-List:      [ 12 ][ 8 ][ 6 ][ 15 ]   n = 4
+Original list (n=4):
+[ 12 ][ 8 ][ 6 ][ 15 ]
 Target: 6
-With sentinel: [ 12 ][ 8 ][ 6 ][ 15 ][ 6 ]
+```
+
+Add sentinel:
+
+```
+[ 12 ][ 8 ][ 6 ][ 15 ][ 6 ]
+  0     1    2    3     4
+```
+
+Scan:
 
-Scan: 12≠6 → 8≠6 → 6=6 (index 2 < n) → real match at 2
+```
+12 ≠ 6   → index 0
+ 8 ≠ 6   → index 1
+ 6 = 6   → index 2 ✅
 ```
 
 * Removes the per-iteration “have we reached the end?” check; the sentinel guarantees termination.
@@ -198,28 +237,61 @@ $$
 \text{Output: } \text{not found}
 $$
 
-
 **How it works**
 
+We repeatedly check the **middle** element, and then discard half the list based on comparison.
+
+Find **16** in:
+
+```
+A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i =    0     1     2     3     4     5     6
 ```
-Sorted A:  [ 2 ][ 5 ][ 8 ][12 ][16 ][23 ][38 ]
-Indexes:     0    1    2   3    4    5    6
-Target: 16
 
-1) low=0, high=6  → mid=(0+6)//2=3
-   A[3]=12 < 16 → discard left half up to mid, keep [mid+1..high]
-         [ 2 ][ 5 ][ 8 ]  |[12 ]| [16 ][23 ][38 ]
-                          low=4               high=6
+*Step 1*
 
-2) low=4, high=6 → mid=(4+6)//2=5
-   A[5]=23 > 16 → discard right half after mid, keep [low..mid-1]
-         [16 ][23 ]|[38 ]
-        low=4  high=4
+```
+low = 0, high = 6
+mid = (0+6)//2 = 3
+A[3] = 12 < 16  →  target is to the RIGHT  →  new low = mid + 1 = 4
+
+A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i =    0     1     2     3     4     5     6
+        ↑L                ↑M                ↑H
+        0                 3                 6
+Active range: indices 0..6
+```
 
-3) low=4, high=4 → mid=4
-   A[4]=16 = target → FOUND at index 4
+*Step 2*
+
+```
+low = 4, high = 6
+mid = (4+6)//2 = 5
+A[5] = 23 > 16  →  target is to the LEFT   →  new high = mid - 1 = 4
+
+A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i =    0     1     2     3     4     5     6
+                              ↑L      ↑M      ↑H
+                              4       5       6
+Active range: indices 4..6
 ```
 
+*Step 3*
+
+```
+low = 4, high = 4
+mid = 4
+A[4] = 16 == target ✅
+
+A = [  2 ][  5 ][  8 ][ 12 ][ 16 ][ 23 ][ 38 ]
+i =    0     1     2     3     4     5     6
+                              ↑LMH
+                              4
+Active range: indices 4..4
+```
+
+FOUND at index 4
+
 * Requires a sorted array (assume ascending here).
 * Time: O(log n); Space: O(1) iterative.
 * Returns any one matching index by default; “first/last occurrence” is a small, common refinement.
@@ -253,25 +325,37 @@ $$
 
 **How it works**
 
+We divide the array into **three parts** using two midpoints `m1` and `m2`.
+
+* If `target < A[m1]` → search $[low .. m1-1]$
+* Else if `target > A[m2]` → search $[m2+1 .. high]$
+* Else → search $[m1+1 .. m2-1]$
+
 ```
-Sorted A: [ 1 ][ 4 ][ 7 ][ 9 ][12 ][15 ]
-Indexes:    0    1    2    3    4    5
+A = [  1 ][  4 ][  7 ][  9 ][ 12 ][ 15 ]
+i =    0     1     2     3     4     5
 Target: 9
+```
 
-1) low=0, high=5
-   m1 = low + (high-low)//3      = 0 + 5//3 = 1
-   m2 = high - (high-low)//3     = 5 - 5//3 = 3
+*Step 1*
 
-   Compare A[m1]=4, A[m2]=9 with target=9:
+```
+low = 0, high = 5
+
+m1 = low + (high - low)//3  = 0 + (5)//3 = 1
+m2 = high - (high - low)//3 = 5 - (5)//3 = 3
 
-   A[m2]=9 = target → FOUND at index 3
+A[m1] = 4
+A[m2] = 9
 
-(If no immediate match:)
-- If target < A[m1], keep [low..m1-1]
-- Else if target > A[m2], keep [m2+1..high]
-- Else keep [m1+1..m2-1] and repeat
+A = [  1 ][  4 ][  7 ][  9 ][ 12 ][ 15 ]
+i =    0     1     2     3     4     5
+        ↑L    ↑m1        ↑m2           ↑H
+        0     1          3             5
 ```
 
+FOUND at index 3
+
 * Also assumes a sorted array.
 * For discrete sorted arrays, it does **not** beat binary search asymptotically; it performs more comparisons per step.
 * Most valuable for searching the extremum of a **unimodal function** on a continuous domain; for arrays, prefer binary search.
@@ -304,21 +388,42 @@ $$
 
 **How it works**
 
+Perfect — that’s a **jump search trace**. Let me reformat and polish it so the steps are crystal clear and the “jump + linear scan” pattern pops visually:
+
+We’re applying **jump search** to find $25$ in
+
+$$
+A = [1, 4, 9, 16, 25, 36, 49]
+$$
+
+with $n=7$, block size $\approx \sqrt{7} \approx 2$, so **jump=2**.
+
+We probe every 2nd index:
+
+* probe = 0 → $A[0] = 1 < 25$ → jump to 2
+* probe = 2 → $A[2] = 9 < 25$ → jump to 4
+* probe = 4 → $A[4] = 25 \geq 25$ → stop
+
+So target is in block $(2..4]$.
+
 ```
-Sorted A: [ 1 ][ 4 ][ 9 ][16 ][25 ][36 ][49 ]
-Indexes:    0    1    2    3    4    5    6
-Target: 25
-Choose block size ≈ √n → here n=7 → jump=2
+[ 1 ][ 4 ] | [ 9 ][16 ] | [25 ][36 ] | [49 ]
+    ^            ^            ^            ^
+   probe=0      probe=2      probe=4      probe=6
+```
+
+Linear Scan in block (indexes 3..4)
 
-Jumps (probe at 0,2,4,6 until A[probe] ≥ target):
-- probe=0 → A[0]=1   (<25) → next probe=2
-- probe=2 → A[2]=9   (<25) → next probe=4
-- probe=4 → A[4]=25 (≥25) → target must be in block (2..4]
+* i = 3 → $A[3] = 16 < 25$
+* i = 4 → $A[4] = 25 = 25$ ✅ FOUND
 
-Linear scan inside last block (indexes 3..4):
-- i=3 → A[3]=16 (<25)
-- i=4 → A[4]=25 (=) FOUND at index 4
 ```
+Block [16 ][25 ]
+       ^    ^
+      i=3  i=4 (found!)
+```
+
+The element $25$ is found at **index 4**.
 
 * Works on sorted arrays; pick jump ≈ √n for good balance.
 * Time: O(√n) comparisons on average; Space: O(1).

From 5ce060c360e4a3baccc29d730ea885bfee1782f7 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 30 Aug 2025 17:45:33 +0200
Subject: [PATCH 20/48] Refine LaTeX formatting and explanations in
 searching.md

Updated LaTeX formatting for hash functions and examples in the searching notes. Improved clarity in the explanation of cuckoo hashing and added details for the KMP algorithm.
---
 notes/searching.md | 642 +++++++++++++++++++++++++++++++++------------
 1 file changed, 473 insertions(+), 169 deletions(-)

diff --git a/notes/searching.md b/notes/searching.md
index 33c9219..8efc9f2 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -859,11 +859,11 @@ Bucket contents after hashing:
 
 $$
 \begin{aligned}
-h(12) &= 12 \bmod 5 = 2 &\;\;\Rightarrow& \;\; \text{bucket 2: } [12] \\[6pt]
-h(22) &= 22 \bmod 5 = 2 &\;\;\Rightarrow& \;\; \text{bucket 2: } [12, 22] \\[6pt]
-h(7)  &= 7  \bmod 5 = 2 &\;\;\Rightarrow& \;\; \text{bucket 2: } [12, 22, 7] \\[6pt]
-h(3)  &= 3  \bmod 5 = 3 &\;\;\Rightarrow& \;\; \text{bucket 3: } [3] \\[6pt]
-h(14) &= 14 \bmod 5 = 4 &\;\;\Rightarrow& \;\; \text{bucket 4: } [14]
+h(12) &= 12 \bmod 5 = 2 \Rightarrow&  \text{bucket 2: } [12] [6pt]
+h(22) &= 22 \bmod 5 = 2 \Rightarrow&  \text{bucket 2: } [12, 22] [6pt]
+h(7)  &= 7  \bmod 5 = 2 \Rightarrow&  \text{bucket 2: } [12, 22, 7] [6pt]
+h(3)  &= 3  \bmod 5 = 3 \Rightarrow&  \text{bucket 3: } [3] [6pt]
+h(14) &= 14 \bmod 5 = 4 \Rightarrow&  \text{bucket 4: } [14]
 \end{aligned}
 $$
 
@@ -874,7 +874,7 @@ $$
 $$
 
 $$
-h(22) = 2 \;\;\Rightarrow\;\; \text{bucket 2} = [12, 22, 7]
+h(22) = 2 \Rightarrow \text{bucket 2} = [12, 22, 7]
 $$
 
 Found at **position 2** in the list.
@@ -890,7 +890,7 @@ $$
 $$
 
 $$
-h(9) = 9 \bmod 5 = 4 \;\;\Rightarrow\;\; \text{bucket 4} = [14]
+h(9) = 9 \bmod 5 = 4 \Rightarrow \text{bucket 4} = [14]
 $$
 
 No match.
@@ -974,58 +974,80 @@ $$
 If insertion causes repeated displacements and eventually loops:
 
 $$
-\text{Cycle detected } \;\;\Rightarrow\;\; \text{rehash with new } h_{1}, h_{2}
+\text{Cycle detected } \Rightarrow \text{rehash with new } h_{1}, h_{2}
 $$
 
 $$
 \text{Output: rebuild / rehash required}
 $$
 
-**How it works**
+**How it works** 
 
-```
-Example hashes:
-h₁(k) = k mod 5
-h₂(k) = 1 + (k mod 4)
+We keep **two hash tables (T₁, T₂)**, each with its own hash function. Every key can live in **exactly one of two possible slots**:
+
+Hash functions:
+
+$$
+h_1(k) = k \bmod 5, \quad h_2(k) = 1 + (k \bmod 4)
+$$
+
+Every key can live in **exactly one of two slots**: $T_1[h_1(k)]$ or $T_2[h_2(k)]$.
+If a slot is occupied, we **evict** the old occupant and reinsert it at its alternate location.
 
-Start empty T₁ and T₂ (indexes 0..4):
+*Start empty:*
 
+```
 T₁: [   ][   ][   ][   ][   ]
 T₂: [   ][   ][   ][   ][   ]
+```
 
-Insert 10:
-- Place at T₁[h₁(10)=0] = 0
+*Insert 10* → goes to $T_1[h_1(10)=0]$:
 
+```
 T₁: [10 ][   ][   ][   ][   ]
 T₂: [   ][   ][   ][   ][   ]
+```
+
+*Insert 15*
 
-Insert 15:
-- T₁[h₁(15)=0] occupied by 10 → cuckoo step:
-  Evict 10; put 15 at T₁[0]
-  Reinsert evicted 10 at its alternate home T₂[h₂(10)=1+(10 mod 4)=3]
+* $T_1[0]$ already has 10 → evict 10
+* Place 15 at $T_1[0]$
+* Reinsert evicted 10 at $T_2[h_2(10)=3]$:
 
+```
 T₁: [15 ][   ][   ][   ][   ]
 T₂: [   ][   ][   ][10 ][   ]
+```
+
+*Insert 20*
 
-Insert 20:
-- T₁[h₁(20)=0] occupied by 15 → evict 15; place 20 at T₁[0]
-  Reinsert 15 at T₂[h₂(15)=1+(15 mod 4)=4]
+* $T_1[0]$ has 15 → evict 15
+* Place 20 at $T_1[0]$
+* Reinsert 15 at $T_2[h_2(15)=4]$:
 
+```
 T₁: [20 ][   ][   ][   ][   ]
 T₂: [   ][   ][   ][10 ][15 ]
+```
+
+*Insert 25*
 
-Insert 25:
-- T₁[h₁(25)=0] occupied by 20 → evict 20; place 25 at T₁[0]
-  Reinsert 20 at T₂[h₂(20)=1+(20 mod 4)=1]
+* $T_1[0]$ has 20 → evict 20
+* Place 25 at $T_1[0]$
+* Reinsert 20 at $T_2[h_2(20)=1]$:
 
+```
 T₁: [25 ][   ][   ][   ][   ]
 T₂: [   ][20 ][   ][10 ][15 ]
-
-Search(15):
-- Check T₁[h₁(15)=0] → 25 ≠ 15
-- Check T₂[h₂(15)=4] → 15 → FOUND
 ```
 
+🔎 *Search(15)*
+
+* $T_1[h_1(15)=0] \to 25 \neq 15$
+* $T_2[h_2(15)=4] \to 15$ ✅ FOUND
+
+**FOUND in T₂ at index 4**
+
 * Lookups probe at **most two places** (with two hashes) → excellent constant factors.
 * Inserts may trigger a chain of evictions; detect cycles and **rehash** with new functions.
 * High load factors achievable (e.g., \~0.5–0.9 depending on variant and number of hashes/tables).
@@ -1090,33 +1112,64 @@ $$
 
 **How it works**
 
+*Initial state* (all zeros):
+
+```
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
+A  =  [0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0]
+```
+
+Insert `"cat"`
+
 ```
-Bit array (m = 16), initially all zeros:
+h1(cat) = 3,  h2(cat) = 7,  h3(cat) = 12
+→ Set bits at 3, 7, 12
 
-Idx:  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
-     [0 0 0 0 0 0 0 0 0 0  0  0  0  0  0  0]
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
+A  =  [0  0  0  1  0  0  0  1  0  0  0  0  1  0  0  0]
+                 ^              ^              ^
+                 3              7              12
+```
 
-INSERT "cat":
-h1(cat)=3, h2(cat)=7, h3(cat)=12  → set those bits to 1
+Insert `"dog"`
 
-     [0 0 0 1 0 0 0 1 0 0  0  0  1  0  0  0]
-                 ^   ^           ^
-                3    7          12
+```
+h1(dog) = 1,  h2(dog) = 7,  h3(dog) = 9
+→ Set bits at 1, 7, 9  (7 already set)
 
-INSERT "dog":
-h1(dog)=1, h2(dog)=7, h3(dog)=9  → set 1,7,9 to 1 (7 already 1)
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
+A  =  [0  1  0  1  0  0  0  1  0  1  0  0  1  0  0  0]
+           ^              ^       ^
+           1              7       9
+```
 
-     [0 1 0 1 0 0 0 1 0 1  0  0  1  0  0  0]
-         ^         ^       ^
+Query `"cow"`
 
-QUERY "cow":
-h1(cow)=1 (bit=1), h2(cow)=3 (bit=1), h3(cow)=6 (bit=0) → at least one zero
-→ Result: DEFINITELY NOT PRESENT
+```
+h1(cow) = 1 → bit[1] = 1
+h2(cow) = 3 → bit[3] = 1
+h3(cow) = 6 → bit[6] = 0  ❌
+
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
+A  =  [0  1  0  1  0  0  0  1  0  1  0  0  1  0  0  0]
+           ✓              ✓     ✗
+```
+
+At least one zero → **DEFINITELY NOT PRESENT**
+
+Query `"eel"`
 
-QUERY "eel":
-h1(eel)=7 (1), h2(eel)=9 (1), h3(eel)=12 (1) → all ones
-→ Result: MAYBE PRESENT (could be a FALSE POSITIVE)
 ```
+h1(eel) = 7 → bit[7] = 1
+h2(eel) = 9 → bit[9] = 1
+h3(eel) = 12 → bit[12] = 1
+
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
+A  =  [0  1  0  1  0  0  0  1  0  1  0  0  1  0  0  0]
+                             ✓     ✓        ✓
+```
+
+All ones → **MAYBE PRESENT** (could be a **false positive**)
 
 * Answers: **maybe present** / **definitely not present**; never false negatives (without deletions).
 * False-positive rate is tunable via bit-array size **m**, number of hashes **k**, and items **n**; more space & good **k** → lower FPR.
@@ -1133,8 +1186,8 @@ Bloom filter variant that keeps a small counter per bit so you can **delete** by
 *Setup*
 
 $$
-m = 12 \;\; \text{counters (each 2–4 bits)}, 
-\quad k = 3 \;\; \text{hash functions}
+m = 12  \text{counters (each 2–4 bits)}, 
+\quad k = 3  \text{hash functions}
 $$
 
 Inserted set:
@@ -1183,26 +1236,77 @@ $$
 
 **How it works**
 
+Each cell is a **small counter** (e.g. 4-bits, range 0..15).
+This allows **deletions**: increment on insert, decrement on delete.
+
+Initial state
+
+```
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11
+A  =  [0  0  0  0  0  0  0  0  0  0  0  0]
+```
+
+Insert `"alpha"`
+
+```
+Hashes: {2, 5, 9}
+→ Increment those counters
+
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11
+A  =  [0  0  1  0  0  1  0  0  0  1  0  0]
+             ↑        ↑              ↑
+             2        5              9
+```
+
+Insert `"beta"`
+
 ```
-Counters (not bits). Each cell stores a small integer (0..15 if 4-bit).
+Hashes: {3, 5, 11}
+→ Increment those counters
 
-Idx:  0 1 2 3 4 5 6 7 8 9 10 11
-     [0 0 0 0 0 0 0 0 0 0  0  0]
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11
+A  =  [0  0  1  1  0  2  0  0  0  1  0  1]
+                ↑        ↑              ↑
+                3        5              11
+```
 
-INSERT "alpha": h: {2, 5, 9} → increment those counters
-     [0 0 1 0 0 1 0 0 0 1  0  0]
+Lookup `"beta"`
 
-INSERT "beta":  h: {3, 5, 11} → increment
-     [0 0 1 1 0 2 0 0 0 1  0  1]
+```
+Hashes: {3, 5, 11}
+Counters = {1, 2, 1} → all > 0
+→ Result: MAYBE PRESENT
 
-LOOKUP "beta": counters at {3,5,11} = {1,2,1} > 0 → MAYBE PRESENT
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11
+A  =  [0  0  1  1  0  2  0  0  0  1  0  1]
+                ✓        ✓              ✓
+```
 
-DELETE "alpha": decrement {2,5,9}
-     [0 0 0 1 0 1 0 0 0 0  0  1]
+Delete `"alpha"`
 
-LOOKUP "alpha": counters at {2,5,9} = {0,1,0}
-→ has a zero → DEFINITELY NOT PRESENT
 ```
+Hashes: {2, 5, 9}
+→ Decrement those counters
+
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11
+A  =  [0  0  0  1  0  1  0  0  0  0  0  1]
+             ↓        ↓              ↓
+             2        5              9
+```
+
+Lookup `"alpha"`
+
+```
+Hashes: {2, 5, 9}
+Counters = {0, 1, 0}
+→ At least one zero
+→ Result: DEFINITELY NOT PRESENT
+
+Idx:   0  1  2  3  4  5  6  7  8  9 10 11
+A  =  [0  0  0  1  0  1  0  0  0  0  0  1]
+             ✗        ✓              ✗
+```
+
 
 * Supports **deletion** by decrementing counters; insertion increments.
 * Still probabilistic: may return false positives; avoids false negatives **if counters never underflow** and hashes are consistent.
@@ -1218,7 +1322,7 @@ Hash-table–style filter that stores short **fingerprints** in two possible buc
 *Setup*
 
 $$
-b = 8 \;\; \text{buckets}, 
+b = 8  \text{buckets}, 
 \quad \text{bucket size} = 2, 
 \quad \text{fingerprint size} = 8 \; \text{bits}
 $$
@@ -1269,45 +1373,78 @@ $$
 
 **How it works**
 
-```
-Each key x → short fingerprint f = FP(x)
+Each key `x` → short **fingerprint** `f = FP(x)`
 Two candidate buckets:
-i1 = H(x) mod b
-i2 = i1 XOR H(f) mod b   (so moving f between i1 and i2 preserves alternation)
 
-Buckets (capacity 2 each), showing fingerprints as hex bytes:
+* `i1 = H(x) mod b`
+* `i2 = i1 XOR H(f) mod b`
+  (`f` can be stored in either bucket; moving between buckets preserves the invariant.)
+
+Start (empty)
+
+```
+[0]: [ -- , -- ]   [1]: [ -- , -- ]   [2]: [ -- , -- ]   [3]: [ -- , -- ]
+[4]: [ -- , -- ]   [5]: [ -- , -- ]   [6]: [ -- , -- ]   [7]: [ -- , -- ]
+```
+
+Insert `"cat"`
 
-Start (empty):
-[0]: [  -- , -- ]   [1]: [ -- , -- ]   [2]: [ -- , -- ]   [3]: [ -- , -- ]
-[4]: [  -- , -- ]   [5]: [ -- , -- ]   [6]: [ -- , -- ]   [7]: [ -- , -- ]
+```
+f = 0xA7
+i1 = 1
+i2 = 1 XOR H(0xA7) = 5
 
-INSERT "cat": f=0xA7, i1=1, i2=1 XOR H(0xA7)=5
-- Bucket 1 has space → place 0xA7 in [1]
+Bucket 1 has free slot → place 0xA7 in [1]
 
 [1]: [ A7 , -- ]
+```
+
+Insert `"dog"`
+
+```
+f = 0x3C
+i1 = 5
+i2 = 5 XOR H(0x3C) = 2
+
+Bucket 5 has free slot → place 0x3C in [5]
 
-INSERT "dog": f=0x3C, i1=5, i2=5 XOR H(0x3C)=2
-- Bucket 5 has space → place 0x3C in [5]
+[1]: [ A7 , -- ]        [5]: [ 3C , -- ]
+```
 
-[5]: [ 3C , -- ]
+Insert `"eel"`
 
-INSERT "eel": f=0xD2, i1=1, i2=1 XOR H(0xD2)=4
-- Bucket 1 has one free slot → place 0xD2 in [1]
+```
+f = 0xD2
+i1 = 1
+i2 = 1 XOR H(0xD2) = 4
 
-[1]: [ A7 , D2 ]
+Bucket 1 has one free slot → place 0xD2 in [1]
 
-LOOKUP "cat":
-- Compute f=0xA7, check buckets 1 and 5 → found in bucket 1 → MAYBE PRESENT
+[1]: [ A7 , D2 ]        [5]: [ 3C , -- ]
+```
 
-LOOKUP "fox":
-- Compute f=0x9B, buckets say 0 and 7 → fingerprint not in [0] or [7]
-→ DEFINITELY NOT PRESENT
+Lookup `"cat"`
 
-If an insertion finds both buckets full:
-- Evict one resident fingerprint (“cuckoo kick”), move it to its alternate bucket,
-  possibly triggering a chain; if a loop is detected, resize/rehash.
+```
+f = 0xA7
+Buckets: i1 = 1, i2 = 5
+Check: bucket[1] has A7 → found
 ```
 
+Result: MAYBE PRESENT
+
+Lookup `"fox"`
+
+```
+f = 0x9B
+i1 = 0
+i2 = 0 XOR H(0x9B) = 7
+
+Check buckets 0 and 7 → fingerprint not found
+```
+
+Result: DEFINITELY NOT PRESENT
+
 * Stores **fingerprints**, not full keys; answers **maybe present** / **definitely not present**.
 * Supports **deletion** by removing a matching fingerprint from either bucket.
 * Very high load factors (often 90%+ with small buckets) and excellent cache locality.
@@ -1336,7 +1473,7 @@ $$
 $$
 
 $$
-\text{Output: matches at indices } \; 0 \;\; \text{and} \;\; 7
+\text{Output: matches at indices } \; 0  \text{and}  7
 $$
 
 *Example 2*
@@ -1352,43 +1489,90 @@ $$
 
 **How it works**
 
+*Text* (length 11):
+
+```
+Text:   a b r a c a d a b r  a
+Idx:    0 1 2 3 4 5 6 7 8 9 10
+```
+
+*Pattern* (length 4):
+
+```
+Pattern: a b r a
+```
+
+*Shift 0*
+
+```
+Text:    a b r a
+Pattern: a b r a
+```
+
+✅ All match → **REPORT at index 0**
+
+*Shift 1*
+
+```
+Text:     b r a c
+Pattern:  a b r a
 ```
-Text (index): 0 1 2 3 4 5 6 7 8 9 10
-              a b r a c a d a b r  a
-Pattern:      a b r a
 
-Shift 0:
-a b r a
-a b r a   ← all match → REPORT 0
+❌ Mismatch at first char → advance
 
-Shift 1:
-  a b r a
-  b r a c ← mismatch at first char → advance by 1
+*Shift 2*
 
-Shift 2:
-    a b r a
-    r a c a ← mismatch → advance
+```
+Text:      r a c a
+Pattern:   a b r a
+```
 
-Shift 3:
-      a b r a
-      a c a d ← mismatch → advance
+❌ Mismatch → advance
 
-Shift 4:
-        a b r a
-        c a d a ← mismatch → advance
+*Shift 3*
+
+```
+Text:       a c a d
+Pattern:    a b r a
+```
 
-Shift 5:
-          a b r a
-          a d a b ← mismatch → advance
+❌ Mismatch → advance
 
-Shift 6:
-            a b r a
-            d a b r ← mismatch → advance
+*Shift 4*
 
-Shift 7:
-              a b r a
-              a b r a ← all match → REPORT 7
 ```
+Text:        c a d a
+Pattern:     a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 5*
+
+```
+Text:         a d a b
+Pattern:      a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 6*
+
+```
+Text:          d a b r
+Pattern:       a b r a
+```
+
+❌ Mismatch → advance
+
+*Shift 7*
+
+```
+Text:           a b r a
+Pattern:        a b r a
+```
+
+✅ All match → **REPORT at index 7**
 
 * Works anywhere; no preprocessing.
 * Time: worst/average **O(n·m)** (text length n, pattern length m).
@@ -1424,33 +1608,79 @@ $$
 $$
 
 **How it works**
+
+We want to find the pattern `"ababd"` in the text `"ababcabca babab d"`.
+
+*1) Precompute LPS (Longest Proper Prefix that is also a Suffix)*
+
+Pattern:
+
+```
+a   b   a   b   d
+0   1   2   3   4   ← index
 ```
-1) Precompute LPS (Longest Proper Prefix that is also Suffix) for the pattern.
 
-Pattern:  a  b  a  b  d
-Index:    0  1  2  3  4
-LPS:      0  0  1  2  0
+LPS array:
 
-Meaning: at each position, how far can we "fall back" within the pattern itself
-to avoid rechecking text characters.
+```
+0   0   1   2   0
+```
 
-2) Scan the text with two pointers i (text), j (pattern):
+Meaning:
 
-Text:    a b a b c a b c a b a b a b d
-Index:   0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
+* At each position, how many chars can we “fall back” within the pattern itself if a mismatch happens.
+* Example: at index 3 (pattern `"abab"`), LPS=2 means if mismatch occurs, restart comparison from `"ab"` inside the pattern.
+
+*2) Scan Text with Two Pointers*
+
+* `i` = text index
+* `j` = pattern index
+
+Text:
+
+```
+a b a b c a b c a b a b a b d
+0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
 Pattern: a b a b d
+```
+
+*Step A: Initial matches*
+
+```
+i=0..3: "abab" matched → j=4 points to 'd'
+```
+
+*Step B: Mismatch at i=4*
+
+```
+text[i=4] = 'c'
+pattern[j=4] = 'd' → mismatch
+```
+
+Instead of restarting, use LPS:
+
+```
+j = LPS[j-1] = LPS[3] = 2
+```
 
-Walkthrough (only key steps shown):
+So pattern jumps back to `"ab"` (no wasted text comparisons).
+i stays at 4.
 
-- i=0..3 match "abab" (j=4 points to 'd'), then
-  i=4 is 'c' vs pattern[j]='d' → mismatch
-  → set j = LPS[j-1] = LPS[3] = 2  (jump pattern back to "ab")
-  (i stays at 4; we do NOT recheck earlier text chars)
+*Step C: Continue scanning*
 
-- Continue matching; eventually at i=14, j advances to 5 (pattern length)
-  → FULL MATCH ends at i=14 → start index = 14 - 5 + 1 = 10 → REPORT 10
+The algorithm keeps moving forward, reusing LPS whenever mismatches occur.
+
+*Step D: Full match found*
+
+At `i=14`, j advances to 5 (pattern length).
+
+```
+→ FULL MATCH found!
+Start index = i - m + 1 = 14 - 5 + 1 = 10
 ```
 
+✅ Pattern `"ababd"` occurs in the text starting at **index 10**.
+
 * Time: **O(n + m)** (preprocessing + scan).
 * Space: **O(m)** for LPS table.
 * Never moves i backward; avoids redundant comparisons.
@@ -1485,32 +1715,57 @@ $$
 \text{Output: match at index } 15
 $$
 
-
 **How it works**
 
+* Align the pattern under the text.
+* Compare **right → left**.
+* On mismatch, shift the pattern by the **max** of:
+* **Bad-character rule**: align the mismatched text char with its last occurrence in the pattern (or skip it if absent).
+* **Good-suffix rule**: if a suffix matched, align another occurrence of it (or a prefix).
+
+*Text* (with spaces shown as `_`):
+
+```
+0        10        20        30
+H E R E _ I S _ A _ S I M P L E _ E X A M P L E
 ```
-Idea: align pattern under text, compare RIGHT→LEFT.
-On mismatch, shift by the MAX of:
-  - Bad-character shift: move so the mismatching text char lines up with its last
-    occurrence in the pattern (or skip past if absent).
-  - Good-suffix shift: if a suffix matched, align another occurrence of that suffix
-    (or a prefix) with the text.
 
-Example (bad-character only shown for brevity):
-Text (0..):  H E R E _ I S _ A _ S I M P L E _ E X A M P L E
-Pattern:                       E X A M P L E
-                               ↑ compare from here (rightmost)
+**Pattern**: `"EXAMPLE"` (length = 7)
 
-1) Align at text index 10..16 ("SIMPLE"):
-   compare L→R? No, BM compares R→L:
-   E vs E (ok), L vs L (ok), P vs P (ok), M vs M (ok), A vs A (ok), X vs I (mismatch)
-   Bad char = 'I' in text; last 'I' in pattern? none → shift pattern PAST 'I'
+*Step 1: Align pattern at text\[10..16] = `"SIMPLE"`*
 
-2) After shifts, eventually align under "... E X A M P L E":
-   Compare from right:
-   E=E, L=L, P=P, M=M, A=A, X=X, E=E → FULL MATCH at index 17
+```
+Text:    ... S I M P L E ...
+Pattern:       E X A M P L E
+               ↑ (start comparing right → left)
 ```
 
+Compare right-to-left:
+
+```
+E=E, L=L, P=P, M=M, A=A, 
+X vs I → mismatch
+```
+
+* Bad character = `I` (from text).
+* Does pattern contain `I`? → ❌ no.
+* → Shift pattern **past `I`**.
+
+*Step 2: Shift until pattern under `"EXAMPLE"`*
+
+```
+Text:    ... E X A M P L E
+Pattern:     E X A M P L E
+```
+
+Compare right-to-left:
+
+```
+E=E, L=L, P=P, M=M, A=A, X=X, E=E
+```
+
+✅ **Full match** found at **index 17**.
+
 * Average case sublinear (often skips large chunks of text).
 * Worst case can be **O(n·m)**; with both rules + Galil’s optimization, comparisons can be bounded **O(n + m)**.
 * Space: **O(σ + m)** for tables (σ = alphabet size).
@@ -1545,34 +1800,83 @@ $$
 \text{Output: no match}
 $$
 
-
 **How it works**
 
+We’ll use the classic choices:
+
+* Base **B = 256**
+* Modulus **M = 101** (prime)
+* Character value `val(c) = ASCII(c)` (e.g., `A=65, B=66, C=67, D=68`)
+* Pattern **P = "ABC"** (length **m = 3**)
+* Text **T = "ABCDABCABCD"** (length 11)
+
+```
+Text:    A B C D A B C A B C D
+Index:   0 1 2 3 4 5 6 7 8 9 10
+Pattern: A B C   (m = 3)
+```
+
+*Precompute*
+
+```
+pow = B^(m-1) mod M = 256^2 mod 101 = 88
+HP = H(P) = H("ABC")
 ```
-Pick a base B and modulus M. Compute:
-- pattern hash H(P)
-- rolling window hash H(T[i..i+m-1]) for each window of length m
 
-Example windows (conceptual; showing only positions, not numbers):
+Start `h=0`, then for each char `h = (B*h + val) mod M`:
+
+* After 'A': `(256*0 + 65) % 101 = 65`
+* After 'B': `(256*65 + 66) % 101 = 41`
+* After 'C': `(256*41 + 67) % 101 = 59`
+
+So **HP = 59**.
+
+*Rolling all windows*
+
+Initial window `T[0..2]="ABC"`:
+
+`h0 = 59` (matches HP → verify chars → ✅ match at 0)
+
+For rolling:
 
-Text:   A B C D A B C A B C D
-Index:  0 1 2 3 4 5 6 7 8 9 10
-Pat:    A B C      (m = 3)
+`h_next = ( B * (h_curr − val(left) * pow) + val(new) ) mod M`
 
-Windows & hashes:
-[0..2] ABC → hash h0
-[1..3] BCD → hash h1  (derived from h0 by removing 'A', adding 'D')
-[2..4] CDA → hash h2
-[3..5] DAB → hash h3
-[4..6] ABC → hash h4 (equals H(P) → verify chars → MATCH at 4)
-[5..7] BCA → hash h5
-[6..8] CAB → hash h6
-[7..9] ABC → hash h7 (equals H(P) → verify → MATCH at 7)
+(If the inner term is negative, add `M` before multiplying.)
+
+*First two rolls*
+
+From $[0..2]$ "ABC" $(h0=59)$ → $[1..3]$ "BCD":
+
+```
+left='A'(65), new='D'(68)
+inner = (59 − 65*88) mod 101 = (59 − 5720) mod 101 = 96
+h1 = (256*96 + 68) mod 101 = 24644 mod 101 = 0
+```
+
+From $[3..5]$ "DAB" $(h3=66)$ → $[4..6]$ "ABC":
 
-Rolling update (conceptually):
-h_next = (B*(h_curr - value(left_char)*B^(m-1)) + value(new_char)) mod M
-Only on hash equality do we compare characters to avoid false positives.
 ```
+left='D'(68), new='C'(67)
+inner = (66 − 68*88) mod 101 = (66 − 5984) mod 101 = 41
+h4 = (256*41 + 67) mod 101 = 10563 mod 101 = 59  (= HP)
+```
+
+*All windows (start index s)*
+
+```
+s   window   hash   =HP?
+0   ABC      59     ✓  → verify → ✅ MATCH at 0
+1   BCD       0
+2   CDA      38
+3   DAB      66
+4   ABC      59     ✓  → verify → ✅ MATCH at 4
+5   BCA      98
+6   CAB      79
+7   ABC      59     ✓  → verify → ✅ MATCH at 7
+8   BCD       0
+```
+
+Matches at indices: **0, 4, 7**.
 
 * Expected time **O(n + m)** with a good modulus and low collision rate; worst case **O(n·m)** if many collisions.
 * Space: **O(1)** beyond the text/pattern and precomputed powers.

From 5c9be6c8e553bd500f1fcfb3984838908001970c Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 30 Aug 2025 23:28:29 +0200
Subject: [PATCH 21/48] Update searching.md

---
 notes/searching.md | 447 ++++++++++++++++++++++++++++++++++-----------
 1 file changed, 345 insertions(+), 102 deletions(-)

diff --git a/notes/searching.md b/notes/searching.md
index 8efc9f2..5d4c6d1 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -27,7 +27,7 @@ $$
 $$
 
 $$
-\text{Output: } \text{index} = 0 \; (\text{first match})
+\text{Output: } \text{index} = 0  (\text{first match})
 $$
 
 *Example 3*
@@ -133,7 +133,7 @@ $$
 $$
 
 $$
-\text{Output: } \text{not found } \; (\text{only the sentinel matched})
+\text{Output: } \text{not found }  (\text{only the sentinel matched})
 $$
 
 **How it works**
@@ -459,21 +459,84 @@ $$
 **How it works**
 
 ```
-Sorted A: [ 2 ][ 3 ][ 5 ][ 7 ][11 ][13 ][17 ][19 ][23 ]
-Indexes:    0    1    2    3    4    5    6    7    8
-Target: 19
+A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i =    0     1     2     3    4     5     6     7     8
+Target = 19
+```
+
+Find range by exponential jumps
+
+*Start* at `i=1`, double each step until `A[i] ≥ target` (or end).
+
+*Jump 1:* `i=1`
+
+```
+A[i]=3 ≤ 19  → continue
+A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i =    0     1     2     3    4     5     6     7     8
+             ↑
+```
+
+*Jump 2:* `i=2`
+
+```
+A[i]=5 ≤ 19  → continue
+A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i =    0     1     2     3    4     5     6     7     8
+                   ↑
+```
 
-1) Find range by exponential jumps (check A[1], A[2], A[4], A[8], ...):
-   - A[1]=3  ≤ 19
-   - A[2]=5  ≤ 19
-   - A[4]=11 ≤ 19
-   - A[8]=23 > 19 → stop; range is (prev_power_of_two..8] → (4..8]
+*Jump 3:* `i=4`
 
-2) Do binary search on A[5..8]:
-   Subarray: [13 ][17 ][19 ][23 ]
-   Indices:    5    6    7    8
-   Binary search finds A[7]=19 → FOUND at index 7
 ```
+A[i]=11 ≤ 19 → continue
+A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i =    0     1     2     3    4     5     6     7     8
+                               ↑
+```
+
+*Jump 4:* `i=8`
+
+```
+A[i]=23 > 19  → stop
+Range is (previous power of two .. i] = (4 .. 8] → search indices 5..8
+A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i =    0     1     2     3    4     5     6     7     8
+                                                         ↑
+```
+
+*Range for binary search:* `low=5, high=8`.
+
+Binary search on $A[5..8]$
+
+```
+Subarray: [ 13 ][ 17 ][ 19 ][ 23 ]
+Indices :    5     6     7     8
+```
+
+*Step 1*
+
+```
+low=5, high=8 → mid=(5+8)//2=6
+A[6]=17 < 19 → move right → low=7
+A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i =    0     1     2     3    4     5     6     7     8
+                                    ↑L      ↑M             ↑H
+                                    5       6              8
+```
+
+*Step 2*
+
+```
+low=7, high=8 → mid=(7+8)//2=7
+A[7]=19 == target ✅ → FOUND
+A = [  2 ][  3 ][  5 ][  7 ][ 11 ][ 13 ][ 17 ][ 19 ][ 23 ]
+i =    0     1     2     3    4     5     6     7     8
+                                           ↑L M H
+                                              7
+```
+
+Found at **index 7**.
 
 * Great when the target is likely to be near the beginning or when the array is **unbounded**/**stream-like** but sorted (you can probe indices safely).
 * Time: O(log p) to find the range where p is the final bound, plus O(log p) for binary search → overall O(log p).
@@ -493,7 +556,7 @@ $$
 $$
 
 $$
-\text{Output: } \text{not found } \; (\text{probes near indices } 4\text{–}5)
+\text{Output: } \text{not found }  (\text{probes near indices } 4\text{–}5)
 $$
 
 *Example 2*
@@ -513,31 +576,57 @@ $$
 $$
 
 $$
-\text{Output: } \text{not found } \; (\text{bad distribution for interpolation})
+\text{Output: } \text{not found }  (\text{bad distribution for interpolation})
 $$
 
 **How it works**
 
-```
-Assumes A is sorted and values are roughly uniform.
+* Guard against division by zero: if `A[high] == A[low]`, stop (or binary-search fallback).
+* Clamp the computed `pos` to `[low, high]` before probing.
+* Works best when values are **uniformly distributed**; otherwise it can degrade toward linear time.
+* Assumes `A` is sorted and values are uniform.
 
-Idea: "Guess" the likely index by linearly interpolating the target’s value
-between A[low] and A[high]:
+Probe formula:
 
-Estimated position:
+```
 pos ≈ low + (high - low) * (target - A[low]) / (A[high] - A[low])
+```
 
-Example:
-A = [10, 20, 30, 40, 50, 60, 70], target = 45
+Let say we have following array and target:
+
+```
+A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
+i =    0     1     2     3     4     5     6
+target = 45
+```
+
+*Step 1 — initial probe*
+
+```
 low=0 (A[0]=10), high=6 (A[6]=70)
 
-pos ≈ 0 + (6-0) * (45-10)/(70-10) = 6 * 35/60 ≈ 3.5 → probe index 3 or 4
+pos ≈ 0 + (6-0) * (45-10)/(70-10)
+    ≈ 6 * 35/60
+    ≈ 3.5  → probe near 3.5
+
+A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
+i =    0     1     2     3     4     5     6
+      ↑L                                   ↑H
+            ↑pos≈3.5 → choose ⌊pos⌋=3 (or ⌈pos⌉=4)
+```
+
+Probe **index 3**: `A[3]=40 < 45` → set `low = 3 + 1 = 4`
+
+*Step 2 — after moving low*
 
-Probe at 3: A[3]=40  (<45) → new low=4
-Probe at 4: A[4]=50  (>45) → new high=3
-Now low>high → not found
+```
+A = [ 10 ][ 20 ][ 30 ][ 40 ][ 50 ][ 60 ][ 70 ]
+i =    0     1     2     3     4     5     6
+                                ↑L         ↑H
 ```
 
+At this point, an **early-stop check** already tells us `target (45) < A[low] (50)` → cannot exist in `A[4..6]` → **not found**.
+
 * Best on **uniformly distributed** sorted data; expected time O(log log n).
 * Worst case can degrade to O(n), especially on skewed or clustered values.
 * Space: O(1).
@@ -583,22 +672,76 @@ $$
 **How it works**
 
 ```
-Concept:
-key  --hash-->  index in array  --search/compare-->  match?
++-----+      hash       +-----------------+      search/compare      +--------+
+| key | --------------> | index in array  | ----------------------> | match? |
++-----+                  +-----------------+                         +--------+
+```
 
+* With chaining, the “collision path” is the **list inside one bucket**.
+* With linear probing, the “collision path” is the **probe sequence** across buckets (3 → 4 → 5 → …).
+* Both keep your original flow: hash → inspect bucket (and collision path) → match?
+
+```
 Array (buckets/indexes 0..6):
-Idx:    0     1     2     3     4     5     6
-      [   ][   ][   ][   ][   ][   ][   ]
 
-Example mapping with h(k)=k mod 7, stored keys {10, 24, 31}:
+Idx:   0     1     2     3     4     5     6
+      +---+-----+-----+-----+-----+-----+-----+
+      |   |     |     |     |     |     |     |
+      +---+-----+-----+-----+-----+-----+-----+
+```
+
+**Example mapping with** `h(k) = k mod 7`, **stored keys** `{10, 24, 31}` all hash to index `3`.
+
+*Strategy A — Separate Chaining (linked list per bucket)*
+
+Insertions
+
+```
 10 -> 3
-24 -> 3   (collides with 10; resolved by the chosen strategy)
-31 -> 3   (collides again)
+24 -> 3   (collides with 10; append to bucket[3] list)
+31 -> 3   (collides again; append to bucket[3] list)
+
+Idx:   0     1     2     3     4     5     6
+      +---+-----+-----+-----+-----+-----+-----+
+      |   |     |     | •   |     |     |     |
+      +---+-----+-----+-----+-----+-----+-----+
+
+bucket[3] chain:  [10] → [24] → [31] → ∅
+```
 
-Search(24):
+*Search(24)*
+
+```
 1) Compute index = h(24) = 3
-2) Inspect bucket 3 (and possibly its collision path)
-3) If 24 is found along that path → found; otherwise → not found
+2) Inspect bucket 3's chain:
+      [10]  →  [24]  →  [31]
+               ↑ found here
+3) Return FOUND (bucket 3)
+```
+
+*Strategy B — Open Addressing (Linear Probing)*
+
+Insertions
+
+```
+10 -> 3                    place at 3
+24 -> 3  (occupied)  →  probe 4  → place at 4
+31 -> 3  (occ) → 4 (occ) → probe 5 → place at 5
+
+Idx:   0     1     2     3     4     5     6
+      +---+-----+-----+-----+-----+-----+-----+
+      |   |     |     | 10  | 24  | 31  |     |
+      +---+-----+-----+-----+-----+-----+-----+
+```
+
+*Search(24)*
+
+```
+1) Compute index = h(24) = 3
+2) Probe sequence:
+      3: 10 ≠ 24  → continue
+      4: 24 = target  → FOUND at index 4
+   (If not found, continue probing until an empty slot or wrap limit.)
 ```
 
 * Quality hash + low load factor (α = n/m) ⇒ expected O(1) search/insert/delete.
@@ -638,27 +781,52 @@ $$
 
 **How it works**
 
+*Hash function:*
+
+```
+h(k) = k mod 10
+Probe sequence: i, i+1, i+2, ... (wrap around)
 ```
-h(k) = k mod 10, probe sequence: i, i+1, i+2, ... (wrap around)
 
-Insertions already done:
-12 -> h=2 → put at 2
-22 -> h=2 (occupied) → try 3 → put at 3
-32 -> h=2 (occupied), 3 (occupied) → try 4 → put at 4
+*Insertions*
 
-Array:
-Idx:  0  1  2   3   4  5  6  7  8  9
-     [ ][ ][12][22][32][ ][ ][ ][ ][ ]
+* Insert 12 → `h(12)=2` → place at index 2
+* Insert 22 → `h(22)=2` occupied → probe 3 → place at 3
+* Insert 32 → `h(32)=2` occupied → probe 3 (occupied) → probe 4 → place at 4
 
-Search(22):
-- Start at h(22)=2 → 12 ≠ 22
-- Next 3 → 22 = 22 → FOUND at index 3
+Resulting table (indexes 0..9):
+
+```
+Index:   0   1   2    3    4   5   6   7   8   9
+        +---+---+----+----+----+---+---+---+---+---+
+Value:  |   |   | 12 | 22 | 32 |   |   |   |   |   |
+        +---+---+----+----+----+---+---+---+---+---+
+```
+
+*Search(22)*
+
+* Start at `h(22)=2`
+* index 2 → 12 ≠ 22 → probe →
+* index 3 → 22 ✅ FOUND
+
+Path followed:
 
-Search(42):
-- Start at 2 → 12 ≠ 42
-- 3 → 22 ≠ 42
-- 4 → 32 ≠ 42
-- 5 → empty → stop → NOT FOUND
+```
+2 → 3
+```
+
+*Search(42)*
+
+* Start at `h(42)=2`
+* index 2 → 12 ≠ 42 → probe →
+* index 3 → 22 ≠ 42 → probe →
+* index 4 → 32 ≠ 42 → probe →
+* index 5 → empty slot → stop → ❌ NOT FOUND
+
+Path followed:
+
+```
+2 → 3 → 4 → 5 (∅)
 ```
 
 * Simple and cache-friendly; clusters form (“primary clustering”) which can slow probes.
@@ -673,7 +841,7 @@ Search(42):
 *Example 1*
 
 $$
-m = 11 \; (\text{prime}), 
+m = 11  (\text{prime}), 
 \quad \text{Stored keys: } \{22, 33, 44\}, 
 \quad \text{Target: } 33
 $$
@@ -689,7 +857,7 @@ $$
 *Example 2*
 
 $$
-m = 11 \; (\text{prime}), 
+m = 11  (\text{prime}), 
 \quad \text{Stored keys: } \{22, 33, 44\}, 
 \quad \text{Target: } 55
 $$
@@ -704,29 +872,62 @@ $$
 
 **How it works**
 
+*Hash function:*
+
 ```
 h(k) = k mod 11
-Probe offsets: +1^2, +2^2, +3^2, ... (i.e., +1, +4, +9, +16≡+5, +25≡+3, ... mod 11)
+```
 
-Insert:
-22 -> h=0 → put at 0
-33 -> h=0 (occupied) → 0+1^2=1 → put at 1? (showing a typical sequence)
-(For clarity we'll place 33 at the first free among 0,1,4,9,... Suppose 1 is free.)
-44 -> h=0 (occupied) → try 1 (occupied) → try 4 → put at 4
+*Probe sequence (relative offsets):*
 
-Array (one possible state):
-Idx:  0   1   2  3  4  5  6  7  8  9  10
-     [22][33][ ][ ][44][ ][ ][ ][ ][ ][  ]
+```
++1², +2², +3², ... mod 11
+= +1, +4, +9, +5, +3, +3²… (wrapping around table size)
+```
 
-Search(33):
-- h=0 → 22 ≠ 33
-- 0+1^2=1 → 33 = 33 → FOUND at index 1
+So from `h(k)`, we try slots in this order:
 
-Search(55):
-- h=0 → 22 ≠ 55
-- +1^2=1 → 33 ≠ 55
-- +2^2=4 → 44 ≠ 55
-- +3^2=9 → empty → NOT FOUND
+```
+h, h+1, h+4, h+9, h+5, h+3, ...   (all mod 11)
+```
+
+*Insertions*
+
+* Insert **22** → `h(22)=0` → place at index 0
+* Insert **33** → `h(33)=0` occupied → try `0+1²=1` → index 1 free → place at 1
+* Insert **44** → `h(44)=0` occupied → probe 1 (occupied) → probe `0+4=4` → place at 4
+
+Resulting table:
+
+```
+Idx:   0    1    2   3   4   5  6  7  8  9  10
+      +----+----+---+---+----+---+--+--+--+---+
+Val:  | 22 | 33 |   |   | 44 |   |  |  |  |  |   |
+      +----+----+---+---+----+---+--+--+--+--+---+
+```
+
+*Search(33)*
+
+* Start `h(33)=0` → slot 0 = 22 ≠ 33
+* Probe `0+1²=1` → slot 1 = 33 ✅ FOUND
+
+Path:
+
+```
+0 → 1
+```
+
+*Search(55)*
+
+* Start `h(55)=0` → slot 0 = 22 ≠ 55
+* Probe `0+1²=1` → slot 1 = 33 ≠ 55
+* Probe `0+2²=4` → slot 4 = 44 ≠ 55
+* Probe `0+3²=9` → slot 9 = empty → stop → ❌ NOT FOUND
+
+Path:
+
+```
+0 → 1 → 4 → 9 (∅)
 ```
 
 * Reduces primary clustering but can exhibit **secondary clustering** (keys with same h(k) follow same probe squares).
@@ -759,7 +960,7 @@ m = 11,
 \quad \text{Target: } 33
 $$
 
-* For $k = 33$:
+For $k = 33$:
 
 $$
 h_{1}(33) = 33 \bmod 11 = 0,
@@ -769,9 +970,9 @@ $$
 So probe sequence is
 
 $$
-h(33,0) = 0,\;
-h(33,1) = (0 + 1\cdot 4) \bmod 11 = 4,\;
-h(33,2) = (0 + 2\cdot 4) \bmod 11 = 8,\; \dots
+h(33,0) = 0,
+h(33,1) = (0 + 1\cdot 4) \bmod 11 = 4,
+h(33,2) = (0 + 2\cdot 4) \bmod 11 = 8, \dots
 $$
 
 Since the stored layout places $33$ at index $4$, the search succeeds.
@@ -788,7 +989,7 @@ m = 11,
 \quad \text{Target: } 55
 $$
 
-* For $k = 55$:
+For $k = 55$:
 
 $$
 h_{1}(55) = 55 \bmod 11 = 0,
@@ -798,7 +999,7 @@ $$
 Probing sequence:
 
 $$
-0, \; (0+6)\bmod 11 = 6,\; (0+2\cdot 6)\bmod 11 = 1,\; (0+3\cdot 6)\bmod 11 = 7,\; \dots
+0,  (0+6)\bmod 11 = 6, (0+2\cdot 6)\bmod 11 = 1, (0+3\cdot 6)\bmod 11 = 7, \dots
 $$
 
 No slot matches $55$.
@@ -809,27 +1010,69 @@ $$
 
 **How it works**
 
+We use **two hash functions**:
+
+```
+h₁(k) = k mod m
+h₂(k) = 1 + (k mod 10)
+```
+
+*Probe sequence:*
+
+```
+i, i + h₂, i + 2·h₂, i + 3·h₂, ...  (all mod m)
+```
+
+This ensures fewer clustering issues compared to linear or quadratic probing.
+
+*Insertions (m = 11)*
+
+Insert **22**
+
+* `h₁(22)=0` → place at index 0
+
+Insert **33**
+
+* `h₁(33)=0` (occupied)
+* `h₂(33)=1+(33 mod 10)=4`
+* Probe sequence: 0, 4 → place at index 4
+
+Insert **44**
+
+* `h₁(44)=0` (occupied)
+* `h₂(44)=1+(44 mod 10)=5`
+* Probe sequence: 0, 5 → place at index 5
+
+*Table State*
+
 ```
-Probe sequence: i, i+h₂, i+2·h₂, i+3·h₂, ... (all mod m)
+Idx:   0   1  2  3   4   5  6  7  8  9  10
+      +---+---+---+---+---+---+---+---+---+---+
+Val:  |22 |   |   |   |33 |44 |   |   |   |   |   |
+      +---+---+---+---+---+---+---+---+---+---+---+
+```
+
+*Search(33)*
+
+* Start at `h₁(33)=0` → slot 0 = 22 ≠ 33
+* Next: `0+1·h₂(33)=0+4=4` → slot 4 = 33 ✅ FOUND
+
+Path:
 
-Insert:
-22: h₁=0 → put at 0
-33: h₁=0 (occupied), h₂=1+(33 mod 10)=4
-    Probes: 0, 4 → put at 4
-44: h₁=0 (occupied), h₂=1+(44 mod 10)=5
-    Probes: 0, 5 → put at 5
+```
+0 → 4
+```
 
-Array:
-Idx:  0   1  2  3  4   5  6  7  8  9  10
-     [22][ ][ ][ ][33][44][ ][ ][ ][ ][ ]
+*Search(55)*
 
-Search(33):
-- Start 0 → 22 ≠ 33
-- Next 0+4=4 → 33 → FOUND
+* `h₁(55)=0`, `h₂(55)=1+(55 mod 10)=6`
+* slot 0 = 22 ≠ 55
+* slot 6 = empty → stop → ❌ NOT FOUND
 
-Search(55):
-- h₁=0, h₂=1+(55 mod 10)=6
-- Probes: 0 (22), 6 (empty) → NOT FOUND
+Path:
+
+```
+0 → 6 (∅)
 ```
 
 * Minimizes clustering; probe steps depend on the key.
@@ -1064,8 +1307,8 @@ Space-efficient structure for fast membership tests; answers **“maybe present
 *Setup*
 
 $$
-m = 16 \; \text{bits}, 
-\quad k = 3 \; \text{hash functions } (h_{1}, h_{2}, h_{3})
+m = 16  \text{bits}, 
+\quad k = 3  \text{hash functions } (h_{1}, h_{2}, h_{3})
 $$
 
 Inserted set:
@@ -1193,7 +1436,7 @@ $$
 Inserted set:
 
 $$
-\{\text{"alpha"}, \; \text{"beta"}\}
+\{\text{"alpha"},  \text{"beta"}\}
 $$
 
 Then delete `"alpha"`.
@@ -1324,13 +1567,13 @@ Hash-table–style filter that stores short **fingerprints** in two possible buc
 $$
 b = 8  \text{buckets}, 
 \quad \text{bucket size} = 2, 
-\quad \text{fingerprint size} = 8 \; \text{bits}
+\quad \text{fingerprint size} = 8  \text{bits}
 $$
 
 Inserted set:
 
 $$
-\{\text{"cat"}, \; \text{"dog"}, \; \text{"eel"}\}
+\{\text{"cat"},  \text{"dog"},  \text{"eel"}\}
 $$
 
 Each element is stored as a short fingerprint in one of two candidate buckets.
@@ -1473,7 +1716,7 @@ $$
 $$
 
 $$
-\text{Output: matches at indices } \; 0  \text{and}  7
+\text{Output: matches at indices }  0  \text{and}  7
 $$
 
 *Example 2*
@@ -1484,7 +1727,7 @@ $$
 $$
 
 $$
-\text{Output: matches at indices } \; 0, \; 1, \; 2
+\text{Output: matches at indices }  0,  1,  2
 $$
 
 **How it works**
@@ -1786,7 +2029,7 @@ $$
 $$
 
 $$
-\text{Output: matches at indices } 0, \; 4, \; 7
+\text{Output: matches at indices } 0,  4,  7
 $$
 
 *Example 2*

From b1b83b33cbf8e6bb47df40ff209ca90d6f0cde57 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sat, 30 Aug 2025 22:08:35 +0000
Subject: [PATCH 22/48] temp

---
 notes/greedy_algorithms.md | 101 ++++++++++---
 notes/matrices.md          | 303 ++++++++++++++++++++++++++++++++++++-
 notes/searching.md         |  56 +++----
 3 files changed, 408 insertions(+), 52 deletions(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index 9cae8c4..0db6be9 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -1,4 +1,4 @@
-## What are greedy algorithms?
+## Greedy Algorithms
 
 Greedy methods construct a solution piece by piece, always choosing the currently best-looking option according to a simple rule. The subtlety is not the rule itself but the proof that local optimality extends to global optimality. Two proof tools do most of the work: exchange arguments (you can swap an optimal solution’s first “deviation” back to the greedy choice without harm) and loop invariants (you maintain a statement that pins down exactly what your partial solution guarantees at each step).
 
@@ -93,13 +93,13 @@ The last axiom is the heart. It forbids “dead ends” where a smaller feasible
 You’re standing on square 0 of a line of squares $0,1,\dots,n-1$.
 Each square $i$ tells you how far you’re allowed to jump forward from there: a number $a[i]$. From $i$, you can jump to any square $i+1, i+2, \dots, i+a[i]$. The goal is to decide whether you can ever reach the last square, and, if not, what the furthest square is that you can reach.
 
-#### Example input and the expected output
+**Example inputs and outputs**
 
 Input array: `a = [3, 1, 0, 0, 4, 1]`
 There are 6 squares (0 through 5).
 Correct output: you cannot reach the last square; the furthest you can get is square `3`.
 
-#### A slow but obvious approach
+Baseline (slow)
 
 Think “paint everything I can reach, one wave at a time.”
 
@@ -120,7 +120,7 @@ from 3:  can reach {}      → no change (a[3]=0)
 done:    no new squares → furthest is 3, last is unreachable
 ```
 
-#### A clean, fast greedy scan
+**How it works**
 
 Carry one number as you sweep left to right: `F`, the furthest square you can reach **so far**.
 Rule of thumb:
@@ -148,11 +148,16 @@ i=4: 4 > F  → stuck here
 
 Final state: `F = 3`, which means the furthest reachable square is 3. Since `F < n-1 = 5`, the last square is not reachable.
 
+Summary
+
+* Time: $O(n)$ (single left-to-right pass)
+* Space: $O(1)$
+
 ### Minimum spanning trees
 
 You’ve got a connected weighted graph and you want the cheapest way to connect **all** its vertices without any cycles—that’s a minimum spanning tree (MST). Think “one network of cables that touches every building, with the total cost as small as possible.”
 
-#### Example input → expected output
+**Example inputs and outputs**
 
 Vertices: $V=\{A,B,C,D,E\}$
 
@@ -173,11 +178,11 @@ Total weight $=1+2+3+4=10$.
 
 You can’t do better: any cheaper set of 4 edges would either miss a vertex or create a cycle.
 
-#### A slow, baseline way (what you’d do if time didn’t matter)
+Baseline (slow)
 
 Enumerate every spanning tree and pick the one with the smallest total weight. That’s conceptually simple—“try all combinations of $n-1$ edges that connect everything and have no cycles”—but it explodes combinatorially. Even medium graphs have an astronomical number of spanning trees, so this approach is only good as a thought experiment.
 
-#### Two fast greedy methods that always work
+**How it works**
 
 Both fast methods rely on two facts:
 
@@ -185,6 +190,17 @@ Both fast methods rely on two facts:
 * **Cycle rule (safe to skip):** in any cycle, the most expensive edge is never in an MST. Intuition: if you already have a loop, drop the priciest link and you’ll still be connected but strictly cheaper.
 
 #### Kruskal’s method
+
+**Example inputs and outputs**
+
+Use the same graph as above. A valid MST is
+
+$$
+\{A\!-\!B(1),\ B\!-\!D(2),\ C\!-\!E(3),\ B\!-\!C(4)\}\quad\Rightarrow\quad \text{total} = 10.
+$$
+
+**How it works**
+
 Sort edges from lightest to heaviest; walk down that list and keep an edge if it connects two **different** components. Stop when you have $n-1$ edges.
 
 Sorted edges by weight:
@@ -218,8 +234,19 @@ We’ll keep a running view of the components; initially each vertex is alone.
 Edges kept: $A\!-\!B(1), B\!-\!D(2), C\!-\!E(3), B\!-\!C(4)$.
 Total $=10$. Every later edge would create a cycle and is skipped by the cycle rule.
 
+Complexity
+
+* Time: $O(E \log E)$ to sort edges + near-constant $\alpha(V)$ for DSU unions; often written $O(E \log V)$ since $E\le V^2$.
+* Space: $O(V)$ for disjoint-set structure.
+
 ### Prim's method
 
+**Example inputs and outputs**
+
+Same graph and target: produce any MST of total weight $10$.
+
+**How it works**
+
 Start from any vertex; repeatedly add the lightest edge that leaves the current tree to bring in a new vertex. Stop when all vertices are in.
 
 Let’s start from $A$. The “tree” grows one cheapest boundary edge at a time.
@@ -242,11 +269,16 @@ Edges chosen: exactly the same four as Kruskal, total $=10$.
 
 Why did step 4 grab a weight-3 edge after we already took a 4? Because earlier that 3 wasn’t **available**—it didn’t cross from the tree to the outside until $C$ joined the tree. Prim never regrets earlier picks because of the cut rule: at each moment it adds the cheapest bridge from “inside” to “outside,” and that’s always safe.
 
+Complexity
+
+* Time: $O(E \log V)$ with a binary heap and adjacency lists; $O(E + V\log V)$ with a Fibonacci heap.
+* Space: $O(V)$ for keys/parents and visited set.
+
 ### Shortest paths with non-negative weights
 
 You’ve got a weighted graph and a starting node $s$. Every edge has a cost $\ge 0$. The task is to find the cheapest cost to reach every node from $s$, and a cheapest route for each if you want it.
 
-#### Example input → expected output
+**Example inputs and outputs**
 
 Nodes: $A,B,C,D,E$
 
@@ -267,11 +299,11 @@ Correct shortest-path costs from $A$:
 * $d(D)=4$ via $A\!\to\!B\!\to\!D$
 * $d(E)=4$ via $A\!\to\!B\!\to\!C\!\to\!E$
 
-#### A slow baseline (what you’d do without the greedy insight)
+Baseline (slow)
 
 One safe—but slower—approach is to relax all edges repeatedly until nothing improves. Think of it as “try to shorten paths by one edge at a time, do that $|V|-1$ rounds.” This eventually converges to the true shortest costs, but it touches every edge many times, so its work is about $|V|\cdot|E|$. It also handles negative edges, which is why it has to be cautious and keep looping.
 
-#### The greedy method: Dijkstra’s idea
+**How it works**
 
 Carry two sets and a distance label for each node.
 
@@ -343,18 +375,23 @@ Recovering paths by remembering “who improved whom” gives:
 * $D$ from $B$
 * $E$ from $C$
 
+Complexity
+
+* Time: $O((V+E)\log V)$ with a binary heap (often written $O(E \log V)$ when $E\ge V$).
+* Space: $O(V)$ for distances, parent pointers, and heap entries.
+
 ### Maximum contiguous sum
 
 You’re given a list of numbers laid out in a line. You may pick one **contiguous** block, and you want that block’s sum to be as large as possible.
 
-### Example input and the expected output
+**Example inputs and outputs**
 
 Take $x = [\,2,\,-3,\,4,\,-1,\,2,\,-5,\,3\,]$.
 
 A best block is $[\,4,\,-1,\,2\,]$. Its sum is $5$.
 So the correct output is “maximum sum $=5$” and one optimal segment is positions $3$ through $5$ (1-based).
 
-#### A slow, obvious baseline
+Baseline (slow)
 
 Try every possible block and keep the best total. To sum any block $i..j$ quickly, precompute **prefix sums** $S_0=0$ and $S_j=\sum_{k=1}^j x_k$. Then
 
@@ -364,7 +401,7 @@ $$
 
 Loop over all $j$ and all $i\le j$, compute $S_j-S_{i-1}$, and take the maximum. This is easy to reason about and always correct, but it does $O(n^2)$ block checks.
 
-#### A clean one-pass greedy scan
+**How it works**
 
 Walk left to right once and carry two simple numbers.
 
@@ -434,13 +471,18 @@ When all numbers are negative, the best block is the **least negative single ele
 
 Empty-block conventions matter. If you define the answer to be strictly nonempty, initialize $\text{best}$ with $x_1$ and $E=x_1$ in the incremental form; if you allow empty blocks with sum $0$, initialize $\text{best}=0$ and $M=0$. Either way, the one-pass logic doesn’t change.
 
+Summary
+
+* Time: $O(n)$
+* Space: $O(1)$
+
 ### Scheduling themes
 
 Two everyday scheduling goals keep popping up. One tries to pack as many non-overlapping intervals as possible, like booking the most meetings in a single room. The other tries to keep lateness under control when jobs have deadlines, like finishing homework so the worst overrun is as small as possible. Both have crisp greedy rules, and both are easy to run by hand once you see them.
 
 Imagine you have time intervals on a single line, and you can keep an interval only if it doesn’t overlap anything you already kept. The aim is to keep as many as possible.
 
-#### Example input and the desired output
+**Example inputs and outputs**
 
 Intervals (start, finish):
 
@@ -448,11 +490,11 @@ Intervals (start, finish):
 
 A best answer keeps four intervals, for instance $(1,3),(4,7),(8,10),(10,11)$. I wrote $(10,11)$ for clarity even though the original end was $11$; think half-open $[s,e)$ if you want “touching” to be allowed.
 
-#### A slow baseline
+Baseline (slow)
 
 Try all subsets and keep the largest that has no overlaps. That’s conceptually simple and always correct, but it’s exponential in the number of intervals, which is a non-starter for anything but tiny inputs.
 
-#### The greedy rule
+**How it works**
 
 Sort by finishing time, then walk once from earliest finisher to latest. Keep an interval if its start is at least the end time of the last one you kept. Ending earlier leaves more room for the future, and that is the whole intuition.
 
@@ -487,6 +529,11 @@ ending earlier leaves more open space to the right
 
 Why this works in one sentence: at the first place an optimal schedule would choose a later-finishing interval, swapping in the earlier finisher cannot reduce what still fits afterward, so you can push the optimal schedule to match greedy without losing size.
 
+Complexity
+
+* Time: $O(n \log n)$ to sort by finishing time; $O(n)$ scan.
+* Space: $O(1)$ (beyond input storage).
+
 ### Minimize the maximum lateness
 
 Now think of $n$ jobs, all taking the same amount of time (say one unit). Each job $i$ has a deadline $d_i$. When you run them in some order, the completion time of the $k$-th job is $C_k=k$ (since each takes one unit), and its lateness is
@@ -497,7 +544,7 @@ $$
 
 Negative values mean you finished early; the quantity to control is the worst lateness $L_{\max}=\max_i L_i$. The goal is to order the jobs so $L_{\max}$ is as small as possible.
 
-#### Example input and the desired output
+**Example inputs and outputs**
 
 Jobs and deadlines:
 
@@ -508,11 +555,11 @@ Jobs and deadlines:
 
 An optimal schedule is $J_2,J_4, J_1, J_3$. The maximum lateness there is $0$.
 
-#### A slow baseline
+Baseline (slow)
 
 Try all $n!$ orders, compute every job’s completion time and lateness, and take the order with the smallest $L_{\max}$. This explodes even for modest $n$.
 
-#### The greedy rule
+**How it works**
 
 Order jobs by nondecreasing deadlines (earliest due date first, often called EDD). Fixing any “inversion” where a later deadline comes before an earlier one can only help the maximum lateness, so sorting by deadlines is safe.
 
@@ -552,6 +599,11 @@ late?    0    0    0    0  → max lateness 0
 
 Why this works in one sentence: if two adjacent jobs are out of deadline order, swapping them never increases any completion time relative to its own deadline, and strictly improves at least one, so repeatedly fixing these inversions leads to the sorted-by-deadline order with no worse maximum lateness.
 
+Complexity
+
+* Time: $O(n \log n)$ to sort by deadlines; $O(n)$ evaluation.
+* Space: $O(1)$.
+
 ### Huffman coding
 
 You have symbols that occur with known frequencies $f_i>0$ and $\sum_i f_i=1$. The goal is to assign each symbol a binary codeword so that no codeword is a prefix of another (a prefix code), and the average length
@@ -562,7 +614,7 @@ $$
 
 is as small as possible. Prefix codes exactly correspond to full binary trees whose leaves are the symbols and whose leaf depths are the codeword lengths $L_i$. The Kraft inequality $\sum_i 2^{-L_i}\le 1$ is the feasibility condition; equality holds for full trees.
 
-#### Example input and the target output
+**Example inputs and outputs**
 
 Frequencies:
 
@@ -572,11 +624,11 @@ $$
 
 A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths $L_A,\dots,L_E$, plus a concrete codebook.
 
-### A naive way to think about it
+Baseline (slow)
 
 One conceptual baseline is to enumerate all full binary trees with five labeled leaves and pick the one minimizing $\sum f_i\,L_i$. That is correct but explodes combinatorially as the number of symbols grows. A simpler but usually suboptimal baseline is to give every symbol the same length $\lceil \log_2 5\rceil=3$. That fixed-length code has $\mathbb{E}[L]=3$.
 
-#### The greedy method that is actually optimal
+**How it works**
 
 Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights $p$ and $q$, you create a parent of weight $p+q$. The act of merging adds exactly $p+q$ to the objective $\mathbb{E}[L]$ because every leaf inside those two subtrees becomes one level deeper. Summing over all merges yields the final cost:
 
@@ -641,6 +693,11 @@ One concrete codebook arises by reading left edges as 0 and right edges as 1:
 
 You can verify the prefix property immediately and recompute $\mathbb{E}[L]$ from these lengths to get $2.20$ again.
 
+Complexity
+
+* Time: $O(k \log k)$ using a min-heap over $k$ symbol frequencies.
+* Space: $O(k)$ for the heap and $O(k)$ for the resulting tree.
+
 ### When greedy fails (and how to quantify “not too bad”)
 
 The $0\text{–}1$ knapsack with arbitrary weights defeats the obvious density-based rule. A small, dense item can block space needed for a medium-density item that pairs perfectly with a third, leading to a globally superior pack. Weighted interval scheduling similarly breaks the “earliest finish” rule; taking a long, heavy meeting can beat two short light ones that finish earlier.
diff --git a/notes/matrices.md b/notes/matrices.md
index 72167a3..4aefb08 100644
--- a/notes/matrices.md
+++ b/notes/matrices.md
@@ -1,4 +1,303 @@
-rotations
+## Matrices and 2D Grids
 
+Matrices represent images, game boards, and maps. Many classic problems reduce to transforming matrices, traversing them, or treating grids as graphs for search. This note mirrors the structure used in the Searching notes: each topic includes Example inputs and outputs, How it works, and a compact summary with $O(\cdot)$.
+
+### Conventions
+
+* Rows indexed 0..R−1, columns 0..C−1; cell (r, c).
+* Neighborhoods: 4-dir Δ = {(-1,0),(1,0),(0,-1),(0,1)}, or 8-dir adds diagonals.
+
+---
+
+### Basic Operations (Building Blocks)
+
+#### Transpose
+
+Swap across the main diagonal: $A[r][c] \leftrightarrow A[c][r]$ (square). For non-square, result shape is $C\times R$.
+
+**Example inputs and outputs**
+
+*Example 1 (square)*
+
+$$
+	ext{Input: } A = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+$$
+
+$$
+	ext{Output: } A^{T} = \begin{bmatrix}1 & 4 & 7\\2 & 5 & 8\\3 & 6 & 9\end{bmatrix}
+$$
+
+*Example 2 (rectangular)*
+
+$$
+	ext{Input: } A = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}\ (2\times3)
+$$
+
+$$
+	ext{Output: } A^{T} = \begin{bmatrix}1 & 4\\2 & 5\\3 & 6\end{bmatrix}\ (3\times2)
+$$
+
+**How it works**
+
+Iterate pairs once and swap. For square matrices, can be in-place by visiting only $c>r$.
+
+* Time: $O(R\cdot C)$; Space: $O(1)$ in-place (square), else $O(R\cdot C)$ to allocate.
+
+#### Reverse Rows (Horizontal Flip)
+
+Reverse each row left↔right.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: } \begin{bmatrix}3 & 2 & 1\\6 & 5 & 4\end{bmatrix}
+$$
+
+* Time: $O(R\cdot C)$; Space: $O(1)$.
+
+#### Reverse Columns (Vertical Flip)
+
+Reverse each column top↔bottom.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: } \begin{bmatrix}7 & 8 & 9\\4 & 5 & 6\\1 & 2 & 3\end{bmatrix}
+$$
+
+* Time: $O(R\cdot C)$; Space: $O(1)$.
+
+---
+
+### Rotations (Composed from Basics)
+
+Use transpose + reversals for square in-place rotations; rectangular rotations produce new shape $(R\times C)\to(C\times R)$.
+
+#### 90° Clockwise (CW)
+
+Transpose, then reverse each row.
+
+**Example inputs and outputs**
+
+*Example 1 (3×3)*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: } \begin{bmatrix}7 & 4 & 1\\8 & 5 & 2\\9 & 6 & 3\end{bmatrix}
+$$
+
+*Example 2 (2×3 → 3×2)*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: } \begin{bmatrix}4 & 1\\5 & 2\\6 & 3\end{bmatrix}
+$$
+
+**How it works**
+
+Transpose swaps axes; reversing each row aligns columns to rows of the rotated image.
+
+* Time: $O(R\cdot C)$; Space: $O(1)$ in-place for square, else $O(R\cdot C)$ new.
+
+#### 90° Counterclockwise (CCW)
+
+Transpose, then reverse each column (or reverse rows, then transpose).
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: } \begin{bmatrix}3 & 6 & 9\\2 & 5 & 8\\1 & 4 & 7\end{bmatrix}
+$$
+
+**How it works**
+
+Transpose, then flip vertically to complete the counterclockwise rotation.
+
+* Time: $O(R\cdot C)$; Space: $O(1)$ (square) or $O(R\cdot C)$.
+
+#### 180° Rotation
+
+Equivalent to reversing rows, then reversing columns (or two 90° rotations).
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: } \begin{bmatrix}9 & 8 & 7\\6 & 5 & 4\\3 & 2 & 1\end{bmatrix}
+$$
+
+**How it works**
+
+Horizontal + vertical flips relocate each element to $(R-1-r,\ C-1-c)$.
+
+* Time: $O(R\cdot C)$; Space: $O(1)$ (square) or $O(R\cdot C)$.
+
+#### 270° Rotation
+
+270° CW = 90° CCW; 270° CCW = 90° CW. Reuse the 90° procedures.
+
+#### Layer-by-Layer (Square) 90° CW
+
+Rotate each ring by cycling 4 positions.
+
+**How it works**
+
+For layer $\ell$ with bounds $[\ell..n-1-\ell]$, for each offset move:
+
+```
+top ← left, left ← bottom, bottom ← right, right ← top
+```
+
+* Time: $O(n^{2})$; Space: $O(1)$.
+
+---
+
+### Traversal Patterns
+
+#### Spiral Order
+
+Read outer layer, then shrink bounds.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 2 & 3 & 4\\5 & 6 & 7 & 8\\9 & 10 & 11 & 12\end{bmatrix}
+$$
+
+$$
+	ext{Output sequence: } 1,2,3,4,8,12,11,10,9,5,6,7
+$$
+
+**How it works**
+
+Maintain top, bottom, left, right. Walk edges in order; after each edge, move the corresponding bound inward.
+
+* Time: $O(R\cdot C)$; Space: $O(1)$ beyond output.
+
+#### Diagonal Order (r+c layers)
+
+Visit cells grouped by $s=r+c$; alternate direction per diagonal to keep locality if desired.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Input: } \begin{bmatrix}a & b & c\\d & e & f\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{One order: } a, b,d, e,c, f
+$$
+
+* Time: $O(R\cdot C)$; Space: $O(1)$.
+
+---
+
+### Grids as Graphs
+
+Each cell is a node; edges connect neighboring walkable cells.
+
+#### BFS Shortest Path (Unweighted)
+
+Find the minimum steps from S to T.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Grid (0=open, 1=wall), S=(0,0), T=(2,3)}\\
+\begin{bmatrix}S & 0 & 1 & 0\\0 & 0 & 0 & 0\\1 & 1 & 0 & T\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: distance } = 5
+$$
+
+**How it works**
+
+Push S to a queue, expand in 4-dir layers, track distance/visited; stop when T is dequeued.
+
+* Time: $O(R\cdot C)$; Space: $O(R\cdot C)$.
+
+#### Connected Components (Islands)
+
+Count regions of ‘1’s via DFS/BFS.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Input: } \begin{bmatrix}1 & 1 & 0\\0 & 1 & 0\\0 & 0 & 1\end{bmatrix}
+\quad\Rightarrow\quad
+	ext{Output: } 2\ \text{islands}
+$$
+
+**How it works**
+
+Scan cells; when an unvisited ‘1’ is found, flood it (DFS/BFS) to mark the whole island.
+
+* Time: $O(R\cdot C)$; Space: $O(R\cdot C)$ worst-case.
+
+---
+
+### Backtracking on Grids
+
+#### Word Search (Single Word)
+
+Find a word by moving to adjacent cells (4-dir), using each cell once per path.
+
+**Example inputs and outputs**
+
+*Example*
+
+$$
+	ext{Board: } \begin{bmatrix}A & B & C & E\\S & F & C & S\\A & D & E & E\end{bmatrix},\ \text{Word: } "ABCCED"
+\quad\Rightarrow\quad
+	ext{Output: } \text{true}
+$$
+
+**How it works**
+
+From each starting match, DFS to next char; mark visited (temporarily), backtrack on failure.
+
+* Time: up to $O(R\cdot C\cdot b^{L})$ (branching $b\in[3,4]$, word length $L$); Space: $O(L)$.
+
+Pruning: early letter mismatch; frequency precheck; prefix trie when searching many words.
+
+#### Crossword-style Fill (Multiple Words)
+
+Place words to slots with crossings; verify consistency at intersections.
+
+**How it works**
+
+Backtrack over slot assignments; use a trie for prefix feasibility; order by most constrained slot first.
+
+* Time: exponential in slots; strong pruning and good heuristics are crucial.
+
+---
+
+### Summary of Complexities
+
+* Full traversal: $O(R\cdot C)$ time; $O(1)$ space (no visited) or $O(R\cdot C)$ with visited.
+* Rotations/transpose: $O(R\cdot C)$ time; $O(1)$ in-place (square) or $O(R\cdot C)$ extra.
+* BFS/DFS on grids: $O(R\cdot C)$ time; $O(R\cdot C)$ space.
+* Word search backtracking: up to $O(R\cdot C\cdot b^{L})$ time; $O(L)$ space.
 
-backtracking dfs search
diff --git a/notes/searching.md b/notes/searching.md
index 5d4c6d1..523307f 100644
--- a/notes/searching.md
+++ b/notes/searching.md
@@ -104,8 +104,8 @@ Checks:
 
 * Works on any list; no sorting or structure required.
 * Returns the first index containing the target; if absent, reports “not found.”
-* Time: O(n) comparisons on average and in the worst case; best case O(1) if the first element matches.
-* Space: O(1) extra memory.
+* Time: $O(n)$ comparisons on average and in the worst case; best case $O(1)$ if the first element matches.
+* Space: $O(1)$ extra memory.
 * Naturally finds the earliest occurrence when duplicates exist.
 * Simple and dependable for short or unsorted data.
 * Assumes 0-based indexing in these notes.
@@ -192,7 +192,7 @@ Scan:
 ```
 
 * Removes the per-iteration “have we reached the end?” check; the sentinel guarantees termination.
-* Same O(n) time in big-O terms, but slightly fewer comparisons in tight loops.
+* Same $O(n)$ time in big-O terms, but slightly fewer comparisons in tight loops.
 * Space: needs one extra slot; if you cannot append, you can temporarily overwrite the last element (store it, write the target, then restore it).
 * After scanning, decide by index: if the first match index < original length, it’s a real match; otherwise, it’s only the sentinel.
 * Use when micro-optimizing linear scans over arrays where bounds checks are costly.
@@ -293,7 +293,7 @@ Active range: indices 4..4
 FOUND at index 4
 
 * Requires a sorted array (assume ascending here).
-* Time: O(log n); Space: O(1) iterative.
+* Time: $O(log n)$; Space: $O(1)$ iterative.
 * Returns any one matching index by default; “first/last occurrence” is a small, common refinement.
 * Robust, cache-friendly, and a building block for many higher-level searches.
 * Beware of off-by-one errors when shrinking bounds.
@@ -359,7 +359,7 @@ FOUND at index 3
 * Also assumes a sorted array.
 * For discrete sorted arrays, it does **not** beat binary search asymptotically; it performs more comparisons per step.
 * Most valuable for searching the extremum of a **unimodal function** on a continuous domain; for arrays, prefer binary search.
-* Complexity: O(log n) steps but with larger constant factors than binary search.
+* Complexity: $O(log n)$ steps but with larger constant factors than binary search.
 
 #### Jump Search
 On a sorted array, jump ahead in fixed block sizes to find the block that may contain the target, then do a linear scan inside that block.
@@ -426,7 +426,7 @@ Block [16 ][25 ]
 The element $25$ is found at **index 4**.
 
 * Works on sorted arrays; pick jump ≈ √n for good balance.
-* Time: O(√n) comparisons on average; Space: O(1).
+* Time: $O(√n)$ comparisons on average; Space: $O(1)$.
 * Useful when random access is cheap but full binary search isn’t desirable (e.g., limited CPU branch prediction, or when scanning in blocks is cache-friendly).
 * Degrades gracefully to “scan block then stop.”
 
@@ -539,8 +539,8 @@ i =    0     1     2     3    4     5     6     7     8
 Found at **index 7**.
 
 * Great when the target is likely to be near the beginning or when the array is **unbounded**/**stream-like** but sorted (you can probe indices safely).
-* Time: O(log p) to find the range where p is the final bound, plus O(log p) for binary search → overall O(log p).
-* Space: O(1).
+* Time: $O(log p)$ to find the range where p is the final bound, plus $O(log p)$ for binary search → overall $O(log p)$.
+* Space: $O(1)$.
 * Often paired with data sources where you can test “is index i valid?” while doubling i.
 
 #### Interpolation Search
@@ -627,21 +627,21 @@ i =    0     1     2     3     4     5     6
 
 At this point, an **early-stop check** already tells us `target (45) < A[low] (50)` → cannot exist in `A[4..6]` → **not found**.
 
-* Best on **uniformly distributed** sorted data; expected time O(log log n).
-* Worst case can degrade to O(n), especially on skewed or clustered values.
-* Space: O(1).
+* Best on **uniformly distributed** sorted data; expected time $O(log log n)$.
+* Worst case can degrade to $O(n)$, especially on skewed or clustered values.
+* Space: $O(1)$.
 * Very fast when value-to-index mapping is close to linear (e.g., near-uniform numeric keys).
 * Requires careful handling when A\[high] = A\[low] (avoid division by zero); also sensitive to integer rounding in discrete arrays.
 
 ### Hash-based Search
-* **Separate chaining:** Easiest deletions, steady O(1) with α≈1; good when memory fragmentation isn’t a concern.
+ * **Separate chaining:** Easiest deletions, steady $O(1)$ with α≈1; good when memory fragmentation isn’t a concern.
 * **Open addressing (double hashing):** Best probe quality among OA variants; great cache locality; keep α < 0.8.
 * **Open addressing (linear/quadratic):** Simple and fast at low α; watch clustering and tombstones.
 * **Cuckoo hashing:** Tiny and predictable lookup cost; inserts costlier and may rehash; great for read-heavy workloads.
 * In all cases: pick strong hash functions and resize early to keep α healthy.
 
 #### Hash Table Search
-Map a key to an array index with a hash function; look at that bucket to find the key, giving expected O(1) lookups under a good hash and healthy load factor.
+Map a key to an array index with a hash function; look at that bucket to find the key, giving expected $O(1)$ lookups under a good hash and healthy load factor.
 
 **Example inputs and outputs**
 
@@ -744,10 +744,10 @@ Idx:   0     1     2     3     4     5     6
    (If not found, continue probing until an empty slot or wrap limit.)
 ```
 
-* Quality hash + low load factor (α = n/m) ⇒ expected O(1) search/insert/delete.
+* Quality hash + low load factor (α = n/m) ⇒ expected $O(1)$ search/insert/delete.
 * Collisions are inevitable; the collision strategy (open addressing vs. chaining vs. cuckoo) dictates actual steps.
 * Rehashing (growing and re-inserting) is used to keep α under control.
-* Uniform hashing assumption underpins the O(1) expectation; adversarial keys or poor hashes can degrade performance.
+* Uniform hashing assumption underpins the $O(1)$ expectation; adversarial keys or poor hashes can degrade performance.
 
 #### Open Addressing — Linear Probing
 
@@ -832,7 +832,7 @@ Path followed:
 * Simple and cache-friendly; clusters form (“primary clustering”) which can slow probes.
 * Deletion uses **tombstones** to keep probe chains intact.
 * Performance depends sharply on load factor; keep α well below 1 (e.g., α ≤ 0.7).
-* Expected search \~ O(1) at low α; degrades as clusters grow.
+* Expected search \~ $O(1)$ at low α; degrades as clusters grow.
 
 #### Open Addressing — Quadratic Probing
 
@@ -933,7 +933,7 @@ Path:
 * Reduces primary clustering but can exhibit **secondary clustering** (keys with same h(k) follow same probe squares).
 * Table size choice matters (often prime); ensure the probe sequence can reach many slots.
 * Keep α modest; deletion still needs tombstones.
-* Expected O(1) at healthy α; simpler than double hashing.
+* Expected $O(1)$ at healthy α; simpler than double hashing.
 
 #### Open Addressing — Double Hashing
 
@@ -1162,7 +1162,7 @@ Search(9):
 ```
 
 * Simple deletes (remove from a bucket) and no tombstones.
-* Expected O(1 + α) time; with good hashing and α kept near/below 1, bucket lengths stay tiny.
+* Expected $O(1 + α)$ time; with good hashing and α kept near/below 1, bucket lengths stay tiny.
 * Memory overhead for bucket nodes; cache locality worse than open addressing.
 * Buckets can use **ordered lists** or **small vectors** to accelerate scans.
 * Rehashing still needed as n grows; α = n/m controls performance.
@@ -1416,7 +1416,7 @@ All ones → **MAYBE PRESENT** (could be a **false positive**)
 
 * Answers: **maybe present** / **definitely not present**; never false negatives (without deletions).
 * False-positive rate is tunable via bit-array size **m**, number of hashes **k**, and items **n**; more space & good **k** → lower FPR.
-* Time: O(k) per insert/lookup; Space: \~m bits.
+* Time: $O(k)$ per insert/lookup; Space: \~m bits.
 * No deletions in the basic form; duplicates are harmless (idempotent sets).
 * Union = bitwise OR; intersection = bitwise AND (for same m,k,hashes).
 * Choose independent, well-mixed hash functions to avoid correlated bits.
@@ -1697,7 +1697,7 @@ Result: DEFINITELY NOT PRESENT
 
 ### String Search Algorithms
 
-* **KMP:** Best all-rounder for guaranteed **O(n + m)** and tiny memory.
+* **KMP:** Best all-rounder for guaranteed $O(n + m)$ and tiny memory.
 * **Boyer–Moore:** Fastest in practice on long patterns / large alphabets due to big skips.
 * **Rabin–Karp:** Great for **many patterns** or streaming; hashing enables batched checks.
 * **Naive:** Fine for tiny inputs or as a baseline; simplest to reason about.
@@ -1818,8 +1818,8 @@ Pattern:        a b r a
 ✅ All match → **REPORT at index 7**
 
 * Works anywhere; no preprocessing.
-* Time: worst/average **O(n·m)** (text length n, pattern length m).
-* Space: **O(1)**.
+* Time: worst/average $O(n·m)$ (text length n, pattern length m).
+* Space: $O(1)$.
 * Good for very short patterns or tiny inputs; otherwise use KMP/BM/RK.
 
 #### Knuth–Morris–Pratt (KMP)
@@ -1924,8 +1924,8 @@ Start index = i - m + 1 = 14 - 5 + 1 = 10
 
 ✅ Pattern `"ababd"` occurs in the text starting at **index 10**.
 
-* Time: **O(n + m)** (preprocessing + scan).
-* Space: **O(m)** for LPS table.
+* Time: $O(n + m)$ (preprocessing + scan).
+* Space: $O(m)$ for LPS table.
 * Never moves i backward; avoids redundant comparisons.
 * Ideal for repeated searches with the same pattern.
 * LPS is also called prefix-function / failure-function.
@@ -2010,8 +2010,8 @@ E=E, L=L, P=P, M=M, A=A, X=X, E=E
 ✅ **Full match** found at **index 17**.
 
 * Average case sublinear (often skips large chunks of text).
-* Worst case can be **O(n·m)**; with both rules + Galil’s optimization, comparisons can be bounded **O(n + m)**.
-* Space: **O(σ + m)** for tables (σ = alphabet size).
+* Worst case can be $O(n·m)$; with both rules + Galil’s optimization, comparisons can be bounded $O(n + m)$.
+* Space: $O(σ + m)$ for tables (σ = alphabet size).
 * Shines on long patterns over large alphabets (e.g., English text, logs).
 * Careful table prep (bad-character & good-suffix) is crucial.
 
@@ -2121,8 +2121,8 @@ s   window   hash   =HP?
 
 Matches at indices: **0, 4, 7**.
 
-* Expected time **O(n + m)** with a good modulus and low collision rate; worst case **O(n·m)** if many collisions.
-* Space: **O(1)** beyond the text/pattern and precomputed powers.
+* Expected time $O(n + m)$ with a good modulus and low collision rate; worst case $O(n·m)$ if many collisions.
+* Space: $O(1)$ beyond the text/pattern and precomputed powers.
 * Excellent for multi-pattern search (compute many pattern hashes, reuse rolling windows).
 * Choose modulus to reduce collisions; verify on hash hits to ensure correctness.
 * Works naturally on streams/very large texts since it needs only the current window.

From d5acaa8e13b7ef981f618c7a594ea30d96f3e586 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 00:15:03 +0200
Subject: [PATCH 23/48] Update greedy_algorithms.md

---
 notes/greedy_algorithms.md | 197 ++++++++++++++++++-------------------
 1 file changed, 95 insertions(+), 102 deletions(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index 0db6be9..5567dee 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -1,157 +1,150 @@
-## Greedy Algorithms
+## Greedy algorithms
 
-Greedy methods construct a solution piece by piece, always choosing the currently best-looking option according to a simple rule. The subtlety is not the rule itself but the proof that local optimality extends to global optimality. Two proof tools do most of the work: exchange arguments (you can swap an optimal solution’s first “deviation” back to the greedy choice without harm) and loop invariants (you maintain a statement that pins down exactly what your partial solution guarantees at each step).
-
-Formally, consider a finite ground set $E$, a family of feasible subsets $\mathcal{F}\subseteq 2^E$, and a weight function $w:E\to \mathbb{R}$. A generic greedy scheme orders elements of $E$ by a key $\kappa(e)$ and scans them, adding $e$ to the building solution $S$ if $S\cup\{e\}\in\mathcal{F}$. Correctness means
-
-$$
-\text{Greedy}(E,\mathcal{F},w,\kappa)\in\arg\max\{\,w(S):S\in\mathcal{F}\,\}.
-$$
-
-The nice, crisp setting where this always works is the theory of matroids. Outside that, correctness must be argued problem-by-problem.
+Greedy algorithms build a solution one step at a time. At each step, grab the option that looks best *right now* by some simple rule (highest value, earliest finish, shortest length, etc.). Keep it if it doesn’t break the rules of the problem.
 
 ```
-scan order:  e1  e2  e3  e4  e5  ...
-feasible?     Y   N   Y   Y   N
-solution S:  {e1,    e3, e4}
+1) Sort by your rule (the “key”).
+2) Scan items in that order.
+3) If adding this item keeps the partial answer valid, keep it.
+4) Otherwise skip it.
 ```
 
-### The greedy-choice principle and exchange arguments
-
-Greedy methods feel simple on the surface—always take the best-looking move right now—but the proof that this is globally safe is subtle. The core idea is to show that at the first moment an optimal solution “disagrees” with your greedy choice, you can surgically swap in the greedy move without making things worse. Do that repeatedly and you literally transform some optimal solution into the greedy one. That’s the exchange argument.
-
-Let $E$ be a finite ground set of “atoms.” Feasible solutions are subsets $S\subseteq E$ belonging to a family $\mathcal{F}\subseteq 2^E$. The objective is additive:
-
-$$
-\text{maximize } w(S)=\sum_{e\in S} w(e)\quad\text{subject to } S\in\mathcal{F},\qquad w:E\to\mathbb{R}.
-$$
+Picking the best “now” doesn’t obviously give the best “overall.” The real work is showing that these local choices still lead to a globally best answer.
 
-A generic greedy algorithm fixes an order $e_1,e_2,\dots,e_m$ determined by a key $\kappa$ (for example, sort by nonincreasing $w$ or by earliest finishing time), then scans the elements and keeps $e_i$ whenever $S\cup\{e_i\}\in\mathcal{F}$.
+**Two proof tricks you’ll see a lot:**
 
-Two structural properties make the exchange proof go through.
+* *Exchange argument.* Take any optimal solution that disagrees with greedy at the first point. Show you can “swap in” the greedy choice there without making the solution worse or breaking feasibility. Do this repeatedly and you morph some optimal solution into the greedy one—so greedy must be optimal.
+* *Loop invariant.* Write down a sentence that’s true after every step of the scan (e.g., “the current set is feasible and as good as any other set built from the items we’ve seen”). Prove it stays true as you process the next item; at the end, that sentence implies optimality.
 
-1. Feasibility exchange. Whenever $A,B\in\mathcal{F}$ with $|A|<|B|$, there exists $x\in B\setminus A$ such that $A\cup\{x\}\in\mathcal{F}$. This “augmentation flavor” is what lets you replace a non-greedy element by a greedy one while staying feasible.
+*Picture it like this:*
 
-2. Local dominance. At the first position where greedy would keep $g$ but some optimal $O$ keeps $o\neq g$, you can drop some element $x\in O\setminus A$ and insert $g$ so that
+```
+position →   1    2    3    4    5
+greedy:     [✓]  [✗]  [✓]  [✓]  [✗]
+some optimal:
+             ✓    ✓    ✗    ?    ?
+First mismatch at 3 → swap in greedy’s pick without harm.
+Repeat until both rows match → greedy is optimal.
+```
 
-$$
-A\cup\{g\}\cup\bigl(O\setminus\{x\}\bigr)\in\mathcal{F}
-\quad\text{and}\quad
-w(g)\ge w(x),
-$$
+**Where greedy shines automatically: matroids (nice constraint systems).**
+There’s a tidy setting where greedy is *always* right (for nonnegative weights): when your “what’s allowed” rules form a **matroid**. You don’t need the symbols—just the vibe:
 
-where $A$ is the common prefix chosen by both up to that point. The inequality ensures the objective does not decrease during the swap.
+1. **You can start from empty.**
+2. **Throwing things out never hurts.** If a set is allowed, any subset is allowed.
+3. **Smooth growth (augmentation).** If one allowed set is smaller than another, you can always add *something* from the bigger one to the smaller and stay allowed.
 
-When $(E,\mathcal{F})$ is a matroid, the feasibility exchange always holds; if you also order by nonincreasing $w$, local dominance holds trivially with $x$ chosen by the matroid’s augmentation. Many everyday problems satisfy these two properties even without full matroid machinery.
+That third rule prevents dead ends and is exactly what exchange arguments rely on. In matroids, the simple “sort by weight and take what fits” greedy is guaranteed optimal. Outside matroids, greedy can still work—but you must justify it for the specific problem using exchange/invariants.
 
-Write the greedy picks as a sequence $G=(g_1,g_2,\dots,g_k)$, in the order chosen. The following lemma is the workhorse.
 
-**Lemma (first-difference exchange).** Suppose there exists an optimal solution $O$ whose first $t-1$ elements agree with greedy, meaning $g_1,\dots,g_{t-1}\in O$. If $g_t\in O$ as well, continue. Otherwise there exists $x\in O\setminus\{g_1,\dots,g_{t-1}\}$ such that
+### Reachability on a line
 
-$$
-O' \;=\;\bigl(O\setminus\{x\}\bigr)\cup\{g_t\}\in\mathcal{F}
-\quad\text{and}\quad
-w(O')\ge w(O).
-$$
+- You stand at square \$0\$ on squares \$0,1,\dots,n-1\$.
+- Each square \$i\$ has a jump power \$a\[i]\$. From \$i\$ you may land on any of \$i+1, i+2, \dots, i+a\[i]\$.
+- Goal: decide if you can reach \$n-1\$; if not, report the furthest reachable square.
 
-Hence there is an optimal solution that agrees with greedy on the first $t$ positions.
+Example
 
-*Proof sketch.* Let $A_{t-1}=\{g_1,\dots,g_{t-1}\}$. Because greedy considered $g_t$ before any element in $O\setminus A_{t-1}$ that it skipped, local dominance says some $x\in O\setminus A_{t-1}$ can be traded for $g_t$ without breaking feasibility and without decreasing weight. This creates $O'$ optimal and consistent with greedy for one more step. Apply the same reasoning inductively.
+* Input: \$a=\[3,1,0,0,4,1]\$, so \$n=6\$ (squares \$0..5\$).
 
-Induction on $t$ yields the main theorem: there exists an optimal solution that agrees with greedy everywhere, hence greedy is optimal.
+```
+indices:  0   1   2   3   4   5
+a[i]   :  3   1   0   0   4   1
+reach   :  ^ start at 0
+```
 
-It helps to picture the two solutions aligned in the greedy order. The top row is the greedy decision at each position; the bottom row is some optimal solution, possibly disagreeing. At the first disagreement, one swap pushes the optimal line upward to match greedy, and the objective value does not drop.
+From any \$i\$, the allowed landings are a range:
 
 ```
-positions →   1      2      3      4      5      6      7
-greedy G:    [g1]   [g2]   [g3]   [g4]   [g5]   [g6]   [g7]
-optimal O:   [g1]   [g2]   [ o ]  [ ? ]  [ ? ]  [ ? ]  [ ? ]
-                                      
-exchange at position 3:
-drop some x from O beyond position 2 and insert g3
-
-after swap:
-optimal O':  [g1]   [g2]   [g3]   [ ? ]  [ ? ]  [ ? ]  [ ? ]
+i=0 (a[0]=3): 1..3
+i=1 (a[1]=1): 2
+i=2 (a[2]=0): —
+i=3 (a[3]=0): —
+i=4 (a[4]=4): 5..8 (board ends at 5)
 ```
 
-The key is not the letter symbols but the invariants. Up to position $t-1$, both solutions coincide. The swap keeps feasibility and weight, so you have a new optimal that also matches at position $t$. Repeat, and the bottom row becomes the top row.
+---
 
-### Matroids
+Baseline idea (waves)
 
-Greedy methods don’t usually get ironclad guarantees, but there is a beautiful class of feasibility systems where they do. That class is the matroids. Once your constraints form a matroid, the simplest weight-ordered greedy scan is not a heuristic anymore; it is provably optimal for every nonnegative weight assignment.
+“Paint everything reachable, one wave at a time.”
 
-A matroid is a pair $(E,\mathcal{I})$ with $E$ a finite ground set and $\mathcal{I}\subseteq 2^E$ the “independent” subsets. Three axioms hold.
+1. Start with \${0}\$ reachable.
+2. For each already-reachable \$i\$, add all \$i+1..i+a\[i]\$.
+3. Stop when nothing new appears.
 
-* Non-emptiness says $\varnothing\in\mathcal{I}$.
-* Heredity says independence is downward-closed: if $A\in\mathcal{I}$ and $B\subseteq A$, then $B\in\mathcal{I}$.
-* Augmentation says independence grows smoothly: if $A,B\in\mathcal{I}$ with $|A|<|B|$, then some $x\in B\setminus A$ exists with $A\cup\{x\}\in\mathcal{I}$.
+Walk on the example:
 
-The last axiom is the heart. It forbids “dead ends” where a smaller feasible set cannot absorb a single element from any larger feasible set. That smoothness is exactly what greedy needs to keep repairing early choices.
+```
+start:   reachable = {0}
+from 0:  add {1,2,3}     → reachable = {0,1,2,3}
+from 1:  add {2}         → no change
+from 2:  add {}          → a[2]=0
+from 3:  add {}          → a[3]=0
+stop:    no new squares  → furthest = 3; last (5) unreachable
+```
 
-### Reachability on a line
+Correct, but can reprocess many squares.
 
-You’re standing on square 0 of a line of squares $0,1,\dots,n-1$.
-Each square $i$ tells you how far you’re allowed to jump forward from there: a number $a[i]$. From $i$, you can jump to any square $i+1, i+2, \dots, i+a[i]$. The goal is to decide whether you can ever reach the last square, and, if not, what the furthest square is that you can reach.
+---
 
-**Example inputs and outputs**
+One-pass trick (frontier)
 
-Input array: `a = [3, 1, 0, 0, 4, 1]`
-There are 6 squares (0 through 5).
-Correct output: you cannot reach the last square; the furthest you can get is square `3`.
+Carry one number while scanning left→right: the furthest frontier \$F\$ seen so far.
 
-Baseline (slow)
+Rules:
 
-Think “paint everything I can reach, one wave at a time.”
+* If you are at \$i\$ with \$i>F\$, you hit a gap → stuck forever.
+* Otherwise, extend \$F \leftarrow \max(F,\ i+a\[i])\$ and continue.
 
-1. Start with square 0 marked “reachable.”
-2. For every square already marked, paint all squares it can jump to.
-3. Keep doing this until no new squares get painted.
+At the end:
 
-This is correct because you literally try every allowed jump from every spot you know is reachable. It can be wasteful, though, because the same squares get reconsidered over and over in dense cases.
+* Can reach last iff \$F \ge n-1\$.
+* Furthest reachable square is \$F\$ (capped by \$n-1\$).
 
-Walking the example:
+Pseudocode
 
 ```
-start:   reachable = {0}
-from 0:  can reach {1,2,3} → reachable = {0,1,2,3}
-from 1:  can reach {2}     → no change
-from 2:  can reach {}      → no change (a[2]=0)
-from 3:  can reach {}      → no change (a[3]=0)
-done:    no new squares → furthest is 3, last is unreachable
+F = 0
+for i in 0..n-1:
+    if i > F: break
+    F = max(F, i + a[i])
+
+can_reach_last = (F >= n-1)
+furthest = min(F, n-1)
 ```
 
-**How it works**
+Why this is safe (one line): \$F\$ always equals “best jump end discovered from any truly-reachable square \$\le i\$,” and never decreases; if \$i>F\$, no earlier jump can help because its effect was already folded into \$F\$.
 
-Carry one number as you sweep left to right: `F`, the furthest square you can reach **so far**.
-Rule of thumb:
+ walkthrough on the example
 
-* If you’re looking at square `i` and `i` is beyond `F`, you’re stuck forever.
-* Otherwise, extend the frontier with `F = max(F, i + a[i])` and move on.
+We draw the frontier as a bracket reaching to \$F\$.
 
-That’s it—one pass, no backtracking.
+Step \$i=0\$ (inside frontier since \$0\le F\$); update \$F=\max(0,0+3)=3\$.
+
+```
+indices:  0   1   2   3   4   5
+          [===============F]
+          0   1   2   3
+F=3
+```
 
-Why this is safe in a sentence: `F` always summarizes “the best jump end we have discovered from any square we truly reached,” and it never goes backward; if you hit a gap where `i > F`, then no earlier jump can help because its effect was already folded into `F`.
+Step \$i=1\$: still \$i\le F\$. Update \$F=\max(3,1+1)=3\$ (no change).
+Step \$i=2\$: \$F=\max(3,2+0)=3\$ (no change).
+Step \$i=3\$: \$F=\max(3,3+0)=3\$ (no change).
 
-Plugging in the same numbers
+Now \$i=4\$ but \$4>F(=3)\$ → gap → stuck.
 
 ```
-a = [3, 1, 0, 0, 4, 1]
-n = 6
-F = 0           # we start at square 0 (we’ll extend immediately at i=0)
-
-i=0: 0 ≤ F → F = max(0, 0+3) = 3
-i=1: 1 ≤ F → F = max(3, 1+1) = 3
-i=2: 2 ≤ F → F = max(3, 2+0) = 3
-i=3: 3 ≤ F → F = max(3, 3+0) = 3
-i=4: 4 > F  → stuck here
+indices:  0   1   2   3   4   5
+          [===============F]   x  (i=4 is outside)
+F=3
 ```
 
-Final state: `F = 3`, which means the furthest reachable square is 3. Since `F < n-1 = 5`, the last square is not reachable.
+Final: \$F=3\$. Since \$F\<n-1=5\$, last is unreachable; furthest reachable square is \$3\$.
 
-Summary
+Complexity: time \$O(n)\$, space \$O(1)\$.
 
-* Time: $O(n)$ (single left-to-right pass)
-* Space: $O(1)$
 
 ### Minimum spanning trees
 

From b8df05cdc2cf519b6b68e81048b2928710294c46 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 11:42:00 +0200
Subject: [PATCH 24/48] Update greedy_algorithms.md

---
 notes/greedy_algorithms.md | 393 +++++++++++++++++++++----------------
 1 file changed, 222 insertions(+), 171 deletions(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index 5567dee..5239c3e 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -2,12 +2,10 @@
 
 Greedy algorithms build a solution one step at a time. At each step, grab the option that looks best *right now* by some simple rule (highest value, earliest finish, shortest length, etc.). Keep it if it doesn’t break the rules of the problem.
 
-```
-1) Sort by your rule (the “key”).
-2) Scan items in that order.
-3) If adding this item keeps the partial answer valid, keep it.
-4) Otherwise skip it.
-```
+1. Sort by your rule (the “key”).
+2. Scan items in that order.
+3. If adding this item keeps the partial answer valid, keep it.
+4. Otherwise skip it.
 
 Picking the best “now” doesn’t obviously give the best “overall.” The real work is showing that these local choices still lead to a globally best answer.
 
@@ -36,24 +34,23 @@ There’s a tidy setting where greedy is *always* right (for nonnegative weights
 
 That third rule prevents dead ends and is exactly what exchange arguments rely on. In matroids, the simple “sort by weight and take what fits” greedy is guaranteed optimal. Outside matroids, greedy can still work—but you must justify it for the specific problem using exchange/invariants.
 
-
 ### Reachability on a line
 
-- You stand at square \$0\$ on squares \$0,1,\dots,n-1\$.
-- Each square \$i\$ has a jump power \$a\[i]\$. From \$i\$ you may land on any of \$i+1, i+2, \dots, i+a\[i]\$.
-- Goal: decide if you can reach \$n-1\$; if not, report the furthest reachable square.
+- You stand at square $0$ on squares $0,1,dots,n-1$.
+- Each square $i$ has a jump power $a\[i]$. From $i$ you may land on any of $i+1, i+2, \dots, i+a\[i]$.
+- Goal: decide if you can reach $n-1$; if not, report the furthest reachable square.
 
-Example
+**Example**
 
-* Input: \$a=\[3,1,0,0,4,1]\$, so \$n=6\$ (squares \$0..5\$).
+Input: $a=\[3,1,0,0,4,1]$, so $n=6$ (squares $0..5$).
 
 ```
 indices:  0   1   2   3   4   5
 a[i]   :  3   1   0   0   4   1
-reach   :  ^ start at 0
+reach  :  ^ start at 0
 ```
 
-From any \$i\$, the allowed landings are a range:
+From any $i$, the allowed landings are a range:
 
 ```
 i=0 (a[0]=3): 1..3
@@ -63,17 +60,15 @@ i=3 (a[3]=0): —
 i=4 (a[4]=4): 5..8 (board ends at 5)
 ```
 
----
-
-Baseline idea (waves)
+**Baseline idea**
 
 “Paint everything reachable, one wave at a time.”
 
-1. Start with \${0}\$ reachable.
-2. For each already-reachable \$i\$, add all \$i+1..i+a\[i]\$.
+1. Start with ${0}$ reachable.
+2. For each already-reachable $i$, add all $i+1..i+a\[i]$.
 3. Stop when nothing new appears.
 
-Walk on the example:
+*Walkthrough:*
 
 ```
 start:   reachable = {0}
@@ -86,23 +81,21 @@ stop:    no new squares  → furthest = 3; last (5) unreachable
 
 Correct, but can reprocess many squares.
 
----
-
-One-pass trick (frontier)
+**One-pass trick**
 
-Carry one number while scanning left→right: the furthest frontier \$F\$ seen so far.
+Carry one number while scanning left→right: the furthest frontier $F$ seen so far.
 
 Rules:
 
-* If you are at \$i\$ with \$i>F\$, you hit a gap → stuck forever.
-* Otherwise, extend \$F \leftarrow \max(F,\ i+a\[i])\$ and continue.
+* If you are at $i$ with $i>F$, you hit a gap → stuck forever.
+* Otherwise, extend $F \leftarrow \max(F, i+a\[i])$ and continue.
 
 At the end:
 
-* Can reach last iff \$F \ge n-1\$.
-* Furthest reachable square is \$F\$ (capped by \$n-1\$).
+* Can reach last iff $F \ge n-1$.
+* Furthest reachable square is $F$ (capped by $n-1$).
 
-Pseudocode
+*Pseudocode*
 
 ```
 F = 0
@@ -114,13 +107,13 @@ can_reach_last = (F >= n-1)
 furthest = min(F, n-1)
 ```
 
-Why this is safe (one line): \$F\$ always equals “best jump end discovered from any truly-reachable square \$\le i\$,” and never decreases; if \$i>F\$, no earlier jump can help because its effect was already folded into \$F\$.
+Why this is safe (one line): $F$ always equals “best jump end discovered from any truly-reachable square $\le i$,” and never decreases; if $i>F$, no earlier jump can help because its effect was already folded into $F$.
 
- walkthrough on the example
+*Walkthrough:*
 
-We draw the frontier as a bracket reaching to \$F\$.
+We draw the frontier as a bracket reaching to $F$.
 
-Step \$i=0\$ (inside frontier since \$0\le F\$); update \$F=\max(0,0+3)=3\$.
+Step $i=0$ (inside frontier since $0\le F$); update $F=\max(0,0+3)=3$.
 
 ```
 indices:  0   1   2   3   4   5
@@ -129,11 +122,11 @@ indices:  0   1   2   3   4   5
 F=3
 ```
 
-Step \$i=1\$: still \$i\le F\$. Update \$F=\max(3,1+1)=3\$ (no change).
-Step \$i=2\$: \$F=\max(3,2+0)=3\$ (no change).
-Step \$i=3\$: \$F=\max(3,3+0)=3\$ (no change).
+Step $i=1$: still $i\le F$. Update $F=\max(3,1+1)=3$ (no change).
+Step $i=2$: $F=\max(3,2+0)=3$ (no change).
+Step $i=3$: $F=\max(3,3+0)=3$ (no change).
 
-Now \$i=4\$ but \$4>F(=3)\$ → gap → stuck.
+Now $i=4$ but $4>F(=3)$ → gap → stuck.
 
 ```
 indices:  0   1   2   3   4   5
@@ -141,10 +134,9 @@ indices:  0   1   2   3   4   5
 F=3
 ```
 
-Final: \$F=3\$. Since \$F\<n-1=5\$, last is unreachable; furthest reachable square is \$3\$.
-
-Complexity: time \$O(n)\$, space \$O(1)\$.
+Final: $F=3$. Since $F\<n-1=5$, last is unreachable; furthest reachable square is $3$.
 
+Complexity: time $O(n)$, space $O(1)$.
 
 ### Minimum spanning trees
 
@@ -152,35 +144,36 @@ You’ve got a connected weighted graph and you want the cheapest way to connect
 
 **Example inputs and outputs**
 
-Vertices: $V=\{A,B,C,D,E\}$
-
-Edges with weights:
+```
+V = {A,B,C,D,E}
 
-* $A\!-\!B:1,\ A\!-\!C:5,\ A\!-\!E:9$
-* $B\!-\!C:4,\ B\!-\!D:2,\ B\!-\!E:7$
-* $C\!-\!D:6,\ C\!-\!E:3$
-* $D\!-\!E:8$
+Edges (u-v:w):
+A-B:1  A-C:5  A-E:9
+B-C:4  B-D:2  B-E:7
+C-D:6  C-E:3
+D-E:8
+```
 
 A correct MST for this graph is:
 
 $$
-\{A\!-\!B(1),\ B\!-\!D(2),\ C\!-\!E(3),\ B\!-\!C(4)\}
+{A ⁣− ⁣B(1), B ⁣− ⁣D(2), C ⁣− ⁣E(3), B ⁣− ⁣C(4)}
 $$
 
 Total weight $=1+2+3+4=10$.
 
 You can’t do better: any cheaper set of 4 edges would either miss a vertex or create a cycle.
 
-Baseline (slow)
+*Baseline*
 
 Enumerate every spanning tree and pick the one with the smallest total weight. That’s conceptually simple—“try all combinations of $n-1$ edges that connect everything and have no cycles”—but it explodes combinatorially. Even medium graphs have an astronomical number of spanning trees, so this approach is only good as a thought experiment.
 
-**How it works**
+*How it works*
 
 Both fast methods rely on two facts:
 
-* **Cut rule (safe to add):** for any cut $(S, V\setminus S)$, the cheapest edge that crosses the cut appears in some MST. Intuition: if your current partial connection is on one side, the cheapest bridge to the other side is never a bad idea.
-* **Cycle rule (safe to skip):** in any cycle, the most expensive edge is never in an MST. Intuition: if you already have a loop, drop the priciest link and you’ll still be connected but strictly cheaper.
+* **Cut rule (safe to add)** - for any cut $(S, V\setminus S)$, the cheapest edge that crosses the cut appears in some MST. Intuition: if your current partial connection is on one side, the cheapest bridge to the other side is never a bad idea.
+* **Cycle rule (safe to skip)** - in any cycle, the most expensive edge is never in an MST. Intuition: if you already have a loop, drop the priciest link and you’ll still be connected but strictly cheaper.
 
 #### Kruskal’s method
 
@@ -189,7 +182,7 @@ Both fast methods rely on two facts:
 Use the same graph as above. A valid MST is
 
 $$
-\{A\!-\!B(1),\ B\!-\!D(2),\ C\!-\!E(3),\ B\!-\!C(4)\}\quad\Rightarrow\quad \text{total} = 10.
+\{A\!-\!B(1), B\!-\!D(2), C\!-\!E(3), B\!-\!C(4)\}\quad\Rightarrow\quad \text{total} = 10
 $$
 
 **How it works**
@@ -198,41 +191,52 @@ Sort edges from lightest to heaviest; walk down that list and keep an edge if it
 
 Sorted edges by weight:
 
-$$
-1: A\!-\!B,\quad
-2: B\!-\!D,\quad
-3: C\!-\!E,\quad
-4: B\!-\!C,\quad
-5: A\!-\!C,\quad
-6: C\!-\!D,\quad
-7: B\!-\!E,\quad
-8: D\!-\!E,\quad
-9: A\!-\!E
-$$
+```
+1: A-B
+2: B-D
+3: C-E
+4: B-C
+5: A-C
+6: C-D
+7: B-E
+8: D-E
+9: A-E
+```
 
 We’ll keep a running view of the components; initially each vertex is alone.
 
-1. $A\!-\!B(1)$ connects $\{A\}$ and $\{B\}$ → **keep**
-   Components: $\{A,B\},\{C\},\{D\},\{E\}$
+```
+start:   {A} {B} {C} {D} {E}
 
-2. $B\!-\!D(2)$ connects $\{A,B\}$ and $\{D\}$ → **keep**
-   Components: $\{A,B,D\},\{C\},\{E\}$
+take 1:  A-B(1)   → {AB} {C} {D} {E}
+take 2:  B-D(2)   → {ABD} {C} {E}
+take 3:  C-E(3)   → {ABD} {CE}
+take 4:  B-C(4)   → {ABCDE}   ← all connected (|V|-1 edges) → stop
 
-3. $C\!-\!E(3)$ connects $\{C\}$ and $\{E\}$ → **keep**
-   Components: $\{A,B,D\},\{C,E\}$
+kept: A-B(1), B-D(2), C-E(3), B-C(4)  → total = 10
+```
 
-4. $B\!-\!C(4)$ connects $\{A,B,D\}$ and $\{C,E\}$ → **keep**
-   Components: $\{A,B,C,D,E\}$  ← all connected; we have 4 edges → stop.
+- Edges kept: $A\!-\!B(1), B\!-\!D(2), C\!-\!E(3), B\!-\!C(4)$.
+- Total $=10$. Every later edge would create a cycle and is skipped by the cycle rule.
 
-Edges kept: $A\!-\!B(1), B\!-\!D(2), C\!-\!E(3), B\!-\!C(4)$.
-Total $=10$. Every later edge would create a cycle and is skipped by the cycle rule.
+Kruskal pseudocode
+
+```python
+MST = ∅
+make_set(v) for v in V
+for (w,u,v) in edges sorted by w:
+    if find(u) != find(v):
+        MST.add((u,v,w))
+        union(u,v)
+        if |MST| == |V|-1: break
+```
 
 Complexity
 
 * Time: $O(E \log E)$ to sort edges + near-constant $\alpha(V)$ for DSU unions; often written $O(E \log V)$ since $E\le V^2$.
 * Space: $O(V)$ for disjoint-set structure.
 
-### Prim's method
+#### Prim's method
 
 **Example inputs and outputs**
 
@@ -244,129 +248,174 @@ Start from any vertex; repeatedly add the lightest edge that leaves the current
 
 Let’s start from $A$. The “tree” grows one cheapest boundary edge at a time.
 
-Start: tree = $\{A\}$. Boundary edges from $A$: $A\!-\!B(1), A\!-\!C(5), A\!-\!E(9)$.
+```
+step 0:
+Tree = {A}
+Boundary = { A-B(1), A-C(5), A-E(9) }
 
-1. Take $A\!-\!B(1)$ (lightest leaving the tree).
-   Tree vertices: $\{A,B\}$. New boundary includes $B\!-\!D(2), B\!-\!C(4), B\!-\!E(7)$ as well.
+take A-B(1)
+Tree = {A,B}
+Boundary = { B-D(2), B-C(4), B-E(7), A-C(5), A-E(9) }
 
-2. Take $B\!-\!D(2)$ (now the lightest boundary edge).
-   Tree: $\{A,B,D\}$. Boundary now has $B\!-\!C(4), A\!-\!C(5), D\!-\!C(6), B\!-\!E(7), D\!-\!E(8), A\!-\!E(9)$.
+take B-D(2)
+Tree = {A,B,D}
+Boundary = { B-C(4), A-C(5), D-C(6), B-E(7), D-E(8), A-E(9) }
 
-3. Take $B\!-\!C(4)$.
-   Tree: $\{A,B,C,D\}$. Boundary updates to include $C\!-\!E(3)$.
+take B-C(4)
+Tree = {A,B,C,D}
+Boundary = { C-E(3), A-E(9), B-E(7), D-E(8) }
 
-4. Take $C\!-\!E(3)$ (cheapest boundary edge now).
-   Tree: $\{A,B,C,D,E\}$ → all vertices included → stop.
+take C-E(3)
+Tree = {A,B,C,D,E}  → done
+
+kept: A-B(1), B-D(2), B-C(4), C-E(3) → total = 10
+```
 
 Edges chosen: exactly the same four as Kruskal, total $=10$.
 
 Why did step 4 grab a weight-3 edge after we already took a 4? Because earlier that 3 wasn’t **available**—it didn’t cross from the tree to the outside until $C$ joined the tree. Prim never regrets earlier picks because of the cut rule: at each moment it adds the cheapest bridge from “inside” to “outside,” and that’s always safe.
 
+Prim pseudocode (binary heap)
+
+```python
+pick any root r
+Tree = {r}
+push all edges (r→v,w) into heap
+while |Tree| < |V|:
+    pop (w,u→v) with minimum w where v ∉ Tree
+    add v to Tree; record edge (u,v,w) in MST
+    push all edges (v→x,wvx) with x ∉ Tree
+```
+
 Complexity
 
 * Time: $O(E \log V)$ with a binary heap and adjacency lists; $O(E + V\log V)$ with a Fibonacci heap.
 * Space: $O(V)$ for keys/parents and visited set.
 
-### Shortest paths with non-negative weights
+### Shortest paths with non-negative weights (Dijkstra)
 
-You’ve got a weighted graph and a starting node $s$. Every edge has a cost $\ge 0$. The task is to find the cheapest cost to reach every node from $s$, and a cheapest route for each if you want it.
+Goal: from start $s$, compute cheapest costs $d(\cdot)$ to every node (and routes if you keep parents).
 
-**Example inputs and outputs**
+Non-negative edges only; that’s what makes the greedy step safe.
 
-Nodes: $A,B,C,D,E$
-
-Edges (undirected, weights in parentheses):
-
-* $A\!-\!B(2)$, $A\!-\!C(5)$
-* $B\!-\!C(1)$, $B\!-\!D(2)$, $B\!-\!E(7)$
-* $C\!-\!D(3)$, $C\!-\!E(1)$
-* $D\!-\!E(2)$
-
-Start at $A$.
-
-Correct shortest-path costs from $A$:
+Example
 
-* $d(A)=0$
-* $d(B)=2$ via $A\!\to\!B$
-* $d(C)=3$ via $A\!\to\!B\!\to\!C$
-* $d(D)=4$ via $A\!\to\!B\!\to\!D$
-* $d(E)=4$ via $A\!\to\!B\!\to\!C\!\to\!E$
+```
+Nodes: A B C D E   (start s=A)
 
-Baseline (slow)
+Edges (undirected):
+A-B:2  A-C:5
+B-C:1  B-D:2  B-E:7
+C-D:3  C-E:1
+D-E:2
+```
 
-One safe—but slower—approach is to relax all edges repeatedly until nothing improves. Think of it as “try to shorten paths by one edge at a time, do that $|V|-1$ rounds.” This eventually converges to the true shortest costs, but it touches every edge many times, so its work is about $|V|\cdot|E|$. It also handles negative edges, which is why it has to be cautious and keep looping.
+Correct answers from A: $d(A)=0, d(B)=2, d(C)=3, d(D)=4, d(E)=4$.
 
-**How it works**
+*Baseline* 
 
-Carry two sets and a distance label for each node.
+Repeat relaxations $|V|-1$ rounds (Bellman–Ford-style).
 
-* “Settled” nodes are done forever; their labels are final.
-* “Unsettled” nodes still might improve.
+Work $\approx |V|\cdot|E|$. Handles negatives; we don’t need that here.
 
-Initialize all labels to $\infty$ except $d(s)=0$. Over and over:
+Fast method (Dijkstra): “settle the smallest label”
 
-1. Pick the **unsettled** node with the **smallest** label. Call it $u$. Move $u$ to settled.
-2. For each neighbor $v$ of $u$, try to improve $d(v)$ using the route through $u$:
+1. Initialize distance labels: set $d(s)=0$, and $d(x)=\infty$ for all other nodes.
+2. Initialize parent pointers $\pi(\cdot)$.
+3. Initialize the settled set $S=\emptyset$.
+4. Initialize the unsettled set as $V\setminus S$.
+5. Select the unsettled node $u$ with the smallest distance label $d(u)$.
+6. Move node $u$ from the unsettled set into the settled set.
+7. Update each neighbor $v$ of $u$ by assigning $d(v) \leftarrow \min\bigl(d(v), d(u) + w(u,v)\bigr)$.
+8. If the update in step 7 improves $d(v)$, then set $\pi(v) \leftarrow u$.
+9. Repeat from step 5 until all nodes are settled or no reachable unsettled nodes remain.
+10. Justification of correctness: with non-negative edge weights, any path reaching an unsettled node must have length at least $d(u)$ plus a non-negative exit edge, so the chosen $d(u)$ is final and cannot later be decreased.
 
-   $$
-   d(v)\ \leftarrow\ \min\bigl(d(v),\ d(u)+w(u,v)\bigr).
-   $$
+Pseudocode (binary heap)
 
-Greedy “settle and forget” is safe **because all edges are non-negative**. At the moment you pick the smallest-label unsettled node $u$, every route to any other unsettled node must cost at least $d(u)$ plus some non-negative edge to leave the settled set, so nobody can beat $d(u)$ later.
+```
+for v in V: d[v]=∞; π[v]=nil
+d[s]=0
+push (0,s) into min-heap H
+S = ∅
+while H not empty:
+    (du,u) = pop-min(H)
+    if u in S: continue         # ignore stale heap entries
+    S.add(u)
+    for (u,v,w) in adj[u]:
+        if d[v] > d[u] + w:
+            d[v] = d[u] + w
+            π[v] = u
+            push (d[v], v) into H
+```
 
-#### Plugging the numbers
+Time $O((|V|+|E|)\log|V|)$; space $O(|V|)$.
 
-We’ll keep a tiny table each round. “S” means settled. Ties can be broken arbitrarily.
+*Walkthrough*
 
-Start:
+Legend: “S” = settled, “π\[x]” = parent of $x$. Ties break arbitrarily.
 
-* Labels: $d(A)=0$; $d(B)=d(C)=d(D)=d(E)=\infty$
-* Settled: $\varnothing$
+Round 0 (init)
 
-Round 1
-Pick min unsettled → $A$ (0). Settle $A$. Relax its neighbors.
+```
+S = ∅
+d:  A:0  B:∞  C:∞  D:∞  E:∞
+π:  A:-  B:-  C:-  D:-  E:-
+```
 
-* Update via $A$: $d(B)=2$, $d(C)=5$
-* Labels now: $A:0\text{ (S)},\ B:2,\ C:5,\ D:\infty,\ E:\infty$
+Round 1 — pick min unsettled → A(0); relax neighbors
 
-Round 2
-Pick min unsettled → $B$ (2). Settle $B$. Relax neighbors of $B$.
+```
+S = {A}
+relax A-B (2):  d[B]=2  π[B]=A
+relax A-C (5):  d[C]=5  π[C]=A
+d:  A:0S  B:2  C:5  D:∞  E:∞
+π:  A:-   B:A  C:A  D:-  E:-
+```
 
-* $C$ via $B$: $2+1=3 < 5$ → $d(C)=3$
-* $D$ via $B$: $2+2=4$ → $d(D)=4$
-* $E$ via $B$: $2+7=9$ → $d(E)=9$
-* Labels: $A:0\text{ (S)},\ B:2\text{ (S)},\ C:3,\ D:4,\ E:9$
+Round 2 — pick B(2); relax
 
-Round 3
-Pick min unsettled → $C$ (3). Settle $C$. Relax neighbors of $C$.
+```
+S = {A,B}
+B→C (1): 2+1=3 <5 → d[C]=3  π[C]=B
+B→D (2): 2+2=4     → d[D]=4  π[D]=B
+B→E (7): 2+7=9     → d[E]=9  π[E]=B
+d:  A:0S  B:2S  C:3  D:4  E:9
+π:  A:-   B:A   C:B  D:B  E:B
+```
 
-* $D$ via $C$: $3+3=6$ (worse than 4) → no change
-* $E$ via $C$: $3+1=4 < 9$ → $d(E)=4$
-* Labels: $A:0\text{ (S)},\ B:2\text{ (S)},\ C:3\text{ (S)},\ D:4,\ E:4$
+Round 3 — pick C(3); relax
 
-Round 4
-Pick min unsettled → tie $D$ or $E$ at 4. Take $D$. Settle $D$. Relax neighbors.
+```
+S = {A,B,C}
+C→D (3): 3+3=6  (no improv; keep 4)
+C→E (1): 3+1=4 <9 → d[E]=4  π[E]=C
+d:  A:0S  B:2S  C:3S  D:4  E:4
+π:  A:-   B:A   C:B   D:B  E:C
+```
 
-* $E$ via $D$: $4+2=6$ (worse than 4) → no change
-* Labels: $A:0\text{ (S)},\ B:2\text{ (S)},\ C:3\text{ (S)},\ D:4\text{ (S)},\ E:4$
+Round 4 — pick D(4); relax
 
-Round 5
-Pick $E$ (4). Settle $E$. No better updates. Done: all settled.
+```
+S = {A,B,C,D}
+D→E (2): 4+2=6 (no improv; keep 4)
+d:  A:0S  B:2S  C:3S  D:4S  E:4
+```
 
-Final labels:
+Round 5 — pick E(4); done
 
-* $d(A)=0$
-* $d(B)=2$
-* $d(C)=3$
-* $d(D)=4$
-* $d(E)=4$
+```
+S = {A,B,C,D,E}  (all settled)
+Final d: A:0  B:2  C:3  D:4  E:4
+Parents π: B←A, C←B, D←B, E←C
+```
 
-Recovering paths by remembering “who improved whom” gives:
+Reconstruct routes by following parents backward:
 
-* $B$ from $A$
-* $C$ from $B$
-* $D$ from $B$
-* $E$ from $C$
+* $B$: $A\to B$
+* $C$: $A\to B\to C$
+* $D$: $A\to B\to D$
+* $E$: $A\to B\to C\to E$
 
 Complexity
 
@@ -379,17 +428,19 @@ You’re given a list of numbers laid out in a line. You may pick one **contiguo
 
 **Example inputs and outputs**
 
-Take $x = [\,2,\,-3,\,4,\,-1,\,2,\,-5,\,3\,]$.
+```
+x = [ 2, -3, 4, -1, 2, -5, 3 ]
+best block = [ 4, -1, 2 ]  → sum = 5
+```
 
-A best block is $[\,4,\,-1,\,2\,]$. Its sum is $5$.
 So the correct output is “maximum sum $=5$” and one optimal segment is positions $3$ through $5$ (1-based).
 
-Baseline (slow)
+*Baseline*
 
 Try every possible block and keep the best total. To sum any block $i..j$ quickly, precompute **prefix sums** $S_0=0$ and $S_j=\sum_{k=1}^j x_k$. Then
 
 $$
-\sum_{k=i}^j x_k \;=\; S_j - S_{i-1}.
+\sum_{k=i}^j x_k = S_j - S_{i-1}
 $$
 
 Loop over all $j$ and all $i\le j$, compute $S_j-S_{i-1}$, and take the maximum. This is easy to reason about and always correct, but it does $O(n^2)$ block checks.
@@ -404,7 +455,7 @@ Walk left to right once and carry two simple numbers.
 At each step $j$, the best block **ending at** $j$ is “current prefix minus the smallest older prefix”:
 
 $$
-\text{best\_ending\_at\ }j \;=\; S_j - \min_{0\le t<j} S_t.
+\text{best\_ending\_at\ }j = S_j - \min_{0\le t<j} S_t.
 $$
 
 So during the scan:
@@ -415,11 +466,11 @@ So during the scan:
 
 This is the whole algorithm. In words: keep the lowest floor you’ve ever seen and measure how high you are above it now. If you dip to a new floor, remember it; if you rise, maybe you’ve set a new record.
 
-A widely used equivalent form keeps a “best sum ending here” value $E$: set $E \leftarrow \max(x_j,\; E+x_j)$ and track a global maximum. It’s the same idea written incrementally: if the running sum ever hurts you, you “reset” and start fresh at the current element.
+A widely used equivalent form keeps a “best sum ending here” value $E$: set $E \leftarrow \max(x_j,; E+x_j)$ and track a global maximum. It’s the same idea written incrementally: if the running sum ever hurts you, you “reset” and start fresh at the current element.
 
-#### Work the example by hand
+Work the example by hand
 
-Sequence $x = [\,2,\,-3,\,4,\,-1,\,2,\,-5,\,3\,]$.
+Sequence $x = [\,2,,-3,,4,,-1,,2,,-5,,3\,]$.
 Initialize $S=0$, $M=0$, and $\text{best}=-\infty$. Keep the index $t$ where the current $M$ occurred so we can reconstruct the block as $(t+1)..j$.
 
 ```
@@ -458,7 +509,7 @@ gap S-M: 0    2     0    4   3   5    0   3
                              ^ peak gap = 5 here
 ```
 
-#### Edge cases
+Edge cases
 
 When all numbers are negative, the best block is the **least negative single element**. The scan handles this automatically because $M$ keeps dropping with every step, so the maximum of $S_j-M$ happens when you take just the largest entry.
 
@@ -494,7 +545,7 @@ Sort by finishing time, then walk once from earliest finisher to latest. Keep an
 Sorted by finish:
 
 $$
-(1,3),\ (2,5),\ (4,7),\ (6,9),\ (8,10),\ (9,11)
+(1,3), (2,5), (4,7), (6,9), (8,10), (9,11)
 $$
 
 Run the scan and track the end of the last kept interval.
@@ -559,7 +610,7 @@ Order jobs by nondecreasing deadlines (earliest due date first, often called EDD
 Deadlines in increasing order:
 
 $$
-J_2(d=1),\ J_4(d=2),\ J_1(d=3),\ J_3(d=4)
+J_2(d=1), J_4(d=2), J_1(d=3), J_3(d=4)
 $$
 
 Run them one by one and compute completion times and lateness.
@@ -612,10 +663,10 @@ is as small as possible. Prefix codes exactly correspond to full binary trees wh
 Frequencies:
 
 $$
-A:0.40,\quad B:0.20,\quad C:0.20,\quad D:0.10,\quad E:0.10.
+A:0.40,quad B:0.20,quad C:0.20,quad D:0.10,quad E:0.10.
 $$
 
-A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths $L_A,\dots,L_E$, plus a concrete codebook.
+A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths $L_A,dots,L_E$, plus a concrete codebook.
 
 Baseline (slow)
 
@@ -626,7 +677,7 @@ One conceptual baseline is to enumerate all full binary trees with five labeled
 Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights $p$ and $q$, you create a parent of weight $p+q$. The act of merging adds exactly $p+q$ to the objective $\mathbb{E}[L]$ because every leaf inside those two subtrees becomes one level deeper. Summing over all merges yields the final cost:
 
 $$
-\mathbb{E}[L]=\sum_{\text{merges}} (p+q)=\sum_{\text{internal nodes}} \text{weight}.
+\mathbb{E}[L]=\sum_{\text{merges}} (p+q)=\sum_{\text{internal nodes}} \text{weight}
 $$
 
 The greedy choice is safe because in an optimal tree the two deepest leaves must be siblings and must be the two least frequent symbols; otherwise swapping depths strictly reduces the cost by at least $f_{\text{heavy}}-f_{\text{light}}>0$. Collapsing those siblings into one pseudo-symbol reduces the problem size without changing optimality, so induction finishes the proof.
@@ -659,7 +710,7 @@ Now assign actual lengths. Record who merged with whom:
 Depths follow directly:
 
 $$
-L_A=2,\quad L_B=L_C=2,\quad L_D=L_E=3.
+L_A=2,quad L_B=L_C=2,quad L_D=L_E=3.
 $$
 
 Check the Kraft sum $3\cdot 2^{-2}+2\cdot 2^{-3}=3/4+1/4=1$ and the cost $0.4\cdot2+0.2\cdot2+0.2\cdot2+0.1\cdot3+0.1\cdot3=2.2$.
@@ -698,7 +749,7 @@ The $0\text{–}1$ knapsack with arbitrary weights defeats the obvious density-b
 Approximation guarantees rescue several hard problems with principled greedy performance. For set cover on a universe $U$ with $|U|=n$, the greedy rule that repeatedly picks the set covering the largest number of uncovered elements achieves an $H_n$ approximation:
 
 $$
-\text{cost}_{\text{greedy}} \le H_n\cdot \text{OPT},\qquad H_n=\sum_{k=1}^n \frac{1}{k}\le \ln n+1.
+\text{cost}_{\text{greedy}} \le H_n\cdot \text{OPT},qquad H_n=\sum_{k=1}^n \frac{1}{k}\le \ln n+1.
 $$
 
 A tight charging argument proves it: each time you cover new elements, charge them equally; no element is charged more than the harmonic sum relative to the optimum’s coverage.
@@ -706,7 +757,7 @@ A tight charging argument proves it: each time you cover new elements, charge th
 Maximizing a nondecreasing submodular set function $f:2^E\to\mathbb{R}_{\ge 0}$ under a cardinality constraint $|S|\le k$ is a crown jewel. Submodularity means diminishing returns:
 
 $$
-A\subseteq B,\ x\notin B \ \Rightarrow\ f(A\cup\{x\})-f(A)\ \ge\ f(B\cup\{x\})-f(B).
+A\subseteq B, x\notin B \ \Rightarrow\ f(A\cup\{x\})-f(A)\ \ge\ f(B\cup\{x\})-f(B).
 $$
 
 The greedy algorithm that adds the element with largest marginal gain at each step satisfies the celebrated bound

From c9f1cfc81fc72bc11b76922731b1012f3c532dca Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 16:19:39 +0200
Subject: [PATCH 25/48] Update greedy_algorithms.md

---
 notes/greedy_algorithms.md | 212 +++++++++++++++++++++----------------
 1 file changed, 120 insertions(+), 92 deletions(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index 5239c3e..a8c9c93 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -455,7 +455,7 @@ Walk left to right once and carry two simple numbers.
 At each step $j$, the best block **ending at** $j$ is “current prefix minus the smallest older prefix”:
 
 $$
-\text{best\_ending\_at\ }j = S_j - \min_{0\le t<j} S_t.
+\text{best ending at j} = S_j - \min_{0\le t \le j} S_t
 $$
 
 So during the scan:
@@ -468,9 +468,10 @@ This is the whole algorithm. In words: keep the lowest floor you’ve ever seen
 
 A widely used equivalent form keeps a “best sum ending here” value $E$: set $E \leftarrow \max(x_j,; E+x_j)$ and track a global maximum. It’s the same idea written incrementally: if the running sum ever hurts you, you “reset” and start fresh at the current element.
 
-Work the example by hand
+*Walkthrough*
+
+Sequence $x = [2,-3,4,-1,2,-5,3]$.
 
-Sequence $x = [\,2,,-3,,4,,-1,,2,,-5,,3\,]$.
 Initialize $S=0$, $M=0$, and $\text{best}=-\infty$. Keep the index $t$ where the current $M$ occurred so we can reconstruct the block as $(t+1)..j$.
 
 ```
@@ -504,25 +505,52 @@ You can picture $S_j$ as a hilly skyline and $M$ as the lowest ground you’ve t
 
 ```
 prefix S: 0 → 2 → -1 → 3 → 2 → 4 → -1 → 2
-ground M: 0    0    -1   -1  -1  -1   -1  -1
-gap S-M: 0    2     0    4   3   5    0   3
-                             ^ peak gap = 5 here
+ground M: 0   0   -1   -1  -1  -1   -1  -1
+gap S-M:  0   2    0    4   3   5    0   3
+                                ^ peak gap = 5 here
 ```
 
-Edge cases
+Pseudocode (prefix-floor form):
+
+```
+best = -∞          # or x[0] if you require non-empty
+S = 0
+M = 0              # 0 makes empty prefix available
+t = 0              # index where M happened (0 means before first element)
+best_i = best_j = None
+
+for j in 1..n:
+    S = S + x[j]
+    if S - M > best:
+        best = S - M
+        best_i = t + 1
+        best_j = j
+    if S < M:
+        M = S
+        t = j
+
+return best, (best_i, best_j)
+```
+
+*Edge cases*
 
 When all numbers are negative, the best block is the **least negative single element**. The scan handles this automatically because $M$ keeps dropping with every step, so the maximum of $S_j-M$ happens when you take just the largest entry.
 
 Empty-block conventions matter. If you define the answer to be strictly nonempty, initialize $\text{best}$ with $x_1$ and $E=x_1$ in the incremental form; if you allow empty blocks with sum $0$, initialize $\text{best}=0$ and $M=0$. Either way, the one-pass logic doesn’t change.
 
-Summary
+*Complexity*
 
 * Time: $O(n)$
 * Space: $O(1)$
 
 ### Scheduling themes
 
-Two everyday scheduling goals keep popping up. One tries to pack as many non-overlapping intervals as possible, like booking the most meetings in a single room. The other tries to keep lateness under control when jobs have deadlines, like finishing homework so the worst overrun is as small as possible. Both have crisp greedy rules, and both are easy to run by hand once you see them.
+Two classics:
+
+- Pick as many non-overlapping intervals as possible (one room, max meetings).
+- Keep maximum lateness small when jobs have deadlines.
+
+They’re both greedy—and both easy to run by hand.
 
 Imagine you have time intervals on a single line, and you can keep an interval only if it doesn’t overlap anything you already kept. The aim is to keep as many as possible.
 
@@ -530,17 +558,22 @@ Imagine you have time intervals on a single line, and you can keep an interval o
 
 Intervals (start, finish):
 
-* $(1,3)$, $(2,5)$, $(4,7)$, $(6,9)$, $(8,10)$, $(9,11)$
+```
+(1,3) (2,5) (4,7) (6,9) (8,10) (9,11)
+```
 
-A best answer keeps four intervals, for instance $(1,3),(4,7),(8,10),(10,11)$. I wrote $(10,11)$ for clarity even though the original end was $11$; think half-open $[s,e)$ if you want “touching” to be allowed.
+A best answer keeps four intervals, for instance $(1,3),(4,7),(8,10),(10,11)$.
 
-Baseline (slow)
+**Baseline (slow)**
 
 Try all subsets and keep the largest that has no overlaps. That’s conceptually simple and always correct, but it’s exponential in the number of intervals, which is a non-starter for anything but tiny inputs.
 
-**How it works**
+**Greedy rule:** 
 
-Sort by finishing time, then walk once from earliest finisher to latest. Keep an interval if its start is at least the end time of the last one you kept. Ending earlier leaves more room for the future, and that is the whole intuition.
+Sort by finish time and take what fits.
+
+- Scan from earliest finisher to latest.
+- Keep $(s,e)$ iff $s \ge \text{last_end}$; then set $\text{last_end}\leftarrow e$.
 
 Sorted by finish:
 
@@ -566,14 +599,27 @@ A tiny picture helps the “finish early” idea feel natural:
 
 ```
 time →
-kept:   [1──3)      [4───7)        [8─10)
-skip:      [2────5)    [6────9)        [9───11)
+kept:  [1────3) [4─────7)  [8────10)
+skip:      [2────5)  [6──────9)[9─────11)
 ending earlier leaves more open space to the right
 ```
 
-Why this works in one sentence: at the first place an optimal schedule would choose a later-finishing interval, swapping in the earlier finisher cannot reduce what still fits afterward, so you can push the optimal schedule to match greedy without losing size.
+Why this works: at the first place an optimal schedule would choose a later-finishing interval, swapping in the earlier finisher cannot reduce what still fits afterward, so you can push the optimal schedule to match greedy without losing size.
 
-Complexity
+Handy pseudocode
+
+```python
+# Interval scheduling (max cardinality)
+sort intervals by end time
+last_end = -∞
+keep = []
+for (s,e) in intervals:
+    if s >= last_end:
+        keep.append((s,e))
+        last_end = e
+```
+
+*Complexity*
 
 * Time: $O(n \log n)$ to sort by finishing time; $O(n)$ scan.
 * Space: $O(1)$ (beyond input storage).
@@ -599,11 +645,11 @@ Jobs and deadlines:
 
 An optimal schedule is $J_2,J_4, J_1, J_3$. The maximum lateness there is $0$.
 
-Baseline (slow)
+**Baseline (slow)**
 
 Try all $n!$ orders, compute every job’s completion time and lateness, and take the order with the smallest $L_{\max}$. This explodes even for modest $n$.
 
-**How it works**
+**Greedy rule**
 
 Order jobs by nondecreasing deadlines (earliest due date first, often called EDD). Fixing any “inversion” where a later deadline comes before an earlier one can only help the maximum lateness, so sorting by deadlines is safe.
 
@@ -641,48 +687,61 @@ EDD:   [J2][J4][J1][J3]   deadlines: 1   2   3   4
 late?    0    0    0    0  → max lateness 0
 ```
 
-Why this works in one sentence: if two adjacent jobs are out of deadline order, swapping them never increases any completion time relative to its own deadline, and strictly improves at least one, so repeatedly fixing these inversions leads to the sorted-by-deadline order with no worse maximum lateness.
+Why this works: if two adjacent jobs are out of deadline order, swapping them never increases any completion time relative to its own deadline, and strictly improves at least one, so repeatedly fixing these inversions leads to the sorted-by-deadline order with no worse maximum lateness.
 
-Complexity
+Pseudocode
+
+```
+# Minimize L_max (EDD)
+sort jobs by increasing deadline d_j
+t = 0; Lmax = -∞
+for job j in order:
+    t += p_j           # completion time C_j
+    L = t - d_j
+    Lmax = max(Lmax, L)
+return order, Lmax
+```
+
+*Complexity*
 
 * Time: $O(n \log n)$ to sort by deadlines; $O(n)$ evaluation.
 * Space: $O(1)$.
 
 ### Huffman coding
 
-You have symbols that occur with known frequencies $f_i>0$ and $\sum_i f_i=1$. The goal is to assign each symbol a binary codeword so that no codeword is a prefix of another (a prefix code), and the average length
+You have symbols that occur with known frequencies \$f\_i>0\$ and \$\sum\_i f\_i=1\$ (if you start with counts, first normalize by their total). The goal is to assign each symbol a binary codeword so that no codeword is a prefix of another (a **prefix code**, i.e., uniquely decodable without separators), and the average length
 
 $$
 \mathbb{E}[L]=\sum_i f_i\,L_i
 $$
 
-is as small as possible. Prefix codes exactly correspond to full binary trees whose leaves are the symbols and whose leaf depths are the codeword lengths $L_i$. The Kraft inequality $\sum_i 2^{-L_i}\le 1$ is the feasibility condition; equality holds for full trees.
+is as small as possible. Prefix codes correspond exactly to **full binary trees** (every internal node has two children) whose leaves are the symbols and whose leaf depths equal the codeword lengths \$L\_i\$. The **Kraft inequality** \$\sum\_i 2^{-L\_i}\le 1\$ characterizes feasibility; equality holds for full trees (so an optimal prefix code “fills” the inequality).
 
 **Example inputs and outputs**
 
 Frequencies:
 
 $$
-A:0.40,quad B:0.20,quad C:0.20,quad D:0.10,quad E:0.10.
+A:0.40,\quad B:0.20,\quad C:0.20,\quad D:0.10,\quad E:0.10.
 $$
 
-A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths $L_A,dots,L_E$, plus a concrete codebook.
+A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths \$L\_A,\dots,L\_E\$, plus a concrete codebook. (There can be multiple optimal codebooks when there are ties in frequencies; their **lengths** agree, though the exact bitstrings may differ.)
 
-Baseline (slow)
+**Baseline**
 
-One conceptual baseline is to enumerate all full binary trees with five labeled leaves and pick the one minimizing $\sum f_i\,L_i$. That is correct but explodes combinatorially as the number of symbols grows. A simpler but usually suboptimal baseline is to give every symbol the same length $\lceil \log_2 5\rceil=3$. That fixed-length code has $\mathbb{E}[L]=3$.
+One conceptual baseline is to enumerate all full binary trees with five labeled leaves and pick the one minimizing \$\sum f\_i,L\_i\$. That is correct but explodes combinatorially as the number of symbols grows. A simpler but usually suboptimal baseline is to give every symbol the same length \$\lceil \log\_2 5\rceil=3\$. That fixed-length code has \$\mathbb{E}\[L]=3\$.
 
-**How it works**
+**Greedy Approach**
 
-Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights $p$ and $q$, you create a parent of weight $p+q$. The act of merging adds exactly $p+q$ to the objective $\mathbb{E}[L]$ because every leaf inside those two subtrees becomes one level deeper. Summing over all merges yields the final cost:
+Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights \$p\$ and \$q\$, you create a parent of weight \$p+q\$. **Why does this change the objective by exactly \$p+q\$?** Every leaf in those two subtrees increases its depth (and thus its code length) by \$1\$, so the total increase in \$\sum f\_i L\_i\$ is \$\sum\_{\ell\in\text{subtrees}} f\_\ell\cdot 1=(p+q)\$ by definition of \$p\$ and \$q\$. Summing over all merges yields the final cost:
 
 $$
-\mathbb{E}[L]=\sum_{\text{merges}} (p+q)=\sum_{\text{internal nodes}} \text{weight}
+\mathbb{E}[L]=\sum_{\text{merges}} (p+q)=\sum_{\text{internal nodes}} \text{weight}.
 $$
 
-The greedy choice is safe because in an optimal tree the two deepest leaves must be siblings and must be the two least frequent symbols; otherwise swapping depths strictly reduces the cost by at least $f_{\text{heavy}}-f_{\text{light}}>0$. Collapsing those siblings into one pseudo-symbol reduces the problem size without changing optimality, so induction finishes the proof.
+**Why is the greedy choice optimal?** In an optimal tree the two deepest leaves must be siblings; if not, pairing them to be siblings never increases any other depth and strictly reduces cost whenever a heavier symbol is deeper than a lighter one (an **exchange argument**: swapping depths changes the cost by \$f\_{\text{heavy}}-f\_{\text{light}}>0\$ in our favor). Collapsing those siblings into a single pseudo-symbol reduces the problem size without changing optimality, so induction finishes the proof. (Ties can be broken arbitrarily; all tie-breaks achieve the same minimum \$\mathbb{E}\[L]\$.)
 
-Start with the multiset $\{0.40, 0.20, 0.20, 0.10, 0.10\}$. At each line, merge the two smallest weights and add their sum to the running cost.
+Start with the multiset \${0.40, 0.20, 0.20, 0.10, 0.10}\$. At each line, merge the two smallest weights and add their sum to the running cost.
 
 ```
 1) merge 0.10 + 0.10 → 0.20        cost += 0.20   (total 0.20)
@@ -698,79 +757,48 @@ Start with the multiset $\{0.40, 0.20, 0.20, 0.10, 0.10\}$. At each line, merge
    multiset becomes {1.00}  (done)
 ```
 
-So the optimal expected length is $\boxed{\mathbb{E}[L]=2.20}$ bits per symbol. This already beats the naive fixed-length baseline $3$. It also matches the information-theoretic bound $H(f)\le \mathbb{E}[L]<H(f)+1$, since the entropy here is $H\approx 2.122$.
+So the optimal expected length is \$\boxed{\mathbb{E}\[L]=2.20}\$ bits per symbol. This already beats the naive fixed-length baseline \$3\$. It also matches the information-theoretic bound \$H(f)\le \mathbb{E}\[L]\<H(f)+1\$, since the entropy here is \$H\approx 2.1219\$.
 
 Now assign actual lengths. Record who merged with whom:
 
-* Step 1 merges $D(0.10)$ and $E(0.10)$ → those two become siblings.
-* Step 2 merges $B(0.20)$ and $C(0.20)$ → those two become siblings.
-* Step 3 merges the pair $D\!E(0.20)$ with $A(0.40)$.
-* Step 4 merges the pair from step 3 with the pair $B\!C(0.40)$.
+* Step 1 merges \$D(0.10)\$ and \$E(0.10)\$ → those two become siblings.
+* Step 2 merges \$B(0.20)\$ and \$C(0.20)\$ → those two become siblings.
+* Step 3 merges the pair \$D!E(0.20)\$ with \$A(0.40)\$.
+* Step 4 merges the pair from step 3 with the pair \$B!C(0.40)\$.
 
-Depths follow directly:
+Depths follow directly (each merge adds one level to its members):
 
 $$
-L_A=2,quad L_B=L_C=2,quad L_D=L_E=3.
+L_A=2,\quad L_B=L_C=2,\quad L_D=L_E=3.
 $$
 
-Check the Kraft sum $3\cdot 2^{-2}+2\cdot 2^{-3}=3/4+1/4=1$ and the cost $0.4\cdot2+0.2\cdot2+0.2\cdot2+0.1\cdot3+0.1\cdot3=2.2$.
+Check the Kraft sum \$3\cdot 2^{-2}+2\cdot 2^{-3}=3/4+1/4=1\$ and the cost \$0.4\cdot2+0.2\cdot2+0.2\cdot2+0.1\cdot3+0.1\cdot3=2.2\$.
 
-A tidy ASCII tree (weights shown for clarity):
+A tidy tree (weights shown for clarity):
 
 ```
-                [1.00]
-               /      \
-           [0.60]     [0.40]=BC
-           /    \        /   \
-       [0.40]=A [0.20]=DE    B     C
-                    /   \
-                   D     E
+[1.00]
++--0--> [0.60]
+|       +--0--> A(0.40)
+|       `--1--> [0.20]
+|                +--0--> D(0.10)
+|                `--1--> E(0.10)
+`--1--> [0.40]
+        +--0--> B(0.20)
+        `--1--> C(0.20)
 ```
 
-One concrete codebook arises by reading left edges as 0 and right edges as 1:
-
-* $A \mapsto 00$
-* $B \mapsto 10$
-* $C \mapsto 11$
-* $D \mapsto 010$
-* $E \mapsto 011$
-
-You can verify the prefix property immediately and recompute $\mathbb{E}[L]$ from these lengths to get $2.20$ again.
-
-Complexity
-
-* Time: $O(k \log k)$ using a min-heap over $k$ symbol frequencies.
-* Space: $O(k)$ for the heap and $O(k)$ for the resulting tree.
-
-### When greedy fails (and how to quantify “not too bad”)
-
-The $0\text{–}1$ knapsack with arbitrary weights defeats the obvious density-based rule. A small, dense item can block space needed for a medium-density item that pairs perfectly with a third, leading to a globally superior pack. Weighted interval scheduling similarly breaks the “earliest finish” rule; taking a long, heavy meeting can beat two short light ones that finish earlier.
-
-Approximation guarantees rescue several hard problems with principled greedy performance. For set cover on a universe $U$ with $|U|=n$, the greedy rule that repeatedly picks the set covering the largest number of uncovered elements achieves an $H_n$ approximation:
-
-$$
-\text{cost}_{\text{greedy}} \le H_n\cdot \text{OPT},qquad H_n=\sum_{k=1}^n \frac{1}{k}\le \ln n+1.
-$$
-
-A tight charging argument proves it: each time you cover new elements, charge them equally; no element is charged more than the harmonic sum relative to the optimum’s coverage.
-
-Maximizing a nondecreasing submodular set function $f:2^E\to\mathbb{R}_{\ge 0}$ under a cardinality constraint $|S|\le k$ is a crown jewel. Submodularity means diminishing returns:
-
-$$
-A\subseteq B, x\notin B \ \Rightarrow\ f(A\cup\{x\})-f(A)\ \ge\ f(B\cup\{x\})-f(B).
-$$
-
-The greedy algorithm that adds the element with largest marginal gain at each step satisfies the celebrated bound
-
-$$
-f(S_k)\ \ge\ \Bigl(1-\frac{1}{e}\Bigr)\,f(S^\star),
-$$
+One concrete codebook arises by reading left edges as 0 and right edges as 1 (the left/right choice is arbitrary; flipping all bits in a subtree yields an equivalent optimal code):
 
-where $S^\star$ is an optimal size-$k$ set. The proof tracks the residual gap $g_i=f(S^\star)-f(S_i)$ and shows
+* \$A \mapsto 00\$
+* \$B \mapsto 10\$
+* \$C \mapsto 11\$
+* \$D \mapsto 010\$
+* \$E \mapsto 011\$
 
-$$
-g_{i+1}\ \le\ \Bigl(1-\frac{1}{k}\Bigr)g_i,
-$$
+You can verify the prefix property immediately and recompute \$\mathbb{E}\[L]\$ from these lengths to get \$2.20\$ again. (From these lengths you can also construct the **canonical Huffman code**, which orders codewords lexicographically—useful for compactly storing the codebook.)
 
-hence $g_k\le e^{-k/k}g_0=e^{-1}g_0$. Diminishing returns is exactly what makes the greedy increments add up to a constant-factor slice of the unreachable optimum.
+*Complexity*
 
+* Time: \$O(k \log k)\$ using a min-heap over \$k\$ symbol frequencies (each of the \$k-1\$ merges performs two extractions and one insertion).
+* Space: \$O(k)\$ for the heap and \$O(k)\$ for the resulting tree (plus \$O(k)\$ for an optional map from symbols to codewords).

From 474627354f9fffc66d5baf7ddc364f0700f07634 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 16:22:10 +0200
Subject: [PATCH 26/48] Update greedy_algorithms.md

---
 notes/greedy_algorithms.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index a8c9c93..cda4c6d 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -573,7 +573,7 @@ Try all subsets and keep the largest that has no overlaps. That’s conceptually
 Sort by finish time and take what fits.
 
 - Scan from earliest finisher to latest.
-- Keep $(s,e)$ iff $s \ge \text{last_end}$; then set $\text{last_end}\leftarrow e$.
+- Keep $(s,e)$ iff $s \ge \text{last end}$; then set $\text{last end}\leftarrow e$.
 
 Sorted by finish:
 

From 9ba91d9e9371800773b98fb518555d2782811aaf Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 16:59:39 +0200
Subject: [PATCH 27/48] Update matrices.md

---
 notes/matrices.md | 344 +++++++++++++++++++++++++++++++++++++---------
 1 file changed, 277 insertions(+), 67 deletions(-)

diff --git a/notes/matrices.md b/notes/matrices.md
index 4aefb08..c0c2b78 100644
--- a/notes/matrices.md
+++ b/notes/matrices.md
@@ -1,81 +1,196 @@
 ## Matrices and 2D Grids
 
-Matrices represent images, game boards, and maps. Many classic problems reduce to transforming matrices, traversing them, or treating grids as graphs for search. This note mirrors the structure used in the Searching notes: each topic includes Example inputs and outputs, How it works, and a compact summary with $O(\cdot)$.
+Matrices represent images, game boards, and maps. Many classic problems reduce to transforming matrices, traversing them, or treating grids as graphs for search.
 
 ### Conventions
 
-* Rows indexed 0..R−1, columns 0..C−1; cell (r, c).
-* Neighborhoods: 4-dir Δ = {(-1,0),(1,0),(0,-1),(0,1)}, or 8-dir adds diagonals.
+**Rows indexed \$0..R-1\$, columns \$0..C-1\$; cell \$(r,c)\$.**
 
----
+Rows increase **down**, columns increase **right**. Think “top-left is \$(0,0)\$”, not a Cartesian origin.
+
+Visual index map (example \$R=6\$, \$C=8\$; each cell labeled \$rc\$):
+
+```
+  c →    0    1    2    3    4    5    6    7
+r ↓   +----+----+----+----+----+----+----+----+
+0     | 00 | 01 | 02 | 03 | 04 | 05 | 06 | 07 |
+	  +----+----+----+----+----+----+----+----+
+1     | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 |
+	  +----+----+----+----+----+----+----+----+
+2     | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 |
+	  +----+----+----+----+----+----+----+----+
+3     | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 |
+	  +----+----+----+----+----+----+----+----+
+4     | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 |
+	  +----+----+----+----+----+----+----+----+
+5     | 50 | 51 | 52 | 53 | 54 | 55 | 56 | 57 |
+	  +----+----+----+----+----+----+----+----+
+```
+
+Handy conversions (for linearization / array-of-arrays):
+
+* Linear index: \$\text{id}=r\cdot C+c\$.
+* From id: \$r=\lfloor \text{id}/C \rfloor\$, \$c=\text{id}\bmod C\$.
+* Row-major scan order (common in problems): for \$r\$ in \$0..R-1\$, for \$c\$ in \$0..C-1\$.
+
+**Row-major vs column-major arrows (same \$3\times 6\$ grid):**
+
+```
+Row-major (r, then c):                  Column-major (c, then r):
+→ → → → → →                              ↓ ↓ ↓
+↓             ↓                          ↓ ↓ ↓
+← ← ← ← ← ←                              ↓ ↓ ↓
+↓             ↓                          ↓ ↓ ↓
+→ → → → → →                              ↓ ↓ ↓
+```
+
+**Neighborhoods: \$\mathbf{4}\$-dir \$\Delta={(-1,0),(1,0),(0,-1),(0,1)}\$; \$\mathbf{8}\$-dir adds diagonals.**
+
+The offsets \$(\Delta r,\Delta c)\$ are applied as \$(r+\Delta r,\ c+\Delta c)\$.
+
+**4-neighborhood (“+”):**
+
+```
+		   (r-1,c)
+			 ↑
+  (r,c-1) ← (r,c) → (r,c+1)
+			 ↓
+		   (r+1,c)
+```
+
+**8-neighborhood (“×” adds diagonals):**
+
+```
+   (r-1,c-1)   (r-1,c)   (r-1,c+1)
+		  \       ↑       /
+		   \      │      /
+  (r,c-1) ←———  (r,c)  ———→ (r,c+1)
+		   /      │      \
+		  /       ↓       \
+   (r+1,c-1)   (r+1,c)   (r+1,c+1)
+```
+
+Typical direction arrays (keep them consistent to avoid bugs):
+
+```
+// 4-dir
+dr = [-1, 1,  0, 0]
+dc = [ 0, 0, -1, 1]
+
+// 8-dir
+dr8 = [-1,-1,-1, 0, 0, 1, 1, 1]
+dc8 = [-1, 0, 1,-1, 1,-1, 0, 1]
+```
+
+**Boundary checks** (always guard neighbors):
+
+```
+0 ≤ nr < R  and  0 ≤ nc < C
+```
+
+**Edge/inside intuition:**
+
+```
+	   out of bounds
+	┌─────────────────┐
+	│ · · · · · · · · │
+	│ · +---+---+---+ │
+	│ · | a | b | c | │   ← valid cells
+	│ · +---+---+---+ │
+	│ · | d | e | f | │
+	│ · +---+---+---+ │
+	│ · · · · · · · · │
+	└─────────────────┘
+```
+
+Here’s a cleaned-up, MathJax-friendly version you can paste in:
 
 ### Basic Operations (Building Blocks)
 
 #### Transpose
 
-Swap across the main diagonal: $A[r][c] \leftrightarrow A[c][r]$ (square). For non-square, result shape is $C\times R$.
+Swap across the main diagonal: $A_{r,c} \leftrightarrow A_{c,r}$ (square). For non-square, result shape is $C\times R$.
 
 **Example inputs and outputs**
 
 *Example 1 (square)*
 
 $$
-	ext{Input: } A = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
-$$
-
-$$
-	ext{Output: } A^{T} = \begin{bmatrix}1 & 4 & 7\\2 & 5 & 8\\3 & 6 & 9\end{bmatrix}
+A = \begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
+\quad\Rightarrow\quad
+A^{\mathsf{T}} =
+\begin{bmatrix}
+1 & 4 & 7 \\
+2 & 5 & 8 \\
+3 & 6 & 9
+\end{bmatrix}
 $$
 
 *Example 2 (rectangular)*
 
 $$
-	ext{Input: } A = \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}\ (2\times3)
+\text{Input: } \quad 
+A = \begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6
+\end{bmatrix}
+\ (2 \times 3)
 $$
 
 $$
-	ext{Output: } A^{T} = \begin{bmatrix}1 & 4\\2 & 5\\3 & 6\end{bmatrix}\ (3\times2)
+\text{Output: } \quad 
+A^{\mathsf{T}} = \begin{bmatrix}
+1 & 4 \\
+2 & 5 \\
+3 & 6
+\end{bmatrix}
+\ (3 \times 2)
 $$
 
 **How it works**
 
 Iterate pairs once and swap. For square matrices, can be in-place by visiting only $c>r$.
 
-* Time: $O(R\cdot C)$; Space: $O(1)$ in-place (square), else $O(R\cdot C)$ to allocate.
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ in-place (square), else $O(R\cdot C)$ to allocate
 
 #### Reverse Rows (Horizontal Flip)
 
-Reverse each row left↔right.
+Reverse each row left $\leftrightarrow$ right.
 
 **Example inputs and outputs**
 
 *Example*
 
 $$
-	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}
+\text{Input: }\begin{bmatrix}1&2&3\\4&5&6\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: } \begin{bmatrix}3 & 2 & 1\\6 & 5 & 4\end{bmatrix}
+\text{Output: }\begin{bmatrix}3&2&1\\6&5&4\end{bmatrix}
 $$
 
-* Time: $O(R\cdot C)$; Space: $O(1)$.
+* Time: $O(R\cdot C)$
+* Space: $O(1)$
 
 #### Reverse Columns (Vertical Flip)
 
-Reverse each column top↔bottom.
+Reverse each column top $\leftrightarrow$ bottom.
 
 **Example inputs and outputs**
 
 *Example*
 
 $$
-	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\text{Input: }\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: } \begin{bmatrix}7 & 8 & 9\\4 & 5 & 6\\1 & 2 & 3\end{bmatrix}
+\text{Output: }\begin{bmatrix}7&8&9\\4&5&6\\1&2&3\end{bmatrix}
 $$
 
-* Time: $O(R\cdot C)$; Space: $O(1)$.
-
----
+* Time: $O(R\cdot C)$
+* Space: $O(1)$
 
 ### Rotations (Composed from Basics)
 
@@ -90,24 +205,44 @@ Transpose, then reverse each row.
 *Example 1 (3×3)*
 
 $$
-	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\text{Input: } 
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: } \begin{bmatrix}7 & 4 & 1\\8 & 5 & 2\\9 & 6 & 3\end{bmatrix}
+\text{Output: } 
+\begin{bmatrix}
+7 & 4 & 1 \\
+8 & 5 & 2 \\
+9 & 6 & 3
+\end{bmatrix}
 $$
 
 *Example 2 (2×3 → 3×2)*
 
 $$
-	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6
+\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: } \begin{bmatrix}4 & 1\\5 & 2\\6 & 3\end{bmatrix}
+\text{Output: }
+\begin{bmatrix}
+4 & 1 \\
+5 & 2 \\
+6 & 3
+\end{bmatrix}
 $$
 
 **How it works**
 
 Transpose swaps axes; reversing each row aligns columns to rows of the rotated image.
 
-* Time: $O(R\cdot C)$; Space: $O(1)$ in-place for square, else $O(R\cdot C)$ new.
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ in-place for square, else $O(R\cdot C)$ new
 
 #### 90° Counterclockwise (CCW)
 
@@ -118,16 +253,27 @@ Transpose, then reverse each column (or reverse rows, then transpose).
 *Example*
 
 $$
-	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: } \begin{bmatrix}3 & 6 & 9\\2 & 5 & 8\\1 & 4 & 7\end{bmatrix}
+\text{Output: }
+\begin{bmatrix}
+3 & 6 & 9 \\
+2 & 5 & 8 \\
+1 & 4 & 7
+\end{bmatrix}
 $$
 
 **How it works**
 
 Transpose, then flip vertically to complete the counterclockwise rotation.
 
-* Time: $O(R\cdot C)$; Space: $O(1)$ (square) or $O(R\cdot C)$.
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ (square) or $O(R\cdot C)$
 
 #### 180° Rotation
 
@@ -138,16 +284,27 @@ Equivalent to reversing rows, then reversing columns (or two 90° rotations).
 *Example*
 
 $$
-	ext{Input: } \begin{bmatrix}1 & 2 & 3\\4 & 5 & 6\\7 & 8 & 9\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: } \begin{bmatrix}9 & 8 & 7\\6 & 5 & 4\\3 & 2 & 1\end{bmatrix}
+\text{Output: }
+\begin{bmatrix}
+9 & 8 & 7 \\
+6 & 5 & 4 \\
+3 & 2 & 1
+\end{bmatrix}
 $$
 
 **How it works**
 
 Horizontal + vertical flips relocate each element to $(R-1-r,\ C-1-c)$.
 
-* Time: $O(R\cdot C)$; Space: $O(1)$ (square) or $O(R\cdot C)$.
+* Time: $O(R\cdot C)$
+* Space: $O(1)$ (square) or $O(R\cdot C)$
 
 #### 270° Rotation
 
@@ -165,9 +322,8 @@ For layer $\ell$ with bounds $[\ell..n-1-\ell]$, for each offset move:
 top ← left, left ← bottom, bottom ← right, right ← top
 ```
 
-* Time: $O(n^{2})$; Space: $O(1)$.
-
----
+* Time: $O(n^{2})$
+* Space: $O(1)$
 
 ### Traversal Patterns
 
@@ -180,11 +336,16 @@ Read outer layer, then shrink bounds.
 *Example*
 
 $$
-	ext{Input: } \begin{bmatrix}1 & 2 & 3 & 4\\5 & 6 & 7 & 8\\9 & 10 & 11 & 12\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 & 4 \\
+5 & 6 & 7 & 8 \\
+9 & 10 & 11 & 12
+\end{bmatrix}
 $$
 
 $$
-	ext{Output sequence: } 1,2,3,4,8,12,11,10,9,5,6,7
+\text{Output sequence: } 1,\,2,\,3,\,4,\,8,\,12,\,11,\,10,\,9,\,5,\,6,\,7
 $$
 
 **How it works**
@@ -202,19 +363,62 @@ Visit cells grouped by $s=r+c$; alternate direction per diagonal to keep localit
 *Example*
 
 $$
-	ext{Input: } \begin{bmatrix}a & b & c\\d & e & f\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+a & b & c \\
+d & e & f
+\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{One order: } a, b,d, e,c, f
+\text{One order: } a,\, b,\, d,\, e,\, c,\, f
 $$
 
 * Time: $O(R\cdot C)$; Space: $O(1)$.
 
----
-
 ### Grids as Graphs
 
 Each cell is a node; edges connect neighboring walkable cells.
 
+**Grid-as-graph view (4-dir edges).** Each cell is a node; edges connect neighbors that are “passable”. Great for BFS shortest paths on unweighted grids.
+
+**Example map (walls `#`, free `.`, start `S`, target `T`).**
+Left: the map. Right: BFS distances (4-dir) from `S` until `T` is reached.
+
+```
+Original Map:
+#####################
+#S..#....#....#.....#
+#.#.#.##.#.##.#.##..#
+#.#...#..#.......#.T#
+#...###.....###.....#
+#####################
+
+BFS layers (distance mod 10):
+#####################
+#012#8901#9012#45678#
+#1#3#7##2#8##1#3##89#
+#2#456#43#7890123#7X#
+#345###54567###34567#
+#####################
+
+Legend: walls (#), goal reached (X)
+```
+
+BFS explores in **expanding “rings”**; with 4-dir edges, each step increases Manhattan distance by 1 (unless blocked). Time \$O(RC)\$, space \$O(RC)\$ with a visited matrix/queue.
+
+**Obstacles / costs / diagonals.**
+
+* Obstacles: skip neighbors that are `#` (or where cost is \$\infty\$).
+* Weighted grids: Dijkstra / 0-1 BFS on the same neighbor structure.
+* 8-dir with Euclidean costs: use \$1\$ for orthogonal moves and \$\sqrt{2}\$ for diagonals (A\* often pairs well here with an admissible heuristic).
+
+**Common symbols:**
+
+```
+. = free cell      # = wall/obstacle
+S = start          T = target/goal
+V = visited        * = on current path / frontier
+```
+
 #### BFS Shortest Path (Unweighted)
 
 Find the minimum steps from S to T.
@@ -224,10 +428,17 @@ Find the minimum steps from S to T.
 *Example*
 
 $$
-	ext{Grid (0=open, 1=wall), S=(0,0), T=(2,3)}\\
-\begin{bmatrix}S & 0 & 1 & 0\\0 & 0 & 0 & 0\\1 & 1 & 0 & T\end{bmatrix}
+\text{Grid (0 = open, 1 = wall), } S = (0,0),\; T = (2,3)
+$$
+
+$$
+\begin{bmatrix}
+S & 0 & 1 & 0 \\
+0 & 0 & 0 & 0 \\
+1 & 1 & 0 & T
+\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: distance } = 5
+\text{Output: distance } = 5
 $$
 
 **How it works**
@@ -242,21 +453,23 @@ Count regions of ‘1’s via DFS/BFS.
 
 **Example inputs and outputs**
 
-*Example*
-
 $$
-	ext{Input: } \begin{bmatrix}1 & 1 & 0\\0 & 1 & 0\\0 & 0 & 1\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+1 & 1 & 0 \\
+0 & 1 & 0 \\
+0 & 0 & 1
+\end{bmatrix}
 \quad\Rightarrow\quad
-	ext{Output: } 2\ \text{islands}
+\text{Output: } 2 \ \text{islands}
 $$
 
 **How it works**
 
 Scan cells; when an unvisited ‘1’ is found, flood it (DFS/BFS) to mark the whole island.
 
-* Time: $O(R\cdot C)$; Space: $O(R\cdot C)$ worst-case.
-
----
+* Time: $O(R\cdot C)$
+* Space: $O(R\cdot C)$ worst-case
 
 ### Backtracking on Grids
 
@@ -266,19 +479,25 @@ Find a word by moving to adjacent cells (4-dir), using each cell once per path.
 
 **Example inputs and outputs**
 
-*Example*
-
 $$
-	ext{Board: } \begin{bmatrix}A & B & C & E\\S & F & C & S\\A & D & E & E\end{bmatrix},\ \text{Word: } "ABCCED"
+\text{Board: }
+\begin{bmatrix}
+A & B & C & E \\
+S & F & C & S \\
+A & D & E & E
+\end{bmatrix},
+\quad
+\text{Word: } "ABCCED"
 \quad\Rightarrow\quad
-	ext{Output: } \text{true}
+\text{Output: true}
 $$
 
 **How it works**
 
 From each starting match, DFS to next char; mark visited (temporarily), backtrack on failure.
 
-* Time: up to $O(R\cdot C\cdot b^{L})$ (branching $b\in[3,4]$, word length $L$); Space: $O(L)$.
+* Time: up to $O(R\cdot C\cdot b^{L})$ (branching $b\in[3,4]$, word length $L$)
+* Space: $O(L)$
 
 Pruning: early letter mismatch; frequency precheck; prefix trie when searching many words.
 
@@ -292,12 +511,3 @@ Backtrack over slot assignments; use a trie for prefix feasibility; order by mos
 
 * Time: exponential in slots; strong pruning and good heuristics are crucial.
 
----
-
-### Summary of Complexities
-
-* Full traversal: $O(R\cdot C)$ time; $O(1)$ space (no visited) or $O(R\cdot C)$ with visited.
-* Rotations/transpose: $O(R\cdot C)$ time; $O(1)$ in-place (square) or $O(R\cdot C)$ extra.
-* BFS/DFS on grids: $O(R\cdot C)$ time; $O(R\cdot C)$ space.
-* Word search backtracking: up to $O(R\cdot C\cdot b^{L})$ time; $O(L)$ space.
-

From 3c1f5573053af5ed98874ce2500ac2b73fb52da4 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 18:21:52 +0200
Subject: [PATCH 28/48] Fix Markdown formatting in matrices.md

---
 notes/matrices.md | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/notes/matrices.md b/notes/matrices.md
index c0c2b78..9da11db 100644
--- a/notes/matrices.md
+++ b/notes/matrices.md
@@ -4,11 +4,11 @@ Matrices represent images, game boards, and maps. Many classic problems reduce t
 
 ### Conventions
 
-**Rows indexed \$0..R-1\$, columns \$0..C-1\$; cell \$(r,c)\$.**
+**Rows indexed $0..R-1$, columns $0..C-1$; cell $(r,c)$.**
 
-Rows increase **down**, columns increase **right**. Think “top-left is \$(0,0)\$”, not a Cartesian origin.
+Rows increase **down**, columns increase **right**. Think “top-left is $(0,0)$”, not a Cartesian origin.
 
-Visual index map (example \$R=6\$, \$C=8\$; each cell labeled \$rc\$):
+Visual index map (example $R=6$, $C=8$; each cell labeled $rc$):
 
 ```
   c →    0    1    2    3    4    5    6    7
@@ -29,11 +29,11 @@ r ↓   +----+----+----+----+----+----+----+----+
 
 Handy conversions (for linearization / array-of-arrays):
 
-* Linear index: \$\text{id}=r\cdot C+c\$.
-* From id: \$r=\lfloor \text{id}/C \rfloor\$, \$c=\text{id}\bmod C\$.
-* Row-major scan order (common in problems): for \$r\$ in \$0..R-1\$, for \$c\$ in \$0..C-1\$.
+* Linear index: $\text{id}=r\cdot C+c$.
+* From id: $r=\lfloor \text{id}/C \rfloor$, $c=\text{id}\bmod C$.
+* Row-major scan order (common in problems): for $r$ in $0..R-1$, for $c$ in $0..C-1$.
 
-**Row-major vs column-major arrows (same \$3\times 6\$ grid):**
+**Row-major vs column-major arrows (same $3\times 6$ grid):**
 
 ```
 Row-major (r, then c):                  Column-major (c, then r):
@@ -44,9 +44,9 @@ Row-major (r, then c):                  Column-major (c, then r):
 → → → → → →                              ↓ ↓ ↓
 ```
 
-**Neighborhoods: \$\mathbf{4}\$-dir \$\Delta={(-1,0),(1,0),(0,-1),(0,1)}\$; \$\mathbf{8}\$-dir adds diagonals.**
+**Neighborhoods: $\mathbf{4}$-dir $\Delta={(-1,0),(1,0),(0,-1),(0,1)}$; $\mathbf{8}$-dir adds diagonals.**
 
-The offsets \$(\Delta r,\Delta c)\$ are applied as \$(r+\Delta r,\ c+\Delta c)\$.
+The offsets $(\Delta r,\Delta c)$ are applied as $(r+\Delta r,\ c+\Delta c)$.
 
 **4-neighborhood (“+”):**
 
@@ -345,7 +345,7 @@ $$
 $$
 
 $$
-\text{Output sequence: } 1,\,2,\,3,\,4,\,8,\,12,\,11,\,10,\,9,\,5,\,6,\,7
+\text{Output sequence: } 1,2,3,4,8,12,11,10,9,5,6,7
 $$
 
 **How it works**
@@ -369,7 +369,7 @@ a & b & c \\
 d & e & f
 \end{bmatrix}
 \quad\Rightarrow\quad
-\text{One order: } a,\, b,\, d,\, e,\, c,\, f
+\text{One order: } a, b, d, e, c, f
 $$
 
 * Time: $O(R\cdot C)$; Space: $O(1)$.
@@ -403,13 +403,13 @@ BFS layers (distance mod 10):
 Legend: walls (#), goal reached (X)
 ```
 
-BFS explores in **expanding “rings”**; with 4-dir edges, each step increases Manhattan distance by 1 (unless blocked). Time \$O(RC)\$, space \$O(RC)\$ with a visited matrix/queue.
+BFS explores in **expanding “rings”**; with 4-dir edges, each step increases Manhattan distance by 1 (unless blocked). Time $O(RC)$, space $O(RC)$ with a visited matrix/queue.
 
 **Obstacles / costs / diagonals.**
 
-* Obstacles: skip neighbors that are `#` (or where cost is \$\infty\$).
+* Obstacles: skip neighbors that are `#` (or where cost is $\infty$).
 * Weighted grids: Dijkstra / 0-1 BFS on the same neighbor structure.
-* 8-dir with Euclidean costs: use \$1\$ for orthogonal moves and \$\sqrt{2}\$ for diagonals (A\* often pairs well here with an admissible heuristic).
+* 8-dir with Euclidean costs: use $1$ for orthogonal moves and $\sqrt{2}$ for diagonals (A\* often pairs well here with an admissible heuristic).
 
 **Common symbols:**
 
@@ -428,7 +428,7 @@ Find the minimum steps from S to T.
 *Example*
 
 $$
-\text{Grid (0 = open, 1 = wall), } S = (0,0),\; T = (2,3)
+\text{Grid (0 = open, 1 = wall), } S = (0,0), T = (2,3)
 $$
 
 $$

From 358ca73cff1122e7031b9c4a7bbd226dcf95d5d3 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 18:24:16 +0200
Subject: [PATCH 29/48] Format matrix examples for better readability

---
 notes/matrices.md | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

diff --git a/notes/matrices.md b/notes/matrices.md
index 9da11db..d0f34cd 100644
--- a/notes/matrices.md
+++ b/notes/matrices.md
@@ -167,9 +167,17 @@ Reverse each row left $\leftrightarrow$ right.
 *Example*
 
 $$
-\text{Input: }\begin{bmatrix}1&2&3\\4&5&6\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6
+\end{bmatrix}
 \quad\Rightarrow\quad
-\text{Output: }\begin{bmatrix}3&2&1\\6&5&4\end{bmatrix}
+\text{Output: }
+\begin{bmatrix}
+3 & 2 & 1 \\
+6 & 5 & 4
+\end{bmatrix}
 $$
 
 * Time: $O(R\cdot C)$
@@ -184,9 +192,19 @@ Reverse each column top $\leftrightarrow$ bottom.
 *Example*
 
 $$
-\text{Input: }\begin{bmatrix}1&2&3\\4&5&6\\7&8&9\end{bmatrix}
+\text{Input: }
+\begin{bmatrix}
+1 & 2 & 3 \\
+4 & 5 & 6 \\
+7 & 8 & 9
+\end{bmatrix}
 \quad\Rightarrow\quad
-\text{Output: }\begin{bmatrix}7&8&9\\4&5&6\\1&2&3\end{bmatrix}
+\text{Output: }
+\begin{bmatrix}
+7 & 8 & 9 \\
+4 & 5 & 6 \\
+1 & 2 & 3
+\end{bmatrix}
 $$
 
 * Time: $O(R\cdot C)$

From 83fc65d14aa68f0afbe12dffcc92048928b676b3 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 19:00:16 +0200
Subject: [PATCH 30/48] Update matrices.md

---
 notes/matrices.md | 2 --
 1 file changed, 2 deletions(-)

diff --git a/notes/matrices.md b/notes/matrices.md
index d0f34cd..dee57a8 100644
--- a/notes/matrices.md
+++ b/notes/matrices.md
@@ -103,8 +103,6 @@ dc8 = [-1, 0, 1,-1, 1,-1, 0, 1]
 	└─────────────────┘
 ```
 
-Here’s a cleaned-up, MathJax-friendly version you can paste in:
-
 ### Basic Operations (Building Blocks)
 
 #### Transpose

From 1d682e20d9e255fe78d25d290e1c00d411d22f9c Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 19:22:00 +0200
Subject: [PATCH 31/48] Update sorting.md

---
 notes/sorting.md | 917 +++++++++++++++++++++++++++++++++++++----------
 1 file changed, 731 insertions(+), 186 deletions(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index c697ba0..70345f8 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -72,343 +72,888 @@ If you then did a second pass (say, sorting by rank or battle-honors) you’d on
 
 ### Bubble Sort
 
-Bubble sort, one of the simplest sorting algorithms, is often a go-to choice for teaching the foundational concepts of sorting due to its intuitive nature. The name "bubble sort" stems from the way larger elements "bubble up" towards the end of the array, much like how bubbles rise in a liquid.
+Bubble sort is one of the simplest sorting algorithms. It is often used as an **introductory algorithm** because it is easy to understand, even though it is not efficient for large datasets.
 
-#### Conceptual Overview
+The name comes from the way **larger elements "bubble up"** to the top (end of the list), just as bubbles rise in water.
 
-Imagine a sequence of numbers. Starting from the beginning of the sequence, we compare each pair of adjacent numbers and swap them if they are out of order. As a result, at the end of the first pass, the largest number will have "bubbled up" to the last position. Each subsequent pass ensures that the next largest number finds its correct position, and this continues until the whole array is sorted.
+The basic idea:
 
-#### Steps
+* Compare **adjacent elements**.
+* Swap them if they are in the wrong order.
+* Repeat until no swaps are needed.
 
-1. Start from the first item and compare it with its neighbor to the right.
-2. If the items are out of order (i.e., the left item is greater than the right), swap them.
-3. Move to the next item and repeat the above steps until the end of the array.
-4. After the first pass, the largest item will be at the last position. On the next pass, you can ignore the last item and consider the rest of the array.
-5. Continue this process for `n-1` passes to ensure the array is completely sorted.
+**Step-by-Step Walkthrough**
 
+1. Start from the **first element**.
+2. Compare it with its **neighbor to the right**.
+3. If the left is greater, **swap** them.
+4. Move to the next pair and repeat until the end of the list.
+5. After the **first pass**, the largest element is at the end.
+6. On each new pass, ignore the elements already in their correct place.
+7. Continue until the list is sorted.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 5 ][ 1 ][ 4 ][ 2 ][ 8 ]
+```
+
+**Pass 1**
+
+Compare adjacent pairs and push the largest to the end.
+
+```
+Initial:   [ 5 ][ 1 ][ 4 ][ 2 ][ 8 ]
+
+Compare 5 and 1 → swap
+           [ 1 ][ 5 ][ 4 ][ 2 ][ 8 ]
+
+Compare 5 and 4 → swap
+           [ 1 ][ 4 ][ 5 ][ 2 ][ 8 ]
+
+Compare 5 and 2 → swap
+           [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
+
+Compare 5 and 8 → no swap
+           [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
+```
+
+✔ Largest element **8** has bubbled to the end.
+
+**Pass 2**
+
+Now we only need to check the first 4 elements.
+
+```
+Start:     [ 1 ][ 4 ][ 2 ][ 5 ] [8]
+
+Compare 1 and 4 → no swap
+           [ 1 ][ 4 ][ 2 ][ 5 ] [8]
+
+Compare 4 and 2 → swap
+           [ 1 ][ 2 ][ 4 ][ 5 ] [8]
+
+Compare 4 and 5 → no swap
+           [ 1 ][ 2 ][ 4 ][ 5 ] [8]
 ```
-Start:  [ 5 ][ 1 ][ 4 ][ 2 ][ 8 ]
 
-Pass 1:
-  [ 5 ][ 1 ][ 4 ][ 2 ][ 8 ]  → swap(5,1) →  [ 1 ][ 5 ][ 4 ][ 2 ][ 8 ]
-  [ 1 ][ 5 ][ 4 ][ 2 ][ 8 ]  → swap(5,4) →  [ 1 ][ 4 ][ 5 ][ 2 ][ 8 ]
-  [ 1 ][ 4 ][ 5 ][ 2 ][ 8 ]  → swap(5,2) →  [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
-  [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]  →      no swap      → [ 1 ][ 4 ][ 2 ][ 5 ][ 8 ]
+✔ Second largest element **5** is now in place.
+
+**Pass 3**
+
+Check only the first 3 elements.
+
+```
+Start:     [ 1 ][ 2 ][ 4 ] [5][8]
+
+Compare 1 and 2 → no swap
+           [ 1 ][ 2 ][ 4 ] [5][8]
+
+Compare 2 and 4 → no swap
+           [ 1 ][ 2 ][ 4 ] [5][8]
+```
+
+✔ Sorted order is now reached.
+
+**Final Result**
+
+```
+[ 1 ][ 2 ][ 4 ][ 5 ][ 8 ]
+```
 
-Pass 2:
-  [ 1 ][ 4 ][ 2 ][ 5 ] [8]  →      no swap      → [ 1 ][ 4 ][ 2 ][ 5 ] [8]
-  [ 1 ][ 4 ][ 2 ][ 5 ] [8]  → swap(4,2) →  [ 1 ][ 2 ][ 4 ][ 5 ] [8]
-  [ 1 ][ 2 ][ 4 ][ 5 ] [8]  →      no swap      → [ 1 ][ 2 ][ 4 ][ 5 ] [8]
+**Visual Illustration of Bubble Effect**
 
-Pass 3:
-  [ 1 ][ 2 ][ 4 ] [5,8]  → all comparisons OK
+Here’s how the **largest values "bubble up"** to the right after each pass:
 
-Result: [ 1 ][ 2 ][ 4 ][ 5 ][ 8 ]
 ```
+Pass 1:  [ 5  1  4  2  8 ] → [ 1  4  2  5  8 ]
+Pass 2:  [ 1  4  2  5 ]       → [ 1  2  4  5 ]  [8]
+Pass 3:  [ 1  2  4 ]          → [ 1  2  4 ]     [5 8]
+```
+
+Sorted! ✅
+
+**Optimizations**
 
-#### Optimizations
+* **Early Exit**: If in a full pass **no swaps occur**, the array is already sorted, and the algorithm can terminate early.
+* This makes Bubble Sort’s **best case** much faster (\$O(n)\$).
 
-An important optimization for bubble sort is to keep track of whether any swaps were made during a pass. If a pass completes without any swaps, it means the array is already sorted, and there's no need to continue further iterations.
+**Stability**
 
-#### Stability
+Bubble sort is **stable**.
 
-Bubble sort is stable. This means that two objects with equal keys will retain their relative order after sorting. Thus, if you had records sorted by name and then sorted them using bubble sort based on age, records with the same age would still maintain the name order.
+* If two elements have the same value, they remain in the same order relative to each other after sorting.
+* This is important when sorting complex records where a secondary key matters.
 
-#### Time Complexity
+**Complexity**
 
-- In the **worst-case** scenario, the time complexity of bubble sort is $O(n^2)$, which occurs when the array is in reverse order.
-- The **average-case** time complexity is also $O(n^2)$, as bubble sort generally requires quadratic time for typical unsorted arrays.
-- In the **best-case** scenario, the time complexity is $O(n)$, which happens when the array is already sorted, especially if an optimization like early exit is implemented.
+| Case             | Time Complexity | Notes                                  |
+|------------------|-----------------|----------------------------------------|
+| **Worst Case**   | $O(n^2)$        | Array in reverse order                 |
+| **Average Case** | $O(n^2)$        | Typically quadratic comparisons        |
+| **Best Case**    | $O(n)$          | Already sorted + early exit optimization |
+| **Space**        | $O(1)$          | In-place, requires no extra memory     |
 
-#### Space Complexity
+**Implementation**
 
-$(O(1))$ - It sorts in place, so it doesn't require any additional memory beyond the input array.
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/selection_sort/src/bubble_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/selection_sort/src/bubble_sort.py)
 
 ### Selection Sort
 
-Selection sort is another intuitive algorithm, widely taught in computer science curricula due to its straightforward mechanism. The crux of selection sort lies in repeatedly selecting the smallest (or largest, depending on the desired order) element from the unsorted section of the array and swapping it with the first unsorted element.
+Selection sort is another simple sorting algorithm, often introduced right after bubble sort because it is equally easy to understand.
+
+Instead of repeatedly "bubbling" elements, **selection sort works by repeatedly selecting the smallest (or largest) element** from the unsorted portion of the array and placing it into its correct position.
+
+Think of it like arranging books:
 
-#### Conceptual Overview
+* Look through all the books, find the smallest one, and put it first.
+* Then, look through the rest, find the next smallest, and put it second.
+* Repeat until the shelf is sorted.
 
-Consider an array of numbers. The algorithm divides the array into two parts: a sorted subarray and an unsorted subarray. Initially, the sorted subarray is empty, while the entire array is unsorted. During each pass, the smallest element from the unsorted subarray is identified and then swapped with the first unsorted element. As a result, the sorted subarray grows by one element after each pass.
+**Step-by-Step Walkthrough**
 
-#### Steps
+1. Start at the **first position**.
+2. Search the **entire unsorted region** to find the smallest element.
+3. Swap it with the element in the current position.
+4. Move the boundary of the sorted region one step forward.
+5. Repeat until all elements are sorted.
 
-1. Assume the first element is the smallest.
-2. Traverse the unsorted subarray and find the smallest element.
-3. Swap the found smallest element with the first element of the unsorted subarray.
-4. Move the boundary of the sorted and unsorted subarrays one element to the right.
-5. Repeat steps 1-4 until the entire array is sorted.
+**Example Run**
+
+We will sort the array:
 
 ```
-Start:
 [ 64 ][ 25 ][ 12 ][ 22 ][ 11 ]
+```
 
-Pass 1: find min(64,25,12,22,11)=11, swap with first element
-[ 11 ][ 25 ][ 12 ][ 22 ][ 64 ]
+**Pass 1**
 
-Pass 2: find min(25,12,22,64)=12, swap with second element
-[ 11 ][ 12 ][ 25 ][ 22 ][ 64 ]
+Find the smallest element in the entire array and put it in the first position.
 
-Pass 3: find min(25,22,64)=22, swap with third element
-[ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+```
+Initial:    [ 64 ][ 25 ][ 12 ][ 22 ][ 11 ]
 
-Pass 4: find min(25,64)=25, swap with fourth element (self-swap)
-[ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+Smallest = 11  
+Swap 64 ↔ 11  
 
-Pass 5: only one element remains, already in place
-[ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+Result:     [ 11 ][ 25 ][ 12 ][ 22 ][ 64 ]
+```
+
+✔ The first element is now in its correct place.
+
+**Pass 2**
+
+Find the smallest element in the remaining unsorted region.
+
+```
+Start:      [ 11 ][ 25 ][ 12 ][ 22 ][ 64 ]
+
+Smallest in [25,12,22,64] = 12  
+Swap 25 ↔ 12  
+
+Result:     [ 11 ][ 12 ][ 25 ][ 22 ][ 64 ]
+```
+
+✔ The second element is now in place.
+
+**Pass 3**
+
+Repeat for the next unsorted region.
+
+```
+Start:      [ 11 ][ 12 ][ 25 ][ 22 ][ 64 ]
+
+Smallest in [25,22,64] = 22  
+Swap 25 ↔ 22  
+
+Result:     [ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+```
+
+✔ The third element is now in place.
+
+**Pass 4**
+
+Finally, sort the last two.
+
+```
+Start:      [ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+
+Smallest in [25,64] = 25  
+Already in correct place → no swap  
 
-Result:
+Result:     [ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
+```
+
+✔ Array fully sorted.
+
+**Final Result**
+
+```
 [ 11 ][ 12 ][ 22 ][ 25 ][ 64 ]
 ```
 
-#### Stability
+**Visual Illustration of Selection**
+
+Here’s how the **sorted region expands** from left to right:
+
+```
+Pass 1:  [ 64  25  12  22  11 ] → [ 11 ] [ 25  12  22  64 ]
+Pass 2:  [ 11 ][ 25  12  22  64 ] → [ 11  12 ] [ 25  22  64 ]
+Pass 3:  [ 11  12 ][ 25  22  64 ] → [ 11  12  22 ] [ 25  64 ]
+Pass 4:  [ 11  12  22 ][ 25  64 ] → [ 11  12  22  25 ] [ 64 ]
+```
 
-Selection sort is inherently unstable. When two elements have equal keys, their relative order might change post-sorting. This can be problematic in scenarios where stability is crucial.
+At each step:
 
-#### Time Complexity
+* The **left region is sorted** ✅
+* The **right region is unsorted** 🔄
 
-- In the **worst-case**, the time complexity is $O(n^2)$, as even if the array is already sorted, the algorithm still iterates through every element to find the smallest.
-- The **average-case** time complexity is also $O(n^2)$, since the algorithm's performance generally remains quadratic regardless of input arrangement.
-- In the **best-case**, the time complexity is still $O(n^2)$, unlike other algorithms, because selection sort always performs the same number of comparisons, regardless of the input's initial order.
+**Optimizations**
 
-#### Space Complexity
+* Unlike bubble sort, **early exit is not possible** because selection sort always scans the entire unsorted region to find the minimum.
+* But it does fewer swaps: **at most (n-1) swaps**, compared to potentially many in bubble sort.
 
-$(O(1))$ - The algorithm sorts in-place, meaning it doesn't use any extra space beyond what's needed for the input.
+**Stability**
 
-#### Implementation
+* **Selection sort is NOT stable** in its classic form.
+* If two elements are equal, their order may change due to swapping.
+* Stability can be achieved by inserting instead of swapping, but this makes the algorithm more complex.
+
+**Complexity**
+
+| Case             | Time Complexity | Notes                                      |
+|------------------|-----------------|--------------------------------------------|
+| **Worst Case**   | $O(n^2)$        | Scanning full unsorted region every pass   |
+| **Average Case** | $O(n^2)$        | Quadratic comparisons                      |
+| **Best Case**    | $O(n^2)$        | No improvement, still must scan every pass |
+| **Space**        | $O(1)$          | In-place sorting                           |
+
+**Implementation**
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/selection_sort/src/selection_sort.cpp)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/selection_sort/src/selection_sort.py)
 
 ### Insertion Sort
 
-Insertion sort works much like how one might sort a hand of playing cards. It builds a sorted array (or list) one element at a time by repeatedly taking one element from the input and inserting it into the correct position in the already-sorted section of the array. Its simplicity makes it a common choice for teaching the basics of algorithm design.
+Insertion sort is a simple, intuitive sorting algorithm that works the way people often sort playing cards in their hands.
+
+It builds the **sorted portion one element at a time**, by repeatedly taking the next element from the unsorted portion and inserting it into its correct position among the already sorted elements.
+
+The basic idea:
+
+1. Start with the **second element** (the first element by itself is trivially sorted).
+2. Compare it with elements to its **left**.
+3. Shift larger elements one position to the right.
+4. Insert the element into the correct spot.
+5. Repeat until all elements are processed.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 12 ][ 11 ][ 13 ][ 5 ][ 6 ]
+```
+
+**Pass 1: Insert 11**
+
+Compare 11 with 12 → shift 12 right → insert 11 before it.
+
+```
+Before:  [ 12 ][ 11 ][ 13 ][ 5 ][ 6 ]
+Action:  Insert 11 before 12
+After:   [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+```
+
+✔ Sorted portion: $[11, 12]$
+
+**Pass 2: Insert 13**
+
+Compare 13 with 12 → already greater → stays in place.
+
+```
+Before:  [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+After:   [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+```
+
+✔ Sorted portion: \[11, 12, 13]
+
+**Pass 3: Insert 5**
+
+Compare 5 with 13 → shift 13
+Compare 5 with 12 → shift 12
+Compare 5 with 11 → shift 11
+Insert 5 at start.
 
-#### Conceptual Overview
+```
+Before:  [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
+Action:  Move 13 → Move 12 → Move 11 → Insert 5
+After:   [ 5 ][ 11 ][ 12 ][ 13 ][ 6 ]
+```
 
-Imagine you have a series of numbers. The algorithm begins with the second element (assuming the first element on its own is already sorted) and inserts it into the correct position relative to the first. With each subsequent iteration, the algorithm takes the next unsorted element and scans through the sorted subarray, finding the appropriate position to insert the new element.
+✔ Sorted portion: [5, 11, 12, 13]
 
-#### Steps
+**Pass 4: Insert 6**
 
-1. Start at the second element (index 1) assuming the element at index 0 is sorted.
-2. Compare the current element with the previous elements.
-3. If the current element is smaller than the previous element, compare it with the elements before until you reach an element smaller or until you reach the start of the array. 
-4. Insert the current element into the correct position so that the elements before are all smaller.
-5. Repeat steps 2-4 for each element in the array.
+Compare 6 with 13 → shift 13
+Compare 6 with 12 → shift 12
+Compare 6 with 11 → shift 11
+Insert 6 after 5.
 
 ```
-Start:
-[ 12 ][ 11 ][ 13 ][  5 ][  6 ]
+Before:  [ 5 ][ 11 ][ 12 ][ 13 ][ 6 ]
+Action:  Move 13 → Move 12 → Move 11 → Insert 6
+After:   [ 5 ][ 6 ][ 11 ][ 12 ][ 13 ]
+```
 
-Pass 1: key = 11, insert into [12]
-[ 11 ][ 12 ][ 13 ][  5 ][  6 ]
+✔ Sorted!
 
-Pass 2: key = 13, stays in place
-[ 11 ][ 12 ][ 13 ][  5 ][  6 ]
+**Final Result**
 
-Pass 3: key =  5, insert into [11,12,13]
-[  5 ][ 11 ][ 12 ][ 13 ][  6 ]
+```
+[ 5 ][ 6 ][ 11 ][ 12 ][ 13 ]
+```
 
-Pass 4: key =  6, insert into [5,11,12,13]
-[  5 ][  6 ][ 11 ][ 12 ][ 13 ]
+**Visual Growth of Sorted Region**
 
-Result:
-[  5 ][  6 ][ 11 ][ 12 ][ 13 ]
 ```
+Start:   [ 12 | 11  13  5  6 ]
+Pass 1:  [ 11  12 | 13  5  6 ]
+Pass 2:  [ 11  12  13 | 5  6 ]
+Pass 3:  [ 5  11  12  13 | 6 ]
+Pass 4:  [ 5  6  11  12  13 ]
+```
+
+✔ The **bar ( | )** shows the boundary between **sorted** and **unsorted**.
 
-#### Stability
+**Optimizations**
 
-Insertion sort is stable. When two elements have equal keys, their relative order remains unchanged post-sorting. This stability is preserved since the algorithm only swaps elements if they are out of order, ensuring that equal elements never overtake each other.
+* Efficient for **small arrays**.
+* Useful as a **helper inside more complex sorts** (e.g., Quick Sort or Merge Sort) for small subarrays.
+* Can be optimized with **binary search** to find insertion positions faster (but shifting still takes linear time).
 
-#### Time Complexity
+**Stability**
 
-- In the **worst-case**, the time complexity is $O(n^2)$, which happens when the array is in reverse order, requiring every element to be compared with every other element.
-- The **average-case** time complexity is $O(n^2)$, as elements generally need to be compared with others, leading to quadratic performance.
-- In the **best-case**, the time complexity is $O(n)$, occurring when the array is already sorted, allowing the algorithm to simply pass through the array once without making any swaps.
+Insertion sort is **stable** (equal elements keep their relative order).
 
-#### Space Complexity
+**Complexity**
 
-$(O(1))$ - This in-place sorting algorithm doesn't need any additional storage beyond the input array.
+| Case             | Time Complexity | Notes                                             |
+|------------------|-----------------|---------------------------------------------------|
+| **Worst Case**   | $O(n^2)$        | Reverse-sorted input                              |
+| **Average Case** | $O(n^2)$        |                                                   |
+| **Best Case**    | $O(n)$          | Already sorted input — only comparisons, no shifts |
+| **Space**        | $O(1)$          | In-place                                          |
 
-#### Implementation
+**Implementation**
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/insertion_sort/src/insertion_sort.cpp)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/insertion_sort/src/insertion_sort.py)
 
 ### Quick Sort
 
-Quick Sort, often simply referred to as "quicksort", is a divide-and-conquer algorithm that's renowned for its efficiency and is widely used in practice. Its name stems from its ability to sort large datasets quickly. The core idea behind quicksort is selecting a 'pivot' element and partitioning the other elements into two sub-arrays according to whether they are less than or greater than the pivot. The process is then recursively applied to the sub-arrays.
+Quick Sort is a **divide-and-conquer** algorithm. Unlike bubble sort or selection sort, which work by repeatedly scanning the whole array, Quick Sort works by **partitioning** the array into smaller sections around a "pivot" element and then sorting those sections independently.
+
+It is one of the **fastest sorting algorithms in practice**, widely used in libraries and systems.
+
+The basic idea:
+
+1. Choose a **pivot element** (commonly the last, first, middle, or random element).
+2. Rearrange (partition) the array so that:
+* All elements **smaller than the pivot** come before it.
+* All elements **larger than the pivot** come after it.
+3. The pivot is now in its **final sorted position**.
+4. Recursively apply Quick Sort to the **left subarray** and **right subarray**.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 10 ][ 80 ][ 30 ][ 90 ][ 40 ][ 50 ][ 70 ]
+```
+
+**Step 1: Choose Pivot (last element = 70)**
+
+Partition around 70.
+
+```
+Initial:   [ 10 ][ 80 ][ 30 ][ 90 ][ 40 ][ 50 ][ 70 ]
+
+→ Elements < 70: [ 10, 30, 40, 50 ]  
+→ Pivot (70) goes here ↓
+Sorted split:  [ 10 ][ 30 ][ 40 ][ 50 ][ 70 ][ 90 ][ 80 ]
+```
+
+*(ordering of right side may vary during partition; only pivot’s position is guaranteed)*
+
+✔ Pivot (70) is in correct place.
+
+**Step 2: Left Subarray [10, 30, 40, 50]**
+
+Choose pivot = 50.
+
+```
+[ 10 ][ 30 ][ 40 ][ 50 ]   → pivot = 50
+
+→ Elements < 50: [10, 30, 40]  
+→ Pivot at correct place  
+
+Result: [ 10 ][ 30 ][ 40 ][ 50 ]
+```
+
+✔ Pivot (50) fixed.
+
+**Step 3: Left Subarray of Left [10, 30, 40]**
+
+Choose pivot = 40.
+
+```
+[ 10 ][ 30 ][ 40 ]   → pivot = 40
 
-#### Conceptual Overview
+→ Elements < 40: [10, 30]  
+→ Pivot at correct place  
 
-1. The first step is to **choose a pivot** from the array, which is the element used to partition the array. The pivot selection method can vary, such as picking the first element, the middle element, a random element, or using a more advanced approach like the median-of-three.
-2. During **partitioning**, the elements in the array are rearranged so that all elements less than or equal to the pivot are placed before it, and all elements greater than the pivot are placed after it. At this point, the pivot reaches its final sorted position.
-3. Finally, **recursion** is applied by repeating the same process for the two sub-arrays: one containing elements less than the pivot and the other containing elements greater than the pivot.
+Result: [ 10 ][ 30 ][ 40 ]
+```
 
-#### Steps
+✔ Pivot (40) fixed.
 
-1. Choose a 'pivot' from the array.
-2. Partition the array around the pivot, ensuring all elements on the left are less than the pivot and all elements on the right are greater than it.
-3. Recursively apply steps 1 and 2 to the left and right partitions.
-4. Repeat until base case: the partition has only one or zero elements.
+**Step 4: [10, 30]**
 
+Choose pivot = 30.
 
 ```
-Start:
-[ 10 ][ 7 ][ 8 ][ 9 ][ 1 ][ 5 ]
+[ 10 ][ 30 ]   → pivot = 30
 
-Partition around pivot = 5:
-  • Compare and swap ↓
-    [  1 ][ 7 ][ 8 ][ 9 ][ 10 ][ 5 ]
-  • Place pivot in correct spot ↓
-    [  1 ][ 5 ][ 8 ][ 9 ][ 10 ][ 7 ]
+→ Elements < 30: [10]  
 
-Recurse on left [1]  → already sorted  
-Recurse on right [8, 9, 10, 7]:
+Result: [ 10 ][ 30 ]
+```
 
-  Partition around pivot = 7:
-    [  7 ][ 9 ][ 10 ][ 8 ]
-  Recurse left []       → []
-  Recurse right [9, 10, 8]:
+✔ Sorted.
 
-    Partition around pivot = 8:
-      [  8 ][ 10 ][ 9 ]
-    Recurse left []     → []
-    Recurse right [10, 9]:
-      Partition pivot = 9:
-        [  9 ][ 10 ]
-      → both sides sorted
+**Final Result**
 
-    → merge [8] + [9, 10] → [  8 ][  9 ][ 10 ]
+```
+[ 10 ][ 30 ][ 40 ][ 50 ][ 70 ][ 80 ][ 90 ]
+```
+
+**Visual Partition Illustration**
+
+Here’s how the array gets partitioned step by step:
+
+```
+Pass 1:  [ 10  80  30  90  40  50 | 70 ]  
+          ↓ pivot = 70  
+          [ 10  30  40  50 | 70 | 90  80 ]
 
-  → merge [7] + [8, 9, 10] → [  7 ][  8 ][  9 ][ 10 ]
+Pass 2:  [ 10  30  40 | 50 ] [70] [90  80]  
+          ↓ pivot = 50  
+          [ 10  30  40 | 50 ] [70] [90  80]
 
-→ merge [1, 5] + [7, 8, 9, 10] → [  1 ][  5 ][  7 ][  8 ][  9 ][ 10 ]
+Pass 3:  [ 10  30 | 40 ] [50] [70] [90  80]  
+          ↓ pivot = 40  
+          [ 10  30 | 40 ] [50] [70] [90  80]
 
-Result:
-[  1 ][  5 ][  7 ][  8 ][  9 ][ 10 ]
+Pass 4:  [ 10 | 30 ] [40] [50] [70] [90  80]  
+          ↓ pivot = 30  
+          [ 10 | 30 ] [40] [50] [70] [90  80]
 ```
 
-#### Stability
+✔ Each pivot splits the problem smaller and smaller until fully sorted.
 
-Quick sort is inherently unstable due to the long-distance exchanges of values. However, with specific modifications, it can be made stable, although this is not commonly done.
+**Optimizations**
 
-#### Time Complexity
+* **Pivot Choice:** Choosing a good pivot (e.g., median or random) improves performance.
+* **Small Subarrays:** For very small partitions, switch to Insertion Sort for efficiency.
+* **Tail Recursion:** Can optimize recursion depth.
 
-- In the **worst-case**, the time complexity is $O(n^2)$, which can occur when the pivot is the smallest or largest element, resulting in highly unbalanced partitions. However, with effective pivot selection strategies, this scenario is rare in practice.
-- The **average-case** time complexity is $O(n \log n)$, which is expected when using a good pivot selection method that balances the partitions reasonably well.
-- In the **best-case**, the time complexity is also $O(n \log n)$, occurring when each pivot divides the array into two roughly equal-sized parts, leading to optimal partitioning.
+**Stability**
 
-#### Space Complexity
+* Quick Sort is **not stable** by default (equal elements may be reordered).
+* Stable versions exist, but require modifications.
 
-$(O(\log n))$ - Though quicksort sorts in place, it requires stack space for recursion, which in the best case is logarithmic.
+**Complexity**
 
-#### Implementation
+| Case             | Time Complexity | Notes                                                                 |
+|------------------|-----------------|----------------------------------------------------------------------|
+| **Worst Case**   | $O(n^2)$        | Poor pivot choices (e.g., always smallest/largest in sorted array)   |
+| **Average Case** | $O(n \log n)$   | Expected performance, very fast in practice                          |
+| **Best Case**    | $O(n \log n)$   | Balanced partitions                                                  |
+| **Space**        | $O(\log n)$     | Due to recursion stack                                               |
+
+**Implementation**
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/quick_sort/src/quick_sort.cpp)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/quick_sort/src/quick_sort.py)
 
 ### Heap sort
 
-Heap Sort is a comparison-based sorting technique performed on a binary heap data structure. It leverages the properties of a heap to efficiently sort a dataset. The essential idea is to build a heap from the input data, then continuously extract the maximum element from the heap and reconstruct the heap until it's empty. The result is a sorted list.
+Heap Sort is a **comparison-based sorting algorithm** that uses a special data structure called a **binary heap**.
+It is efficient, with guaranteed \$O(n \log n)\$ performance, and sorts **in-place** (no extra array needed).
+
+The basic idea:
 
-#### Conceptual Overview
+1. **Build a max heap** from the input array.
+* In a max heap, every parent is greater than its children.
+* This ensures the **largest element is at the root** (first index).
+2. Swap the **root (largest element)** with the **last element** of the heap.
+3. Reduce the heap size by 1 (ignore the last element, which is now in place).
+4. **Heapify** (restore heap property).
+5. Repeat until all elements are sorted.
 
-1. The first step is to **build a max heap**, which involves transforming the list into a max heap (a complete binary tree where each node is greater than or equal to its children). This is typically achieved using a bottom-up approach to ensure the heap property is satisfied. *(Building the heap with Floyd’s bottom-up procedure costs Θ(*n*) time—lower than Θ(*n log n*)—so it never dominates the overall running time.)*
+**Example Run**
 
-2. During **sorting**, the maximum element (the root of the heap) is swapped with the last element of the unsorted portion of the array, placing the largest element in its final position. **After each swap, the newly “fixed” maximum stays at the end of the *same* array; the active heap is simply the prefix that remains unsorted.** The heap size is then reduced by one, and the unsorted portion is restructured into a max heap. This process continues until the heap size is reduced to one, completing the sort.
+We will sort the array:
+
+```
+[ 4 ][ 10 ][ 3 ][ 5 ][ 1 ]
+```
 
-#### Steps
+**Step 1: Build Max Heap**
 
-1. Construct a max heap from the given data. This will place the largest element at the root.
-2. Swap the root (maximum value) with the last element of the heap. This element is now considered sorted.
-3. Decrease the heap size by one (to exclude the sorted elements).
-4. "Heapify" the root of the tree, i.e., ensure the heap property is maintained.
-5. Repeat steps 2-4 until the size of the heap is one.
+Binary tree view:
 
 ```
-Initial array (size n = 5)                     index: 0 1 2 3 4
-        4                                        [4,10,3,5,1]
+        4
       /   \
     10     3
    /  \
   5    1
+```
+
+Heapify → Largest at top:
 
-↓ BUILD MAX-HEAP  (Θ(n))  —> heapSize = 5
-        10                                       [10,5,3,4,1]
-      /    \
-     5      3
+```
+       10
+      /   \
+     5     3
     / \
    4   1
+
+Array: [ 10 ][ 5 ][ 3 ][ 4 ][ 1 ]
 ```
 
-**Pass 1 extract-max**
+**Step 2: Swap Root with Last**
 
+Swap 10 ↔ 1 → largest (10) moves to correct final place.
+
+```
+[ 1 ][ 5 ][ 3 ][ 4 ][ 10 ]
 ```
-swap 10 ↔ 1                               [1,5,3,4 | 10]   heapSize = 4
-                                           ↑ live heap ↑  ↑fixed↑
-heapify (1↔5, 1↔4)  →                      [5,4,3,1 | 10]
 
-        5
-      /   \
-     4     3
+Heapify the reduced heap \[1,5,3,4]:
+
+```
+       5
+      / \
+     4   3
     /
    1
+
+Array: [ 5 ][ 4 ][ 3 ][ 1 ][ 10 ]
 ```
 
-**Pass 2 extract-max**
+**Step 3: Swap Root with Last**
 
+Swap 5 ↔ 1.
+
+```
+[ 1 ][ 4 ][ 3 ][ 5 ][ 10 ]
 ```
-swap 5 ↔ 1                                [1,4,3 | 5,10]   heapSize = 3
-heapify (1↔4)      →                      [4,1,3 | 5,10]
 
-        4
-      /   \
-     1     3
+Heapify reduced heap \[1,4,3]:
+
+```
+       4
+      / \
+     1   3
+
+Array: [ 4 ][ 1 ][ 3 ][ 5 ][ 10 ]
+```
+
+**Step 4: Swap Root with Last**
+
+Swap 4 ↔ 3.
+
+```
+[ 3 ][ 1 ][ 4 ][ 5 ][ 10 ]
 ```
 
-**Pass 3 extract-max**
+Heapify reduced heap \[3,1]:
 
 ```
-swap 4 ↔ 3                                [3,1 | 4,5,10]   heapSize = 2
-(no heapify needed – root already ≥ child)
+       3
+      /
+     1
 
-        3
-       /
-      1
+Array: [ 3 ][ 1 ][ 4 ][ 5 ][ 10 ]
 ```
 
-**Pass 4 extract-max**
+**Step 5: Swap Root with Last**
+
+Swap 3 ↔ 1.
 
 ```
-swap 3 ↔ 1                                [1 | 3,4,5,10]   heapSize = 1
-(heap of size 1 is trivially a heap)
+[ 1 ][ 3 ][ 4 ][ 5 ][ 10 ]
 ```
 
-**Pass 5 extract-max**
+✔ Sorted array achieved.
+
+**Final Result**
 
 ```
-Done – heapSize = 0
-Sorted array:                              [1,3,4,5,10]
+[ 1 ][ 3 ][ 4 ][ 5 ][ 10 ]
 ```
 
-#### Stability
+**Visual Progress**
+
+```
+Initial:   [ 4  10  3  5  1 ]
+Heapify:   [ 10  5  3  4  1 ]
+Step 1:    [ 5  4  3  1 | 10 ]
+Step 2:    [ 4  1  3 | 5  10 ]
+Step 3:    [ 3  1 | 4  5  10 ]
+Step 4:    [ 1 | 3  4  5  10 ]
+Sorted:    [ 1  3  4  5  10 ]
+```
 
-Heap sort is inherently unstable. Similar to quicksort, the relative order of equal items is not preserved because of the long-distance exchanges.
+✔ Each step places the largest element into its correct final position.
 
-#### Time Complexity
+**Optimizations**
 
-- In the **worst-case**, the time complexity is $O(n \log n)$, regardless of the arrangement of the input data.
-- The **average-case** time complexity is also $O(n \log n)$, as the algorithm's structure ensures consistent performance.
-- In the **best-case**, the time complexity remains $O(n \log n)$, since building and deconstructing the heap is still necessary, even if the input is already partially sorted.
+* Building the heap can be done in **O(n)** time using bottom-up heapify.
+* After building, each extract-max + heapify takes **O(log n)**.
 
-#### Space Complexity
+**Stability**
 
-$O(1)$ – The sorting is done in-place, requiring only a constant amount of auxiliary space. **This assumes an *iterative* `siftDown/heapify`; a recursive version would add an \$O(\log n)\$ call stack.**
+Heap sort is **not stable**. Equal elements may not preserve their original order because of swaps.
 
-#### Implementation
+**Complexity**
+
+| Case             | Time Complexity | Notes                          |
+|------------------|-----------------|--------------------------------|
+| **Worst Case**   | $O(n \log n)$   |                                |
+| **Average Case** | $O(n \log n)$   |                                |
+| **Best Case**    | $O(n \log n)$   | No early exit possible         |
+| **Space**        | $O(1)$          | In-place                       |
+
+**Implementation**
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/heap_sort/src/heap_sort.cpp)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/heap_sort/src/heap_sort.py)
+
+### Radix Sort
+
+Radix Sort is a **non-comparison-based sorting algorithm**.
+Instead of comparing elements directly, it processes numbers digit by digit, from either the **least significant digit (LSD)** or the **most significant digit (MSD)**, using a stable intermediate sorting algorithm (commonly **Counting Sort**).
+
+Because it avoids comparisons, Radix Sort can achieve **linear time complexity** in many cases.
+
+The basic idea:
+
+1. Pick a **digit position** (units, tens, hundreds, etc.).
+2. Sort the array by that digit using a **stable sorting algorithm**.
+3. Move to the next digit.
+4. Repeat until all digits are processed.
+
+**Example Run (LSD Radix Sort)**
+
+We will sort the array:
+
+```
+[ 170 ][ 45 ][ 75 ][ 90 ][ 802 ][ 24 ][ 2 ][ 66 ]
+```
+
+**Step 1: Sort by 1s place (units digit)**
+
+```
+Original: [170, 45, 75, 90, 802, 24, 2, 66]
+
+By 1s digit:
+[170][90] (0)
+[802][2]  (2)
+[24]      (4)
+[45][75]  (5)
+[66]      (6)
+
+Result: [170][90][802][2][24][45][75][66]
+```
+
+**Step 2: Sort by 10s place**
+
+```
+[170][90][802][2][24][45][75][66]
+
+By 10s digit:
+[802][2]   (0)
+[24]       (2)
+[45]       (4)
+[66]       (6)
+[170][75]  (7)
+[90]       (9)
+
+Result: [802][2][24][45][66][170][75][90]
+```
+
+**Step 3: Sort by 100s place**
+
+```
+[802][2][24][45][66][170][75][90]
+
+By 100s digit:
+[2][24][45][66][75][90]  (0)
+[170]                   (1)
+[802]                   (8)
+
+Result: [2][24][45][66][75][90][170][802]
+```
+
+**Final Result**
+
+```
+[ 2 ][ 24 ][ 45 ][ 66 ][ 75 ][ 90 ][ 170 ][ 802 ]
+```
+
+**Visual Process**
+
+```
+Step 1 (1s):   [170  90  802  2  24  45  75  66]
+Step 2 (10s):  [802  2  24  45  66  170  75  90]
+Step 3 (100s): [2  24  45  66  75  90  170  802]
+```
+
+✔ Each pass groups by digit → final sorted order.
+ 
+**LSD vs MSD**
+
+* **LSD (Least Significant Digit first):** Process digits from right (units) to left (hundreds). Most common, simpler.
+* **MSD (Most Significant Digit first):** Process from left to right, useful for variable-length data like strings.
+
+**Stability**
+
+* Radix Sort **is stable**, because it relies on a stable intermediate sort (like Counting Sort).
+* Equal elements remain in the same order across passes.
+
+**Complexity**
+
+* **Time Complexity:** \$O(n \cdot k)\$
+
+  * \$n\$ = number of elements
+  * \$k\$ = number of digits (or max digit length)
+
+* **Space Complexity:** \$O(n + k)\$ (depends on the stable sorting method used, e.g., Counting Sort).
+
+* For integers with fixed number of digits, Radix Sort can be considered **linear time**.
+
+**Implementation**
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/heap_sort/src/radix_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/heap_sort/src/radix_sort.py)
+
+### Counting Sort
+
+Counting Sort is a **non-comparison-based sorting algorithm** that works by **counting occurrences** of each distinct element and then calculating their positions in the output array.
+
+It is especially efficient when:
+
+* The input values are integers.
+* The **range of values (k)** is not significantly larger than the number of elements (n).
+
+The basic idea:
+
+1. Find the **range** of the input (min to max).
+2. Create a **count array** to store the frequency of each number.
+3. Modify the count array to store **prefix sums** (cumulative counts).
+* This gives the final position of each element.
+4. Place elements into the output array in order, using the count array.
+
+**Example Run**
+
+We will sort the array:
+
+```
+[ 4 ][ 2 ][ 2 ][ 8 ][ 3 ][ 3 ][ 1 ]
+```
+
+**Step 1: Count Frequencies**
+
+```
+Elements:  1  2  3  4  5  6  7  8
+Counts:    1  2  2  1  0  0  0  1
+```
+
+**Step 2: Prefix Sums**
+
+```
+Elements:  1  2  3  4  5  6  7  8
+Counts:    1  3  5  6  6  6  6  7
+```
+
+✔ Now each number tells us the **last index position** where that value should go.
+
+**Step 3: Place Elements**
+
+Process input from right → left (for stability).
+
+```
+Input:  [4,2,2,8,3,3,1]
+
+Place 1 → index 0
+Place 3 → index 4
+Place 3 → index 3
+Place 8 → index 6
+Place 2 → index 2
+Place 2 → index 1
+Place 4 → index 5
+```
+
+**Final Result**
+
+```
+[ 1 ][ 2 ][ 2 ][ 3 ][ 3 ][ 4 ][ 8 ]
+```
+
+**Visual Process**
+
+```
+Step 1 Count:   [0,1,2,2,1,0,0,0,1]
+Step 2 Prefix:  [0,1,3,5,6,6,6,6,7]
+Step 3 Output:  [1,2,2,3,3,4,8]
+```
+
+✔ Linear-time sorting by counting positions.
+
+**Stability**
+
+Counting Sort is **stable** if we place elements **from right to left** into the output array.
+
+**Complexity**
+
+| Case             | Time Complexity | Notes                                    |
+|------------------|-----------------|------------------------------------------|
+| **Overall**      | $O(n + k)$      | $n$ = number of elements, $k$ = value range |
+| **Space**        | $O(n + k)$      | Extra array for counts + output          |
+
+**Implementation**
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/heap_sort/src/counting_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/heap_sort/src/counting_sort.py)
+
+### Comparison Table
+
+Below is a consolidated **side-by-side comparison** of all the sorts we’ve covered so far:
+
+| Algorithm      | Best Case  | Average     | Worst Case  | Space       | Stable? | Notes                  |
+|----------------|------------|-------------|-------------|-------------|---------|------------------------|
+| **Bubble Sort**    | O(n)       | O(n²)       | O(n²)       | O(1)        | Yes     | Simple, slow           |
+| **Selection Sort** | O(n²)      | O(n²)       | O(n²)       | O(1)        | No      | Few swaps              |
+| **Insertion Sort** | O(n)       | O(n²)       | O(n²)       | O(1)        | Yes     | Good for small inputs  |
+| **Quick Sort**     | O(n log n) | O(n log n)  | O(n²)       | O(log n)    | No      | Very fast in practice  |
+| **Heap Sort**      | O(n log n) | O(n log n)  | O(n log n)  | O(1)        | No      | Guaranteed performance |
+| **Counting Sort**  | O(n + k)   | O(n + k)    | O(n + k)    | O(n + k)    | Yes     | Integers only          |
+| **Radix Sort**     | O(nk)      | O(nk)       | O(nk)       | O(n + k)    | Yes     | Uses Counting Sort     |

From 7db352440ec680f3f419b7724e94a71e446acce6 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 19:57:25 +0200
Subject: [PATCH 32/48] Update graphs.md

---
 notes/graphs.md | 1536 ++++++++++++++++++++++++++++++++---------------
 1 file changed, 1065 insertions(+), 471 deletions(-)

diff --git a/notes/graphs.md b/notes/graphs.md
index ce094e4..912de5d 100644
--- a/notes/graphs.md
+++ b/notes/graphs.md
@@ -1,6 +1,3 @@
-TODO:
-- topological sort
-
 ## Graphs
 
 In many areas of life, we come across systems where elements are deeply interconnected—whether through physical routes, digital networks, or abstract relationships. Graphs offer a flexible way to represent and make sense of these connections.
@@ -197,7 +194,18 @@ Attempting to avoid one crossing in $K_5$ inevitably forces another crossing els
 
 ### Traversals
 
-- When we **traverse** a graph, we visit its vertices in an organized way to make sure we don’t miss any vertices or edges.
+What does it mean to traverse a graph?
+
+Graph traversal **can** be done in a way that visits *all* vertices and edges (like a full DFS/BFS), but it doesn’t *have to*.
+
+* If you start DFS or BFS from a single source vertex, you’ll only reach the **connected component** containing that vertex. Any vertices in other components won’t be visited.
+* Some algorithms (like shortest path searches, A\*, or even partial DFS) intentionally stop early, meaning not all vertices or edges are visited.
+* In weighted or directed graphs, you may also skip certain edges depending on the traversal rules.
+
+So the precise way to answer that question is:
+
+> **Graph traversal is a systematic way of exploring vertices and edges, often ensuring complete coverage of the reachable part of the graph — but whether all vertices/edges are visited depends on the algorithm and stopping conditions.**
+
 - Graphs, unlike **trees**, don’t have a single starting point like a root. This means we either need to be given a starting vertex or pick one randomly.
 - Let’s say we start from a specific vertex, like **$i$**. From there, the traversal explores all connected vertices according to the rules of the chosen method.
 - In both **breadth-first search (BFS)** and **depth-first search (DFS)**, the order of visiting vertices depends on how the algorithm is implemented.
@@ -206,525 +214,870 @@ Attempting to avoid one crossing in $K_5$ inevitably forces another crossing els
 
 #### Breadth-First Search (BFS)
 
-Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores the vertices of a graph in layers, starting from a specified source vertex. It progresses by visiting all immediate neighbors of the starting point, then the neighbors of those neighbors, and so on.
+Breadth-First Search (BFS) is a fundamental graph traversal algorithm that explores a graph **level by level** from a specified start vertex. It first visits all vertices at distance 1 from the start, then all vertices at distance 2, and so on. This makes BFS the natural choice whenever “closest in number of edges” matters.
 
 To efficiently keep track of the traversal, BFS employs two primary data structures:
 
-* A queue, typically named `unexplored` or `queue`, to store nodes that are pending exploration.
-* A hash table or a set called `visited` to ensure that we do not revisit nodes.
+* A **queue** (often named `queue` or `unexplored`) that stores vertices pending exploration in **first-in, first-out (FIFO)** order.
+* A **`visited` set** (or boolean array) that records which vertices have already been discovered to prevent revisiting.
 
-##### Algorithm Steps
+*Useful additions in practice:*
 
-1. Begin from a starting vertex, $i$.
-2. Mark the vertex $i$ as visited.
-3. Explore each of its neighbors. If the neighbor hasn't been visited yet, mark it as visited and enqueue it in `unexplored`.
-4. Dequeue the front vertex from `unexplored` and repeat step 3.
-5. Continue this process until the `unexplored` queue becomes empty.
+* An optional **`parent` map** to reconstruct shortest paths (store `parent[child] = current` when you first discover `child`).
+* An optional **`dist` map** to record the edge-distance from the start (`dist[start] = 0`, and when discovering `v` from `u`, set `dist[v] = dist[u] + 1`).
 
-To ensure the algorithm doesn't fall into an infinite loop due to cycles in the graph, it could be useful to mark nodes as visited as soon as they are enqueued. This prevents them from being added to the queue multiple times.
+**Algorithm Steps**
 
-##### Example
+1. Begin from a starting vertex, \$i\$.
+2. Initialize `visited = {i}`, set `parent[i] = None`, optionally `dist[i] = 0`, and **enqueue** \$i\$ into `queue`.
+3. While `queue` is not empty:
+1. **Dequeue** the front vertex `u`.
+2. For **each neighbor** `v` of `u`:
+* If `v` is **not** in `visited`, add `v` to `visited`, set `parent[v] = u` (and `dist[v] = dist[u] + 1` if tracking distances), and **enqueue** `v`.
+4. Continue until the queue becomes empty.
 
-```
-Queue: Empty          Visited: A, B, C, D, E
+Marking nodes as **visited at the moment they are enqueued** (not when dequeued) is crucial: it prevents the same node from being enqueued multiple times in graphs with cycles or multiple incoming edges.
+
+*Reference pseudocode (adjacency-list graph):*
 
-   A
-  / \
- B   C
- |   |
- D   E
 ```
+BFS(G, i):
+    visited = {i}
+    parent  = {i: None}
+    dist    = {i: 0}            # optional
+    queue   = [i]
 
-In this example, BFS started at the top of the graph and worked its way down, visiting nodes in order of their distance from the starting node. The ASCII representation provides a step-by-step visualization of BFS using a queue and a list of visited nodes.
+    order = []                  # optional: visitation order
 
-##### Applications
+    while queue:
+        u = queue.pop(0)        # dequeue
+        order.append(u)
 
-BFS is not only used for simple graph traversal. Its applications span multiple domains:
+        for v in G[u]:          # iterate neighbors
+            if v not in visited:
+                visited.add(v)
+                parent[v] = u
+                dist[v] = dist[u] + 1   # if tracking
+                queue.append(v)
 
-1. BFS can determine the **shortest path** in an unweighted graph from a source to all other nodes.
-2. To find all **connected components** in an undirected graph, you can run BFS on every unvisited node.
-3. BFS mirrors the propagation in broadcasting networks, where a message is forwarded to neighboring nodes, and they subsequently forward it to their neighbors.
-4. If during BFS traversal, an already visited node is encountered (and it's not the parent of the current node in traversal), then there exists a cycle in the graph.
+    return order, parent, dist
+```
 
-##### Implementation
+*Sanity notes:*
 
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/bfs)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/bfs)
+* **Time:** $O(V + E)$ for a graph with $V$ vertices and $E$ edges (each vertex enqueued once; each edge considered once).
+* **Space:** $O(V)$ for the queue + visited (+ parent/dist if used).
+* BFS order can differ depending on **neighbor iteration order**.
 
-#### Depth-First Search (DFS)
+**Example**
 
-Depth-First Search (DFS) is another fundamental graph traversal algorithm, but unlike BFS which traverses level by level, DFS dives deep into the graph, exploring as far as possible along each branch before backtracking.
+Graph (undirected) with start at **A**:
 
-To implement DFS, we use two main data structures:
+```
+           ┌─────┐
+           │  A  │
+           └──┬──┘
+          ┌───┘ └───┐
+        ┌─▼─┐     ┌─▼─┐
+        │ B │     │ C │
+        └─┬─┘     └─┬─┘
+        ┌─▼─┐     ┌─▼─┐
+        │ D │     │ E │
+        └───┘     └───┘
 
-* A stack, either implicitly using the call stack through recursion or explicitly using a data structure. This stack is responsible for tracking vertices that are to be explored.
-* A hash table or set called `visited` to ensure nodes aren't revisited.
+Edges: A–B, A–C, B–D, C–E
+```
 
-##### Algorithm Steps
+*Queue/Visited evolution (front → back):*
 
-1. Begin from a starting vertex, $i$.
-2. Mark vertex $i$ as visited.
-3. Visit an unvisited neighbor of $i$, mark it as visited, and move to that vertex.
-4. Repeat the above step until the current vertex has no unvisited neighbors.
-5. Backtrack to the previous vertex and explore other unvisited neighbors.
-6. Continue this process until you've visited all vertices connected to the initial start vertex.
-
-Marking nodes as visited as soon as you encounter them is important to avoid infinite loops, particularly in graphs with cycles.
+```
+Step | Dequeued | Action                                   | Queue            | Visited
+-----+----------+-------------------------------------------+------------------+----------------
+0    | —        | enqueue A                                 | [A]              | {A}
+1    | A        | discover B, C; enqueue both               | [B, C]           | {A, B, C}
+2    | B        | discover D; enqueue                       | [C, D]           | {A, B, C, D}
+3    | C        | discover E; enqueue                       | [D, E]           | {A, B, C, D, E}
+4    | D        | no new neighbors                          | [E]              | {A, B, C, D, E}
+5    | E        | no new neighbors                          | []               | {A, B, C, D, E}
+```
 
-##### Example
+*BFS tree and distances from A:*
 
 ```
-Stack: Empty          Visited: A, B, D, C, E
+dist[A]=0
+A → B (1), A → C (1)
+B → D (2), C → E (2)
 
-   A
-  / \
- B   C
- |   |
- D   E
+Parents: parent[B]=A, parent[C]=A, parent[D]=B, parent[E]=C
+Shortest path A→E: backtrack E→C→A  ⇒  A - C - E
 ```
 
-In this example, DFS explored as deep as possible along the left side (branch with B and D) of the graph before backtracking and moving to the right side (branch with C and E). The ASCII representation provides a step-by-step visualization of DFS using a stack and a list of visited nodes.
+**Applications**
 
-##### Applications
+1. **Shortest paths in unweighted graphs.**
+   BFS computes the minimum number of edges from the source to every reachable node. Use the `parent` map to reconstruct actual paths.
 
-DFS, with its inherent nature of diving deep, has several intriguing applications:
+2. **Connected components (undirected graphs).**
+   Repeatedly run BFS from every unvisited vertex; each run discovers exactly one component.
 
-1. Topological Sorting is used in scheduling tasks, where one task should be completed before another starts.
-2. To find all strongly connected components in a directed graph.
-3. DFS can be employed to find a path between two nodes, though it might not guarantee the shortest path.
-4. If during DFS traversal, an already visited node is encountered (and it's not the direct parent of the current node in traversal), then there's a cycle in the graph.
+3. **Broadcast/propagation modeling.**
+   BFS mirrors “wavefront” spread (e.g., message fan-out, infection spread, multi-hop neighborhood queries).
 
-##### Implementation
+4. **Cycle detection (undirected graphs).**
+   During BFS, if you encounter a neighbor that is already **visited** and is **not** the parent of the current vertex, a cycle exists.
+   *Note:* For **directed graphs**, detecting cycles typically uses other techniques (e.g., DFS with recursion stack or Kahn’s algorithm on indegrees).
 
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dfs)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dfs)
+5. **Bipartite testing.**
+   While BFS’ing, assign alternating “colors” by level; if you ever see an edge connecting the **same** color, the graph isn’t bipartite.
 
-### Shortest paths
+6. **Multi-source searches.**
+   Initialize the queue with **several** starting nodes at once (all with `dist=0`). This solves “nearest facility” style problems efficiently.
 
-A common task when dealing with weighted graphs is to find the shortest route between two vertices, such as from vertex $A$ to vertex $B$. Note that there might not be a unique shortest path, since several paths could have the same length.
+7. **Topological sorting via Kahn’s algorithm (DAGs).**
+   A BFS-like process over vertices of indegree 0 (using a queue) produces a valid topological order for directed acyclic graphs.
 
-#### Dijkstra's Algorithm
+**Implementation**
 
-- **Dijkstra's algorithm** is a method to find the shortest paths from a starting vertex to all other vertices in a weighted graph.
-- A **weighted graph** is one where each edge has a numerical value (cost, distance, or time).
-- The algorithm starts at a **starting vertex**, often labeled **A**, and computes the shortest path to every other vertex.
-- It keeps a **tentative distance** for each vertex, representing the current known shortest distance from the start.
-- It repeatedly **selects the vertex** with the smallest tentative distance that hasn't been finalized (or "finished") yet.
-- Once a vertex is selected, the algorithm **relaxes all its edges**: it checks if going through this vertex offers a shorter path to its neighbors.
-- This continues until all vertices are processed, yielding the shortest paths from the starting vertex to every other vertex.
-- **Important**: Dijkstra’s algorithm requires **non-negative edge weights**, or else results can be incorrect.
+*Implementation tip:* For dense graphs or when memory locality matters, an adjacency **matrix** can be used, but the usual adjacency **list** representation is more space- and time-efficient for sparse graphs.
 
-##### Algorithm Steps
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/bfs)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/bfs)
+
+#### Depth-First Search (DFS)
+
+Depth-First Search (DFS) is a fundamental graph traversal algorithm that explores **as far as possible** along each branch before backtracking. Starting from a source vertex, it dives down one neighbor, then that neighbor’s neighbor, and so on—only backing up when it runs out of new vertices to visit.
+
+To track the traversal efficiently, DFS typically uses:
+
+* A **call stack** via **recursion** *or* an explicit **stack** data structure (LIFO).
+* A **`visited` set** (or boolean array) to avoid revisiting vertices.
 
-**Input**
+*Useful additions in practice:*
 
-- A weighted graph where each edge has a cost or distance
-- A starting vertex `A`
+* A **`parent` map** to reconstruct paths and build the DFS tree (`parent[child] = current` on discovery).
+* Optional **timestamps** (`tin[u]` on entry, `tout[u]` on exit) to reason about edge types, topological order, and low-link computations.
+* Optional **`order` lists**: pre-order (on entry) and post-order (on exit).
 
-**Output**
+**Algorithm Steps**
 
-- An array `distances` where `distances[v]` is the shortest distance from `A` to vertex `v`
+1. Begin at starting vertex $i$.
+2. Mark $i$ as **visited**, optionally set `parent[i] = None`, record `tin[i]`.
+3. For each neighbor $v$ of the current vertex $u$:
 
-**Containers and Data Structures**
+   * If $v$ is **unvisited**, set `parent[v] = u` and **recurse** (or push onto a stack) into $v$.
+4. After all neighbors of $u$ are explored, record `tout[u]` and **backtrack** (return or pop).
+5. Repeat for any remaining unvisited vertices (to cover disconnected graphs).
 
-- An array `distances`, initialized to `∞` for all vertices except `A`, which is set to `0`
-- A hash table `finished` to mark vertices with confirmed shortest paths
-- A priority queue to efficiently select the vertex with the smallest current distance
+Mark vertices **when first discovered** (on entry/push) to prevent infinite loops in cyclic graphs.
 
-**Steps**
+*Pseudocode (recursive, adjacency list):*
 
-I. Initialize `distances[A]` to `0`
+```
+time = 0
+
+DFS(G, i):
+    visited = set()
+    parent  = {i: None}
+    tin     = {}
+    tout    = {}
+    pre     = []       # optional: order on entry
+    post    = []       # optional: order on exit
 
-II. Initialize `distances[v]` to `∞` for every other vertex `v`
+    def explore(u):
+        nonlocal time
+        visited.add(u)
+        time += 1
+        tin[u] = time
+        pre.append(u)               # preorder
 
-III. While not all vertices are marked as finished
+        for v in G[u]:
+            if v not in visited:
+                parent[v] = u
+                explore(v)
 
-- Select vertex `u` with the smallest `distances[u]` among unfinished vertices
-- Mark `finished[u]` as `true`
-- For each neighbor `w` of `u`, if `distances[u] + weights[u][w]` is less than `distances[w]`, update `distances[w]` to `distances[u] + weights[u][w]`
+        time += 1
+        tout[u] = time
+        post.append(u)              # postorder
 
-##### Step by Step Example
+    explore(i)
+    return pre, post, parent, tin, tout
+```
 
-Consider a graph with vertices A, B, C, D, and E, and edges:
+*Pseudocode (iterative, traversal order only):*
 
 ```
-A-B: 4
-A-C: 2
-C-B: 1
-B-D: 5
-C-D: 8
-C-E: 10
-D-E: 2
+DFS_iter(G, i):
+    visited = set()
+    parent  = {i: None}
+    order   = []
+    stack   = [i]
+
+    while stack:
+        u = stack.pop()             # take the top
+        if u in visited: 
+            continue
+        visited.add(u)
+        order.append(u)
+
+        # Push neighbors in reverse of desired visiting order
+        for v in reversed(G[u]):
+            if v not in visited:
+                parent[v] = u
+                stack.append(v)
+
+    return order, parent
 ```
 
-The adjacency matrix looks like this (∞ means no direct edge):
+*Sanity notes:*
 
-|   |  A |  B |  C |  D |  E |
-|---|----|----|----|----|----|
-| **A** |  0 |  4 |  2 |  ∞ |  ∞ |
-| **B** |  4 |  0 |  1 |  5 |  ∞ |
-| **C** |  2 |  1 |  0 |  8 | 10 |
-| **D** |  ∞ |  5 |  8 |  0 |  2 |
-| **E** |  ∞ |  ∞ | 10 |  2 |  0 |
+* **Time:** $O(V + E)$ — each vertex/edge handled a constant number of times.
+* **Space:** $O(V)$ — visited + recursion/stack. Worst-case recursion depth can reach $V$; use the iterative form on very deep graphs.
 
-**Starting from A**, here’s how Dijkstra’s algorithm proceeds:
+**Example**
 
-I. Initialize all distances with ∞ except A=0:
+Same graph as the BFS section, start at **A**; assume neighbor order: `B` before `C`, and for `B` the neighbor `D`; for `C` the neighbor `E`.
 
 ```
-A: 0
-B: ∞
-C: ∞
-D: ∞
-E: ∞
+                 ┌─────────┐
+                 │    A    │
+                 └───┬─┬───┘
+                     │ │
+           ┌─────────┘ └─────────┐
+           ▼                     ▼
+      ┌─────────┐           ┌─────────┐
+      │    B    │           │    C    │
+      └───┬─────┘           └───┬─────┘
+          │                     │
+          ▼                     ▼
+     ┌─────────┐           ┌─────────┐
+     │    D    │           │    E    │
+     └─────────┘           └─────────┘
+
+Edges: A–B, A–C, B–D, C–E  (undirected)
 ```
 
-II. From A (distance 0), update neighbors:
+*Recursive DFS trace (pre-order):*
 
 ```
-A: 0
-B: 4  (via A)
-C: 2  (via A)
-D: ∞
-E: ∞
+call DFS(A)
+  visit A
+    -> DFS(B)
+         visit B
+           -> DFS(D)
+                visit D
+                return D
+         return B
+    -> DFS(C)
+         visit C
+           -> DFS(E)
+                visit E
+                return E
+         return C
+  return A
 ```
 
-III. Pick the smallest unvisited vertex (C with distance 2). Update its neighbors:
-
-- B can be updated to 3 if 2 + 1 < 4  
-- D can be updated to 10 if 2 + 8 < ∞  
-- E can be updated to 12 if 2 + 10 < ∞
+*Discovery/finish times (one valid outcome):*
 
 ```
-A: 0
-B: 3  (via C)
-C: 2
-D: 10 (via C)
-E: 12 (via C)
+Vertex | tin | tout | parent
+-------+-----+------+---------
+A      |  1  | 10   | None
+B      |  2  |  5   | A
+D      |  3  |  4   | B
+C      |  6  |  9   | A
+E      |  7  |  8   | C
 ```
 
-IV. Pick the next smallest unvisited vertex (B with distance 3). Update its neighbors:
-
-- D becomes 8 if 3 + 5 < 10  
-- E remains 12 (no direct edge from B to E)
+*Stack/Visited evolution (iterative DFS, top = right):*
 
 ```
-A: 0
-B: 3
-C: 2
-D: 8  (via B)
-E: 12
+Step | Action                       | Stack                 | Visited
+-----+------------------------------+-----------------------+-----------------
+0    | push A                       | [A]                   | {}
+1    | pop A; visit                 | []                    | {A}
+     | push C, B                    | [C, B]                | {A}
+2    | pop B; visit                 | [C]                   | {A, B}
+     | push D                       | [C, D]                | {A, B}
+3    | pop D; visit                 | [C]                   | {A, B, D}
+4    | pop C; visit                 | []                    | {A, B, D, C}
+     | push E                       | [E]                   | {A, B, D, C}
+5    | pop E; visit                 | []                    | {A, B, D, C, E}
 ```
 
-V. Pick the next smallest unvisited vertex (D with distance 8). Update its neighbors:
-
-- E becomes 10 if 8 + 2 < 12
+*DFS tree (tree edges shown), with preorder: A, B, D, C, E*
 
 ```
-A: 0
-B: 3
-C: 2
-D: 8
-E: 10 (via D)
+A
+├── B
+│   └── D
+└── C
+    └── E
 ```
 
-VI. The only remaining vertex is E (distance 10). No further updates are possible.
+**Applications**
 
-**Final shortest paths from A**:
+1. **Path existence & reconstruction.**
+   Use `parent` to backtrack from a target to the start after a DFS that finds it.
 
-```
-A: 0
-B: 3
-C: 2
-D: 8
-E: 10
-```
+2. **Topological sorting (DAGs).**
+   Run DFS on a directed acyclic graph; the **reverse postorder** (vertices sorted by decreasing `tout`) is a valid topological order.
 
-##### Optimizing Time Complexity
+3. **Cycle detection.**
+   *Undirected:* seeing a visited neighbor that isn’t the parent ⇒ cycle.
+   *Directed:* maintain states (`unvisited`, `in_stack`, `done`); encountering an edge to a vertex **in\_stack** (a back edge) ⇒ cycle.
 
-- A basic (array-based) implementation of Dijkstra's algorithm runs in **O(n^2)** time.
-- Using a priority queue (min-heap) to select the vertex with the smallest distance reduces the complexity to **O((V+E) log V)**, where **V** is the number of vertices and **E** is the number of edges.
+4. **Connected components (undirected).**
+   Run DFS from every unvisited node; each run discovers exactly one component.
 
-##### Applications
+5. **Bridges & articulation points (cut vertices).**
+   Using DFS **low-link** values (`low[u] = min(tin[u], tin[v] over back edges, low of children)`), you can find edges whose removal disconnects the graph (bridges) and vertices whose removal increases components (articulation points).
 
-- **Internet routing** protocols use it to determine efficient paths for data packets.
-- **Mapping software** (e.g., Google Maps, Waze) employ variations of Dijkstra to compute driving routes.
-- **Telecommunication networks** use it to determine paths with minimal cost.
+6. **Strongly Connected Components (SCCs, directed graphs).**
+   Tarjan’s (single-pass with a stack and low-link) or Kosaraju’s (two DFS passes) algorithms are built on DFS.
 
-##### Implementation
+7. **Backtracking & search in state spaces.**
+   Classic for maze solving, puzzles (N-Queens, Sudoku), and constraint satisfaction: DFS systematically explores choices and backtracks on dead ends.
 
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dijkstra)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dijkstra)
+8. **Detecting and classifying edges (directed).**
+   With timestamps, classify edges as **tree**, **back**, **forward**, or **cross**—useful for reasoning about structure and correctness.
 
-#### Bellman-Ford Algorithm
+**Implementation**
 
-- **Bellman-Ford algorithm** is a method for finding the shortest paths from a single starting vertex to all other vertices in a weighted graph.
-- Unlike **Dijkstra’s algorithm**, Bellman-Ford can handle **negative edge weights**, making it more flexible for certain types of graphs.
-- The algorithm works by **repeatedly relaxing all edges** in the graph. Relaxing an edge means updating the current shortest distance to a vertex if a shorter path is found via another vertex.
-- The algorithm performs this **relaxation process** exactly **$V - 1$ times**, where $V$ is the number of vertices. This ensures that every possible shortest path is discovered.
-- After completing $V - 1$ relaxations, the algorithm does one more pass to detect **negative weight cycles**. If any edge can still be relaxed, a negative cycle exists and no finite shortest path is defined.
-- Bellman-Ford’s time complexity is **$O(V \times E)$**, which is generally slower than Dijkstra’s algorithm for large graphs.
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dfs)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dfs)
 
-##### Algorithm Steps
+*Implementation tips:*
 
-**Input**
+* For **very deep** or skewed graphs, prefer the **iterative** form to avoid recursion limits.
+* If neighbor order matters (e.g., lexicographic traversal), control push order (push in reverse for stacks) or sort adjacency lists.
+* For sparse graphs, adjacency **lists** are preferred over adjacency matrices for time/space efficiency.
 
-- A weighted graph with possible negative edge weights
-- A starting vertex `A`
+### Shortest paths
+
+A common task when dealing with weighted graphs is to find the shortest route between two vertices, such as from vertex $A$ to vertex $B$. Note that there might not be a unique shortest path, since several paths could have the same length.
 
-**Output**
+#### Dijkstra’s Algorithm
 
-- An array `distances` where `distances[v]` represents the shortest path from `A` to vertex `v`
+Dijkstra’s algorithm computes **shortest paths** from a specified start vertex in a graph with **non-negative edge weights**. It grows a “settled” region outward from the start, always choosing the unsettled vertex with the **smallest known distance** and relaxing its outgoing edges to improve neighbors’ distances.
 
-**Containers and Data Structures**
+To efficiently keep track of the traversal, Dijkstra’s algorithm employs two primary data structures:
 
-- An array `distances`, set to `∞` for all vertices except the start vertex (set to `0`)
-- A `predecessor` array to help reconstruct the actual shortest path
+* A **min-priority queue** (often named `pq`, `open`, or `unexplored`) keyed by each vertex’s current best known distance from the start.
+* A **`dist` map** storing the best known distance to each vertex (∞ initially, except the start), a **`visited`/`finalized` set** to mark vertices whose shortest distance is proven, and a **`parent` map** to reconstruct paths.
 
-**Steps**
+*Useful additions in practice:*
 
-I. Initialize `distances[A]` to `0` and `distances[v]` to `∞` for all other vertices `v`
+* A **target-aware early stop**: if you only need the distance to a specific target, you can stop when that target is popped from the priority queue.
+* **Decrease-key or lazy insertion**: if the PQ doesn’t support decrease-key, push an updated entry and ignore popped stale ones by checking against `dist`.
+* Optional **`pred` lists** for counting shortest paths or reconstructing multiple optimal routes.
 
-II. Repeat `V - 1` times
+**Algorithm Steps**
 
-- For every edge `(u, v)` with weight `w`, if `distances[u] + w < distances[v]`, update `distances[v]` to `distances[u] + w` and `predecessor[v]` to `u`
+1. Begin from a starting vertex, \$i\$.
 
-III. Check for negative cycles by iterating over all edges `(u, v)` again
+2. Initialize `dist[i] = 0`, `parent[i] = None`; for all other vertices `v`, set `dist[v] = ∞`. Push \$i\$ into the min-priority queue keyed by `dist[i]`.
 
-- If `distances[u] + w < distances[v]` for any edge, a negative weight cycle exists
+3. While the priority queue is not empty:
 
-##### Step by Step Example
+   1. **Extract** the vertex `u` with the **smallest** `dist[u]`.
+   2. If `u` is already finalized, continue; otherwise **finalize** `u` (add to `visited`/`finalized`).
+   3. For **each neighbor** `v` of `u` with edge weight `w(u,v) ≥ 0`:
 
-We have vertices A, B, C, D, and E. The edges and weights (including a self-loop on E):
+      * If `dist[u] + w(u,v) < dist[v]`, then **relax** the edge: set
+        `dist[v] = dist[u] + w(u,v)` and `parent[v] = u`, and **push** `v` into the PQ keyed by the new `dist[v]`.
+
+4. Continue until the queue becomes empty (all reachable vertices finalized) or until your **target** has been finalized (early stop).
+
+5. Reconstruct any shortest path by following `parent[·]` **backwards** from the target to the start.
+
+Vertices are **finalized when they are dequeued** (popped) from the priority queue. With **non-negative** weights, once a vertex is popped the recorded `dist` is **provably optimal**.
+
+*Reference pseudocode (adjacency-list graph):*
 
 ```
-A-B: 6
-A-C: 7
-B-C: 8
-B-D: -4
-B-E: 5
-C-E: -3
-D-A: 2
-D-C: 7
-E-E: 9
+Dijkstra(G, i, target=None):
+    INF = +infinity
+    dist   = defaultdict(lambda: INF)
+    parent = {i: None}
+    dist[i] = 0
+
+    pq = MinPriorityQueue()
+    pq.push(i, 0)
+
+    finalized = set()
+
+    while pq:
+        u, du = pq.pop_min()                 # smallest current distance
+        if u in finalized:                   # ignore stale entries
+            continue
+
+        finalized.add(u)
+
+        if target is not None and u == target:
+            break                            # early exit: target finalized
+
+        for (v, w_uv) in G[u]:               # w_uv >= 0
+            alt = du + w_uv
+            if alt < dist[v]:
+                dist[v]   = alt
+                parent[v] = u
+                pq.push(v, alt)              # decrease-key or lazy insert
+
+    return dist, parent
+
+# Reconstruct path i -> t (if t reachable):
+reconstruct(parent, t):
+    path = []
+    while t is not None:
+        path.append(t)
+        t = parent.get(t)
+    return list(reversed(path))
 ```
 
-Adjacency matrix (∞ means no direct edge):
+*Sanity notes:*
+
+* **Time:** with a binary heap, \$O((V + E)\log V)\$; with a Fibonacci heap, \$O(E + V\log V)\$; with a plain array (no heap), \$O(V^2)\$.
+* **Space:** \$O(V)\$ for `dist`, `parent`, PQ bookkeeping.
+* **Preconditions:** All edge weights must be **\$\ge 0\$**. Negative edges invalidate correctness.
+* **Ordering:** Different neighbor iteration orders don’t affect correctness, only tie behavior/performance.
 
-|   |  A |  B |  C |  D |  E |
-|---|----|----|----|----|----|
-| **A** |  0 |  6 |  7 |  ∞ |  ∞ |
-| **B** |  ∞ |  0 |  8 | -4 |  5 |
-| **C** |  ∞ |  ∞ |  0 |  ∞ | -3 |
-| **D** |  2 |  ∞ |  7 |  0 |  ∞ |
-| **E** |  ∞ |  ∞ |  ∞ |  ∞ |  9 |
+**Example**
 
-**Initialization**:
+Weighted, undirected graph; start at **A**. Edge weights are on the links.
 
 ```
-dist[A] = 0
-dist[B] = ∞
-dist[C] = ∞
-dist[D] = ∞
-dist[E] = ∞
+                          ┌────────┐
+                          │   A    │
+                          └─┬──┬───┘
+                         4/   │1
+                       ┌──    │    ──┐
+                 ┌─────▼──┐  │     ┌▼──────┐
+                 │   B    │──┘2    │   C   │
+                 └───┬────┘        └──┬────┘
+                   1 │               4 │
+                     │                 │
+                 ┌───▼────┐      3  ┌──▼───┐
+                 │   E    │────────│   D   │
+                 └────────┘         └──────┘
+
+Edges: A–B(4), A–C(1), C–B(2), B–E(1), C–D(4), D–E(3)
 ```
 
-**Iteration 1** (relax edges from A):
+*Priority queue / Finalized evolution (front = smallest key):*
 
 ```
-dist[B] = 6
-dist[C] = 7
+Step | Pop (u,dist) | Relaxations (v: new dist, parent)         | PQ after push                   | Finalized
+-----+--------------+--------------------------------------------+----------------------------------+----------------
+0    | —            | init A: dist[A]=0                          | [(A,0)]                          | {}
+1    | (A,0)        | B:4←A  , C:1←A                             | [(C,1), (B,4)]                   | {A}
+2    | (C,1)        | B:3←C  , D:5←C                             | [(B,3), (B,4), (D,5)]            | {A,C}
+3    | (B,3)        | E:4←B                                      | [(E,4), (B,4), (D,5)]            | {A,C,B}
+4    | (E,4)        | D:7 via E  (no improve; current 5)         | [(B,4), (D,5)]                   | {A,C,B,E}
+5    | (B,4) stale  | (ignore; B already finalized)              | [(D,5)]                          | {A,C,B,E}
+6    | (D,5)        | —                                          | []                               | {A,C,B,E,D}
 ```
 
-**Iteration 2** (relax edges from B, then C):
+*Distances and parents (final):*
 
 ```
-dist[D] = 2        (6 + (-4))
-dist[E] = 11       (6 + 5)
-dist[E] = 4        (7 + (-3))  // C → E is better
+dist[A]=0 (—)
+dist[C]=1 (A)
+dist[B]=3 (C)
+dist[E]=4 (B)
+dist[D]=5 (C)
+
+Shortest path A→E: A → C → B → E  (total cost 4)
 ```
 
-**Iteration 3** (relax edges from D):
+*Big-picture view of the expanding frontier:*
 
 ```
-dist[A] = 4        (2 + 2)
-(No update for C since dist[C]=7 is already < 9)
+   Settled set grows outward from A by increasing distance.
+   After Step 1: {A}
+   After Step 2: {A, C}
+   After Step 3: {A, C, B}
+   After Step 4: {A, C, B, E}
+   After Step 6: {A, C, B, E, D}  (all reachable nodes done)
 ```
 
-**Iteration 4**:
+**Applications**
+
+1. **Single-source shortest paths** on graphs with **non-negative** weights (roads, networks, transit).
+2. **Navigation/routing** with early stop: stop when the goal is popped to avoid extra work.
+3. **Network planning & QoS:** minimum latency/cost routing, bandwidth-weighted paths (when additive and non-negative).
+4. **As a building block:** A\* with $h \equiv 0$; **Johnson’s algorithm** (all-pairs on sparse graphs); **k-shortest paths** variants.
+5. **Multi-source Dijkstra:** seed the PQ with multiple starts at distance 0 (e.g., nearest facility / multi-sink problems).
+6. **Label-setting baseline** for comparing heuristics (A\*, ALT landmarks, contraction hierarchies).
+7. **Grid pathfinding with terrain costs** (non-negative cell costs) when no admissible heuristic is available.
+
+**Implementation**
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/dijkstra)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/dijkstra)
+
+*Implementation tip:* If your PQ has no decrease-key, **push duplicates** on improvement and, when popping a vertex, **skip it** if it’s already finalized or if the popped key doesn’t match `dist[u]`. This “lazy” approach is simple and fast in practice.
+
+#### Bellman-Ford Algorithm
+
+#### Bellman–Ford Algorithm
+
+Bellman–Ford computes **shortest paths** from a start vertex in graphs that may have **negative edge weights** (but no negative cycles reachable from the start). It works by repeatedly **relaxing** every edge; each full pass can reduce some distances until they stabilize. A final check detects **negative cycles**: if an edge can still be relaxed after $(V-1)$ passes, a reachable negative cycle exists.
+
+To efficiently keep track of the computation, Bellman–Ford employs two primary data structures:
+
+* A **`dist` map** (or array) with the best-known distance to each vertex (initialized to ∞ except the start).
+* A **`parent` map** to reconstruct shortest paths (store `parent[v] = u` when relaxing edge $u\!\to\!v$).
+
+*Useful additions in practice:*
+
+* **Edge list**: iterate edges directly (fast and simple) even if your graph is stored as adjacency lists.
+* **Early exit**: stop as soon as a full pass makes **no updates**.
+* **Negative-cycle extraction**: if an update occurs on pass $V$, backtrack through `parent` to find a cycle.
+* **Reachability guard**: you can skip edges whose source has `dist[u] = ∞` (still unreached).
+
+**Algorithm Steps**
+
+1. Begin from a starting vertex, $i$.
+
+2. Initialize `dist[i] = 0`, `parent[i] = None`; for all other vertices $v$, set `dist[v] = ∞`.
+
+3. Repeat **$V-1$ passes** (where $V$ is the number of vertices):
+
+   1. Set `changed = False`.
+   2. For **each directed edge** $(u,v,w)$ (weight $w$):
+
+      * If `dist[u] + w < dist[v]`, then **relax** the edge:
+        `dist[v] = dist[u] + w`, `parent[v] = u`, and set `changed = True`.
+   3. If `changed` is **False**, break early (all distances stabilized).
+
+4. **Negative-cycle detection** (optional but common):
+   For each edge $(u,v,w)$, if `dist[u] + w < dist[v]`, then a **negative cycle is reachable**.
+   *To extract a cycle:* follow `parent` from `v` **V times** to land inside the cycle; then keep following until you revisit a vertex, collecting the cycle.
+
+5. To get a shortest path to a target $t$ (if no negative cycle affects it), follow `parent[t]` backward to $i$.
+
+*Reference pseudocode (edge list):*
 
 ```
-No changes in this round
+BellmanFord(V, E, i):                 # V: set/list of vertices
+    INF = +infinity                   # E: list of (u, v, w) edges
+    dist   = {v: INF for v in V}
+    parent = {v: None for v in V}
+    dist[i] = 0
+
+    # (V-1) relaxation passes
+    for _ in range(len(V) - 1):
+        changed = False
+        for (u, v, w) in E:
+            if dist[u] != INF and dist[u] + w < dist[v]:
+                dist[v] = dist[u] + w
+                parent[v] = u
+                changed = True
+        if not changed:
+            break
+
+    # Negative-cycle check
+    cycle_vertex = None
+    for (u, v, w) in E:
+        if dist[u] != INF and dist[u] + w < dist[v]:
+            cycle_vertex = v
+            break
+
+    return dist, parent, cycle_vertex     # cycle_vertex=None if no neg cycle
+
+# Reconstruct shortest path i -> t (if safe):
+reconstruct(parent, t):
+    path = []
+    while t is not None:
+        path.append(t)
+        t = parent[t]
+    return list(reversed(path))
 ```
 
-**Final distances from A**:
+*Sanity notes:*
+
+* **Time:** $O(VE)$ (each pass scans all edges; up to $V-1$ passes).
+* **Space:** $O(V)$ for `dist` and `parent`.
+* **Handles negative weights**; **detects** reachable negative cycles.
+* If a reachable **negative cycle** exists, true shortest paths to vertices it can reach are **undefined** (effectively $-\infty$).
+
+**Example**
+
+Directed, weighted graph; start at **A**. (Negative edges allowed; **no** negative cycles here.)
 
 ```
-dist[A] = 0
-dist[B] = 6
-dist[C] = 7
-dist[D] = 2
-dist[E] = 4
+                               ┌─────────┐
+                               │    A    │
+                               └──┬───┬──┘
+                              4  │   │  2
+                                 │   │
+                      ┌──────────▼───┐   -1    ┌─────────┐
+                      │      B       │ ───────►│    C    │
+                      └──────┬───────┘          └──┬─────┘
+                           2 │                     5│
+                             │                      │
+                      ┌──────▼──────┐    -3    ┌───▼─────┐
+                      │      D       │ ◄────── │    E    │
+                      └──────────────┘          └─────────┘
+
+Also:
+A → C (2)
+C → B (1)
+C → E (3)
+(Edges shown with weights on arrows)
+```
+
+*Edges list:*
+`A→B(4), A→C(2), B→C(-1), B→D(2), C→B(1), C→D(5), C→E(3), D→E(-3)`
+
+*Relaxation trace (dist after each full pass; start A):*
+
 ```
+Init (pass 0):
+  dist[A]=0, dist[B]=∞, dist[C]=∞, dist[D]=∞, dist[E]=∞
 
-##### Special Characteristics
+After pass 1:
+  A=0, B=3, C=2, D=6, E=3
+    (A→B=4, A→C=2; C→B improved B to 3; B→D=5? (via B gives 6); D→E=-3 gives E=3)
 
-- It can manage **negative edge weights** but cannot produce valid results when **negative cycles** are present.
-- It is often used when edges can be negative, though it is slower than Dijkstra’s algorithm.
+After pass 2:
+  A=0, B=3, C=2, D=5, E=2
+    (B→D improved D to 5; D→E improved E to 2)
+
+After pass 3:
+  A=0, B=3, C=2, D=5, E=2   (no changes → early stop)
+```
 
-##### Applications
+*Parents / shortest paths (one valid set):*
 
-- **Financial arbitrage** detection in currency exchange markets.
-- **Routing** in networks where edges might have negative costs.
-- **Game development** scenarios with penalties or negative terrain effects.
+```
+parent[A]=None
+parent[C]=A
+parent[B]=C
+parent[D]=B
+parent[E]=D
+
+Example shortest path A→E:
+A → C → B → D → E   with total cost 2 + 1 + 2 + (-3) = 2
+```
+
+*Negative-cycle detection (illustration):*
+If we **add** an extra edge `E→C(-4)`, the cycle `C → D → E → C` has total weight `5 + (-3) + (-4) = -2` (negative).
+Bellman–Ford would perform a $V$-th pass and still find an improvement (e.g., relaxing `E→C(-4)`), so it reports a **reachable negative cycle**.
+
+**Applications**
+
+1. **Shortest paths with negative edges** (when Dijkstra/A\* don’t apply).
+2. **Arbitrage detection** in currency/markets by summing $\log$ weights along cycles.
+3. **Feasibility checks** in difference constraints (systems like $x_v - x_u \le w$).
+4. **Robust baseline** for verifying or initializing faster methods (e.g., Johnson’s algorithm for all-pairs).
+5. **Graphs with penalties/credits** where some transitions reduce accumulated cost.
 
 ##### Implementation
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/bellman_ford)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/bellman_ford)
-  
-#### A* (A-Star) Algorithm
 
-- **A\*** is an informed search algorithm used for **pathfinding** and **graph traversal**.
-- It is a **best-first search** because it prioritizes the most promising paths first, combining known and estimated costs.
-- The algorithm relies on:
-- **g(n)**: The actual cost from the start node to the current node **n**.
-- **h(n)**: A **heuristic** estimating the cost from **n** to the goal.
-- The total cost function is **f(n) = g(n) + h(n)**, guiding the search toward a potentially optimal path.
-- At each step, A* expands the node with the **lowest f(n)** in the priority queue.
-- The heuristic **h(n)** must be **admissible** (never overestimates the real cost) to guarantee an optimal result.
-- A* terminates when it either reaches the **goal** or exhausts all possibilities if no solution exists.
-- It is efficient for many applications because it balances **exploration** with being **goal-directed**, but its performance depends on the heuristic quality.
-- A* is broadly used in **games**, **robotics**, and **navigation** due to its effectiveness in real-world pathfinding.
+*Implementation tip:* For **all-pairs** on sparse graphs with possible negative edges, use **Johnson’s algorithm**: run Bellman–Ford once from a super-source to reweight edges (no negatives), then run **Dijkstra** from each vertex.
 
-##### Algorithm Steps
+#### A* (A-Star) Algorithm
 
-**Input**
+A\* is a best-first search that finds a **least-cost path** from a start to a goal by minimizing
 
-- A graph
-- A start vertex `A`
-- A goal vertex `B`
-- A heuristic function `h(v)` that estimates the cost from `v` to `B`
+$$
+f(n) = g(n) + h(n),
+$$
 
-**Output**
+where:
 
-- The shortest path from `A` to `B` if one exists
+* $g(n)$ = cost from start to $n$ (so far),
+* $h(n)$ = heuristic estimate of the remaining cost from $n$ to the goal.
 
-**Used Data Structures**
+If $h$ is **admissible** (never overestimates) and **consistent** (triangle inequality), A\* is **optimal** and never needs to “reopen” closed nodes.
 
-I. **g(n)**: The best-known cost from the start vertex to vertex `n`
+**Core data structures**
 
-II. **h(n)**: The heuristic estimate from vertex `n` to the goal
+* **Open set**: min-priority queue keyed by $f$ (often called `open` or `frontier`).
+* **Closed set**: a set (or map) of nodes already expanded (finalized).
+* **`g` map**: best known cost-so-far to each node.
+* **`parent` map**: to reconstruct the path on success.
+* (Optional) **`h` cache** and a **tie-breaker** (e.g., prefer larger $g$ or smaller $h$ when $f$ ties).
 
-III. **f(n) = g(n) + h(n)**: The estimated total cost from start to goal via `n`
+**Algorithm Steps**
 
-IV. **openSet**: Starting with the initial node, contains nodes to be evaluated
+1. Initialize `open = {start}` with `g[start]=0`, `f[start]=h(start)`; `parent[start]=None`.
+2. While `open` is not empty:
+   a. Pop `u` with **smallest** `f(u)` from `open`.
+   b. If `u` is the **goal**, reconstruct the path via `parent` and return.
+   c. Add `u` to **closed**.
+   d. For each neighbor `v` of `u` with edge cost `w(u,v) ≥ 0`:
 
-V. **closedSet**: Contains nodes already fully evaluated
+   * `tentative = g[u] + w(u,v)`
+   * If `v` not in `g` or `tentative < g[v]`: update `parent[v]=u`, `g[v]=tentative`, `f[v]=g[v]+h(v)` and push `v` into `open` (even if it was there before with a worse key).
+3. If `open` empties without reaching the goal, no path exists.
 
-VI. **cameFrom**: Structure to record the path taken
+*Mark neighbors **when you enqueue them** (by storing their best `g`) to avoid duplicate work; with **consistent** $h$, any node popped from `open` is final and will not improve later.*
 
-**Steps**
+**Reference pseudocode**
 
-I. Add the starting node to the **openSet**
+```
+A_star(G, start, goal, h):
+    open = MinPQ()                        # keyed by f = g + h
+    open.push(start, h(start))
+    g = {start: 0}
+    parent = {start: None}
+    closed = set()
 
-II. While the **openSet** is not empty
+    while open:
+        u = open.pop_min()                # node with smallest f
+        if u == goal:
+            return reconstruct_path(parent, goal), g[goal]
 
-- Get the node `current` in **openSet** with the lowest **f(n)**
-- If `current` is the goal node, reconstruct the path and return it
-- Remove `current` from **openSet** and add it to **closedSet**
-- For each neighbor `n` of `current`, skip it if it is in **closedSet**
-- If `n` is not in **openSet**, add it and compute **g(n)**, **h(n)**, and **f(n)**
-- If a better path to `n` is found, update **cameFrom** for `n`
+        closed.add(u)
 
-III. If the algorithm terminates without finding the goal, no path exists
+        for (v, w_uv) in G.neighbors(u):  # w_uv >= 0
+            tentative = g[u] + w_uv
+            if v in closed and tentative >= g.get(v, +inf):
+                continue
 
-##### Step by Step Example
+            if tentative < g.get(v, +inf):
+                parent[v] = u
+                g[v] = tentative
+                f_v = tentative + h(v)
+                open.push(v, f_v)         # decrease-key OR push new entry
 
-We have a graph with vertices A, B, C, D, and E:
+    return None, +inf
 
+reconstruct_path(parent, t):
+    path = []
+    while t is not None:
+        path.append(t)
+        t = parent[t]
+    return list(reversed(path))
 ```
-A-B: 1
-A-C: 2
-B-D: 3
-C-D: 2
-D-E: 1
-```
-
-Heuristic estimates to reach E:
 
-```
-h(A) = 3
-h(B) = 2
-h(C) = 2
-h(D) = 1
-h(E) = 0
-```
+*Sanity notes:*
 
-Adjacency matrix (∞ = no direct path):
+* **Time:** Worst-case exponential; practically much faster with informative $h$.
+* **Space:** $O(V)$ for maps + PQ (A\* is memory-hungry).
+* **Special cases:** If $h \equiv 0$, A\* ≡ **Dijkstra**. If all edges cost 1 and $h \equiv 0$, it behaves like **BFS**.
 
-|   | A | B | C | D |  E |
-|---|---|---|---|---|----|
-| **A** | 0 | 1 | 2 | ∞ |  ∞ |
-| **B** | ∞ | 0 | ∞ | 3 |  ∞ |
-| **C** | ∞ | ∞ | 0 | 2 |  ∞ |
-| **D** | ∞ | ∞ | ∞ | 0 |  1 |
-| **E** | ∞ | ∞ | ∞ | ∞ |  0 |
+**Visual walkthrough (grid with 4-neighborhood, Manhattan $h$)**
 
-**Initialization**:
+Legend: `S` start, `G` goal, `#` wall, `.` free, `◉` expanded (closed), `•` frontier (open), `×` final path
 
 ```
-g(A) = 0
-f(A) = g(A) + h(A) = 0 + 3 = 3
-openSet = [A]
-closedSet = []
+Row/Col →   1 2 3 4 5 6 7 8 9
+           ┌───────────────────┐
+1   S  .  .  .  .  #  .  .  .  │
+2   .  #  #  .  .  #  .  #  .  │
+3   .  .  .  .  .  .  .  #  .  │
+4   #  .  #  #  .  #  .  .  .  │
+5   .  .  .  #  .  .  .  #  G  │
+           └───────────────────┘
+Movement cost = 1 per step; 4-dir moves; h = Manhattan distance
 ```
 
-Expand **A**:
+**Early expansion snapshot (conceptual):**
 
 ```
-f(B) = 0 + 1 + 2 = 3
-f(C) = 0 + 2 + 2 = 4
-```
+Step 0:
+Open: [(S, g=0, h=|S-G|, f=g+h)]         Closed: {}
+Grid: S is • (on frontier)
 
-Expand **B** next (lowest f=3):
+Step 1: pop S → expand neighbors
+Open: [((1,2), g=1, h=?, f=?), ((2,1), g=1, h=?, f=?)]
+Closed: {S}
+Marks: S→ ◉, its valid neighbors → •
 
-```
-f(D) = g(B) + cost(B,D) + h(D) = 1 + 3 + 1 = 5
+Step 2..k:
+A* keeps popping the lowest f, steering toward G.
+Nodes near the straight line to G are preferred over detours around '#'.
 ```
 
-Next lowest is **C** (f=4):
+**When goal is reached, reconstruct the path:**
 
 ```
-f(D) = g(C) + cost(C,D) + h(D) = 2 + 2 + 1 = 5 (no improvement)
+Final path (example rendering):
+           ┌───────────────────┐
+1   ×  ×  ×  ×  .  #  .  .  .  │
+2   ×  #  #  ×  ×  #  .  #  .  │
+3   ×  ×  ×  ×  ×  ×  ×  #  .  │
+4   #  .  #  #  ×  #  ×  ×  ×  │
+5   .  .  .  #  ×  ×  ×  #  G  │
+           └───────────────────┘
+Path length (g at G) equals number of × steps (optimal with admissible/consistent h).
 ```
 
-Expand **D** (f=5):
+**Priority queue evolution (toy example)**
 
 ```
-f(E) = g(D) + cost(D,E) + h(E) = 5 + 1 + 0 = 6
-E is the goal; algorithm stops.
+Step | Popped u | Inserted neighbors (v: g,h,f)                  | Note
+-----+----------+-------------------------------------------------+---------------------------
+0    | —        | push S: g=0, h=14, f=14                        | S at (1,1), G at (5,9)
+1    | S        | (1,2): g=1,h=13,f=14 ; (2,1): g=1,h=12,f=13    | pick (2,1) next
+2    | (2,1)    | (3,1): g=2,h=11,f=13 ; (2,2) blocked           | ...
+3    | (3,1)    | (4,1) wall; (3,2): g=3,h=10,f=13               | still f=13 band
+…    | …        | frontier slides along the corridor toward G    | A* hugs the beeline
 ```
 
-Resulting path: **A -> B -> D -> E** with total cost **5**.
+(Exact numbers depend on the specific grid and walls; shown for intuition.)
 
-##### Special Characteristics
+---
 
-- **A\*** finds an optimal path if the heuristic is **admissible**.
-- Edges must have **non-negative weights** for A* to work correctly.
-- A good heuristic drastically improves its efficiency.
+### Heuristic design
 
-##### Applications
+For **grids**:
 
-- Used in **video games** for enemy AI or player navigation.
-- Employed in **robotics** for motion planning.
-- Integral to **mapping** and **GPS** systems for shortest route calculations.
+* **4-dir moves:** $h(n)=|x_n-x_g|+|y_n-y_g|$ (Manhattan).
+* **8-dir (diag cost √2):** **Octile**: $h=\Delta_{\max} + (\sqrt{2}-1)\Delta_{\min}$.
+* **Euclidean** when motion is continuous and diagonal is allowed.
 
-##### Implementation
+For **sliding puzzles (e.g., 8/15-puzzle)**:
+
+* **Misplaced tiles** (admissible, weak).
+* **Manhattan sum** (stronger).
+* **Linear conflict / pattern databases** (even stronger).
+
+**Admissible vs. consistent**
+
+* **Admissible:** $h(n) \leq h^\*(n)$ (true remaining cost). Guarantees optimality.
+* **Consistent (monotone):** $h(u) \le w(u,v) + h(v)$ for every edge.
+  Ensures $f$-values are nondecreasing along paths; once a node is popped, its `g` is final (no reopen).
+
+**Applications**
+
+1. **Pathfinding** in maps, games, robotics (shortest or least-risk routes).
+2. **Route planning** with road metrics (time, distance, tolls) and constraints.
+3. **Planning & scheduling** in AI as a general shortest-path in state spaces.
+4. **Puzzle solving** (8-puzzle, Sokoban variants) with domain-specific $h$.
+5. **Network optimization** where edge costs are nonnegative and heuristics exist.
+
+**Variants & practical tweaks**
+
+* **Dijkstra** = A\* with $h \equiv 0$.
+* **Weighted A\***: use $f = g + \varepsilon h$ ($\varepsilon>1$) for faster, **bounded-suboptimal** search.
+* **A\*ε / Anytime A\***: start with $\varepsilon>1$, reduce over time to approach optimal.
+* **IDA\***: iterative deepening on $f$-bound; **much lower memory**, sometimes slower.
+* **RBFS / Fringe Search**: memory-bounded alternatives.
+* **Tie-breaking**: on equal $f$, prefer **larger $g$** (deeper) or **smaller $h$** to reduce node re-expansions.
+* **Closed-set policy**: if $h$ is **inconsistent**, allow **reopening** when a better `g` is found.
+
+**Pitfalls & tips**
+
+* **No negative edges.** A\* assumes $w(u,v) \ge 0$.
+* **Overestimating $h$** breaks optimality.
+* **Precision issues:** with floats, compare $f$ using small epsilons.
+* **State hashing:** ensure equal states hash equal (avoid exploding duplicates).
+* **Neighbor order:** doesn’t affect optimality, but affects performance/trace aesthetics.
+
+**Implementation**
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/a_star)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/a_star)
 
+*Implementation tip:* If your PQ lacks decrease-key, **push duplicates** with improved keys and ignore stale entries when popped (check if popped `g` matches current `g[u]`). This is simple and fast in practice.
+
 ### Minimal Spanning Trees
 
 Suppose we have a graph that represents a network of houses. Weights represent the distances between vertices, which each represent a single house. All houses must have water, electricity, and internet, but we want the cost of installation to be as low as possible. We need to identify a subgraph of our graph with the following properties:
@@ -737,226 +1090,467 @@ Such a subgraph is called a minimal spanning tree.
 
 #### Prim's Algorithm
 
-- **Prim's Algorithm** is used to find a **minimum spanning tree (MST)**, which is a subset of a graph that connects all its vertices with the smallest total edge weight.  
-- It works on a **weighted undirected graph**, meaning the edges have weights, and the direction of edges doesn’t matter.  
-- It starts with an **arbitrary vertex** and grows the MST by adding one edge at a time.  
-- At each step, it chooses the **smallest weight edge** that connects a vertex in the MST to a vertex not yet in the MST (a **greedy** approach).  
-- This process continues until **all vertices** are included.  
-- The resulting MST is **connected**, ensuring a path between any two vertices, and the total edge weight is minimized.  
-- Using a **priority queue** (min-heap), it can achieve a time complexity of **O(E log V)** with adjacency lists, where E is the number of edges and V is the number of vertices.  
-- With an adjacency matrix, the algorithm can be implemented in **O(V^2)** time.
+#### Prim’s Algorithm
 
-##### Algorithm Steps
+Prim’s algorithm builds a **minimum spanning tree (MST)** of a **weighted, undirected** graph by growing a tree from a start vertex. At each step it adds the **cheapest edge** that connects a vertex **inside** the tree to a vertex **outside** the tree.
 
-**Input**
+To efficiently keep track of the construction, Prim’s algorithm employs two primary data structures:
 
-- A connected, undirected graph with weighted edges
-- A start vertex `A`
+* A **min-priority queue** (often named `pq`, `open`, or `unexplored`) keyed by a vertex’s **best known connection cost** to the current tree.
+* A **`in_mst`/`visited` set** to mark vertices already added to the tree, plus a **`parent` map** to record the chosen incoming edge for each vertex.
 
-**Output**
+*Useful additions in practice:*
 
-- A minimum spanning tree, which is a subset of the edges that connects all vertices together without any cycles and with the minimum total edge weight
+* A **`key` map** where `key[v]` stores the lightest edge weight found so far that connects `v` to the current tree (∞ initially, except the start which is 0).
+* **Lazy updates** if your PQ has no decrease-key: push improved `(v, key[v])` again and skip stale pops.
+* **Component handling**: if the graph can be **disconnected**, either run Prim once per component (restarting at an unvisited vertex) or seed the PQ with **multiple starts** (`key=0`) to produce a **spanning forest**.
 
-**Containers and Data Structures**
+**Algorithm Steps**
 
-- An array `key[]` to store the minimum reachable edge weight for each vertex. Initially, `key[v] = ∞` for all `v` except the first chosen vertex (set to `0`)
-- A boolean array `mstSet[]` to keep track of whether a vertex is included in the MST. Initially, all values are `false`
-- An array `parent[]` to store the MST. Each `parent[v]` indicates the vertex connected to `v` in the MST
+1. Begin from a starting vertex, \$i\$.
 
-**Steps**
+2. Initialize `key[i] = 0`, `parent[i] = None`; for all other vertices `v`, set `key[v] = ∞`. Push \$i\$ into the min-priority queue keyed by `key`.
 
-I. Start with an arbitrary node as the initial MST node
+3. While the priority queue is not empty:
 
-II. While there are vertices not yet included in the MST
+   1. **Extract** the vertex `u` with the **smallest** `key[u]`.
+   2. If `u` is already in the MST, continue; otherwise **add `u` to the MST** (insert into `in_mst`).
+      If `parent[u]` is not `None`, record the tree edge `(parent[u], u)`.
+   3. For **each neighbor** `v` of `u` with edge weight `w(u,v)`:
 
-- Pick a vertex `v` with the smallest `key[v]`
-- Include `v` in `mstSet[]`
-- For each neighboring vertex `u` of `v` not in the MST
-- If the weight of edge `(u, v)` is less than `key[u]`, update `key[u]` and set `parent[u]` to `v`
+      * If `v` is **not** in the MST **and** `w(u,v) < key[v]`, then **improve** the connection to `v`:
+        set `key[v] = w(u,v)`, `parent[v] = u`, and **push** `v` into the PQ keyed by the new `key[v]`.
 
-III. The MST is formed using the `parent[]` array once all vertices are included
+4. Continue until the queue is empty (or until all vertices are in the MST for a connected graph).
 
-##### Step by Step Example
+5. The set of edges `{ (parent[v], v) : v ≠ i }` forms an MST; the MST **total weight** is `∑ key[v]` when each `v` is added.
 
-Consider a simple graph with vertices **A**, **B**, **C**, **D**, and **E**. The edges with weights are:
+Vertices are **finalized when they are dequeued**: at that moment, `key[u]` is the **minimum** cost to connect `u` to the growing tree (by the **cut property**).
+
+*Reference pseudocode (adjacency-list graph):*
 
 ```
-A-B: 2
-A-C: 3
-B-D: 1
-B-E: 3
-C-D: 4
-C-E: 5
-D-E: 2
-```
+Prim(G, i):
+    INF = +infinity
+    key    = defaultdict(lambda: INF)
+    parent = {i: None}
+    key[i] = 0
 
-The adjacency matrix for the graph (using ∞ where no direct edge exists) is:
+    pq = MinPriorityQueue()              # holds (key[v], v)
+    pq.push((0, i))
 
-|   | A | B | C | D | E |
-|---|---|---|---|---|---|
-| **A** | 0 | 2 | 3 | ∞ | ∞ |
-| **B** | 2 | 0 | ∞ | 1 | 3 |
-| **C** | 3 | ∞ | 0 | 4 | 5 |
-| **D** | ∞ | 1 | 4 | 0 | 2 |
-| **E** | ∞ | 3 | 5 | 2 | 0 |
+    in_mst = set()
+    mst_edges = []
 
-Run Prim's algorithm starting from vertex **A**:
+    while pq:
+        ku, u = pq.pop_min()             # smallest key
+        if u in in_mst:
+            continue
+        in_mst.add(u)
 
-I. **Initialization**  
+        if parent[u] is not None:
+            mst_edges.append((parent[u], u, ku))
 
-```
-Chosen vertex: A  
-Not in MST: B, C, D, E  
+        for (v, w_uv) in G[u]:           # undirected: each edge seen twice
+            if v not in in_mst and w_uv < key[v]:
+                key[v] = w_uv
+                parent[v] = u
+                pq.push((key[v], v))     # decrease-key or lazy insert
+
+    return mst_edges, parent, sum(w for (_,_,w) in mst_edges)
 ```
 
-II. **Pick the smallest edge from A**  
+*Sanity notes:*
 
-```
-Closest vertex is B with a weight of 2.  
-MST now has: A, B  
-Not in MST: C, D, E  
-```
+* **Time:** with a binary heap, $O(E \log V)$; with a Fibonacci heap, $O(E + V \log V)$.
+  Dense graph (adjacency matrix + no PQ) variant runs in $O(V^2)$.
+* **Space:** $O(V)$ for `key`, `parent`, and MST bookkeeping.
+* **Graph type:** **weighted, undirected**; weights may be negative or positive (no restriction like Dijkstra).
+  If the graph is **disconnected**, Prim yields a **minimum spanning forest** (one tree per component).
+* **Uniqueness:** If all edge weights are **distinct**, the MST is **unique**.
+
+**Example**
 
-III. **From A and B, pick the smallest edge**  
+Undirected, weighted graph; start at **A**. Edge weights shown on links.
 
 ```
-Closest vertex is D (from B) with a weight of 1.  
-MST now has: A, B, D  
-Not in MST: C, E  
+                          ┌────────┐
+                          │   A    │
+                          └─┬──┬───┘
+                         4/   │1
+                       ┌──    │    ──┐
+                 ┌─────▼──┐  │     ┌▼──────┐
+                 │   B    │──┘2    │   C   │
+                 └───┬────┘        └──┬────┘
+                   1 │               4 │
+                     │                 │
+                 ┌───▼────┐      3  ┌──▼───┐
+                 │   E    │────────│   D   │
+                 └────────┘         └──────┘
+
+Edges: A–B(4), A–C(1), C–B(2), B–E(1), C–D(4), D–E(3)
 ```
 
-IV. **Next smallest edge from A, B, or D**  
+*Frontier (keys) / In-tree evolution (min at front):*
 
 ```
-Closest vertex is E (from D) with a weight of 2.  
-MST now has: A, B, D, E  
-Not in MST: C  
+Legend: key[v] = cheapest known connection to tree; parent[v] = chosen neighbor
+
+Step | Action                          | PQ (key:vertex) after push         | In MST | Updated keys / parents
+-----+---------------------------------+------------------------------------+--------+-------------------------------
+0    | init at A                       | [0:A]                               | {}     | key[A]=0, others=∞
+1    | pop A → add                     | [1:C, 4:B]                          | {A}    | key[C]=1 (A), key[B]=4 (A)
+2    | pop C → add                     | [2:B, 4:D, 4:B]                     | {A,C}  | key[B]=min(4,2)=2 (C), key[D]=4 (C)
+3    | pop B(2) → add                  | [1:E, 4:D, 4:B]                     | {A,C,B}| key[E]=1 (B)
+4    | pop E(1) → add                  | [3:D, 4:D, 4:B]                     | {A,C,B,E}| key[D]=min(4,3)=3 (E)
+5    | pop D(3) → add                  | [4:D, 4:B]                          | {A,C,B,E,D}| done
 ```
 
-V. **Pick the final vertex**  
+*MST edges chosen (with weights):*
 
 ```
-The closest remaining vertex is C (from A) with a weight of 3.  
-MST now has: A, B, D, E, C  
+A—C(1), C—B(2), B—E(1), E—D(3)
+Total weight = 1 + 2 + 1 + 3 = 7
 ```
 
-The MST includes the edges: **A-B (2), B-D (1), D-E (2),** and **A-C (3)**, with a total weight of **8**.
-
-##### Special Characteristics
+*Resulting MST (tree edges only):*
 
-- It always selects the smallest edge that can connect a new vertex to the existing MST.  
-- Different choices of starting vertex can still result in the same total MST weight (though the exact edges might differ if multiple edges have the same weight).  
-- With adjacency lists and a priority queue, the time complexity is **O(E log V)**; with an adjacency matrix, it is **O(V^2)**.
+```
+A
+└── C (1)
+    └── B (2)
+        └── E (1)
+            └── D (3)
+```
 
-##### Applications
+**Applications**
 
-- **Network design**: Building telecommunication networks with minimal cable length.  
-- **Road infrastructure**: Constructing roads, tunnels, or bridges at minimal total cost.  
-- **Utility services**: Designing water, electrical, or internet infrastructure to connect all locations at minimum cost.
+1. **Network design** (least-cost wiring/piping/fiber) connecting all sites with minimal total cost.
+2. **Approximation for TSP** (metric TSP 2-approx via MST preorder walk).
+3. **Clustering (single-linkage)**: remove the **k−1** heaviest edges of the MST to form **k** clusters.
+4. **Image processing / segmentation**: MST over pixels/superpixels to find low-contrast boundaries.
+5. **Map generalization / simplification**: keep a connectivity backbone with minimal redundancy.
+6. **Circuit design / VLSI**: minimal interconnect length under simple models.
 
 ##### Implementation
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/prim)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/prim)
 
-#### Kruskal's Algorithm
+*Implementation tip:*
+For **dense graphs** ($E \approx V^2$), skip heaps: store `key` in an array and, at each step, scan all non-MST vertices to pick the minimum `key` in $O(V)$. Overall $O(V^2)$ but often **faster in practice** on dense inputs due to low overhead.
 
-- **Kruskal's Algorithm** is used to find a **minimum spanning tree (MST)** in a connected, undirected graph with weighted edges.  
-- It **sorts all edges** from smallest to largest by weight.  
-- It **adds edges** one by one to the MST if they do not form a cycle.  
-- **Cycle detection** is managed by a **disjoint-set** (union-find) data structure, which helps quickly determine if two vertices belong to the same connected component.  
-- If adding an edge connects two different components, it is safe to include; if both vertices are already in the same component, including that edge would create a cycle and is skipped.  
-- The process continues until the MST has **V-1** edges, where **V** is the number of vertices.  
-- Its time complexity is **O(E \log E)**, dominated by sorting the edges, while union-find operations typically take near-constant time (**O(α(V))**, where α is the inverse Ackermann function).
 
-##### Algorithm Steps
+#### Kruskal's Algorithm
+#### Kruskal’s Algorithm
 
-**Input**
+Kruskal’s algorithm builds a **minimum spanning tree (MST)** for a **weighted, undirected** graph by sorting all edges by weight (lightest first) and repeatedly adding the next lightest edge that **does not create a cycle**. It grows the MST as a forest of trees that gradually merges until all vertices are connected.
 
-- A connected, undirected graph with weighted edges
+To efficiently keep track of the construction, Kruskal’s algorithm employs two primary data structures:
 
-**Output**
+* A **sorted edge list** (ascending by weight) that drives which edge to consider next.
+* A **Disjoint Set Union (DSU)**, also called **Union–Find**, to detect whether an edge’s endpoints are already in the same tree (cycle) or in different trees (safe to unite).
 
-- A subset of edges forming a MST, ensuring all vertices are connected with no cycles and minimal total weight
+*Useful additions in practice:*
 
-**Containers and Data Structures**
+* **Union–Find with path compression** + **union by rank/size** for near-constant-time merges and finds.
+* **Early stop**: in a connected graph with $V$ vertices, once you’ve added **$V-1$** edges, the MST is complete.
+* **Deterministic tie-breaking**: when equal weights occur, break ties consistently for reproducible MSTs.
+* **Disconnected graphs**: Kruskal naturally yields a **minimum spanning forest** (one MST per component).
 
-- A list or priority queue to sort the edges by weight
-- A `disjoint-set (union-find)` structure to manage and merge connected components
+**Algorithm Steps**
 
-**Steps**
+1. Gather all edges $E=\{(u,v,w)\}$ and **sort** them by weight $w$ (ascending).
 
-I. Sort all edges in increasing order of their weights
+2. Initialize **DSU** with each vertex in its **own set**; `parent[v]=v`, `rank[v]=0`.
 
-II. Initialize a forest where each vertex is its own tree
+3. Traverse the sorted edges one by one:
 
-III. Iterate through the sorted edges
+   1. For edge $(u,v,w)$, compute `ru = find(u)`, `rv = find(v)` in DSU.
+   2. If `ru ≠ rv` (endpoints in **different** sets), **add** $(u,v,w)$ to the MST and **union** the sets.
+   3. Otherwise, **skip** the edge (it would create a cycle).
 
-- If the edge `(u, v)` connects two different components, include it in the MST and perform a `union` of the sets
-- If it connects vertices in the same component, skip it
+4. Stop when either **$V-1$** edges are chosen (connected case) or edges are exhausted (forest case).
 
-IV. Once `V-1` edges have been added, the MST is complete
+5. The chosen edges form the **MST**; the **total weight** is the sum of their weights.
 
-##### Step by Step Example
+By the **cycle** and **cut** properties of MSTs, selecting the minimum-weight edge that crosses any cut between components is always safe; rejecting edges that close a cycle preserves optimality.
 
-Consider a graph with vertices **A**, **B**, **C**, **D**, and **E**. The weighted edges are:
+*Reference pseudocode (edge list + DSU):*
 
 ```
-A-B: 2
-A-C: 3
-B-D: 1
-B-E: 3
-C-D: 4
-C-E: 5
-D-E: 2
+Kruskal(V, E):
+    # V: iterable of vertices
+    # E: list of edges (u, v, w) for undirected graph
+
+    sort E by weight ascending
+
+    make_set(v) for v in V        # DSU init: parent[v]=v, rank[v]=0
+
+    mst_edges = []
+    total = 0
+
+    for (u, v, w) in E:
+        if find(u) != find(v):
+            union(u, v)
+            mst_edges.append((u, v, w))
+            total += w
+            if len(mst_edges) == len(V) - 1:   # early stop if connected
+                break
+
+    return mst_edges, total
+
+# Union-Find helpers (path compression + union by rank):
+find(x):
+    if parent[x] != x:
+        parent[x] = find(parent[x])
+    return parent[x]
+
+union(x, y):
+    rx, ry = find(x), find(y)
+    if rx == ry: return
+    if rank[rx] < rank[ry]:
+        parent[rx] = ry
+    elif rank[rx] > rank[ry]:
+        parent[ry] = rx
+    else:
+        parent[ry] = rx
+        rank[rx] += 1
 ```
 
-The adjacency matrix (∞ indicates no direct edge):
+*Sanity notes:*
 
-|   | A | B | C | D | E |
-|---|---|---|---|---|---|
-| **A** | 0 | 2 | 3 | ∞ | ∞ |
-| **B** | 2 | 0 | ∞ | 1 | 3 |
-| **C** | 3 | ∞ | 0 | 4 | 5 |
-| **D** | ∞ | 1 | 4 | 0 | 2 |
-| **E** | ∞ | 3 | 5 | 2 | 0 |
+* **Time:** Sorting dominates: $O(E \log E)$ = $O(E \log V)$. DSU operations are almost $O(1)$ amortized (inverse Ackermann).
+* **Space:** $O(V)$ for DSU; $O(E)$ to store edges.
+* **Weights:** May be **negative or positive** (unlike Dijkstra); graph must be **undirected**.
+* **Uniqueness:** If all edge weights are **distinct**, the MST is **unique**.
 
-**Sort edges** by weight:
+**Example**
+
+Undirected, weighted graph (we’ll draw the key edges clearly and list the rest).
+Start with all vertices as separate sets: `{A} {B} {C} {D} {E} {F}`.
 
 ```
-B-D: 1
-A-B: 2
-D-E: 2
-A-C: 3
-B-E: 3
-C-D: 4
-C-E: 5
+Top row:                 A────────4────────B────────2────────C
+                         │                     │
+                         │                     │
+                         7                     3
+                         │                     │
+Bottom row:              F────────1────────E───┴──────────────D
+                               (E–F)
+Other edges (not all drawn to keep the picture clean):
+A–C(4), B–D(5), C–D(5), C–E(5), D–E(6), D–F(2)
 ```
 
-1. **Pick B-D (1)**: Include it. MST has {B-D}, weight = 1.  
-2. **Pick A-B (2)**: Include it. MST has {B-D, A-B}, weight = 3.  
-3. **Pick D-E (2)**: Include it. MST has {B-D, A-B, D-E}, weight = 5.  
-4. **Pick A-C (3)**: Include it. MST has {B-D, A-B, D-E, A-C}, weight = 8.  
-5. **Pick B-E (3)**: Would form a cycle (B, D, E already connected), skip.  
-6. **Pick C-D (4)**: Would form a cycle (C, D already connected), skip.  
-7. **Pick C-E (5)**: Would form a cycle as well, skip.  
+*Sorted edge list (ascending):*
+`E–F(1), B–C(2), D–F(2), B–E(3), A–B(4), A–C(4), B–D(5), C–D(5), C–E(5), D–E(6), A–F(7)`
+
+*Union–Find / MST evolution (take the edge if it connects different sets):*
+
+```
+Step | Edge (w)  | Find(u), Find(v) | Action     | Components after union            | MST so far                 | Total
+-----+-----------+-------------------+------------+-----------------------------------+----------------------------+------
+ 1   | E–F (1)   | {E}, {F}          | TAKE       | {E,F} {A} {B} {C} {D}            | [E–F(1)]                   | 1
+ 2   | B–C (2)   | {B}, {C}          | TAKE       | {E,F} {B,C} {A} {D}              | [E–F(1), B–C(2)]           | 3
+ 3   | D–F (2)   | {D}, {E,F}        | TAKE       | {B,C} {D,E,F} {A}                | [E–F(1), B–C(2), D–F(2)]   | 5
+ 4   | B–E (3)   | {B,C}, {D,E,F}    | TAKE       | {A} {B,C,D,E,F}                  | [..., B–E(3)]              | 8
+ 5   | A–B (4)   | {A}, {B,C,D,E,F}  | TAKE       | {A,B,C,D,E,F} (all connected)    | [..., A–B(4)]              | 12
+     | (stop: we have V−1 = 5 edges for 6 vertices)
+```
 
-The MST edges are **B-D, A-B, D-E, and A-C**, total weight = **8**.
+*Resulting MST edges and weight:*
 
-##### Special Characteristics
+```
+E–F(1), B–C(2), D–F(2), B–E(3), A–B(4)    ⇒  Total = 1 + 2 + 2 + 3 + 4 = 12
+```
 
-- It always picks the **smallest available edge** that won't create a cycle.  
-- In case of a **tie**, any equally weighted edge can be chosen.  
-- The approach is particularly efficient for **sparse graphs**.  
-- Sorting edges takes **O(E \log E)** time, and disjoint-set operations can be considered almost **O(1)** on average.
+*Clean MST view (tree edges only):*
 
-##### Applications
+```
+A
+└── B (4)
+    ├── C (2)
+    └── E (3)
+        └── F (1)
+            └── D (2)
+```
 
-- **Network design**: Connecting servers or cities using minimal cable length.  
-- **Infrastructure**: Building road systems, water lines, or power grids with the smallest total cost.  
-- **Any MST requirement**: Ensuring connectivity among all nodes at minimum cost.
+**Applications**
 
-##### Implementation
+1. **Network design:** least-cost backbone (roads, fiber, pipes) connecting all sites with minimal total length/cost.
+2. **Clustering (single-linkage):** build MST, then cut the **k−1** heaviest edges to form **k** clusters.
+3. **Image segmentation:** graph-based grouping by intensity/feature differences via MST.
+4. **Approximation for metric TSP:** preorder walk of MST gives a 2-approx tour (with shortcutting).
+5. **Circuit/VLSI layout:** minimal interconnect under simple models.
+6. **Maze generation:** randomized Kruskal picks edges in random order subject to acyclicity.
+
+**Implementation**
 
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/kruskal)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/kruskal)
+
+*Implementation tip:*
+On huge graphs that **stream from disk**, you can **external-sort** edges by weight, then perform a single pass with DSU. For reproducibility across platforms, **stabilize** sorting by `(weight, min(u,v), max(u,v))`.
+
+### Topological Sort
+
+Topological sort orders the vertices of a **directed acyclic graph (DAG)** so that **every directed edge** $u \rightarrow v$ goes **from left to right** in the order (i.e., $u$ appears before $v$). It’s the canonical tool for scheduling tasks with dependencies.
+
+To efficiently keep track of the process (Kahn’s algorithm), we use:
+
+* A **queue** (or min-heap if you want lexicographically smallest order) holding all vertices with **indegree = 0** (no unmet prerequisites).
+* An **`indegree` map/array** that counts for each vertex how many prerequisites remain.
+* An **`order` list** to append vertices as they are “emitted.”
+
+*Useful additions in practice:*
+
+* A **`visited_count`** (or length of `order`) to detect cycles: if, after processing, fewer than $V$ vertices were output, the graph has a cycle.
+* A **min-heap** instead of a FIFO queue to get the **lexicographically smallest** valid topological order.
+* A **DFS-based alternative**: run DFS and take vertices in **reverse postorder** (also $O(V+E)$); with DFS you detect cycles via a 3-color/stack state.
+
+**Algorithm Steps (Kahn’s algorithm)**
+
+1. Compute `indegree[v]` for every vertex $v$.
+
+2. Initialize a queue `Q` with **all** vertices of indegree 0.
+
+3. While `Q` is not empty:
+
+   1. **Dequeue** a vertex `u` and append it to `order`.
+   2. For each outgoing edge `u → v`:
+
+      * Decrement `indegree[v]` by 1.
+      * If `indegree[v]` becomes 0, **enqueue** `v`.
+
+4. If `len(order) < V`, a **cycle exists** (topological order does not exist). Otherwise, `order` is a valid topological ordering.
+
+*Reference pseudocode (adjacency-list graph):*
+
+```
+TopoSort_Kahn(G):
+    # G[u] = iterable of neighbors v with edge u -> v
+    V = all_vertices(G)
+    indeg = {v: 0 for v in V}
+    for u in V:
+        for v in G[u]:
+            indeg[v] += 1
+
+    Q = Queue()
+    for v in V:
+        if indeg[v] == 0:
+            Q.enqueue(v)
+
+    order = []
+
+    while not Q.empty():
+        u = Q.dequeue()
+        order.append(u)
+        for v in G[u]:
+            indeg[v] -= 1
+            if indeg[v] == 0:
+                Q.enqueue(v)
+
+    if len(order) != len(V):
+        return None    # cycle detected
+    return order
+```
+
+*Sanity notes:*
+
+* **Time:** $O(V + E)$ — each vertex enqueued once; each edge decreases an indegree once.
+* **Space:** $O(V)$ — for indegrees, queue, and output.
+* **Input:** Must be a **DAG**; if a cycle exists, **no** topological order exists.
+
+**Example**
+
+DAG; we’ll start with all indegree-0 vertices. (Edges shown as arrows.)
+
+```
+                           ┌───────┐
+                           │   A   │
+                           └───┬───┘
+                               │
+                               │
+        ┌───────┐          ┌───▼───┐          ┌───────┐
+        │   B   │──────────│   C   │──────────│   D   │
+        └───┬───┘          └───┬───┘          └───┬───┘
+            │                  │                  │
+            │                  │                  │
+            │              ┌───▼───┐              │
+            │              │   E   │──────────────┘
+            │              └───┬───┘
+            │                  │
+            │                  │
+        ┌───▼───┐          ┌───▼───┐
+        │   G   │          │   F   │
+        └───────┘          └───────┘
+
+Edges:
+A→C, B→C, C→D, C→E, E→D, B→G
+```
+
+*Initial indegrees:*
+
+```
+indeg[A]=0, indeg[B]=0, indeg[C]=2, indeg[D]=2, indeg[E]=1, indeg[F]=0, indeg[G]=1
+```
+
+*Queue/Indegree evolution (front → back; assume we keep the queue **lexicographically** by using a min-heap):*
+
+```
+Step | Pop u | Emit order         | Decrease indeg[...]         | Newly 0 → Enqueue | Q after
+-----+-------+--------------------+------------------------------+-------------------+-----------------
+0    | —     | []                 | —                            | A, B, F           | [A, B, F]
+1    | A     | [A]                | C:2→1                        | —                 | [B, F]
+2    | B     | [A, B]             | C:1→0, G:1→0                | C, G              | [C, F, G]
+3    | C     | [A, B, C]          | D:2→1, E:1→0                | E                 | [E, F, G]
+4    | E     | [A, B, C, E]       | D:1→0                       | D                 | [D, F, G]
+5    | D     | [A, B, C, E, D]    | —                            | —                 | [F, G]
+6    | F     | [A, B, C, E, D, F] | —                            | —                 | [G]
+7    | G     | [A, B, C, E, D, F, G] | —                         | —                 | []
+```
+
+*A valid topological order:*
+`A, B, C, E, D, F, G` (others like `B, A, C, E, D, F, G` are also valid.)
+
+*Clean left-to-right view (one possible ordering):*
+
+```
+A   B   F   C   E   D   G
+│   │       │   │   │
+└──►└──►    └──►└──►└──►   (all arrows go left→right)
+```
+
+**Cycle detection (why it fails on cycles)**
+
+If there’s a cycle, some vertices **never** reach indegree 0. Example:
+
+```
+   ┌─────┐      ┌─────┐
+   │  X  │ ───► │  Y  │
+   └──┬──┘      └──┬──┘
+      └───────────►┘
+        (Y ───► X creates a cycle)
+```
+
+Here `indeg[X]=indeg[Y]=1` initially; `Q` starts empty ⇒ `order=[]` and `len(order) < V` ⇒ **cycle reported**.
+
+**Applications**
+
+1. **Build systems / compilation** (compile a file only after its prerequisites).
+2. **Course scheduling** (take courses in an order respecting prerequisites).
+3. **Data pipelines / DAG workflows** (Airflow, Spark DAGs): execute stages when inputs are ready.
+4. **Dependency resolution** (package managers, container layers).
+5. **Dynamic programming on DAGs** (longest/shortest path, path counting) by processing vertices in topological order.
+6. **Circuit evaluation / spreadsheets** (evaluate cells/nets after their dependencies).
+
+**Implementation**
+
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/topological_sort)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/topological_sort/kruskal)
+
+*Implementation tips:*
+
+* Use a **deque** for FIFO behavior; use a **min-heap** to get the **lexicographically smallest** topological order.
+* When the graph is large and sparse, store adjacency as **lists** and compute indegrees in one pass for $O(V+E)$.
+* **DFS variant** (brief): color states `0=unseen,1=visiting,2=done`; on exploring `u`, mark `1`; DFS to neighbors; if you see `1` again, there’s a cycle; on finish, push `u` to a stack. Reverse the stack for the order.

From 069846f6609fa5c4bb95c72991a367356ee2e0ec Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:01:17 +0200
Subject: [PATCH 33/48] Update graphs.md

---
 notes/graphs.md | 699 +++++++++++++++++++++++++-----------------------
 1 file changed, 366 insertions(+), 333 deletions(-)

diff --git a/notes/graphs.md b/notes/graphs.md
index 912de5d..26f997e 100644
--- a/notes/graphs.md
+++ b/notes/graphs.md
@@ -56,21 +56,29 @@ Graph theory has its own language, full of terms that make it easier to talk abo
 
 ### Representation of Graphs in Computer Memory
 
-Graphs, with their versatile applications in numerous domains, necessitate efficient storage and manipulation mechanisms in computer memory. The choice of representation often depends on the graph's characteristics, such as sparsity, and the specific operations to be performed. Among the various methods available, the adjacency matrix and the adjacency list are the most prevalent.
+Graphs, with their versatile applications in numerous domains, necessitate efficient storage and manipulation mechanisms in computer memory. The choice of representation often depends on the graph's characteristics (e.g., dense vs. sparse, directed vs. undirected, weighted vs. unweighted) and the specific operations to be performed. Among the various methods available, the adjacency matrix and the adjacency list are the most prevalent.
 
 #### Adjacency Matrix
 
-An adjacency matrix represents a graph $G$ as a two-dimensional matrix. Given $V$ vertices, it utilizes a $V \times V$ matrix $A$. The rows and columns correspond to the graph's vertices, and each cell $A_{ij}$ holds:
+An adjacency matrix represents a graph $G$ with $V$ vertices as a two-dimensional matrix $A$ of size $V \times V$. The rows and columns correspond to vertices, and each cell $A_{ij}$ holds:
 
-- `1` if there is an edge between vertex $i$ and vertex $j$
-- `0` if no such edge exists
+* `1` if there is an edge between vertex $i$ and vertex $j$ (or specifically $i \to j$ in a directed graph)
+* `0` if no such edge exists
+* For weighted graphs, $A_{ij}$ contains the **weight** of the edge; often `0` or `∞` (or `None`) indicates “no edge”
 
-For graphs with edge weights, $A_{ij}$ contains the weight of the edge between vertices $i$ and $j$.
+**Same graph used throughout (undirected 4-cycle A–B–C–D–A):**
 
-Example:
+```
+        (A)------(B)
+         |        |
+         |        |
+        (D)------(C)
+```
+
+**Matrix (table form):**
 
 |   | A | B | C | D |
-|---|---|---|---|---|
+| - | - | - | - | - |
 | A | 0 | 1 | 0 | 1 |
 | B | 1 | 0 | 1 | 0 |
 | C | 0 | 1 | 0 | 1 |
@@ -78,21 +86,55 @@ Example:
 
 Here, the matrix indicates a graph with vertices A to D. For instance, vertex A connects with vertices B and D, hence the respective 1s in the matrix.
 
-**Benefits**:
+**Matrix (large ASCII layout):**
+
+```
+            Columns →
+          A   B   C   D
+        +---+---+---+---+
+Row  A | 0 | 1 | 0 | 1 |
+↓   B | 1 | 0 | 1 | 0 |
+    C | 0 | 1 | 0 | 1 |
+    D | 1 | 0 | 1 | 0 |
+        +---+---+---+---+
+```
+
+**Notes & Variants**
+
+* When an *undirected graph* is represented, the adjacency matrix is symmetric because the connection from node $i$ to node $j$ also implies a connection from node $j$ to node $i$; if this property is omitted, the matrix will misrepresent mutual relationships, such as a road existing in both directions between two cities.
+* In the case of a *directed graph*, the adjacency matrix does not need to be symmetric since an edge from node $i$ to node $j$ does not guarantee a reverse edge; without this rule, one might incorrectly assume bidirectional links, such as mistakenly treating a one-way street as two-way.
+* A *self-loop* appears as a nonzero entry on the diagonal of the adjacency matrix, indicating that a node is connected to itself; if ignored, the representation will overlook scenarios like a website containing a hyperlink to its own homepage.
+
+**Benefits**
+
+* An *edge existence check* in an adjacency matrix takes constant time $O(1)$ because the presence of an edge is determined by directly inspecting a single cell; if this property is absent, the lookup could require scanning a list, as in adjacency list representations where finding whether two cities are directly connected may take longer.
+* With *simple, compact indexing*, the adjacency matrix aligns well with array-based structures, which makes it helpful for GPU optimizations or bitset operations; without this feature, algorithms relying on linear algebra techniques, such as computing paths with matrix multiplication, become less efficient.
+
+**Drawbacks**
+
+* The *space* requirement of an adjacency matrix is always $O(V^2)$, meaning memory usage grows with the square of the number of vertices even if only a few edges exist; if this property is overlooked, sparse networks such as social graphs with millions of users but relatively few connections will be stored inefficiently.
+* For *neighbor iteration*, each vertex requires $O(V)$ time because the entire row of the matrix must be scanned to identify adjacent nodes; without recognizing this cost, tasks like finding all friends of a single user in a large social network could become unnecessarily slow.
+
+**Common Operations (Adjacency Matrix)**
 
-- Fixed-time ( $O(1)$ ) edge existence checks.
-- Particularly suitable for dense graphs, where the edge-to-vertex ratio is high.
+| Operation                          | Time     |
+| ---------------------------------- | -------- |
+| Check if edge $u\leftrightarrow v$ | $O(1)$   |
+| Add/remove edge                    | $O(1)$   |
+| Iterate neighbors of $u$           | $O(V)$   |
+| Compute degree of $u$ (undirected) | $O(V)$   |
+| Traverse all edges                 | $O(V^2)$ |
 
-**Drawbacks**:
+**Space Tips**
 
-- Consumes more space for sparse graphs.
-- Traversing neighbors can be slower due to the need to check all vertices.
+* Using a *boolean or bitset matrix* allows each adjacency entry to be stored in just one bit, which reduces memory consumption by a factor of eight compared to storing each entry as a byte; if this method is not applied, representing even moderately sized graphs, such as a network of 10,000 nodes, can require far more storage than necessary.
+* The approach is most useful when the graph is *dense*, the number of vertices is relatively small, or constant-time edge queries are the primary operation; without these conditions, such as in a sparse graph with millions of vertices, the $V^2$ bit requirement remains wasteful and alternative representations like adjacency lists become more beneficial.
 
 #### Adjacency List
 
-An adjacency list uses a collection (often an array or a linked list) to catalog the neighbors of each vertex. Each vertex points to its own list, enumerating its direct neighbors.
+An adjacency list stores, for each vertex, the list of its neighbors. It’s usually implemented as an array/vector of lists (or vectors), hash sets, or linked structures. For weighted graphs, each neighbor entry also stores the weight.
 
-Example:
+**Same graph (A–B–C–D–A) as lists:**
 
 ```
 A -> [B, D]
@@ -101,96 +143,168 @@ C -> [B, D]
 D -> [A, C]
 ```
 
-This list reflects the same graph as our matrix example. Vertex A's neighbors, for instance, are B and D.
+**“In-memory” view (array of heads + per-vertex chains):**
 
-**Benefits**:
+```
+Vertices (index) →   0     1     2     3
+Names              [ A ] [ B ] [ C ] [ D ]
+                     |     |     |     |
+                     v     v     v     v
+A-list:  head -> [B] -> [D] -> NULL
+B-list:  head -> [A] -> [C] -> NULL
+C-list:  head -> [B] -> [D] -> NULL
+D-list:  head -> [A] -> [C] -> NULL
+```
+
+**Variants & Notes**
+
+* In an *undirected graph* stored as adjacency lists, each edge is represented twice—once in the list of each endpoint—so that both directions can be traversed easily; if this duplication is omitted, traversing from one node to its neighbor may be possible in one direction but not in the other, as with a friendship relation that should be mutual but is stored only once.
+* For a *directed graph*, only out-neighbors are recorded in each vertex’s list, meaning that edges can be followed in their given direction; without a separate structure for in-neighbors, tasks like finding all users who link to a webpage require inefficient scanning of every adjacency list.
+* In a *weighted graph*, each adjacency list entry stores both the neighbor and the associated weight, such as $(\text{destination}, \text{distance})$; if weights are not included, algorithms like Dijkstra’s shortest path cannot be applied correctly.
+* The *order of neighbors* in adjacency lists may be arbitrary, though keeping them sorted allows faster checks for membership; if left unsorted, testing whether two people are directly connected in a social network could require scanning the entire list rather than performing a quicker search.
+
+**Benefits**
+
+* The representation is *space-efficient for sparse graphs* because it requires $O(V+E)$ storage, growing only with the number of vertices and edges; without this property, a graph with millions of vertices but relatively few edges, such as a road network, would consume far more memory if stored as a dense matrix.
+* For *neighbor iteration*, the time cost is $O(\deg(u))$, since only the actual neighbors of vertex $u$ are examined; if this benefit is absent, each query would need to scan through all possible vertices, as happens in adjacency matrices when identifying a node’s connections.
+* In *edge traversals and searches*, adjacency lists support breadth-first search and depth-first search efficiently on sparse graphs because only existing edges are processed; without this design, traversals would involve wasted checks on non-edges, making exploration of large but sparsely connected networks, like airline routes, much slower.
 
-- Space-efficient for sparse graphs, where edges are relatively fewer.
-- Facilitates faster traversal of a vertex's neighbors since the direct neighbors are listed without extraneous checks.
+**Drawbacks**
 
-**Drawbacks**:
+* An *edge existence check* in adjacency lists requires $O(\deg(u))$ time in the worst case because the entire neighbor list may need to be scanned; if a hash set is used for each vertex, the expected time improves to $O(1)$, though at the cost of extra memory and overhead, as seen in fast membership tests within large social networks.
+* With respect to *cache locality*, adjacency lists often rely on pointers or scattered memory, which reduces their efficiency on modern hardware; without this drawback, as in dense matrix storage, sequential memory access patterns make repeated operations such as matrix multiplication more beneficial.
 
-- Edge existence checks can take up to $O(V)$ time in the worst case.
-- Potentially consumes more space for dense graphs.
+**Common Operations (Adjacency List)**
+
+| Operation                          | Time (typical)                              |
+| ---------------------------------- | ------------------------------------------- |
+| Check if edge $u\leftrightarrow v$ | $O(\deg(u))$ (or expected $O(1)$ with hash) |
+| Add edge                           | Amortized $O(1)$ (append to list(s))        |
+| Remove edge                        | $O(\deg(u))$ (find & delete)                |
+| Iterate neighbors of $u$           | $O(\deg(u))$                                |
+| Traverse all edges                 | $O(V + E)$                                  |
 
 The choice between these (and other) representations often depends on the graph's characteristics and the specific tasks or operations envisioned.
 
+* Choosing an *adjacency matrix* is helpful when the graph is dense, the number of vertices is moderate, and constant-time edge queries or linear-algebra formulations are beneficial; if this choice is ignored, operations such as repeatedly checking flight connections in a fully connected air network may become slower or harder to express mathematically.
+* Opting for an *adjacency list* is useful when the graph is sparse or when neighbor traversal dominates, as in breadth-first search or shortest-path algorithms; without this structure, exploring a large but lightly connected road network would waste time scanning nonexistent edges.
+
+**Hybrids/Alternatives:**
+
+* With *CSR/CSC (Compressed Sparse Row/Column)* formats, all neighbors of a vertex are stored contiguously in memory, which improves cache locality and enables fast traversals; without this layout, as in basic pointer-based adjacency lists, high-performance analytics on graphs like web link networks would suffer from slower memory access.
+* An *edge list* stores edges simply as $(u,v)$ pairs, making it convenient for graph input, output, and algorithms like Kruskal’s minimum spanning tree; if used for queries such as checking whether two nodes are adjacent, the lack of structure forces scanning the entire list, which becomes inefficient in large graphs.
+* In *hash-based adjacency* structures, each vertex’s neighbor set is managed as a hash table, enabling expected $O(1)$ membership tests; without this tradeoff, checking connections in dense social networks requires linear scans, while the hash-based design accelerates lookups at the cost of extra memory.
+
 ### Planarity
 
-Planarity examines whether a graph can be drawn on a flat surface (a plane) without any of its edges crossing. This idea holds significant importance in areas such as circuit design, urban planning, and geography.
+**Planarity** asks: can a graph be drawn on a flat plane so that edges only meet at their endpoints (no crossings)?
 
-#### What is a Planar Graph?
+Why it matters: layouts of circuits, road networks, maps, and data visualizations often rely on planar drawings.
 
-A graph is considered **planar** if there exists a representation (also called a drawing) of it on a two-dimensional plane where its edges intersect only at their vertices and nowhere else. Even if a graph is initially drawn with overlaps or crossings, it may still be planar if it is possible to **redraw** (or **rearrange**) it so that no edges intersect in the interior of the drawing.
+#### What is a planar graph?
 
-An important theoretical result related to planarity is **Kuratowski’s Theorem**, which states that a graph is planar if and only if it does not contain a subgraph that is a subdivision of either $K_5$ (the complete graph on five vertices) or $K_{3,3}$ (the complete bipartite graph on six vertices, partitioned into sets of three).  
+A graph is **planar** if it has **some** drawing in the plane with **no edge crossings**. A messy drawing with crossings doesn’t disqualify it—if you can **redraw** it without crossings, it’s planar.
 
-#### Planar Embedding
+* A crossing-free drawing of a planar graph is called a **planar embedding** (or **plane graph** once embedded).
+* In a planar embedding, the plane is divided into **faces** (regions), including the unbounded **outer face**.
 
-A **planar embedding** refers to a specific way of drawing a graph on a plane so that none of its edges cross each other in the interior. If such a crossing-free drawing exists, the graph is planar. A related fact is **Euler’s Formula** for planar graphs:
+**Euler’s Formula (connected planar graphs):**
 
-$$|V| - |E| + |F| = 2$$
+$$
+|V| - |E| + |F| = 2 \quad
+\text{(for \(c\) connected components: } |V|-|E|+|F|=1+c)
+$$
 
-where:
+#### Kuratowski’s & Wagner’s characterizations
+
+* According to *Kuratowski’s Theorem*, a graph is planar if and only if it does not contain a subgraph that is a subdivision of $K_5$ or $K_{3,3}$; if this condition is not respected, as in a network with five nodes all mutually connected, the graph cannot be drawn on a plane without edge crossings.
+* By *Wagner’s Theorem*, a graph is planar if and only if it has no $K_5$ or $K_{3,3}$ minor, meaning such structures cannot be formed through edge deletions, vertex deletions, or edge contractions; without ruling out these minors, a graph like the complete bipartite structure of three stations each linked to three others cannot be embedded in the plane without overlaps.
+
+These are equivalent “forbidden pattern” views.
+
+#### Handy planar edge bounds (quick tests)
+
+For a **simple** planar graph with $|V|\ge 3$:
+
+* $|E| \le 3|V| - 6$.
+* If the graph is **bipartite**, then $|E| \le 2|V| - 4$.
 
-- $|V|$ is the number of vertices,
-- $|E|$ is the number of edges,
-- $|F|$ is the number of faces (including the "outer" infinite face).
+These give fast non-planarity proofs:
+
+* $K_5$: $|V|=5, |E|=10 > 3\cdot5-6=9$ ⇒ **non-planar**.
+* $K_{3,3}$: $|V|=6, |E|=9 > 2\cdot6-4=8$ ⇒ **non-planar**.
 
 #### Examples
 
-I. **Cycle Graphs**  
+**I. Cycle graphs $C_n$ (always planar)**
 
-Simple cycle graphs (triangles, squares, pentagons, hexagons, etc.) are planar because you can easily draw them without any edges crossing. In the square cycle graph $C_4$ example below, there are no intersecting edges:
+A 4-cycle $C_4$:
 
 ```
-A-----B
-|     |
-C-----D
+A───B
+│   │
+D───C
 ```
 
-II. **Complete Graph with Four Vertices ($K_4$)**
+No crossings; faces: 2 (inside + outside).
+
+**II. Complete graph on four vertices $K_4$ (planar)**
 
-This graph has every vertex connected to every other vertex. Despite having 6 edges, $K_4$ is planar. Its planar drawing can resemble a tetrahedron (triangular pyramid) flattened onto a plane:
+A planar embedding places one vertex inside a triangle:
 
 ```
-   A
-  / \
- B---C
-  \ /
-   D
+    A
+   / \
+  B───C
+   \ /
+    D
 ```
 
-III. **Complete Graph with Five Vertices ($K_5$)**
+All edges meet only at vertices; no crossings.
 
-$K_5$ has every one of its five vertices connected to the other four, making a total of 10 edges. This graph is **non-planar**: no matter how you try to arrange the vertices and edges, there will always be at least one pair of edges that must cross. A rough sketch illustrating its inherent crossing is shown below:
+**III. Complete graph on five vertices $K_5$ (non-planar)**
+
+No drawing avoids crossings. Even a “best effort” forces at least one:
 
 ```
-   A
-  /|\
- / | \
-B--+--C
- \ | /
-  \|/
-   D
-   |
+A───B
+│╲ ╱│
+│ ╳ │   (some crossing is unavoidable)
+│╱ ╲│
+D───C
+  \ /
    E
 ```
 
-Attempting to avoid one crossing in $K_5$ inevitably forces another crossing elsewhere, confirming its non-planarity.
+The edge bound $10>9$ (above) certifies non-planarity.
+
+**IV. Complete bipartite $K_{3,3}$ (non-planar)**
+
+Two sets $\{u_1,u_2,u_3\}$ and $\{v_1,v_2,v_3\}$, all cross-set pairs connected:
+
+```
+u1   u2   u3
+│ \  │ \  │ \
+│  \ │  \ │  \
+v1───v2───v3  (many edges must cross in the plane)
+```
+
+The bipartite bound $9>8$ proves non-planarity.
+
+#### How to check planarity in practice
 
-#### Strategies for Assessing Planarity
+**For small graphs**
 
-- The **planarity** of a graph refers to whether it can be drawn on a flat surface without any edges crossing each other.
-- **Small graphs** can be tested for planarity by manually rearranging their vertices and edges to check if a crossing-free drawing is possible.
-- **Kuratowski's theorem** states that a graph is planar if it does not contain a subgraph that can be transformed into $K_5$ (a graph with five vertices all connected to each other) or $K_{3,3}$ (a graph with two groups of three vertices, where every vertex in one group connects to every vertex in the other).
-- **$K_5$** is a complete graph with five vertices where every pair of vertices has a direct edge connecting them.
-- **$K_{3,3}$** is a bipartite graph where two sets of three vertices are connected such that each vertex in the first set is connected to all vertices in the second set, with no edges within the same set.
-- **Wagner’s theorem** provides an alternative way to determine planarity, stating that a graph is planar if it does not have $K_5$ or $K_{3,3}$ as a "minor." A minor is a smaller graph formed by deleting edges, deleting vertices, or merging connected vertices.
-- For **larger graphs**, manual testing becomes impractical, and planarity algorithms are often used instead.
-- The **Hopcroft-Tarjan algorithm** is a linear-time method for testing planarity. It uses depth-first search to efficiently decide if a graph can be drawn without crossing edges.
-- The **Boyer-Myrvold algorithm** is another linear-time approach that tests planarity and can also provide an embedding of the graph (a specific way to draw it without crossings) if it is planar.
-- Both **algorithms** are widely used in computer science for applications that involve networks, circuit design, and data visualization, where planarity helps simplify complex structures.
+1. Rearrange vertices and try to remove crossings.
+2. Look for $K_5$ / $K_{3,3}$ (or their subdivisions/minors).
+3. Apply the edge bounds above for quick eliminations.
+
+**For large graphs (efficient algorithms)**
+
+* The *Hopcroft–Tarjan* algorithm uses a depth-first search approach to decide planarity in linear time; without such an efficient method, testing whether a circuit layout can be drawn without wire crossings would take longer on large graphs.
+* The *Boyer–Myrvold* algorithm also runs in linear time but, in addition to deciding planarity, it produces a planar embedding when one exists; if this feature is absent, as in Hopcroft–Tarjan, a separate procedure would be required to actually construct a drawing of a planar transportation network.
+
+Both are widely used in graph drawing, EDA (circuit layout), GIS, and network visualization.
 
 ### Traversals
 
@@ -228,13 +342,12 @@ To efficiently keep track of the traversal, BFS employs two primary data structu
 
 **Algorithm Steps**
 
-1. Begin from a starting vertex, \$i\$.
-2. Initialize `visited = {i}`, set `parent[i] = None`, optionally `dist[i] = 0`, and **enqueue** \$i\$ into `queue`.
-3. While `queue` is not empty:
-1. **Dequeue** the front vertex `u`.
-2. For **each neighbor** `v` of `u`:
-* If `v` is **not** in `visited`, add `v` to `visited`, set `parent[v] = u` (and `dist[v] = dist[u] + 1` if tracking distances), and **enqueue** `v`.
-4. Continue until the queue becomes empty.
+1. Pick a start vertex $i$.
+2. Set `visited = {i}`, `parent[i] = None`, optionally `dist[i] = 0`, and enqueue $i$ into `queue`.
+3. While `queue` is nonempty, repeat steps 4–5.
+4. Dequeue the front vertex `u`.
+5. For each neighbor `v` of `u`, if `v` is not in `visited`, add it to `visited`, set `parent[v] = u` (and `dist[v] = dist[u] + 1` if tracking), and enqueue `v`.
+6. Stop when the queue is empty.
 
 Marking nodes as **visited at the moment they are enqueued** (not when dequeued) is crucial: it prevents the same node from being enqueued multiple times in graphs with cycles or multiple incoming edges.
 
@@ -265,9 +378,9 @@ BFS(G, i):
 
 *Sanity notes:*
 
-* **Time:** $O(V + E)$ for a graph with $V$ vertices and $E$ edges (each vertex enqueued once; each edge considered once).
-* **Space:** $O(V)$ for the queue + visited (+ parent/dist if used).
-* BFS order can differ depending on **neighbor iteration order**.
+* The *time* complexity of breadth-first search is $O(V+E)$ because each vertex is enqueued once and each edge is examined once; if this property is overlooked, one might incorrectly assume that exploring a large social graph requires quadratic time rather than scaling efficiently with its size.
+* The *space* requirement is $O(V)$ since the algorithm maintains a queue and a visited array, with optional parent or distance arrays if needed; without accounting for this, applying BFS to a network of millions of nodes could be underestimated in memory cost.
+* The order in which BFS visits vertices depends on the *neighbor iteration order*, meaning that traversal results can vary between implementations; if this variation is not recognized, two runs on the same graph—such as exploring a road map—may appear inconsistent even though both are correct BFS traversals.
 
 **Example**
 
@@ -314,35 +427,21 @@ Shortest path A→E: backtrack E→C→A  ⇒  A - C - E
 
 **Applications**
 
-1. **Shortest paths in unweighted graphs.**
-   BFS computes the minimum number of edges from the source to every reachable node. Use the `parent` map to reconstruct actual paths.
-
-2. **Connected components (undirected graphs).**
-   Repeatedly run BFS from every unvisited vertex; each run discovers exactly one component.
-
-3. **Broadcast/propagation modeling.**
-   BFS mirrors “wavefront” spread (e.g., message fan-out, infection spread, multi-hop neighborhood queries).
-
-4. **Cycle detection (undirected graphs).**
-   During BFS, if you encounter a neighbor that is already **visited** and is **not** the parent of the current vertex, a cycle exists.
-   *Note:* For **directed graphs**, detecting cycles typically uses other techniques (e.g., DFS with recursion stack or Kahn’s algorithm on indegrees).
-
-5. **Bipartite testing.**
-   While BFS’ing, assign alternating “colors” by level; if you ever see an edge connecting the **same** color, the graph isn’t bipartite.
-
-6. **Multi-source searches.**
-   Initialize the queue with **several** starting nodes at once (all with `dist=0`). This solves “nearest facility” style problems efficiently.
-
-7. **Topological sorting via Kahn’s algorithm (DAGs).**
-   A BFS-like process over vertices of indegree 0 (using a queue) produces a valid topological order for directed acyclic graphs.
+* In *shortest path computation on unweighted graphs*, BFS finds the minimum number of edges from a source to all reachable nodes and allows path reconstruction via a parent map; without this approach, one might incorrectly use Dijkstra’s algorithm, which is slower for unweighted networks such as social connections.
+* For identifying *connected components in undirected graphs*, BFS is run repeatedly from unvisited vertices, with each traversal discovering one full component; without this method, components in a road map or friendship network may remain undetected.
+* When modeling *broadcast or propagation*, BFS naturally mirrors wavefront-like spreading, such as message distribution or infection spread; ignoring this property makes it harder to simulate multi-hop communication in networks.
+* During BFS-based *cycle detection in undirected graphs*, encountering a visited neighbor that is not the current vertex’s parent signals a cycle; without this check, cycles in structures like utility grids may be overlooked.
+* For *bipartite testing*, BFS alternates colors by level, and the appearance of an edge connecting same-colored nodes disproves bipartiteness; without this strategy, verifying whether a task-assignment graph can be split into two groups becomes more complicated.
+* In *multi-source searches*, initializing the queue with several start nodes at distance zero allows efficient nearest-facility queries, such as finding the closest hospital from multiple candidate sites; without this, repeated single-source BFS runs would be less efficient.
+* In *topological sorting of DAGs*, a BFS-like procedure processes vertices of indegree zero using a queue, producing a valid ordering; without this method, scheduling tasks with dependency constraints may require less efficient recursive DFS approaches.
 
 **Implementation**
 
-*Implementation tip:* For dense graphs or when memory locality matters, an adjacency **matrix** can be used, but the usual adjacency **list** representation is more space- and time-efficient for sparse graphs.
-
 * [C++](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/cpp/bfs)
 * [Python](https://github.com/djeada/Algorithms-And-Data-Structures/tree/master/src/graphs/python/bfs)
 
+*Implementation tip:* For dense graphs or when memory locality matters, an adjacency **matrix** can be used, but the usual adjacency **list** representation is more space- and time-efficient for sparse graphs.
+
 #### Depth-First Search (DFS)
 
 Depth-First Search (DFS) is a fundamental graph traversal algorithm that explores **as far as possible** along each branch before backtracking. Starting from a source vertex, it dives down one neighbor, then that neighbor’s neighbor, and so on—only backing up when it runs out of new vertices to visit.
@@ -360,13 +459,14 @@ To track the traversal efficiently, DFS typically uses:
 
 **Algorithm Steps**
 
-1. Begin at starting vertex $i$.
-2. Mark $i$ as **visited**, optionally set `parent[i] = None`, record `tin[i]`.
-3. For each neighbor $v$ of the current vertex $u$:
-
-   * If $v$ is **unvisited**, set `parent[v] = u` and **recurse** (or push onto a stack) into $v$.
-4. After all neighbors of $u$ are explored, record `tout[u]` and **backtrack** (return or pop).
-5. Repeat for any remaining unvisited vertices (to cover disconnected graphs).
+1. Pick a start vertex $i$.
+2. Initialize `visited[v]=False` for all $v$; optionally set `parent[v]=None`; set a global timer `t=0`.
+3. Start a DFS from $i$ (recursive or with an explicit stack).
+4. On entry to a vertex $u$: set `visited[u]=True`, record `tin[u]=t++` (and keep `parent[u]=None` if $u=i$).
+5. Scan neighbors $v$ of $u$; whenever `visited[v]=False`, set `parent[v]=u` and visit $v$ (recurse/push), then resume scanning $u$’s neighbors.
+6. After all neighbors of $u$ are processed, record `tout[u]=t++` and backtrack (return or pop).
+7. When the DFS from $i$ finishes, if any vertex remains unvisited, choose one and repeat steps 4–6 to cover disconnected components.
+8. Stop when no unvisited vertices remain.
 
 Mark vertices **when first discovered** (on entry/push) to prevent infinite loops in cyclic graphs.
 
@@ -430,8 +530,8 @@ DFS_iter(G, i):
 
 *Sanity notes:*
 
-* **Time:** $O(V + E)$ — each vertex/edge handled a constant number of times.
-* **Space:** $O(V)$ — visited + recursion/stack. Worst-case recursion depth can reach $V$; use the iterative form on very deep graphs.
+* The *time* complexity of DFS is $O(V+E)$ because every vertex and edge is processed a constant number of times; if this property is ignored, one might incorrectly assume exponential growth when analyzing networks like citation graphs.
+* The *space* complexity is $O(V)$, coming from the visited array and the recursion stack (or an explicit stack in iterative form); without recognizing this, applying DFS to very deep structures such as long linked lists could risk stack overflow unless the iterative approach is used.
 
 **Example**
 
@@ -516,30 +616,14 @@ A
 
 **Applications**
 
-1. **Path existence & reconstruction.**
-   Use `parent` to backtrack from a target to the start after a DFS that finds it.
-
-2. **Topological sorting (DAGs).**
-   Run DFS on a directed acyclic graph; the **reverse postorder** (vertices sorted by decreasing `tout`) is a valid topological order.
-
-3. **Cycle detection.**
-   *Undirected:* seeing a visited neighbor that isn’t the parent ⇒ cycle.
-   *Directed:* maintain states (`unvisited`, `in_stack`, `done`); encountering an edge to a vertex **in\_stack** (a back edge) ⇒ cycle.
-
-4. **Connected components (undirected).**
-   Run DFS from every unvisited node; each run discovers exactly one component.
-
-5. **Bridges & articulation points (cut vertices).**
-   Using DFS **low-link** values (`low[u] = min(tin[u], tin[v] over back edges, low of children)`), you can find edges whose removal disconnects the graph (bridges) and vertices whose removal increases components (articulation points).
-
-6. **Strongly Connected Components (SCCs, directed graphs).**
-   Tarjan’s (single-pass with a stack and low-link) or Kosaraju’s (two DFS passes) algorithms are built on DFS.
-
-7. **Backtracking & search in state spaces.**
-   Classic for maze solving, puzzles (N-Queens, Sudoku), and constraint satisfaction: DFS systematically explores choices and backtracks on dead ends.
-
-8. **Detecting and classifying edges (directed).**
-   With timestamps, classify edges as **tree**, **back**, **forward**, or **cross**—useful for reasoning about structure and correctness.
+* In *path existence and reconstruction*, DFS records parent links so that after reaching a target node, the path can be backtracked to the source; without this, finding an explicit route through a maze-like graph would require re-running the search.
+* For *topological sorting of DAGs*, running DFS and outputting vertices in reverse postorder yields a valid order; if this step is omitted, dependencies in workflows such as build systems cannot be properly sequenced.
+* During *cycle detection*, DFS in undirected graphs reports a cycle when a visited neighbor is not the parent, while in directed graphs the discovery of a back edge to an in-stack node reveals a cycle; without these checks, feedback loops in control systems or task dependencies may go unnoticed.
+* To identify *connected components in undirected graphs*, DFS is launched from every unvisited vertex, with each traversal discovering one component; without this method, clusters in social or biological networks remain hidden.
+* Using *low-link values* in DFS enables detection of bridges (edges whose removal disconnects the graph) and articulation points (vertices whose removal increases components); if these are not identified, critical links in communication or power networks may be overlooked.
+* In *strongly connected components* of directed graphs, algorithms like Tarjan’s and Kosaraju’s use DFS to group vertices where every node is reachable from every other; ignoring this method prevents reliable partitioning of web link graphs or citation networks.
+* For *backtracking and state-space search*, DFS systematically explores decision trees and reverses when hitting dead ends, as in solving puzzles like Sudoku or N-Queens; without DFS, these problems would be approached less efficiently with blind trial-and-error.
+* With *edge classification in directed graphs*, DFS timestamps allow edges to be labeled as tree, back, forward, or cross, which helps analyze structure and correctness; without this classification, reasoning about graph algorithms such as detecting cycles or proving properties becomes more difficult.
 
 **Implementation**
 
@@ -567,28 +651,22 @@ To efficiently keep track of the traversal, Dijkstra’s algorithm employs two p
 
 *Useful additions in practice:*
 
-* A **target-aware early stop**: if you only need the distance to a specific target, you can stop when that target is popped from the priority queue.
-* **Decrease-key or lazy insertion**: if the PQ doesn’t support decrease-key, push an updated entry and ignore popped stale ones by checking against `dist`.
-* Optional **`pred` lists** for counting shortest paths or reconstructing multiple optimal routes.
+* A *target-aware early stop* allows Dijkstra’s algorithm to halt once the target vertex is extracted from the priority queue, saving work compared to continuing until all distances are finalized; without this optimization, computing the shortest route between two cities would require processing the entire network unnecessarily.
+* With *decrease-key or lazy insertion* strategies, priority queues that lack a decrease-key operation can still work by inserting updated entries and discarding outdated ones when popped; without this adjustment, distance updates in large road networks would be inefficient or require a more complex data structure.
+* Adding optional *predecessor lists* enables reconstruction of multiple optimal paths or counting the number of shortest routes; if these lists are not maintained, applications like enumerating all equally fast routes between transit stations cannot be supported.
 
 **Algorithm Steps**
 
-1. Begin from a starting vertex, \$i\$.
-
-2. Initialize `dist[i] = 0`, `parent[i] = None`; for all other vertices `v`, set `dist[v] = ∞`. Push \$i\$ into the min-priority queue keyed by `dist[i]`.
-
-3. While the priority queue is not empty:
-
-   1. **Extract** the vertex `u` with the **smallest** `dist[u]`.
-   2. If `u` is already finalized, continue; otherwise **finalize** `u` (add to `visited`/`finalized`).
-   3. For **each neighbor** `v` of `u` with edge weight `w(u,v) ≥ 0`:
-
-      * If `dist[u] + w(u,v) < dist[v]`, then **relax** the edge: set
-        `dist[v] = dist[u] + w(u,v)` and `parent[v] = u`, and **push** `v` into the PQ keyed by the new `dist[v]`.
-
-4. Continue until the queue becomes empty (all reachable vertices finalized) or until your **target** has been finalized (early stop).
-
-5. Reconstruct any shortest path by following `parent[·]` **backwards** from the target to the start.
+1. Pick a start vertex $i$.
+2. Set `dist[i] = 0` and `parent[i] = None`; for all other vertices $v \ne i$, set `dist[v] = ∞`.
+3. Push $i$ into a min-priority queue keyed by `dist[·]`.
+4. While the priority queue is nonempty, repeat steps 5–8.
+5. Extract the vertex $u$ with the smallest `dist[u]`.
+6. If $u$ is already finalized, continue to step 4; otherwise mark $u$ as finalized.
+7. For each neighbor $v$ of $u$ with edge weight $w(u,v) \ge 0$, test whether `dist[u] + w(u,v) < dist[v]`.
+8. If true, set `dist[v] = dist[u] + w(u,v)`, set `parent[v] = u`, and push $v$ into the priority queue keyed by the new `dist[v]`.
+9. Stop when the queue is empty (all reachable vertices finalized) or, if you have a target, when that target is finalized.
+10. Reconstruct any shortest path by following `parent[·]` backward from the target to $i$.
 
 Vertices are **finalized when they are dequeued** (popped) from the priority queue. With **non-negative** weights, once a vertex is popped the recorded `dist` is **provably optimal**.
 
@@ -636,10 +714,10 @@ reconstruct(parent, t):
 
 *Sanity notes:*
 
-* **Time:** with a binary heap, \$O((V + E)\log V)\$; with a Fibonacci heap, \$O(E + V\log V)\$; with a plain array (no heap), \$O(V^2)\$.
-* **Space:** \$O(V)\$ for `dist`, `parent`, PQ bookkeeping.
-* **Preconditions:** All edge weights must be **\$\ge 0\$**. Negative edges invalidate correctness.
-* **Ordering:** Different neighbor iteration orders don’t affect correctness, only tie behavior/performance.
+* The *time* complexity of Dijkstra’s algorithm depends on the priority queue: $O((V+E)\log V)$ with a binary heap, $O(E+V\log V)$ with a Fibonacci heap, and $O(V^2)$ with a plain array; without this distinction, one might wrongly assume that all implementations scale equally on dense versus sparse road networks.
+* The *space* complexity is $O(V)$, needed to store distance values, parent pointers, and priority queue bookkeeping; if underestimated, running Dijkstra on very large graphs such as nationwide transit systems may exceed available memory.
+* The *precondition* is that all edge weights must be nonnegative, since the algorithm assumes distances only improve as edges are relaxed; if negative weights exist, as in certain financial models with losses, the computed paths can be incorrect and Bellman–Ford must be used instead.
+* In terms of *ordering*, the sequence in which neighbors are processed does not affect correctness, only the handling of ties and slight performance differences; without recognizing this, variations in output order between implementations might be mistakenly interpreted as errors.
 
 **Example**
 
@@ -702,13 +780,13 @@ Shortest path A→E: A → C → B → E  (total cost 4)
 
 **Applications**
 
-1. **Single-source shortest paths** on graphs with **non-negative** weights (roads, networks, transit).
-2. **Navigation/routing** with early stop: stop when the goal is popped to avoid extra work.
-3. **Network planning & QoS:** minimum latency/cost routing, bandwidth-weighted paths (when additive and non-negative).
-4. **As a building block:** A\* with $h \equiv 0$; **Johnson’s algorithm** (all-pairs on sparse graphs); **k-shortest paths** variants.
-5. **Multi-source Dijkstra:** seed the PQ with multiple starts at distance 0 (e.g., nearest facility / multi-sink problems).
-6. **Label-setting baseline** for comparing heuristics (A\*, ALT landmarks, contraction hierarchies).
-7. **Grid pathfinding with terrain costs** (non-negative cell costs) when no admissible heuristic is available.
+* In *single-source shortest paths* with non-negative edge weights, Dijkstra’s algorithm efficiently finds minimum-cost routes in settings like roads, communication networks, or transit systems; without it, travel times or costs could not be computed reliably when distances vary.
+* For *navigation and routing*, stopping the search as soon as the destination is extracted from the priority queue avoids unnecessary work; without this early stop, route planning in a road map continues exploring irrelevant regions of the network.
+* In *network planning and quality of service (QoS)*, Dijkstra selects minimum-latency or minimum-cost routes when weights are additive and non-negative; without this, designing efficient data or logistics paths becomes more error-prone.
+* As a *building block*, Dijkstra underlies algorithms like A\* (with zero heuristic), Johnson’s algorithm for all-pairs shortest paths in sparse graphs, and $k$-shortest path variants; without it, these higher-level methods would lack a reliable core procedure.
+* In *multi-source Dijkstra*, initializing the priority queue with several starting nodes at distance zero solves nearest-facility queries, such as finding the closest hospital; without this extension, repeated single-source runs would waste time.
+* As a *label-setting baseline*, Dijkstra provides the reference solution against which heuristics like A\*, ALT landmarks, or contraction hierarchies are compared; without this baseline, heuristic correctness and performance cannot be properly evaluated.
+* For *grid pathfinding with terrain costs*, Dijkstra handles non-negative cell costs when no admissible heuristic is available; without it, finding a least-effort path across weighted terrain would require less efficient exhaustive search.
 
 **Implementation**
 
@@ -717,8 +795,6 @@ Shortest path A→E: A → C → B → E  (total cost 4)
 
 *Implementation tip:* If your PQ has no decrease-key, **push duplicates** on improvement and, when popping a vertex, **skip it** if it’s already finalized or if the popped key doesn’t match `dist[u]`. This “lazy” approach is simple and fast in practice.
 
-#### Bellman-Ford Algorithm
-
 #### Bellman–Ford Algorithm
 
 Bellman–Ford computes **shortest paths** from a start vertex in graphs that may have **negative edge weights** (but no negative cycles reachable from the start). It works by repeatedly **relaxing** every edge; each full pass can reduce some distances until they stabilize. A final check detects **negative cycles**: if an edge can still be relaxed after $(V-1)$ passes, a reachable negative cycle exists.
@@ -730,31 +806,18 @@ To efficiently keep track of the computation, Bellman–Ford employs two primary
 
 *Useful additions in practice:*
 
-* **Edge list**: iterate edges directly (fast and simple) even if your graph is stored as adjacency lists.
-* **Early exit**: stop as soon as a full pass makes **no updates**.
-* **Negative-cycle extraction**: if an update occurs on pass $V$, backtrack through `parent` to find a cycle.
-* **Reachability guard**: you can skip edges whose source has `dist[u] = ∞` (still unreached).
+* With an *edge list*, iterating directly over edges simplifies implementation and keeps updates fast, even if the graph is stored in adjacency lists; without this practice, repeatedly scanning adjacency structures adds unnecessary overhead in each relaxation pass.
+* Using an *early exit* allows termination once a full iteration over edges yields no updates, improving efficiency; without this check, the algorithm continues all $V-1$ passes even on graphs like road networks where distances stabilize early.
+* For *negative-cycle extraction*, if an update still occurs on the $V$-th pass, backtracking through parent links reveals a cycle; without this step, applications such as financial arbitrage detection cannot identify opportunities caused by negative cycles.
+* Adding a *reachability guard* skips edges from vertices with infinite distance, avoiding wasted work on unreached nodes; without this filter, the algorithm needlessly inspects irrelevant edges in disconnected parts of the graph.
 
 **Algorithm Steps**
 
-1. Begin from a starting vertex, $i$.
-
-2. Initialize `dist[i] = 0`, `parent[i] = None`; for all other vertices $v$, set `dist[v] = ∞`.
-
-3. Repeat **$V-1$ passes** (where $V$ is the number of vertices):
-
-   1. Set `changed = False`.
-   2. For **each directed edge** $(u,v,w)$ (weight $w$):
-
-      * If `dist[u] + w < dist[v]`, then **relax** the edge:
-        `dist[v] = dist[u] + w`, `parent[v] = u`, and set `changed = True`.
-   3. If `changed` is **False**, break early (all distances stabilized).
-
-4. **Negative-cycle detection** (optional but common):
-   For each edge $(u,v,w)$, if `dist[u] + w < dist[v]`, then a **negative cycle is reachable**.
-   *To extract a cycle:* follow `parent` from `v` **V times** to land inside the cycle; then keep following until you revisit a vertex, collecting the cycle.
-
-5. To get a shortest path to a target $t$ (if no negative cycle affects it), follow `parent[t]` backward to $i$.
+1. Pick a start vertex $i$.
+2. Set `dist[i] = 0` and `parent[i] = None`; for all other vertices $v \ne i$, set `dist[v] = ∞`.
+3. Do up to $V-1$ passes: in each pass, scan every directed edge $(u,v,w)$; if `dist[u] + w < dist[v]`, set `dist[v] = dist[u] + w` and `parent[v] = u`. If a full pass makes no changes, stop early.
+4. (Optional) Detect negative cycles: if any edge $(u,v,w)$ still satisfies `dist[u] + w < dist[v]`, a reachable negative cycle exists. To extract one, follow `parent` from $v$ for $V$ steps to enter the cycle, then continue until a vertex repeats, collecting the cycle.
+5. To get a shortest path to a target $t$ (when no relevant negative cycle exists), follow `parent[t]` backward to $i$.
 
 *Reference pseudocode (edge list):*
 
@@ -796,10 +859,10 @@ reconstruct(parent, t):
 
 *Sanity notes:*
 
-* **Time:** $O(VE)$ (each pass scans all edges; up to $V-1$ passes).
-* **Space:** $O(V)$ for `dist` and `parent`.
-* **Handles negative weights**; **detects** reachable negative cycles.
-* If a reachable **negative cycle** exists, true shortest paths to vertices it can reach are **undefined** (effectively $-\infty$).
+* The *time* complexity of Bellman–Ford is $O(VE)$ because each of the $V-1$ relaxation passes scans all edges; without this understanding, one might underestimate the cost of running it on dense graphs with many edges.
+* The *space* complexity is $O(V)$, needed for storing distance estimates and parent pointers; if this is not accounted for, memory use may be underestimated in large-scale applications such as road networks.
+* The algorithm *handles negative weights* correctly and can also *detect negative cycles* that are reachable from the source; without this feature, Dijkstra’s algorithm would produce incorrect results on graphs with negative edge costs.
+* When a reachable *negative cycle* exists, shortest paths to nodes that can be reached from it are undefined, effectively taking value $-\infty$; without recognizing this, results such as infinitely decreasing profit in arbitrage graphs would be misinterpreted as valid finite paths.
 
 **Example**
 
@@ -828,6 +891,7 @@ C → E (3)
 ```
 
 *Edges list:*
+
 `A→B(4), A→C(2), B→C(-1), B→D(2), C→B(1), C→D(5), C→E(3), D→E(-3)`
 
 *Relaxation trace (dist after each full pass; start A):*
@@ -867,11 +931,11 @@ Bellman–Ford would perform a $V$-th pass and still find an improvement (e.g.,
 
 **Applications**
 
-1. **Shortest paths with negative edges** (when Dijkstra/A\* don’t apply).
-2. **Arbitrage detection** in currency/markets by summing $\log$ weights along cycles.
-3. **Feasibility checks** in difference constraints (systems like $x_v - x_u \le w$).
-4. **Robust baseline** for verifying or initializing faster methods (e.g., Johnson’s algorithm for all-pairs).
-5. **Graphs with penalties/credits** where some transitions reduce accumulated cost.
+* In *shortest path problems with negative edges*, Bellman–Ford is applicable where Dijkstra or A\* fail, such as road networks with toll credits; without this method, these graphs cannot be handled correctly.
+* For *arbitrage detection* in currency or financial markets, converting exchange rates into $\log$ weights makes profit loops appear as negative cycles; without Bellman–Ford, such opportunities cannot be systematically identified.
+* In solving *difference constraints* of the form $x_v - x_u \leq w$, the algorithm checks feasibility by detecting whether any negative cycles exist; without this check, inconsistent scheduling or timing systems may go unnoticed.
+* As a *robust baseline*, Bellman–Ford verifies results of faster algorithms or initializes methods like Johnson’s for all-pairs shortest paths; without it, correctness guarantees in sparse-graph all-pairs problems would be weaker.
+* For *graphs with penalties or credits*, where some transitions decrease accumulated cost, Bellman–Ford models these adjustments accurately; without it, such systems—like transport discounts or energy recovery paths—cannot be represented properly.
 
 ##### Implementation
 
@@ -897,24 +961,22 @@ If $h$ is **admissible** (never overestimates) and **consistent** (triangle ineq
 
 **Core data structures**
 
-* **Open set**: min-priority queue keyed by $f$ (often called `open` or `frontier`).
-* **Closed set**: a set (or map) of nodes already expanded (finalized).
-* **`g` map**: best known cost-so-far to each node.
-* **`parent` map**: to reconstruct the path on success.
-* (Optional) **`h` cache** and a **tie-breaker** (e.g., prefer larger $g$ or smaller $h$ when $f$ ties).
+* The *open set* is a min-priority queue keyed by the evaluation function $f=g+h$, storing nodes pending expansion; without it, selecting the next most promising state in pathfinding would require inefficient linear scans.
+* The *closed set* contains nodes already expanded and finalized, preventing reprocessing; if omitted, the algorithm may revisit the same grid cells or graph states repeatedly, wasting time.
+* The *$g$ map* tracks the best known cost-so-far to each node, ensuring paths are only updated when improvements are found; without it, the algorithm cannot correctly accumulate and compare path costs.
+* The *parent map* stores predecessors so that a complete path can be reconstructed once the target is reached; if absent, the algorithm would output only a final distance without the actual route.
+* An optional *heuristic cache* and *tie-breaker* (such as preferring larger $g$ or smaller $h$ when $f$ ties) can improve efficiency and consistency; without these, the search may expand more nodes than necessary or return different paths under equivalent conditions.
 
 **Algorithm Steps**
 
-1. Initialize `open = {start}` with `g[start]=0`, `f[start]=h(start)`; `parent[start]=None`.
-2. While `open` is not empty:
-   a. Pop `u` with **smallest** `f(u)` from `open`.
-   b. If `u` is the **goal**, reconstruct the path via `parent` and return.
-   c. Add `u` to **closed**.
-   d. For each neighbor `v` of `u` with edge cost `w(u,v) ≥ 0`:
-
-   * `tentative = g[u] + w(u,v)`
-   * If `v` not in `g` or `tentative < g[v]`: update `parent[v]=u`, `g[v]=tentative`, `f[v]=g[v]+h(v)` and push `v` into `open` (even if it was there before with a worse key).
-3. If `open` empties without reaching the goal, no path exists.
+1. Put `start` in `open` (a min-priority queue by `f`); set `g[start]=0`, `f[start]=h(start)`, `parent[start]=None`; initialize `closed = ∅`.
+2. While `open` is nonempty, repeat steps 3–7.
+3. Pop the node `u` with the smallest `f(u)` from `open`.
+4. If `u` is the goal, reconstruct the path by following `parent` back to `start` and return it.
+5. Add `u` to `closed`.
+6. For each neighbor `v` of `u` with edge cost $w(u,v) \ge 0$, set `tentative = g[u] + w(u,v)`.
+7. If `v` not in `g` or `tentative < g[v]`, set `parent[v]=u`, `g[v]=tentative`, `f[v]=g[v]+h(v)`, and push `v` into `open` (even if it was already there with a worse key).
+8. If the loop ends because `open` is empty, no path exists.
 
 *Mark neighbors **when you enqueue them** (by storing their best `g`) to avoid duplicate work; with **consistent** $h$, any node popped from `open` is final and will not improve later.*
 
@@ -958,9 +1020,9 @@ reconstruct_path(parent, t):
 
 *Sanity notes:*
 
-* **Time:** Worst-case exponential; practically much faster with informative $h$.
-* **Space:** $O(V)$ for maps + PQ (A\* is memory-hungry).
-* **Special cases:** If $h \equiv 0$, A\* ≡ **Dijkstra**. If all edges cost 1 and $h \equiv 0$, it behaves like **BFS**.
+* The *time* complexity of A\* is worst-case exponential, though in practice it runs much faster when the heuristic $h$ provides useful guidance; without an informative heuristic, the search can expand nearly the entire graph, as in navigating a large grid without directional hints.
+* The *space* complexity is $O(V)$, covering the priority queue and bookkeeping maps, which makes A\* memory-intensive; without recognizing this, applications such as robotics pathfinding may exceed available memory on large maps.
+* In *special cases*, A\* reduces to Dijkstra’s algorithm when $h \equiv 0$, and further reduces to BFS when all edges have cost 1 and $h \equiv 0$; without this perspective, one might overlook how A\* generalizes these familiar shortest-path algorithms.
 
 **Visual walkthrough (grid with 4-neighborhood, Manhattan $h$)**
 
@@ -1023,53 +1085,50 @@ Step | Popped u | Inserted neighbors (v: g,h,f)                  | Note
 
 (Exact numbers depend on the specific grid and walls; shown for intuition.)
 
----
-
-### Heuristic design
+**Heuristic design**
 
 For **grids**:
 
-* **4-dir moves:** $h(n)=|x_n-x_g|+|y_n-y_g|$ (Manhattan).
-* **8-dir (diag cost √2):** **Octile**: $h=\Delta_{\max} + (\sqrt{2}-1)\Delta_{\min}$.
-* **Euclidean** when motion is continuous and diagonal is allowed.
+* *4-dir moves:* $h(n)=|x_n-x_g|+|y_n-y_g|$ (Manhattan).
+* *8-dir (diag cost √2):* **Octile**: $h=\Delta_{\max} + (\sqrt{2}-1)\Delta_{\min}$.
+* *Euclidean* when motion is continuous and diagonal is allowed.
 
 For **sliding puzzles (e.g., 8/15-puzzle)**:
 
-* **Misplaced tiles** (admissible, weak).
-* **Manhattan sum** (stronger).
-* **Linear conflict / pattern databases** (even stronger).
+**Misplaced tiles* (admissible, weak).
+* *Manhattan sum* (stronger).
+* *Linear conflict / pattern databases* (even stronger).
 
 **Admissible vs. consistent**
 
-* **Admissible:** $h(n) \leq h^\*(n)$ (true remaining cost). Guarantees optimality.
-* **Consistent (monotone):** $h(u) \le w(u,v) + h(v)$ for every edge.
-  Ensures $f$-values are nondecreasing along paths; once a node is popped, its `g` is final (no reopen).
+* An *admissible* heuristic satisfies $h(n) \leq h^*(n)$, meaning it never overestimates the true remaining cost, which guarantees that A\* finds an optimal path; without admissibility, the algorithm may return a suboptimal route, such as a longer-than-necessary driving path.
+* A *consistent (monotone)* heuristic obeys $h(u) \leq w(u,v) + h(v)$ for every edge, ensuring that $f$-values do not decrease along paths and that once a node is removed from the open set, its $g$-value is final; without consistency, nodes may need to be reopened, increasing complexity in searches like grid navigation.
 
 **Applications**
 
-1. **Pathfinding** in maps, games, robotics (shortest or least-risk routes).
-2. **Route planning** with road metrics (time, distance, tolls) and constraints.
-3. **Planning & scheduling** in AI as a general shortest-path in state spaces.
-4. **Puzzle solving** (8-puzzle, Sokoban variants) with domain-specific $h$.
-5. **Network optimization** where edge costs are nonnegative and heuristics exist.
+* In *pathfinding* for maps, games, and robotics, A\* computes shortest or least-risk routes by combining actual travel cost with heuristic guidance; without it, movement planning in virtual or physical environments becomes slower or less efficient.
+* For *route planning* with road metrics such as travel time, distance, or tolls, A\* incorporates these costs and constraints into its evaluation; without heuristic search, navigation systems must fall back to slower methods like plain Dijkstra.
+* In *planning and scheduling* tasks, A\* serves as a general shortest-path algorithm in abstract state spaces, supporting AI decision-making; without it, solving resource allocation or task sequencing problems may require less efficient exhaustive search.
+* In *puzzle solving* domains such as the 8-puzzle or Sokoban, A\* uses problem-specific heuristics to guide the search efficiently; without heuristics, the state space may grow exponentially and become impractical to explore.
+* For *network optimization* problems with nonnegative edge costs, A\* applies whenever a useful heuristic is available to speed convergence; without heuristics, computations on communication or logistics networks may take longer than necessary.
 
 **Variants & practical tweaks**
 
-* **Dijkstra** = A\* with $h \equiv 0$.
-* **Weighted A\***: use $f = g + \varepsilon h$ ($\varepsilon>1$) for faster, **bounded-suboptimal** search.
-* **A\*ε / Anytime A\***: start with $\varepsilon>1$, reduce over time to approach optimal.
-* **IDA\***: iterative deepening on $f$-bound; **much lower memory**, sometimes slower.
-* **RBFS / Fringe Search**: memory-bounded alternatives.
-* **Tie-breaking**: on equal $f$, prefer **larger $g$** (deeper) or **smaller $h$** to reduce node re-expansions.
-* **Closed-set policy**: if $h$ is **inconsistent**, allow **reopening** when a better `g` is found.
+* Viewing *Dijkstra* as A\* with $h \equiv 0$ shows that A\* generalizes the classic shortest-path algorithm; without this equivalence, the connection between uninformed and heuristic search may be overlooked.
+* In *Weighted A\**, the evaluation function becomes $f = g + \varepsilon h$ with $\varepsilon > 1$, trading exact optimality for faster performance with bounded suboptimality; without this variant, applications needing quick approximate routing, like logistics planning, would run slower.
+* The *A\*ε / Anytime A\** approach begins with $\varepsilon > 1$ for speed and gradually reduces it to converge toward optimal paths; without this strategy, incremental refinement in real-time systems like navigation aids is harder to achieve.
+* With *IDA\** (Iterative Deepening A\*), the search is conducted by gradually increasing an $f$-cost threshold, greatly reducing memory usage but sometimes increasing runtime; without it, problems like puzzle solving could exceed memory limits.
+* *RBFS and Fringe Search* are memory-bounded alternatives that manage recursion depth or fringe sets more carefully; without these, large state spaces in AI planning can overwhelm storage.
+* In *tie-breaking*, preferring larger $g$ or smaller $h$ when $f$ ties reduces unnecessary re-expansions; without careful tie-breaking, searches on uniform-cost grids may explore more nodes than needed.
+* For the *closed-set policy*, when heuristics are inconsistent, nodes must be reopened if a better $g$ value is found; without allowing this, the algorithm may miss shorter paths, as in road networks with varying travel times.
 
 **Pitfalls & tips**
 
-* **No negative edges.** A\* assumes $w(u,v) \ge 0$.
-* **Overestimating $h$** breaks optimality.
-* **Precision issues:** with floats, compare $f$ using small epsilons.
-* **State hashing:** ensure equal states hash equal (avoid exploding duplicates).
-* **Neighbor order:** doesn’t affect optimality, but affects performance/trace aesthetics.
+* The algorithm requires *non-negative edge weights* because A\* assumes $w(u,v) \ge 0$; without this, negative costs can cause nodes to be expanded too early, breaking correctness in applications like navigation.
+* If the heuristic *overestimates* actual costs, A\* loses its guarantee of optimality; without enforcing admissibility, a routing system may return a path that is faster to compute but longer in distance.
+* With *floating-point precision issues*, comparisons of $f$-values should include small epsilons to avoid instability; without this safeguard, two nearly equal paths may lead to inconsistent queue ordering in large-scale searches.
+* In *state hashing*, equivalent states must hash identically so duplicates are merged properly; without this, search in puzzles or planning domains may blow up due to treating the same state as multiple distinct ones.
+* While *neighbor order* does not affect correctness, it influences performance and the aesthetics of the returned path trace; without considering this, two identical problems might yield very different expansion sequences or outputs.
 
 **Implementation**
 
@@ -1088,8 +1147,6 @@ Suppose we have a graph that represents a network of houses. Weights represent t
 
 Such a subgraph is called a minimal spanning tree.
 
-#### Prim's Algorithm
-
 #### Prim’s Algorithm
 
 Prim’s algorithm builds a **minimum spanning tree (MST)** of a **weighted, undirected** graph by growing a tree from a start vertex. At each step it adds the **cheapest edge** that connects a vertex **inside** the tree to a vertex **outside** the tree.
@@ -1101,29 +1158,20 @@ To efficiently keep track of the construction, Prim’s algorithm employs two pr
 
 *Useful additions in practice:*
 
-* A **`key` map** where `key[v]` stores the lightest edge weight found so far that connects `v` to the current tree (∞ initially, except the start which is 0).
-* **Lazy updates** if your PQ has no decrease-key: push improved `(v, key[v])` again and skip stale pops.
-* **Component handling**: if the graph can be **disconnected**, either run Prim once per component (restarting at an unvisited vertex) or seed the PQ with **multiple starts** (`key=0`) to produce a **spanning forest**.
+* A *key map* stores, for each vertex, the lightest edge weight connecting it to the current spanning tree, initialized to infinity except for the starting vertex at zero; without this, Prim’s algorithm cannot efficiently track which edges should be added next to grow the tree.
+* With *lazy updates*, when the priority queue lacks a decrease-key operation, improved entries are simply pushed again and outdated ones are skipped upon popping; without this adjustment, priority queues become harder to manage, slowing down minimum spanning tree construction.
+* For *component handling*, if the graph is disconnected, Prim’s algorithm must either restart from each unvisited vertex or seed multiple starts with key values of zero to produce a spanning forest; without this, the algorithm would stop after one component, leaving parts of the graph unspanned.
 
 **Algorithm Steps**
 
-1. Begin from a starting vertex, \$i\$.
-
-2. Initialize `key[i] = 0`, `parent[i] = None`; for all other vertices `v`, set `key[v] = ∞`. Push \$i\$ into the min-priority queue keyed by `key`.
-
-3. While the priority queue is not empty:
-
-   1. **Extract** the vertex `u` with the **smallest** `key[u]`.
-   2. If `u` is already in the MST, continue; otherwise **add `u` to the MST** (insert into `in_mst`).
-      If `parent[u]` is not `None`, record the tree edge `(parent[u], u)`.
-   3. For **each neighbor** `v` of `u` with edge weight `w(u,v)`:
-
-      * If `v` is **not** in the MST **and** `w(u,v) < key[v]`, then **improve** the connection to `v`:
-        set `key[v] = w(u,v)`, `parent[v] = u`, and **push** `v` into the PQ keyed by the new `key[v]`.
-
-4. Continue until the queue is empty (or until all vertices are in the MST for a connected graph).
-
-5. The set of edges `{ (parent[v], v) : v ≠ i }` forms an MST; the MST **total weight** is `∑ key[v]` when each `v` is added.
+1. Pick a start vertex $i$.
+2. Set `key[i] = 0`, `parent[i] = None`; for all other vertices $v \ne i$, set `key[v] = ∞`; push $i$ into a min-priority queue keyed by `key`.
+3. While the priority queue is nonempty, repeat steps 4–6.
+4. Extract the vertex $u$ with the smallest `key[u]`.
+5. If $u$ is already in the MST, continue; otherwise add $u$ to the MST and, if `parent[u] ≠ None`, record the tree edge `(parent[u], u)`.
+6. For each neighbor $v$ of $u$ with weight $w(u,v)$, if $v$ is not in the MST and $w(u,v) < key[v]$, set `key[v] = w(u,v)`, set `parent[v] = u`, and push $v$ into the priority queue keyed by the new `key[v]`.
+7. Stop when the queue is empty or when all vertices are in the MST (for a connected graph).
+8. The edges $\{(parent[v], v) : v \ne i\}$ form an MST; the MST total weight equals $\sum key[v]$ at the moments when each $v$ is added.
 
 Vertices are **finalized when they are dequeued**: at that moment, `key[u]` is the **minimum** cost to connect `u` to the growing tree (by the **cut property**).
 
@@ -1162,12 +1210,10 @@ Prim(G, i):
 
 *Sanity notes:*
 
-* **Time:** with a binary heap, $O(E \log V)$; with a Fibonacci heap, $O(E + V \log V)$.
-  Dense graph (adjacency matrix + no PQ) variant runs in $O(V^2)$.
-* **Space:** $O(V)$ for `key`, `parent`, and MST bookkeeping.
-* **Graph type:** **weighted, undirected**; weights may be negative or positive (no restriction like Dijkstra).
-  If the graph is **disconnected**, Prim yields a **minimum spanning forest** (one tree per component).
-* **Uniqueness:** If all edge weights are **distinct**, the MST is **unique**.
+* The *time* complexity of Prim’s algorithm is $O(E \log V)$ with a binary heap, $O(E + V \log V)$ with a Fibonacci heap, and $O(V^2)$ for the dense-graph adjacency-matrix variant; without knowing this, one might apply the wrong implementation and get poor performance on sparse or dense networks.
+* The *space* complexity is $O(V)$, required for storing the key values, parent pointers, and bookkeeping to build the minimum spanning tree; without this allocation, the algorithm cannot track which edges belong to the MST.
+* The *graph type* handled is a weighted, undirected graph with no restrictions on edge weights being positive; without this flexibility, graphs with negative costs, such as energy-saving transitions, could not be processed.
+* In terms of *uniqueness*, if all edge weights are distinct, the minimum spanning tree is unique; without distinct weights, multiple MSTs may exist, such as in networks where two equally light connections are available.
 
 **Example**
 
@@ -1225,12 +1271,12 @@ A
 
 **Applications**
 
-1. **Network design** (least-cost wiring/piping/fiber) connecting all sites with minimal total cost.
-2. **Approximation for TSP** (metric TSP 2-approx via MST preorder walk).
-3. **Clustering (single-linkage)**: remove the **k−1** heaviest edges of the MST to form **k** clusters.
-4. **Image processing / segmentation**: MST over pixels/superpixels to find low-contrast boundaries.
-5. **Map generalization / simplification**: keep a connectivity backbone with minimal redundancy.
-6. **Circuit design / VLSI**: minimal interconnect length under simple models.
+* In *network design*, Prim’s or Kruskal’s MST construction connects all sites such as offices, cities, or data centers with the least total cost of wiring, piping, or fiber; without using MSTs, infrastructure plans risk including redundant and more expensive links.
+* As an *approximation for the traveling salesman problem (TSP)*, building an MST and performing a preorder walk of it yields a tour within twice the optimal length for metric TSP; without this approach, even approximate solutions for large instances may be much harder to obtain.
+* In *clustering with single linkage*, removing the $k-1$ heaviest edges of the MST partitions the graph into $k$ clusters; without this technique, hierarchical clustering may require recomputing pairwise distances repeatedly.
+* For *image processing and segmentation*, constructing an MST over pixels or superpixels highlights low-contrast boundaries as cut edges; without MST-based grouping, segmentations may fail to respect natural intensity or color edges.
+* In *map generalization and simplification*, the MST preserves a connectivity backbone with minimal redundancy, reducing complexity while maintaining essential routes; without this, simplified maps may show excessive or unnecessary detail.
+* In *circuit design and VLSI*, MSTs minimize interconnect length under simple wiring models, supporting efficient layouts; without this method, chip designs may consume more area and power due to avoidable wiring overhead.
 
 ##### Implementation
 
@@ -1240,39 +1286,32 @@ A
 *Implementation tip:*
 For **dense graphs** ($E \approx V^2$), skip heaps: store `key` in an array and, at each step, scan all non-MST vertices to pick the minimum `key` in $O(V)$. Overall $O(V^2)$ but often **faster in practice** on dense inputs due to low overhead.
 
-
-#### Kruskal's Algorithm
 #### Kruskal’s Algorithm
 
 Kruskal’s algorithm builds a **minimum spanning tree (MST)** for a **weighted, undirected** graph by sorting all edges by weight (lightest first) and repeatedly adding the next lightest edge that **does not create a cycle**. It grows the MST as a forest of trees that gradually merges until all vertices are connected.
 
 To efficiently keep track of the construction, Kruskal’s algorithm employs two primary data structures:
 
-* A **sorted edge list** (ascending by weight) that drives which edge to consider next.
-* A **Disjoint Set Union (DSU)**, also called **Union–Find**, to detect whether an edge’s endpoints are already in the same tree (cycle) or in different trees (safe to unite).
+* A *sorted edge list* arranged in ascending order of weights ensures that Kruskal’s algorithm always considers the lightest available edge next; without this ordering, the method cannot guarantee that the resulting spanning tree has minimum total weight.
+* A *Disjoint Set Union (DSU)*, or Union–Find structure, tracks which vertices belong to the same tree and prevents cycles by only uniting edges from different sets; without this mechanism, the algorithm could inadvertently form cycles instead of building a spanning tree.
 
 *Useful additions in practice:*
 
-* **Union–Find with path compression** + **union by rank/size** for near-constant-time merges and finds.
-* **Early stop**: in a connected graph with $V$ vertices, once you’ve added **$V-1$** edges, the MST is complete.
-* **Deterministic tie-breaking**: when equal weights occur, break ties consistently for reproducible MSTs.
-* **Disconnected graphs**: Kruskal naturally yields a **minimum spanning forest** (one MST per component).
+* Using *Union–Find with path compression and union by rank/size* enables near-constant-time merge and find operations, making Kruskal’s algorithm efficient; without these optimizations, edge processing in large graphs such as communication networks would slow down significantly.
+* Applying an *early stop* allows the algorithm to terminate once $V-1$ edges have been added in a connected graph, since the MST is then complete; without this, unnecessary edges are still considered, adding avoidable work.
+* Enforcing *deterministic tie-breaking* ensures that when multiple edges share equal weights, the same MST is consistently produced; without this, repeated runs on the same weighted graph might yield different but equally valid spanning trees, complicating reproducibility.
+* On *disconnected graphs*, Kruskal’s algorithm naturally outputs a minimum spanning forest with one tree per component; without this property, handling graphs such as multiple separate road systems would require additional adjustments.
 
 **Algorithm Steps**
 
-1. Gather all edges $E=\{(u,v,w)\}$ and **sort** them by weight $w$ (ascending).
-
-2. Initialize **DSU** with each vertex in its **own set**; `parent[v]=v`, `rank[v]=0`.
-
-3. Traverse the sorted edges one by one:
-
-   1. For edge $(u,v,w)$, compute `ru = find(u)`, `rv = find(v)` in DSU.
-   2. If `ru ≠ rv` (endpoints in **different** sets), **add** $(u,v,w)$ to the MST and **union** the sets.
-   3. Otherwise, **skip** the edge (it would create a cycle).
-
-4. Stop when either **$V-1$** edges are chosen (connected case) or edges are exhausted (forest case).
-
-5. The chosen edges form the **MST**; the **total weight** is the sum of their weights.
+1. Gather all edges $E=\{(u,v,w)\}$ and sort them by weight $w$ in ascending order.
+2. Initialize a DSU with each vertex in its own set: `parent[v]=v`, `rank[v]=0`.
+3. Traverse the edges in the sorted order.
+4. For the current edge $(u,v,w)$, compute `ru = find(u)` and `rv = find(v)` in the DSU.
+5. If `ru ≠ rv`, add $(u,v,w)$ to the MST and `union(ru, rv)`.
+6. If `ru = rv`, skip the edge (it would create a cycle).
+7. Continue until $V-1$ edges have been chosen (connected graph) or until all edges are processed (forest).
+8. The chosen edges form the MST; the total weight is the sum of their weights.
 
 By the **cycle** and **cut** properties of MSTs, selecting the minimum-weight edge that crosses any cut between components is always safe; rejecting edges that close a cycle preserves optimality.
 
@@ -1320,10 +1359,10 @@ union(x, y):
 
 *Sanity notes:*
 
-* **Time:** Sorting dominates: $O(E \log E)$ = $O(E \log V)$. DSU operations are almost $O(1)$ amortized (inverse Ackermann).
-* **Space:** $O(V)$ for DSU; $O(E)$ to store edges.
-* **Weights:** May be **negative or positive** (unlike Dijkstra); graph must be **undirected**.
-* **Uniqueness:** If all edge weights are **distinct**, the MST is **unique**.
+* The *time* complexity of Kruskal’s algorithm is dominated by sorting edges, which takes $O(E \log E)$, or equivalently $O(E \log V)$, while DSU operations run in near-constant amortized time; without recognizing this, one might wrongly attribute the main cost to the union–find structure rather than sorting.
+* The *space* complexity is $O(V)$ for the DSU arrays and $O(E)$ to store the edges; without this allocation, the algorithm cannot track connectivity or efficiently access candidate edges.
+* With respect to *weights*, Kruskal’s algorithm works on undirected graphs with either negative or positive weights; without this flexibility, cases like networks where some connections represent cost reductions could not be handled.
+* Regarding *uniqueness*, if all edge weights are distinct, the MST is guaranteed to be unique; without distinct weights, multiple equally valid minimum spanning trees may exist, such as in graphs where two different links have identical costs.
 
 **Example**
 
@@ -1377,12 +1416,12 @@ A
 
 **Applications**
 
-1. **Network design:** least-cost backbone (roads, fiber, pipes) connecting all sites with minimal total length/cost.
-2. **Clustering (single-linkage):** build MST, then cut the **k−1** heaviest edges to form **k** clusters.
-3. **Image segmentation:** graph-based grouping by intensity/feature differences via MST.
-4. **Approximation for metric TSP:** preorder walk of MST gives a 2-approx tour (with shortcutting).
-5. **Circuit/VLSI layout:** minimal interconnect under simple models.
-6. **Maze generation:** randomized Kruskal picks edges in random order subject to acyclicity.
+* In *network design*, Kruskal’s algorithm builds the least-cost backbone, such as roads, fiber, or pipelines, that connects all sites with minimal total expense; without MST construction, the resulting infrastructure may include redundant and costlier links.
+* For *clustering with single linkage*, constructing the MST and then removing the $k-1$ heaviest edges partitions the graph into $k$ clusters; without this method, grouping data points into clusters may require repeated and slower distance recalculations.
+* In *image segmentation*, applying Kruskal’s algorithm to pixel or superpixel graphs groups regions by intensity or feature similarity through MST formation; without MST-based grouping, boundaries between regions may be less well aligned with natural contrasts.
+* As an *approximation for the metric traveling salesman problem*, building an MST and performing a preorder walk (with shortcutting) yields a tour at most twice the optimal length; without this approach, near-optimal solutions would be harder to compute efficiently.
+* In *circuit and VLSI layout*, Kruskal’s algorithm finds minimal interconnect length under simplified wiring models; without this, designs may require more area and energy due to unnecessarily long connections.
+* For *maze generation*, a randomized Kruskal process selects edges in random order while maintaining acyclicity, producing mazes that remain connected without loops; without this structure, generated mazes could contain cycles or disconnected regions.
 
 **Implementation**
 
@@ -1404,25 +1443,19 @@ To efficiently keep track of the process (Kahn’s algorithm), we use:
 
 *Useful additions in practice:*
 
-* A **`visited_count`** (or length of `order`) to detect cycles: if, after processing, fewer than $V$ vertices were output, the graph has a cycle.
-* A **min-heap** instead of a FIFO queue to get the **lexicographically smallest** valid topological order.
-* A **DFS-based alternative**: run DFS and take vertices in **reverse postorder** (also $O(V+E)$); with DFS you detect cycles via a 3-color/stack state.
+* Maintaining a *visited count* or tracking the length of the output order lets you detect cycles, since producing fewer than $V$ vertices indicates that some could not be placed due to a cycle; without this check, algorithms like Kahn’s may silently return incomplete results on cyclic task graphs.
+* Using a *min-heap* instead of a simple FIFO queue ensures that, among available candidates, the smallest-indexed vertex is always chosen, yielding the lexicographically smallest valid topological order; without this modification, the output order depends on arbitrary queueing, which may vary between runs.
+* A *DFS-based alternative* computes a valid topological order by recording vertices in reverse postorder, also in $O(V+E)$ time, while detecting cycles via a three-color marking or recursion stack; without DFS, cycle detection must be handled separately in Kahn’s algorithm.
 
 **Algorithm Steps (Kahn’s algorithm)**
 
-1. Compute `indegree[v]` for every vertex $v$.
-
-2. Initialize a queue `Q` with **all** vertices of indegree 0.
-
-3. While `Q` is not empty:
-
-   1. **Dequeue** a vertex `u` and append it to `order`.
-   2. For each outgoing edge `u → v`:
-
-      * Decrement `indegree[v]` by 1.
-      * If `indegree[v]` becomes 0, **enqueue** `v`.
-
-4. If `len(order) < V`, a **cycle exists** (topological order does not exist). Otherwise, `order` is a valid topological ordering.
+1. Compute `indegree[v]` for every vertex $v$; set `order = []`.
+2. Initialize a queue `Q` with all vertices of indegree 0.
+3. While `Q` is nonempty, repeat steps 4–6.
+4. Dequeue a vertex `u` from `Q` and append it to `order`.
+5. For each outgoing edge `u → v`, decrement `indegree[v]` by 1.
+6. If `indegree[v]` becomes 0, enqueue `v` into `Q`.
+7. If `len(order) < V` at the end, a cycle exists and no topological order; otherwise `order` is a valid topological ordering.
 
 *Reference pseudocode (adjacency-list graph):*
 
@@ -1457,9 +1490,9 @@ TopoSort_Kahn(G):
 
 *Sanity notes:*
 
-* **Time:** $O(V + E)$ — each vertex enqueued once; each edge decreases an indegree once.
-* **Space:** $O(V)$ — for indegrees, queue, and output.
-* **Input:** Must be a **DAG**; if a cycle exists, **no** topological order exists.
+* The *time* complexity of topological sorting is $O(V+E)$ because each vertex is enqueued exactly once and every edge is processed once when its indegree decreases; without this efficiency, ordering tasks in large dependency graphs would be slower.
+* The *space* complexity is $O(V)$, required for storing indegree counts, the processing queue, and the final output order; without allocating this space, the algorithm cannot track which vertices are ready to be placed.
+* The required *input* is a directed acyclic graph (DAG), since if a cycle exists, no valid topological order is possible; without this restriction, attempts to schedule cyclic dependencies, such as tasks that mutually depend on each other, will fail.
 
 **Example**
 
@@ -1537,12 +1570,12 @@ Here `indeg[X]=indeg[Y]=1` initially; `Q` starts empty ⇒ `order=[]` and `len(o
 
 **Applications**
 
-1. **Build systems / compilation** (compile a file only after its prerequisites).
-2. **Course scheduling** (take courses in an order respecting prerequisites).
-3. **Data pipelines / DAG workflows** (Airflow, Spark DAGs): execute stages when inputs are ready.
-4. **Dependency resolution** (package managers, container layers).
-5. **Dynamic programming on DAGs** (longest/shortest path, path counting) by processing vertices in topological order.
-6. **Circuit evaluation / spreadsheets** (evaluate cells/nets after their dependencies).
+* In *build systems and compilation*, topological sorting ensures that each file is compiled only after its prerequisites are compiled; without it, a build may fail by trying to compile a module before its dependencies are available.
+* For *course scheduling*, topological order provides a valid sequence in which to take courses respecting prerequisite constraints; without it, students may be assigned courses they are not yet eligible to take.
+* In *data pipelines and DAG workflows* such as Airflow or Spark, tasks are executed when their inputs are ready by following a topological order; without this, pipeline stages might run prematurely and fail due to missing inputs.
+* For *dependency resolution* in package managers or container systems, topological sorting installs components in an order that respects their dependencies; without it, software may be installed in the wrong sequence and break.
+* In *dynamic programming on DAGs*, problems like longest path, shortest path, or path counting are solved efficiently by processing vertices in topological order; without this ordering, subproblems may be computed before their dependencies are solved.
+* For *circuit evaluation or spreadsheets*, topological order ensures that each cell or net is evaluated only after its referenced inputs; without it, computations could use undefined or incomplete values.
 
 **Implementation**
 

From 006cf09d56cca22be25ebbe9b618e2cbad46cfa9 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:01:59 +0200
Subject: [PATCH 34/48] Update graph matrix formatting in notes

---
 notes/graphs.md | 11 ++++++-----
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/notes/graphs.md b/notes/graphs.md
index 26f997e..ace03b4 100644
--- a/notes/graphs.md
+++ b/notes/graphs.md
@@ -86,16 +86,17 @@ An adjacency matrix represents a graph $G$ with $V$ vertices as a two-dimensiona
 
 Here, the matrix indicates a graph with vertices A to D. For instance, vertex A connects with vertices B and D, hence the respective 1s in the matrix.
 
-**Matrix (large ASCII layout):**
+**Matrix:**
 
 ```
+4x4
             Columns →
           A   B   C   D
         +---+---+---+---+
-Row  A | 0 | 1 | 0 | 1 |
-↓   B | 1 | 0 | 1 | 0 |
-    C | 0 | 1 | 0 | 1 |
-    D | 1 | 0 | 1 | 0 |
+Row   A | 0 | 1 | 0 | 1 |
+↓     B | 1 | 0 | 1 | 0 |
+      C | 0 | 1 | 0 | 1 |
+      D | 1 | 0 | 1 | 0 |
         +---+---+---+---+
 ```
 

From 0c13478862154bbe72889cef6bd3e23e3efccd68 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:05:15 +0200
Subject: [PATCH 35/48] Update graphs.md

---
 notes/graphs.md | 73 +++++++++++++++++++++++++------------------------
 1 file changed, 37 insertions(+), 36 deletions(-)

diff --git a/notes/graphs.md b/notes/graphs.md
index ace03b4..58c2409 100644
--- a/notes/graphs.md
+++ b/notes/graphs.md
@@ -314,7 +314,7 @@ What does it mean to traverse a graph?
 Graph traversal **can** be done in a way that visits *all* vertices and edges (like a full DFS/BFS), but it doesn’t *have to*.
 
 * If you start DFS or BFS from a single source vertex, you’ll only reach the **connected component** containing that vertex. Any vertices in other components won’t be visited.
-* Some algorithms (like shortest path searches, A\*, or even partial DFS) intentionally stop early, meaning not all vertices or edges are visited.
+* Some algorithms (like shortest path searches, A*, or even partial DFS) intentionally stop early, meaning not all vertices or edges are visited.
 * In weighted or directed graphs, you may also skip certain edges depending on the traversal rules.
 
 So the precise way to answer that question is:
@@ -784,9 +784,9 @@ Shortest path A→E: A → C → B → E  (total cost 4)
 * In *single-source shortest paths* with non-negative edge weights, Dijkstra’s algorithm efficiently finds minimum-cost routes in settings like roads, communication networks, or transit systems; without it, travel times or costs could not be computed reliably when distances vary.
 * For *navigation and routing*, stopping the search as soon as the destination is extracted from the priority queue avoids unnecessary work; without this early stop, route planning in a road map continues exploring irrelevant regions of the network.
 * In *network planning and quality of service (QoS)*, Dijkstra selects minimum-latency or minimum-cost routes when weights are additive and non-negative; without this, designing efficient data or logistics paths becomes more error-prone.
-* As a *building block*, Dijkstra underlies algorithms like A\* (with zero heuristic), Johnson’s algorithm for all-pairs shortest paths in sparse graphs, and $k$-shortest path variants; without it, these higher-level methods would lack a reliable core procedure.
+* As a *building block*, Dijkstra underlies algorithms like A* (with zero heuristic), Johnson’s algorithm for all-pairs shortest paths in sparse graphs, and $k$-shortest path variants; without it, these higher-level methods would lack a reliable core procedure.
 * In *multi-source Dijkstra*, initializing the priority queue with several starting nodes at distance zero solves nearest-facility queries, such as finding the closest hospital; without this extension, repeated single-source runs would waste time.
-* As a *label-setting baseline*, Dijkstra provides the reference solution against which heuristics like A\*, ALT landmarks, or contraction hierarchies are compared; without this baseline, heuristic correctness and performance cannot be properly evaluated.
+* As a *label-setting baseline*, Dijkstra provides the reference solution against which heuristics like A*, ALT landmarks, or contraction hierarchies are compared; without this baseline, heuristic correctness and performance cannot be properly evaluated.
 * For *grid pathfinding with terrain costs*, Dijkstra handles non-negative cell costs when no admissible heuristic is available; without it, finding a least-effort path across weighted terrain would require less efficient exhaustive search.
 
 **Implementation**
@@ -932,7 +932,7 @@ Bellman–Ford would perform a $V$-th pass and still find an improvement (e.g.,
 
 **Applications**
 
-* In *shortest path problems with negative edges*, Bellman–Ford is applicable where Dijkstra or A\* fail, such as road networks with toll credits; without this method, these graphs cannot be handled correctly.
+* In *shortest path problems with negative edges*, Bellman–Ford is applicable where Dijkstra or A* fail, such as road networks with toll credits; without this method, these graphs cannot be handled correctly.
 * For *arbitrage detection* in currency or financial markets, converting exchange rates into $\log$ weights makes profit loops appear as negative cycles; without Bellman–Ford, such opportunities cannot be systematically identified.
 * In solving *difference constraints* of the form $x_v - x_u \leq w$, the algorithm checks feasibility by detecting whether any negative cycles exist; without this check, inconsistent scheduling or timing systems may go unnoticed.
 * As a *robust baseline*, Bellman–Ford verifies results of faster algorithms or initializes methods like Johnson’s for all-pairs shortest paths; without it, correctness guarantees in sparse-graph all-pairs problems would be weaker.
@@ -947,7 +947,7 @@ Bellman–Ford would perform a $V$-th pass and still find an improvement (e.g.,
 
 #### A* (A-Star) Algorithm
 
-A\* is a best-first search that finds a **least-cost path** from a start to a goal by minimizing
+A* is a best-first search that finds a **least-cost path** from a start to a goal by minimizing
 
 $$
 f(n) = g(n) + h(n),
@@ -958,7 +958,7 @@ where:
 * $g(n)$ = cost from start to $n$ (so far),
 * $h(n)$ = heuristic estimate of the remaining cost from $n$ to the goal.
 
-If $h$ is **admissible** (never overestimates) and **consistent** (triangle inequality), A\* is **optimal** and never needs to “reopen” closed nodes.
+If $h$ is **admissible** (never overestimates) and **consistent** (triangle inequality), A* is **optimal** and never needs to “reopen” closed nodes.
 
 **Core data structures**
 
@@ -1021,23 +1021,23 @@ reconstruct_path(parent, t):
 
 *Sanity notes:*
 
-* The *time* complexity of A\* is worst-case exponential, though in practice it runs much faster when the heuristic $h$ provides useful guidance; without an informative heuristic, the search can expand nearly the entire graph, as in navigating a large grid without directional hints.
-* The *space* complexity is $O(V)$, covering the priority queue and bookkeeping maps, which makes A\* memory-intensive; without recognizing this, applications such as robotics pathfinding may exceed available memory on large maps.
-* In *special cases*, A\* reduces to Dijkstra’s algorithm when $h \equiv 0$, and further reduces to BFS when all edges have cost 1 and $h \equiv 0$; without this perspective, one might overlook how A\* generalizes these familiar shortest-path algorithms.
+* The *time* complexity of A* is worst-case exponential, though in practice it runs much faster when the heuristic $h$ provides useful guidance; without an informative heuristic, the search can expand nearly the entire graph, as in navigating a large grid without directional hints.
+* The *space* complexity is $O(V)$, covering the priority queue and bookkeeping maps, which makes A* memory-intensive; without recognizing this, applications such as robotics pathfinding may exceed available memory on large maps.
+* In *special cases*, A* reduces to Dijkstra’s algorithm when $h \equiv 0$, and further reduces to BFS when all edges have cost 1 and $h \equiv 0$; without this perspective, one might overlook how A* generalizes these familiar shortest-path algorithms.
 
 **Visual walkthrough (grid with 4-neighborhood, Manhattan $h$)**
 
 Legend: `S` start, `G` goal, `#` wall, `.` free, `◉` expanded (closed), `•` frontier (open), `×` final path
 
 ```
-Row/Col →   1 2 3 4 5 6 7 8 9
-           ┌───────────────────┐
-1   S  .  .  .  .  #  .  .  .  │
-2   .  #  #  .  .  #  .  #  .  │
-3   .  .  .  .  .  .  .  #  .  │
-4   #  .  #  #  .  #  .  .  .  │
-5   .  .  .  #  .  .  .  #  G  │
-           └───────────────────┘
+Row/Col →    1  2  3  4  5  6  7  8  9
+           ┌────────────────────────────┐
+         1 │ S  .  .  .  .  #  .  .  .  │
+         2 │ .  #  #  .  .  #  .  #  .  │
+         3 │ .  .  .  .  .  .  .  #  .  │
+         4 │ #  .  #  #  .  #  .  .  .  │
+         5 │ .  .  .  #  .  .  .  #  G  │
+           └────────────────────────────┘
 Movement cost = 1 per step; 4-dir moves; h = Manhattan distance
 ```
 
@@ -1062,13 +1062,14 @@ Nodes near the straight line to G are preferred over detours around '#'.
 
 ```
 Final path (example rendering):
-           ┌───────────────────┐
-1   ×  ×  ×  ×  .  #  .  .  .  │
-2   ×  #  #  ×  ×  #  .  #  .  │
-3   ×  ×  ×  ×  ×  ×  ×  #  .  │
-4   #  .  #  #  ×  #  ×  ×  ×  │
-5   .  .  .  #  ×  ×  ×  #  G  │
-           └───────────────────┘
+Row/Col →     1  2  3  4  5  6  7  8  9
+           ┌─────────────────────────────┐
+         1 │  ×  ×  ×  ×  .  #  .  .  .  │
+         2 │  ×  #  #  ×  ×  #  .  #  .  │
+         3 │  ×  ×  ×  ×  ×  ×  ×  #  .  │
+         4 │  #  .  #  #  ×  #  ×  ×  ×  │
+         5 │  .  .  .  #  ×  ×  ×  #  G  │
+           └─────────────────────────────┘
 Path length (g at G) equals number of × steps (optimal with admissible/consistent h).
 ```
 
@@ -1102,31 +1103,31 @@ For **sliding puzzles (e.g., 8/15-puzzle)**:
 
 **Admissible vs. consistent**
 
-* An *admissible* heuristic satisfies $h(n) \leq h^*(n)$, meaning it never overestimates the true remaining cost, which guarantees that A\* finds an optimal path; without admissibility, the algorithm may return a suboptimal route, such as a longer-than-necessary driving path.
+* An *admissible* heuristic satisfies $h(n) \leq h^*(n)$, meaning it never overestimates the true remaining cost, which guarantees that A* finds an optimal path; without admissibility, the algorithm may return a suboptimal route, such as a longer-than-necessary driving path.
 * A *consistent (monotone)* heuristic obeys $h(u) \leq w(u,v) + h(v)$ for every edge, ensuring that $f$-values do not decrease along paths and that once a node is removed from the open set, its $g$-value is final; without consistency, nodes may need to be reopened, increasing complexity in searches like grid navigation.
 
 **Applications**
 
-* In *pathfinding* for maps, games, and robotics, A\* computes shortest or least-risk routes by combining actual travel cost with heuristic guidance; without it, movement planning in virtual or physical environments becomes slower or less efficient.
-* For *route planning* with road metrics such as travel time, distance, or tolls, A\* incorporates these costs and constraints into its evaluation; without heuristic search, navigation systems must fall back to slower methods like plain Dijkstra.
-* In *planning and scheduling* tasks, A\* serves as a general shortest-path algorithm in abstract state spaces, supporting AI decision-making; without it, solving resource allocation or task sequencing problems may require less efficient exhaustive search.
-* In *puzzle solving* domains such as the 8-puzzle or Sokoban, A\* uses problem-specific heuristics to guide the search efficiently; without heuristics, the state space may grow exponentially and become impractical to explore.
-* For *network optimization* problems with nonnegative edge costs, A\* applies whenever a useful heuristic is available to speed convergence; without heuristics, computations on communication or logistics networks may take longer than necessary.
+* In *pathfinding* for maps, games, and robotics, A* computes shortest or least-risk routes by combining actual travel cost with heuristic guidance; without it, movement planning in virtual or physical environments becomes slower or less efficient.
+* For *route planning* with road metrics such as travel time, distance, or tolls, A* incorporates these costs and constraints into its evaluation; without heuristic search, navigation systems must fall back to slower methods like plain Dijkstra.
+* In *planning and scheduling* tasks, A* serves as a general shortest-path algorithm in abstract state spaces, supporting AI decision-making; without it, solving resource allocation or task sequencing problems may require less efficient exhaustive search.
+* In *puzzle solving* domains such as the 8-puzzle or Sokoban, A* uses problem-specific heuristics to guide the search efficiently; without heuristics, the state space may grow exponentially and become impractical to explore.
+* For *network optimization* problems with nonnegative edge costs, A* applies whenever a useful heuristic is available to speed convergence; without heuristics, computations on communication or logistics networks may take longer than necessary.
 
 **Variants & practical tweaks**
 
-* Viewing *Dijkstra* as A\* with $h \equiv 0$ shows that A\* generalizes the classic shortest-path algorithm; without this equivalence, the connection between uninformed and heuristic search may be overlooked.
-* In *Weighted A\**, the evaluation function becomes $f = g + \varepsilon h$ with $\varepsilon > 1$, trading exact optimality for faster performance with bounded suboptimality; without this variant, applications needing quick approximate routing, like logistics planning, would run slower.
-* The *A\*ε / Anytime A\** approach begins with $\varepsilon > 1$ for speed and gradually reduces it to converge toward optimal paths; without this strategy, incremental refinement in real-time systems like navigation aids is harder to achieve.
-* With *IDA\** (Iterative Deepening A\*), the search is conducted by gradually increasing an $f$-cost threshold, greatly reducing memory usage but sometimes increasing runtime; without it, problems like puzzle solving could exceed memory limits.
+* Viewing *Dijkstra* as A* with $h \equiv 0$ shows that A* generalizes the classic shortest-path algorithm; without this equivalence, the connection between uninformed and heuristic search may be overlooked.
+* In *Weighted A**, the evaluation function becomes $f = g + \varepsilon h$ with $\varepsilon > 1$, trading exact optimality for faster performance with bounded suboptimality; without this variant, applications needing quick approximate routing, like logistics planning, would run slower.
+* The *A*ε / Anytime A** approach begins with $\varepsilon > 1$ for speed and gradually reduces it to converge toward optimal paths; without this strategy, incremental refinement in real-time systems like navigation aids is harder to achieve.
+* With *IDA** (Iterative Deepening A*), the search is conducted by gradually increasing an $f$-cost threshold, greatly reducing memory usage but sometimes increasing runtime; without it, problems like puzzle solving could exceed memory limits.
 * *RBFS and Fringe Search* are memory-bounded alternatives that manage recursion depth or fringe sets more carefully; without these, large state spaces in AI planning can overwhelm storage.
 * In *tie-breaking*, preferring larger $g$ or smaller $h$ when $f$ ties reduces unnecessary re-expansions; without careful tie-breaking, searches on uniform-cost grids may explore more nodes than needed.
 * For the *closed-set policy*, when heuristics are inconsistent, nodes must be reopened if a better $g$ value is found; without allowing this, the algorithm may miss shorter paths, as in road networks with varying travel times.
 
 **Pitfalls & tips**
 
-* The algorithm requires *non-negative edge weights* because A\* assumes $w(u,v) \ge 0$; without this, negative costs can cause nodes to be expanded too early, breaking correctness in applications like navigation.
-* If the heuristic *overestimates* actual costs, A\* loses its guarantee of optimality; without enforcing admissibility, a routing system may return a path that is faster to compute but longer in distance.
+* The algorithm requires *non-negative edge weights* because A* assumes $w(u,v) \ge 0$; without this, negative costs can cause nodes to be expanded too early, breaking correctness in applications like navigation.
+* If the heuristic *overestimates* actual costs, A* loses its guarantee of optimality; without enforcing admissibility, a routing system may return a path that is faster to compute but longer in distance.
 * With *floating-point precision issues*, comparisons of $f$-values should include small epsilons to avoid instability; without this safeguard, two nearly equal paths may lead to inconsistent queue ordering in large-scale searches.
 * In *state hashing*, equivalent states must hash identically so duplicates are merged properly; without this, search in puzzles or planning domains may blow up due to treating the same state as multiple distinct ones.
 * While *neighbor order* does not affect correctness, it influences performance and the aesthetics of the returned path trace; without considering this, two identical problems might yield very different expansion sequences or outputs.

From 5cc3c0f261e52e6a9840efa36fa6fe6935758c12 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:09:26 +0200
Subject: [PATCH 36/48] Update graphs.md

---
 notes/graphs.md | 44 +++++++++++++++++++++++---------------------
 1 file changed, 23 insertions(+), 21 deletions(-)

diff --git a/notes/graphs.md b/notes/graphs.md
index 58c2409..4ef4fef 100644
--- a/notes/graphs.md
+++ b/notes/graphs.md
@@ -1222,19 +1222,21 @@ Prim(G, i):
 Undirected, weighted graph; start at **A**. Edge weights shown on links.
 
 ```
-                          ┌────────┐
-                          │   A    │
-                          └─┬──┬───┘
-                         4/   │1
-                       ┌──    │    ──┐
-                 ┌─────▼──┐  │     ┌▼──────┐
-                 │   B    │──┘2    │   C   │
-                 └───┬────┘        └──┬────┘
-                   1 │               4 │
-                     │                 │
-                 ┌───▼────┐      3  ┌──▼───┐
-                 │   E    │────────│   D   │
-                 └────────┘         └──────┘
+                           ┌────────┐
+                           │   A    │
+                           └─┬──┬───┘
+                            4│  │1
+                             │  │
+                 ┌───────────┘  └───────────┐
+                 │                          │
+            ┌────▼────┐                ┌────▼────┐
+            │   B     │◄──────2────────│   C     │
+            └───┬─────┘                └─────┬───┘
+              1 │                          4 │
+                │                            │
+            ┌───▼────┐      3           ┌────▼───┐
+            │   E    │─────────────────▶│   D    │
+            └────────┘                  └────────┘
 
 Edges: A–B(4), A–C(1), C–B(2), B–E(1), C–D(4), D–E(3)
 ```
@@ -1372,15 +1374,15 @@ Undirected, weighted graph (we’ll draw the key edges clearly and list the rest
 Start with all vertices as separate sets: `{A} {B} {C} {D} {E} {F}`.
 
 ```
-Top row:                 A────────4────────B────────2────────C
-                         │                     │
-                         │                     │
-                         7                     3
-                         │                     │
-Bottom row:              F────────1────────E───┴──────────────D
-                               (E–F)
+Top row:       A────────4────────B────────2────────C
+               │                 │                 │
+               │                 │                 │
+               7                 3                 5
+               │                 │                 │
+Bottom row:    F────────1────────E────────6────────D
+
 Other edges (not all drawn to keep the picture clean):
-A–C(4), B–D(5), C–D(5), C–E(5), D–E(6), D–F(2)
+A–C(4), B–D(5), C–E(5), D–E(6), D–F(2)
 ```
 
 *Sorted edge list (ascending):*

From cafa6f9c0789895b5548c6a0811b4a537892e8aa Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:20:24 +0200
Subject: [PATCH 37/48] Update notes/sorting.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/sorting.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index 70345f8..4296edf 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -198,8 +198,8 @@ Bubble sort is **stable**.
 
 **Implementation**
 
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/selection_sort/src/bubble_sort.cpp)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/selection_sort/src/bubble_sort.py)
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/bubble_sort/src/bubble_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/bubble_sort/src/bubble_sort.py)
 
 ### Selection Sort
 

From 61b316f8e98b8e484ec3bb7adbf7782a447f3183 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:20:33 +0200
Subject: [PATCH 38/48] Update notes/sorting.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/sorting.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index 4296edf..35202d1 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -379,7 +379,7 @@ Before:  [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
 After:   [ 11 ][ 12 ][ 13 ][ 5 ][ 6 ]
 ```
 
-✔ Sorted portion: \[11, 12, 13]
+✔ Sorted portion: [11, 12, 13]
 
 **Pass 3: Insert 5**
 

From 1010f38b8157cc38f610999a7a775a4c1de4658c Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:20:46 +0200
Subject: [PATCH 39/48] Update notes/sorting.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/sorting.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index 35202d1..44f7183 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -852,8 +852,8 @@ Step 3 (100s): [2  24  45  66  75  90  170  802]
 
 **Implementation**
 
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/heap_sort/src/radix_sort.cpp)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/heap_sort/src/radix_sort.py)
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/radix_sort/src/radix_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/radix_sort/src/radix_sort.py)
 
 ### Counting Sort
 

From 3ea08f15c02217b0a3fe98b128eb5b4284b52abc Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:20:53 +0200
Subject: [PATCH 40/48] Update notes/sorting.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/sorting.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index 44f7183..7b4a2fe 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -941,8 +941,8 @@ Counting Sort is **stable** if we place elements **from right to left** into the
 
 **Implementation**
 
-* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/heap_sort/src/counting_sort.cpp)
-* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/heap_sort/src/counting_sort.py)
+* [C++](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/cpp/counting_sort/src/counting_sort.cpp)
+* [Python](https://github.com/djeada/Algorithms-And-Data-Structures/blob/master/src/sorting/python/counting_sort/src/counting_sort.py)
 
 ### Comparison Table
 

From e6ade7bc14cf772bf2d15b9c4c6b216df5fbe5ec Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:21:00 +0200
Subject: [PATCH 41/48] Update notes/sorting.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/sorting.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index 7b4a2fe..254dae8 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -594,7 +594,7 @@ Pass 4:  [ 10 | 30 ] [40] [50] [70] [90  80]
 ### Heap sort
 
 Heap Sort is a **comparison-based sorting algorithm** that uses a special data structure called a **binary heap**.
-It is efficient, with guaranteed \$O(n \log n)\$ performance, and sorts **in-place** (no extra array needed).
+It is efficient, with guaranteed $O(n \log n)$ performance, and sorts **in-place** (no extra array needed).
 
 The basic idea:
 

From b75cacfc5c2511313599d7a5ac10576516fc6bf3 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:21:11 +0200
Subject: [PATCH 42/48] Update notes/greedy_algorithms.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/greedy_algorithms.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index cda4c6d..ebf1afe 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -36,7 +36,7 @@ That third rule prevents dead ends and is exactly what exchange arguments rely o
 
 ### Reachability on a line
 
-- You stand at square $0$ on squares $0,1,dots,n-1$.
+- You stand at square $0$ on squares $0,1,\ldots,n-1$.
 - Each square $i$ has a jump power $a\[i]$. From $i$ you may land on any of $i+1, i+2, \dots, i+a\[i]$.
 - Goal: decide if you can reach $n-1$; if not, report the furthest reachable square.
 

From 50f1445f05fca1d58d8d48cb06b21c97b909a9f5 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:21:22 +0200
Subject: [PATCH 43/48] Update notes/greedy_algorithms.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/greedy_algorithms.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index ebf1afe..1a6a928 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -709,13 +709,13 @@ return order, Lmax
 
 ### Huffman coding
 
-You have symbols that occur with known frequencies \$f\_i>0\$ and \$\sum\_i f\_i=1\$ (if you start with counts, first normalize by their total). The goal is to assign each symbol a binary codeword so that no codeword is a prefix of another (a **prefix code**, i.e., uniquely decodable without separators), and the average length
+You have symbols that occur with known frequencies $f_i>0$ and $\sum_i f_i=1$ (if you start with counts, first normalize by their total). The goal is to assign each symbol a binary codeword so that no codeword is a prefix of another (a **prefix code**, i.e., uniquely decodable without separators), and the average length
 
 $$
 \mathbb{E}[L]=\sum_i f_i\,L_i
 $$
 
-is as small as possible. Prefix codes correspond exactly to **full binary trees** (every internal node has two children) whose leaves are the symbols and whose leaf depths equal the codeword lengths \$L\_i\$. The **Kraft inequality** \$\sum\_i 2^{-L\_i}\le 1\$ characterizes feasibility; equality holds for full trees (so an optimal prefix code “fills” the inequality).
+is as small as possible. Prefix codes correspond exactly to **full binary trees** (every internal node has two children) whose leaves are the symbols and whose leaf depths equal the codeword lengths $L_i$. The **Kraft inequality** $\sum_i 2^{-L_i}\le 1$ characterizes feasibility; equality holds for full trees (so an optimal prefix code “fills” the inequality).
 
 **Example inputs and outputs**
 

From 370d16d7263afa7c2507090fd76b320354212521 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:21:35 +0200
Subject: [PATCH 44/48] Update notes/brain_teasers.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/brain_teasers.md | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/notes/brain_teasers.md b/notes/brain_teasers.md
index 4b65655..4f0596c 100644
--- a/notes/brain_teasers.md
+++ b/notes/brain_teasers.md
@@ -1,8 +1,4 @@
-todo:
-
-- heaps
-- fast and slow pointer for lists
-- tree traversal in order, post oreder etc.
+- tree traversal in order, post order etc.
 
 ## Solving Programming Brain Teasers
 

From 9916fa6a5779c7120196a0c0098432ce1f6bb711 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:26:42 +0200
Subject: [PATCH 45/48] Update notes/sorting.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/sorting.md | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index 254dae8..c08067a 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -177,8 +177,7 @@ Sorted! ✅
 
 **Optimizations**
 
-* **Early Exit**: If in a full pass **no swaps occur**, the array is already sorted, and the algorithm can terminate early.
-* This makes Bubble Sort’s **best case** much faster (\$O(n)\$).
+* This makes Bubble Sort’s **best case** much faster ($O(n)$).
 
 **Stability**
 

From dc630e8eb048200599dbe6a45e3299f83984acff Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:27:13 +0200
Subject: [PATCH 46/48] Update greedy_algorithms.md

---
 notes/greedy_algorithms.md | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/notes/greedy_algorithms.md b/notes/greedy_algorithms.md
index 1a6a928..576bf9b 100644
--- a/notes/greedy_algorithms.md
+++ b/notes/greedy_algorithms.md
@@ -725,23 +725,23 @@ $$
 A:0.40,\quad B:0.20,\quad C:0.20,\quad D:0.10,\quad E:0.10.
 $$
 
-A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths \$L\_A,\dots,L\_E\$, plus a concrete codebook. (There can be multiple optimal codebooks when there are ties in frequencies; their **lengths** agree, though the exact bitstrings may differ.)
+A valid optimal answer will be a prefix code with expected length as small as possible. We will compute the exact minimum and one optimal set of lengths $L\_A,\dots,L\_E$, plus a concrete codebook. (There can be multiple optimal codebooks when there are ties in frequencies; their **lengths** agree, though the exact bitstrings may differ.)
 
 **Baseline**
 
-One conceptual baseline is to enumerate all full binary trees with five labeled leaves and pick the one minimizing \$\sum f\_i,L\_i\$. That is correct but explodes combinatorially as the number of symbols grows. A simpler but usually suboptimal baseline is to give every symbol the same length \$\lceil \log\_2 5\rceil=3\$. That fixed-length code has \$\mathbb{E}\[L]=3\$.
+One conceptual baseline is to enumerate all full binary trees with five labeled leaves and pick the one minimizing $\sum f\_i,L\_i$. That is correct but explodes combinatorially as the number of symbols grows. A simpler but usually suboptimal baseline is to give every symbol the same length $\lceil \log\_2 5\rceil=3$. That fixed-length code has $\mathbb{E}\[L]=3$.
 
 **Greedy Approach**
 
-Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights \$p\$ and \$q\$, you create a parent of weight \$p+q\$. **Why does this change the objective by exactly \$p+q\$?** Every leaf in those two subtrees increases its depth (and thus its code length) by \$1\$, so the total increase in \$\sum f\_i L\_i\$ is \$\sum\_{\ell\in\text{subtrees}} f\_\ell\cdot 1=(p+q)\$ by definition of \$p\$ and \$q\$. Summing over all merges yields the final cost:
+Huffman’s rule repeats one tiny step: always merge the two least frequent items. When you merge two “symbols” with weights $p$ and $q$, you create a parent of weight $p+q$. **Why does this change the objective by exactly $p+q$?** Every leaf in those two subtrees increases its depth (and thus its code length) by $1$, so the total increase in $\sum f\_i L\_i$ is $\sum\_{\ell\in\text{subtrees}} f\_\ell\cdot 1=(p+q)$ by definition of $p$ and $q$. Summing over all merges yields the final cost:
 
 $$
 \mathbb{E}[L]=\sum_{\text{merges}} (p+q)=\sum_{\text{internal nodes}} \text{weight}.
 $$
 
-**Why is the greedy choice optimal?** In an optimal tree the two deepest leaves must be siblings; if not, pairing them to be siblings never increases any other depth and strictly reduces cost whenever a heavier symbol is deeper than a lighter one (an **exchange argument**: swapping depths changes the cost by \$f\_{\text{heavy}}-f\_{\text{light}}>0\$ in our favor). Collapsing those siblings into a single pseudo-symbol reduces the problem size without changing optimality, so induction finishes the proof. (Ties can be broken arbitrarily; all tie-breaks achieve the same minimum \$\mathbb{E}\[L]\$.)
+**Why is the greedy choice optimal?** In an optimal tree the two deepest leaves must be siblings; if not, pairing them to be siblings never increases any other depth and strictly reduces cost whenever a heavier symbol is deeper than a lighter one (an **exchange argument**: swapping depths changes the cost by $f\_{\text{heavy}}-f\_{\text{light}}>0$ in our favor). Collapsing those siblings into a single pseudo-symbol reduces the problem size without changing optimality, so induction finishes the proof. (Ties can be broken arbitrarily; all tie-breaks achieve the same minimum $\mathbb{E}\[L]$.)
 
-Start with the multiset \${0.40, 0.20, 0.20, 0.10, 0.10}\$. At each line, merge the two smallest weights and add their sum to the running cost.
+Start with the multiset ${0.40, 0.20, 0.20, 0.10, 0.10}$. At each line, merge the two smallest weights and add their sum to the running cost.
 
 ```
 1) merge 0.10 + 0.10 → 0.20        cost += 0.20   (total 0.20)
@@ -757,14 +757,14 @@ Start with the multiset \${0.40, 0.20, 0.20, 0.10, 0.10}\$. At each line, merge
    multiset becomes {1.00}  (done)
 ```
 
-So the optimal expected length is \$\boxed{\mathbb{E}\[L]=2.20}\$ bits per symbol. This already beats the naive fixed-length baseline \$3\$. It also matches the information-theoretic bound \$H(f)\le \mathbb{E}\[L]\<H(f)+1\$, since the entropy here is \$H\approx 2.1219\$.
+So the optimal expected length is $\boxed{\mathbb{E}\[L]=2.20}$ bits per symbol. This already beats the naive fixed-length baseline $3$. It also matches the information-theoretic bound $H(f)\le \mathbb{E}\[L]\<H(f)+1$, since the entropy here is $H\approx 2.1219$.
 
 Now assign actual lengths. Record who merged with whom:
 
-* Step 1 merges \$D(0.10)\$ and \$E(0.10)\$ → those two become siblings.
-* Step 2 merges \$B(0.20)\$ and \$C(0.20)\$ → those two become siblings.
-* Step 3 merges the pair \$D!E(0.20)\$ with \$A(0.40)\$.
-* Step 4 merges the pair from step 3 with the pair \$B!C(0.40)\$.
+* Step 1 merges $D(0.10)$ and $E(0.10)$ → those two become siblings.
+* Step 2 merges $B(0.20)$ and $C(0.20)$ → those two become siblings.
+* Step 3 merges the pair $D!E(0.20)$ with $A(0.40)$.
+* Step 4 merges the pair from step 3 with the pair $B!C(0.40)$.
 
 Depths follow directly (each merge adds one level to its members):
 
@@ -772,7 +772,7 @@ $$
 L_A=2,\quad L_B=L_C=2,\quad L_D=L_E=3.
 $$
 
-Check the Kraft sum \$3\cdot 2^{-2}+2\cdot 2^{-3}=3/4+1/4=1\$ and the cost \$0.4\cdot2+0.2\cdot2+0.2\cdot2+0.1\cdot3+0.1\cdot3=2.2\$.
+Check the Kraft sum $3\cdot 2^{-2}+2\cdot 2^{-3}=3/4+1/4=1$ and the cost $0.4\cdot2+0.2\cdot2+0.2\cdot2+0.1\cdot3+0.1\cdot3=2.2$.
 
 A tidy tree (weights shown for clarity):
 
@@ -790,15 +790,15 @@ A tidy tree (weights shown for clarity):
 
 One concrete codebook arises by reading left edges as 0 and right edges as 1 (the left/right choice is arbitrary; flipping all bits in a subtree yields an equivalent optimal code):
 
-* \$A \mapsto 00\$
-* \$B \mapsto 10\$
-* \$C \mapsto 11\$
-* \$D \mapsto 010\$
-* \$E \mapsto 011\$
+* $A \mapsto 00$
+* $B \mapsto 10$
+* $C \mapsto 11$
+* $D \mapsto 010$
+* $E \mapsto 011$
 
-You can verify the prefix property immediately and recompute \$\mathbb{E}\[L]\$ from these lengths to get \$2.20\$ again. (From these lengths you can also construct the **canonical Huffman code**, which orders codewords lexicographically—useful for compactly storing the codebook.)
+You can verify the prefix property immediately and recompute $\mathbb{E}\[L]$ from these lengths to get $2.20$ again. (From these lengths you can also construct the **canonical Huffman code**, which orders codewords lexicographically—useful for compactly storing the codebook.)
 
 *Complexity*
 
-* Time: \$O(k \log k)\$ using a min-heap over \$k\$ symbol frequencies (each of the \$k-1\$ merges performs two extractions and one insertion).
-* Space: \$O(k)\$ for the heap and \$O(k)\$ for the resulting tree (plus \$O(k)\$ for an optional map from symbols to codewords).
+* Time: $O(k \log k)$ using a min-heap over $k$ symbol frequencies (each of the $k-1$ merges performs two extractions and one insertion).
+* Space: $O(k)$ for the heap and $O(k)$ for the resulting tree (plus $O(k)$ for an optional map from symbols to codewords).

From 2c2560afc3592a94fef27508acf4181f60e394f7 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 21:27:36 +0200
Subject: [PATCH 47/48] Update sorting.md

---
 notes/sorting.md | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index c08067a..51bb779 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -840,12 +840,12 @@ Step 3 (100s): [2  24  45  66  75  90  170  802]
 
 **Complexity**
 
-* **Time Complexity:** \$O(n \cdot k)\$
+* **Time Complexity:** $O(n \cdot k)$
 
-  * \$n\$ = number of elements
-  * \$k\$ = number of digits (or max digit length)
+  * $n$ = number of elements
+  * $k$ = number of digits (or max digit length)
 
-* **Space Complexity:** \$O(n + k)\$ (depends on the stable sorting method used, e.g., Counting Sort).
+* **Space Complexity:** $O(n + k)$ (depends on the stable sorting method used, e.g., Counting Sort).
 
 * For integers with fixed number of digits, Radix Sort can be considered **linear time**.
 

From 25be0113994bca463273168354b6bb59ce454c99 Mon Sep 17 00:00:00 2001
From: Adam Djellouli <37275728+djeada@users.noreply.github.com>
Date: Sun, 31 Aug 2025 22:15:12 +0200
Subject: [PATCH 48/48] Update notes/sorting.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
---
 notes/sorting.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/notes/sorting.md b/notes/sorting.md
index 51bb779..b515b6b 100644
--- a/notes/sorting.md
+++ b/notes/sorting.md
@@ -177,7 +177,7 @@ Sorted! ✅
 
 **Optimizations**
 
-* This makes Bubble Sort’s **best case** much faster ($O(n)$).
+* By keeping track of whether any swaps were made during a pass, Bubble Sort can terminate early if the array is already sorted. This optimization makes Bubble Sort’s **best case** much faster ($O(n)$).
 
 **Stability**