Skip to content

Commit 9c3de6a

Browse files
committed
Convert ```math blocks to $$ for Obsidian compatibility
Replace all ```math fenced code blocks with $$...$$ display math across 12 files. This makes equations render correctly in Obsidian while remaining compatible with GitHub. Also updates the README contribution guidelines to reflect the new convention.
1 parent a65f440 commit 9c3de6a

File tree

12 files changed

+110
-111
lines changed

12 files changed

+110
-111
lines changed

README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,6 @@ Over the past years working in AI/ML, I filled notebooks with intuition first, r
4545
- Suggest topics via GitHub issues.
4646
- PR corrections and better intuition.
4747
- Create SVG images in `../images/` for all diagrams.
48-
- For equations, use ` ```math ` fenced code blocks (NOT `$$`)
49-
- For display math — GitHub escapes `\\` inside `$$`, breaking matrices.
50-
- Inline math `$...$` is fine for simple expressions but move anything with `\\` into a ` ```math ` block.
48+
- For display math, use `$$...$$` blocks.
49+
- Inline math `$...$` is fine for simple expressions.
5150
- Use `\ast` instead of `*` for conjugate/adjoint in inline math.

chapter 01: vectors/05. basis and duality.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,9 @@
3535

3636
- For every basis $\{\mathbf{e}_1, \mathbf{e}_2, \ldots, \mathbf{e}_n\}$, there is a corresponding **dual basis** $\{\mathbf{e}_1^\ast, \mathbf{e}_2^\ast, \ldots, \mathbf{e}_n^\ast\}$. Each dual basis vector extracts exactly one coordinate:
3737

38-
```math
38+
$$
3939
\mathbf{e}_i^\ast(\mathbf{e}_j) = \delta_{ij} = \begin{cases} 1 & \text{if } i = j \\ 0 & \text{if } i \neq j \end{cases}
40-
```
40+
$$
4141

4242
- $\mathbf{e}_1^\ast$ returns 1 when applied to $\mathbf{e}_1$ and 0 for everything else. It perfectly isolates the first coordinate.
4343

chapter 02: matrices/01. matrix properties.md

Lines changed: 24 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -2,27 +2,27 @@
22

33
- At its core, a **matrix** is a rectangular grid of numbers arranged in rows and columns. If a vector is a single list of numbers, a matrix is a table of them.
44

5-
```math
5+
$$
66
A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix}
7-
```
7+
$$
88

99
- You can also think of a matrix as a stack of vectors.
1010

1111
- If a single person is described by the vector $[\text{age}, \text{height}, \text{weight}]$, then three people form a matrix where each row is one person:
1212

13-
```math
13+
$$
1414
\begin{bmatrix} 25 & 170 & 65 \\ 30 & 180 & 80 \\ 22 & 160 & 55 \end{bmatrix}
15-
```
15+
$$
1616

1717
- This matrix has 3 rows and 3 columns, so we call it a $3 \times 3$ matrix.
1818

1919
- Each number in the grid is called an **element** or **entry**, identified by its row and column: $A_{ij}$ is the element in row $i$, column $j$.
2020

2121
- The **transpose** of a matrix flips it along its diagonal, turning rows into columns and columns into rows. If $A$ is $m \times n$, then $A^T$ is $n \times m$.
2222

23-
```math
23+
$$
2424
A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \quad \Rightarrow \quad A^T = \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix}
25-
```
25+
$$
2626

2727
- Multiplying a matrix by its transpose always gives a square matrix: $AA^T$ is $m \times m$ and $A^TA$ is $n \times n$.
2828

@@ -38,15 +38,15 @@ A = \begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \quad \Rightarrow \quad
3838

3939
- For example, the following matrix has rank 2 because neither row is a multiple of the other:
4040

41-
```math
41+
$$
4242
\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}
43-
```
43+
$$
4444

4545
But this matrix has rank 1 because the second row is just twice the first, so it adds no new information:
4646

47-
```math
47+
$$
4848
\begin{bmatrix} 1 & 2 \\ 2 & 4 \end{bmatrix}
49-
```
49+
$$
5050

5151
- A $5 \times 3$ matrix can have rank at most 3. If some rows are just scaled or combined versions of others, the rank drops. A matrix with maximum possible rank is called **full rank**.
5252

@@ -66,17 +66,17 @@ But this matrix has rank 1 because the second row is just twice the first, so it
6666

6767
- The **determinant** of a square matrix is a single number that captures how the matrix scales space. Think of a $2 \times 2$ matrix as transforming a unit square into a parallelogram. The determinant is the area of that parallelogram (with a sign).
6868

69-
```math
69+
$$
7070
\det\begin{bmatrix} a & b \\ c & d \end{bmatrix} = ad - bc
71-
```
71+
$$
7272

7373
![Determinant: the area scaling factor of a linear transformation](../images/determinant.svg)
7474

7575
- For example:
7676

77-
```math
77+
$$
7878
\det\begin{bmatrix} 2 & 1 \\ 0 & 3 \end{bmatrix} = 2 \cdot 3 - 1 \cdot 0 = 6
79-
```
79+
$$
8080

8181
The transformation stretches the unit square into a parallelogram with area 6.
8282

@@ -94,9 +94,9 @@ The transformation stretches the unit square into a parallelogram with area 6.
9494

9595
- For a $2 \times 2$ matrix, the inverse has a direct formula:
9696

97-
```math
97+
$$
9898
\begin{bmatrix} a & b \\ c & d \end{bmatrix}^{-1} = \frac{1}{ad - bc}\begin{bmatrix} d & -b \\ -c & a \end{bmatrix}
99-
```
99+
$$
100100

101101
Notice the determinant in the denominator, which is why singular matrices (determinant zero) have no inverse.
102102

@@ -106,31 +106,31 @@ Notice the determinant in the denominator, which is why singular matrices (deter
106106

107107
- For example, the following matrix has condition number $10^8$. One direction is scaled normally while the other is nearly squashed to zero, so small perturbations along that direction get wildly distorted:
108108

109-
```math
109+
$$
110110
\begin{bmatrix} 1 & 0 \\ 0 & 10^{-8} \end{bmatrix}
111-
```
111+
$$
112112

113113
- Just as vectors have norms (length), matrices have **norms** that measure their "size." The most common is the **Frobenius norm**, which treats the matrix as a long vector and computes its length:
114114

115-
```math
115+
$$
116116
\|A\|_F = \sqrt{\sum_{i}\sum_{j} A_{ij}^2}
117-
```
117+
$$
118118

119119
- For example:
120120

121-
```math
121+
$$
122122
\left\|\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix}\right\|_F = \sqrt{1 + 4 + 9 + 16} = \sqrt{30} \approx 5.48
123-
```
123+
$$
124124

125125
- The **spectral norm** $\|A\|_2$ is the largest singular value of $A$. It measures the maximum amount the matrix can stretch any unit vector. In ML, matrix norms are used for weight regularisation (penalising large weights) and monitoring training stability.
126126

127127
- A symmetric matrix $A$ is **positive definite** if for every non-zero vector $\mathbf{x}$: $\mathbf{x}^T A \mathbf{x} > 0$. This quadratic form always produces a positive number.
128128

129129
- For example, the following matrix is positive definite:
130130

131-
```math
131+
$$
132132
A = \begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix}
133-
```
133+
$$
134134

135135
Pick any vector, say $\mathbf{x} = [1, -1]^T$: $\mathbf{x}^T A \mathbf{x} = 2 - 1 - 1 + 3 = 3 > 0$. No matter which non-zero $\mathbf{x}$ you try, you always get a positive result.
136136

chapter 02: matrices/02. matrix types.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -6,29 +6,29 @@
66

77
- The **identity matrix** $I$ is a square matrix with 1s on the diagonal and 0s everywhere else. It is the "do nothing" transformation: $AI = IA = A$ for any compatible matrix $A$.
88

9-
```math
9+
$$
1010
I = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}
11-
```
11+
$$
1212

1313
- The **zero matrix** $O$ has all elements equal to zero. It maps every vector to the zero vector, destroying all information.
1414

1515
- A **diagonal matrix** is all zeros except on the main diagonal. Multiplying a vector by a diagonal matrix simply scales each component independently, making it very efficient.
1616

17-
```math
17+
$$
1818
D = \begin{bmatrix} 3 & 0 \\ 0 & 7 \end{bmatrix}
19-
```
19+
$$
2020

2121
- A **symmetric matrix** equals its own transpose: $A = A^T$, meaning $A_{ij} = A_{ji}$. Symmetric matrices have the special property that their eigenvectors are always perpendicular to each other. Covariance matrices are always symmetric.
2222

23-
```math
23+
$$
2424
S = \begin{bmatrix} 3 & -1 \\ -1 & 6 \end{bmatrix}
25-
```
25+
$$
2626

2727
- A **triangular matrix** has all zeros on one side of the diagonal. **Lower triangular** has zeros above, **upper triangular** has zeros below. They are essential for solving systems of equations efficiently through forward or back substitution.
2828

29-
```math
29+
$$
3030
L = \begin{bmatrix} 2 & 0 & 0 \\ 1 & 3 & 0 \\ -1 & 2 & 4 \end{bmatrix} \qquad U = \begin{bmatrix} 5 & -1 & 2 \\ 0 & 1 & 3 \\ 0 & 0 & -2 \end{bmatrix}
31-
```
31+
$$
3232

3333
- The determinant of a triangular matrix is simply the product of its diagonal elements.
3434

@@ -50,23 +50,23 @@ L = \begin{bmatrix} 2 & 0 & 0 \\ 1 & 3 & 0 \\ -1 & 2 & 4 \end{bmatrix} \qquad U
5050

5151
- For example, the matrix below moves element 3 to position 1, element 1 to position 2, and element 2 to position 3:
5252

53-
```math
53+
$$
5454
P = \begin{bmatrix} 0 & 0 & 1 \\ 1 & 0 & 0 \\ 0 & 1 & 0 \end{bmatrix}
55-
```
55+
$$
5656

5757
- A **Toeplitz matrix** has the same value along every diagonal (upper-left to lower-right). Notice how each diagonal is constant:
5858

59-
```math
59+
$$
6060
T = \begin{bmatrix} a & b & c \\ d & a & b \\ e & d & a \end{bmatrix}
61-
```
61+
$$
6262

6363
- This structure appears in signal processing and convolution, because sliding a fixed filter across a signal is equivalent to multiplying by a Toeplitz matrix.
6464

6565
- A **circulant matrix** is a special Toeplitz matrix where each row is a cyclic shift of the one above. When a row reaches the end, it wraps around:
6666

67-
```math
67+
$$
6868
C = \begin{bmatrix} 1 & 3 & 2 \\ 2 & 1 & 3 \\ 3 & 2 & 1 \end{bmatrix}
69-
```
69+
$$
7070

7171
- Circulant matrices are closely connected to the discrete Fourier transform (DFT) and are central to how circular convolution works.
7272

@@ -80,31 +80,31 @@ C = \begin{bmatrix} 1 & 3 & 2 \\ 2 & 1 & 3 \\ 3 & 2 & 1 \end{bmatrix}
8080

8181
- A **nilpotent matrix** satisfies $A^k = O$ (the zero matrix) for some power $k$. Apply the transformation enough times and everything collapses to zero. For example:
8282

83-
```math
83+
$$
8484
\begin{bmatrix} 0 & 1 \\ 0 & 0 \end{bmatrix}^2 = \begin{bmatrix} 0 & 0 \\ 0 & 0 \end{bmatrix}
85-
```
85+
$$
8686

8787
- A **Boolean matrix** (or binary matrix) contains only 0s and 1s. It represents yes/no relationships. For example, in a graph with 3 nodes, the **adjacency matrix** records which nodes are connected:
8888

89-
```math
89+
$$
9090
B = \begin{bmatrix} 0 & 1 & 1 \\ 1 & 0 & 0 \\ 1 & 0 & 0 \end{bmatrix}
91-
```
91+
$$
9292

9393
- Here, node 1 connects to nodes 2 and 3, but nodes 2 and 3 are not connected to each other.
9494

9595
- A **Vandermonde matrix** is built from consecutive powers of a set of values. Given values $x_1, x_2, x_3$:
9696

97-
```math
97+
$$
9898
V = \begin{bmatrix} 1 & x_1 & x_1^2 \\ 1 & x_2 & x_2^2 \\ 1 & x_3 & x_3^2 \end{bmatrix}
99-
```
99+
$$
100100

101101
- This structure appears in polynomial interpolation: finding the unique polynomial that passes through a given set of points.
102102

103103
- A **Hessenberg matrix** is "almost" triangular, with zeros below the first subdiagonal:
104104

105-
```math
105+
$$
106106
H = \begin{bmatrix} 4 & 2 & 1 \\ 3 & 5 & -1 \\ 0 & 1 & 6 \end{bmatrix}
107-
```
107+
$$
108108

109109
- It is a useful intermediate form for computing eigenvalues efficiently. Reducing a matrix to Hessenberg form first makes iterative algorithms converge faster.
110110

chapter 02: matrices/03. operations.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4,21 +4,21 @@
44

55
- For addition, both matrices must have the same dimensions, and you add element by element:
66

7-
```math
7+
$$
88
\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} + \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 6 & 8 \\ 10 & 12 \end{bmatrix}
9-
```
9+
$$
1010

1111
- For scalar multiplication, you multiply every element by the scalar:
1212

13-
```math
13+
$$
1414
3 \times \begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} = \begin{bmatrix} 3 & 6 \\ 9 & 12 \end{bmatrix}
15-
```
15+
$$
1616

1717
- The simplest thing you can do with a matrix is multiply it by a vector. **Matrix-vector multiplication** $A\mathbf{x}$ combines the columns of $A$ using the entries of $\mathbf{x}$ as weights:
1818

19-
```math
19+
$$
2020
\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \begin{bmatrix} 5 \\ 6 \end{bmatrix} = 5 \begin{bmatrix} 1 \\ 3 \end{bmatrix} + 6 \begin{bmatrix} 2 \\ 4 \end{bmatrix} = \begin{bmatrix} 17 \\ 39 \end{bmatrix}
21-
```
21+
$$
2222

2323
- This is the core operation in ML. Every neural network layer computes $A\mathbf{x} + \mathbf{b}$: a matrix times an input vector, plus a bias.
2424

@@ -34,9 +34,9 @@ $$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}$$
3434

3535
- A useful special case: multiplying a matrix by its transpose always gives a square matrix. $AA^T$ is $m \times m$ and $A^TA$ is $n \times n$:
3636

37-
```math
37+
$$
3838
\begin{bmatrix} 1 & 2 & 3 \\ 4 & 5 & 6 \end{bmatrix} \begin{bmatrix} 1 & 4 \\ 2 & 5 \\ 3 & 6 \end{bmatrix} = \begin{bmatrix} 14 & 32 \\ 32 & 77 \end{bmatrix}
39-
```
39+
$$
4040

4141
- Matrix multiplication has important rules:
4242

@@ -50,17 +50,17 @@ $$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}$$
5050

5151
- The **Hadamard product** (element-wise product) multiplies two matrices of the same size entry by entry, written $A \odot B$:
5252

53-
```math
53+
$$
5454
\begin{bmatrix} 1 & 2 \\ 3 & 4 \end{bmatrix} \odot \begin{bmatrix} 5 & 6 \\ 7 & 8 \end{bmatrix} = \begin{bmatrix} 5 & 12 \\ 21 & 32 \end{bmatrix}
55-
```
55+
$$
5656

5757
- Unlike standard matrix multiplication, the Hadamard product is commutative ($A \odot B = B \odot A$) and requires both matrices to have the same dimensions. It is used heavily in ML for gating: multiplying element-wise by a mask of values between 0 and 1 controls how much of each entry "passes through."
5858

5959
- The **outer product** of two vectors $\mathbf{u}$ and $\mathbf{v}$ produces a matrix: $\mathbf{u}\mathbf{v}^T$. Each entry is the product of one element from $\mathbf{u}$ and one from $\mathbf{v}$:
6060

61-
```math
61+
$$
6262
\begin{bmatrix} 1 \\ 2 \\ 3 \end{bmatrix} \begin{bmatrix} 4 & 5 \end{bmatrix} = \begin{bmatrix} 4 & 5 \\ 8 & 10 \\ 12 & 15 \end{bmatrix}
63-
```
63+
$$
6464

6565
- The result always has rank 1, because every row is a scaled version of $\mathbf{v}^T$. Any matrix can be written as a sum of rank-1 outer products, which is exactly what SVD does (covered in decompositions).
6666

@@ -74,19 +74,19 @@ $$C_{ij} = \sum_{k=1}^{n} A_{ik} B_{kj}$$
7474

7575
- For example, the matrix:
7676

77-
```math
77+
$$
7878
A = \begin{bmatrix} 5 & 0 & 0 & 2 \\ 0 & 0 & 3 & 0 \\ 0 & 0 & 0 & -1 \end{bmatrix}
79-
```
79+
$$
8080

8181
- Is stored as: values = [5, 2, 3, -1], columns = [0, 3, 2, 3], row offsets = [0, 2, 3, 4]. This skips all the zeros and makes sparse operations much faster.
8282

8383
- A core use of matrices is solving **systems of linear equations**. The system $A\mathbf{x} = \mathbf{b}$ asks: "what vector $\mathbf{x}$, when transformed by $A$, produces $\mathbf{b}$?"
8484

8585
- For example, say you are buying fruit. Apples cost $x_1$ dollars each and bananas cost $x_2$ dollars each. You know that 2 apples and 1 banana cost \$5, and 1 apple and 3 bananas cost \$10. In matrix form:
8686

87-
```math
87+
$$
8888
\begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \end{bmatrix} = \begin{bmatrix} 5 \\ 10 \end{bmatrix}
89-
```
89+
$$
9090

9191
- Multiplying the matrix by the vector row by row (each row dotted with $[x_1, x_2]^T$) gives two equations:
9292

@@ -96,9 +96,9 @@ $$2x_1 + 1x_2 = 5 \qquad \text{(row 1)} \qquad \qquad x_1 + 3x_2 = 10 \qquad \te
9696

9797
- Verify — it checks out:
9898

99-
```math
99+
$$
100100
\begin{bmatrix} 2 & 1 \\ 1 & 3 \end{bmatrix} \begin{bmatrix} 1 \\ 3 \end{bmatrix} = \begin{bmatrix} 2 + 3 \\ 1 + 9 \end{bmatrix} = \begin{bmatrix} 5 \\ 10 \end{bmatrix}
101-
```
101+
$$
102102

103103
- If $A$ has an inverse, the solution is simply $\mathbf{x} = A^{-1}\mathbf{b}$. But computing the inverse directly is expensive and numerically unstable. In practice, we use decompositions instead.
104104

0 commit comments

Comments
 (0)