numerical-analysis/src/11_Eigenvalue_problems.jl at 1a43ab0e0adf4f222c7265dfa5a5af38e2a3554a · epfl-matmat/numerical-analysis · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
### A Pluto.jl notebook ###
# v0.20.6

using Markdown
using InteractiveUtils

# This Pluto notebook uses @bind for interactivity. When running this notebook outside of Pluto, the following 'mock version' of @bind gives bound variables a default value (instead of an error).
macro bind(def, element)
    #! format: off
    return quote
        local iv = try Base.loaded_modules[Base.PkgId(Base.UUID("6e696c72-6542-2067-7265-42206c756150"), "AbstractPlutoDingetjes")].Bonds.initial_value catch; b -> missing; end
        local el = $(esc(element))
        global $(esc(def)) = Core.applicable(Base.get, el) ? Base.get(el) : iv(el)
        el
    end
    #! format: on
end

# ╔═╡ 4949225a-ccf2-11ee-299b-9b834eb6bd42
begin
	using LinearAlgebra
	using PlutoUI
	using PlutoTeachingTools
	using Plots
	using LaTeXStrings
	using HypertextLiteral
	using Printf
end

# ╔═╡ 34beda8f-7e5f-42eb-b32c-73cfc724062e
md"""
!!! info ""
    [Click here to view the PDF version.](https://teaching.matmat.org/numerical-analysis/11_Eigenvalue_problems.pdf)
"""

# ╔═╡ 13298dc4-9800-476d-9474-182359a7671b
TableOfContents()

# ╔═╡ 1980cffb-4b56-4b66-8100-a730da0c89f5
TODO(md"""If we do this chapter *after* boundary value problems, we can actually motivate eigenvalue problems from solving an equation like $-\Delta u = f$ in a bounded domain (discretised e.g. using sine functions). This is nice because (a) it is related to the Resonance phaenomena equations, that lead to the resonance catastrophe in bridges (if $f$ hits an eigenvector of the laplacian) and (b) it makes the whole eigenvalue problems better embedded into the rest of the course and as an "application". """)

# ╔═╡ a138fb39-aae0-41b4-bd7b-d2f7eaad7a53
md"""
# Eigenvalue problems

Recall that the eigenpairs of a matrix $\mathbf A$ are the pairs $(λ_i, \mathbf{v}_i)$
of eigenvalues $λ_i$ and eigenvectors $\mathbf{v}_i$ such that
```math
\mathbf{A} \mathbf{v}_i = λ_i \mathbf{v}_i.
```
Geometrically speaking the eigenvectors provide special directions in space along which forming the matrix-vector-product $\mathbf{A}\mathbf{v}_i$ is particularly simple, namely it just *scales* the vector $\mathbf{v}_i$ by a number.

But more generally if $\mathbf x$ is an arbitrary vector
and if for simplicity we assume $\mathbf{A}$ to be symmetric
and positive definite,
then we find
```math
\| \mathbf A \mathbf x \| \leq \| \mathbf A \| \, \| \mathbf x \| \leq
\sqrt{λ_\text{max}(\mathbf A^T \mathbf A)} \, \| \mathbf x \|
= λ_\text{max}(\mathbf A) \, \| \mathbf x \|
```
where we used the inequalities introduced at the end of  [Direct methods for linear systems](https://teaching.matmat.org/numerical-analysis/05_Direct_methods.html).
We note that the **largest eigenvalue of $\mathbf A$**
provides a **bound to the action of $\mathbf{A}$**.
"""

# ╔═╡ a702d4b6-e70c-417d-ac98-92c534d52770
md"""
This may sound technical, but as **matrices are common
in physics and engineering**
and since their **eigenpairs characterise the action of these matrices**,
the computation of
eigenpairs often carries a **physical interpretation**.

For example, in the classical mechanics of rotating objects, the eigenvectors of the **[Moment of inertia](https://en.wikipedia.org/wiki/Moment_of_inertia)** tensor are the **principle axes** along which an object spins without coupling to other rotational degrees of freedom.

In engineering the **eigenvalues of the Hessian matrix (the Laplacian)**
of an object's total energy describe the **resonance frequences**.
That is the frequencies at which the object best absorbs energy from an outside excitation. When these frequencies coincide
with the motion of humans or cars this can lead to a [Resonance disaster](https://en.wikipedia.org/wiki/Mechanical_resonance#Resonance_disaster), e.g. the famous [Tacoma Narrows Bridge Collapse](https://en.wikipedia.org/wiki/Tacoma_Narrows_Bridge_(1940)#Collapse) or the [Millenium Bridge Lateral Excitation Problem](https://en.wikipedia.org/wiki/Millennium_Bridge,_London#Resonance)).
Analysing the eigenfrequencies of bridges is nowadays required as part of the procedure to obtain the permissons for construction.
"""

# ╔═╡ 73889935-2e4c-4a15-a281-b155cf1ca1c9
md"""
In this notebook we will discuss some simple iterative methods for actually computing  eigenpairs. However, the topic is vast and we will only scratch the surface.
Readers interested in a more in-depth treatment
of eigenvalue problems are encouraged to attend the master class
[MATH-500: Error control in scientific modelling](https://teaching.matmat.org/error-control/).
Some recommended further reading can also be found in the book [Numerical Methods for Large Eigenvalue Problems](https://epubs.siam.org/doi/book/10.1137/1.9781611970739) by Youssef Saad as well as the [Lecture notes on Large Scale Eigenvalue Problems](https://people.inf.ethz.ch/arbenz/ewp/Lnotes/lsevp.pdf) by Peter Arbenz.
"""

# ╔═╡ 71d74b6a-96b2-4855-b20f-59b00c8f560b
md"""
## Power iteration

We start with a simple question: What happens if we apply a matrix multiple times ?
Here, we choose the matrix
"""

# ╔═╡ 6db511ac-a217-49e2-a508-2590b64ea636
A = [ 0.5  1/3;
      0.5  2/3]

# ╔═╡ 814e602d-1afa-44ba-bb25-ed32dd66e2b9
md"""
and a random 2-element vector:
"""

# ╔═╡ 58a88c32-cdf6-4cb9-af08-0ef9a186fe1c
md"Applying the matrix once seems to be very innocent:"

# ╔═╡ 7da94235-399e-4c13-8a02-25d8fa795e17
md"""
But if we appy many times we start to see something ...
"""

# ╔═╡ 5a4038ac-1905-42e5-88ee-ced49c5375ad
@bind dummy Button("Regenerate random vector and rerun experiment")

# ╔═╡ 0dabc016-9cb8-4d1b-bd44-fb9493b8cf28
begin
	dummy  # For rerun to work
	x = randn(2)
end

# ╔═╡ aa47c74d-2202-413a-9cad-17cad20832e1
x

# ╔═╡ 5d990087-901a-4457-a9ec-ba3b2d60f470
A * x

# ╔═╡ 272e7bce-e05e-412f-b2f5-2742d33d0b6f
begin
	history = [x]
	for j in 1:6
		x = A * x
		push!(history, x)
		@printf "Iteration %i:  x = [%12.4f, %12.4f]\n" j x[1] x[2]
	end
	maxerror = maximum(abs, (x - A * x))
	@show maxerror
end;

# ╔═╡ 2dbe6cbf-6b27-49c1-b28c-e03a9b11e09d
let
	l = norm(history[1])
	p = plot(xlims=(-l, l), ylims=(-l, l), legend=:topleft)
	for (i, x) in enumerate(history)
		plot!([0, x[1]], [0, x[2]], arrow=true, label="Iteration $(i-1)", lw=2,
			  colour=cgrad(:speed)[i/length(history)])
	end

	ev = [0.5547, -0.83205]
	plot!([-3ev[1], 3ev[1]], [3ev[2], -3ev[2]], ls=:dash, lw=1.5, c=:grey, label="final direction")
	p
end

# ╔═╡ db0c875b-9955-4bc1-bdb1-ec2e1a44942d
md"""Note how the iterations stabilise, i.e. that $\textbf{x}$ and $\textbf{A} \textbf{x}$ start to be alike. In other words we seem to achieve $\mathbf{A} \mathbf{x} = \mathbf{x}$, which is nothing else than saying that $\mathbf{x}$ is an eigenvector of $\mathbf{A}$ with eigenvalue $1$.
"""

# ╔═╡ 73c62d23-ac2a-48be-b3c7-0d51ffce773c
md"""
Let us understand what happened in this example in detail. We consider the case $\mathbf{A} \in \mathbb{R}^{n \times n}$ diagonalisable and let further
```math
\tag{1}
|λ_1| ≤ |λ_2| ≤ |λ_3| ≤ \cdots ≤ |λ_{n-1}| \textcolor{red}{<} |λ_n|
```
be its eigenvalues with corresponding eigenvectors $\mathbf v_1$, $\mathbf v_2$, $\ldots, \mathbf v_n$,
which we collect column-wise in a unitary matrix $\mathbf{V}$, i.e.
```math
\mathbf{V} = \begin{pmatrix} \mathbf v_1 & \mathbf v_2 & \cdots & \mathbf v_n \\
				\downarrow & \downarrow & \cdots & \downarrow
             \end{pmatrix}
```
Note that $|λ_{n-1}| < |λ_n|$, such that $λ_n$ is non-degenerate and the absolutely largest eigenvalue. We call it the **dominant eigenvalue** of $\mathbf{A}$.
If $k > 0$ is a positive integer, then we have
```math
\mathbf{A}^k = \left(\mathbf{V} \, \mathbf{D}\, \mathbf{V}^{-1}\right)^k = \mathbf{V} \, \mathbf{D}^k \,\mathbf{V}^{-1}
```
and we can expand any random starting vector $\mathbf{x}^{(1)}$ in terms of the eigenbasis
of $\mathbf A$, i.e.
```math
\mathbf{x}^{(1)} = \sum_{i=1}^n z_i \mathbf v_i = \mathbf{V} \mathbf{z},
```
where $\mathbf{z}$ is the vector collecting the coefficients $z_i$.
Notably this implies $\mathbf z = \mathbf{V}^{-1} \mathbf{x}^{(1)}$.
"""

# ╔═╡ 16d6559c-977b-4fd9-aecb-2b6f61290f04
md"""
Consider applying $\mathbf{A}$ a $k$ number of times to $\mathbf{x}^{(1)}$.
This yields
```math
\tag{2}
\begin{aligned}
\mathbf{A}^k \mathbf{x}^{(1)} &= \mathbf{V} \, \mathbf{D}^k \,\underbrace{\mathbf{V}^{-1} \mathbf{x}^{(1)}}_{= \mathbf{z}} = \mathbf{V} \begin{pmatrix}
λ_1^k z_1 \\
λ_2^k z_2 \\
\vdots\\
λ_n^k z_n
\end{pmatrix} \\
&= λ_n^k \left( \left( \frac{λ_1}{λ_n} \right)^k z_1 \mathbf v_1 + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^k z_{n-1} \, \mathbf v_{n-1} + z_n \mathbf v_{n}   \right).
\end{aligned}
```
If $z_n \neq 0$ then
```math
\tag{3}
\left\| \frac{\mathbf{A}^k \mathbf{x}^{(1)}}{λ_n^k} - z_n \textbf v_n \right\| ≤
\left|\frac{λ_1}{λ_n}\right|^k \, |z_1| \, \|\textbf v_1\| +
\cdots
+ \left|\frac{λ_{n-1}}{λ_n}\right|^k \, |z_{n-1}| \, \|\textbf v_{n-1}\|
```
A consequence of the ascending eigenvalue ordering of equation (1) is that
```math
\left|\frac{λ_{j}}{λ_n}\right| < 1 \qquad \text{for all $j = 1, \ldots, n-1$}
```
and therefore that each of the terms $\left|\frac{λ_j}{λ_n}\right|^k$
for $j = 1, \ldots, {n-1}$ goes to zero for $k \to \infty$.
Therefore overall
```math
\left\| \frac{\mathbf{A}^k \mathbf{x}^{(1)}}{λ_n^k} - z_n \textbf v_n \right\|
\to 0 \qquad \text{as $k \to \infty$}.
```
In other words $\frac{1}{λ_n^k} \mathbf{A}^k \mathbf{x}^{(1)}$ **eventually becomes a multiple of the eigenvector** associated with the dominant eigenvalue.
"""

# ╔═╡ 7fc2ff92-bd21-498e-8e1e-4e8a18355501
md"""
!!! danger "Conventions of eigenpair ordering"
	In the literature as well as across linear algebra codes
	there are multiple conventions regarding the ordering
	of eigenvalues. E.g. whether eigenvalues are ordered from
	smallest to largest, from largest to smallest, whether
	one considers the absolute value of the eigenvalues or
	the signed values etc.
	Wherever this is possible and makes sense we will employ the
	ordering
	```math
	|λ_1| ≤ |λ_2| ≤ |λ_3| ≤ \cdots ≤ |λ_n|
	```
	i.e. that eigenvalues increase in magnitude,
	but their signs may differ.
"""

# ╔═╡ cd72b974-37ab-40ad-b9a6-e930efa3e767
md"""
### Power iteration algorithm
Let's try applying our idea from the earlier section to the matrix
"""

# ╔═╡ a3377f9f-7fba-4e6b-b014-e85de15ca3f0
md"""
- `λₙ = ` $(@bind λₙ Slider([0.2, 1.0, 5.0, 10.0]; default=5.0, show_value=true))
"""

# ╔═╡ 612948f2-abd5-4350-9327-bbfa51cebc57
B = [0.1 5.0;
	 0.0 λₙ]

# ╔═╡ bfefdb2f-f42a-4867-9a5e-94d22226645b
md"""which has eigenvalues $0.1$ and `λₙ = ` $λₙ. The latter is the dominant one, which can further be changed by this slider:"""

# ╔═╡ 1e461d1e-bcc7-4c9d-8398-f87a51311fdd
md"We again run 6 subsequent applications of $\mathbf B$ and hope the iterations to stabilise:"

# ╔═╡ d4b22b38-04f7-44a7-834b-1bdbad64182d
let
	x = randn(2)
	for j in 1:6
		x = B * x
		@printf "Iteration %i:  x = [%12.4f, %12.4f]\n" j x[1] x[2]
	end
end;

# ╔═╡ 56352bf0-5837-4c38-89c1-62a940e80b3c
md"""
But this time this does not work ... unless `λₙ` happens to be 1.0.

- This can be understood looking at equation (2): if $λ_n > 1.0$,
  then $λ_n^k$ becomes extremely large, such that $\mathbf{A}^k \mathbf{x}^{(1)}
  ≈ λ_n^k z_n \mathbf v^k$, which grows significantly from one iteration to the next,
  such that no stabilisation is achieved.
  Similarly if $λ_n < 1.0$ then as $k\to \infty$ the $λ_n^k$ becomes smaller
  and smaller.
- Apart from not converging, this also poses difficulties from a numerical point of view, since accurately representing the vector entries of $\mathbf A^k \mathbf{x}$ and performing the associated matrix-vector products becomes more and more difficult the larger the range of numbers that have to be represented.


Naively one could take a look at (3) and just **normalise in each iteration** by dividing by $λ_n$. And indeed this works:
"""

# ╔═╡ 60bbeaee-dc3a-4c44-b016-02d8b1b5623b
let
	x = randn(2)
	for j in 1:6
		x = x / λₙ   # Normalise
		x = B * x
		@printf "Iteration %i:  x = [%12.4f, %12.4f]\n" j x[1] x[2]
	end
end;

# ╔═╡ b728388d-e6e4-475a-adce-149b2b53a1d4
md"""
The problem here is that for general problems **we don't know $λ_n$**,
so we cannot use it for normalisation.
Fortunately it turns out, however, that pretty much any normalisation of $\mathbf x$ works. For example we can use the infinity norm:
```math
\|x\|_\infty = \max_{i=1,\ldots n} |x_i|,
```
which simply selects the largest absolute entry of the vector.
Again this works as expected:
"""

# ╔═╡ 2425ae53-d861-47f7-9f86-750cf4f363c1
let
	x = randn(2)
	for j in 1:6
		x = x / maximum(abs.(x))  # Normalise
		x = B * x
		@printf "Iteration %i:  x = [%12.4f, %12.4f]\n" j x[1] x[2]
	end
	@printf "Estimate for eigenvalue: %12.6f" maximum(abs.(x))
end;

# ╔═╡ a11825e7-9ee5-416c-b170-edd1c5eb746c
md"We also note in passing that $\|x\|_\infty$ seems to converge to the dominant eigenvalue."

# ╔═╡ b69a8d6c-364a-4951-afd9-24588ac10b64
md"""
Note that $\|x\|_\infty = \max_{i=1,\ldots n} |x_i|$
implies that there exists an index $m \in \{1,\ldots n\}$,
such that $\|x\|_\infty = |x_m|$.

Keeping this in mind we formulate the algorithm

!!! info "Algorithm 1: Power iterations"
    Given a diagonalisable matrix $\mathbf A \in \mathbb R^{n\times n}$
    and an initial guess $\mathbf x^{(1)} \in \mathbb R^n$ we iterate
    for $k = 1, 2, \ldots$:
    1. Set $\mathbf y^{(k)} = \mathbf A \, \mathbf x^{(k)}$
    2. Find the index $m$ such that $\left|y_m^{(k)}\right| = \left\|y^{(k)}\right\|_\infty$
    3. Compute $α^{(k)} = \frac{1}{y^{(k)}_m}$ and set $β^{(k)} = \frac{y^{(k)}_m}{x^{(k)}_m}$ *(see below why)*
    4. Set $\mathbf x^{(k+1)} = α^{(k)} \mathbf y^{(k)}$ *(Normalisation)*

    We obtain $β^{(k)}$ as the estimate to the dominant eigenvalue
    and $\mathbf x^{(k)}$ as the estimate of the corresponding eigenvector.
"""

# ╔═╡ 4ebfc860-e179-4c76-8fc5-8c1089301078
md"""
In this algorithm $α^{(k)} = \frac{1}{y^{(k)}_m} = \frac{1}{\| y^{(k)} \|_\infty}$.
Step 4 is thus performing the normalisation we developed above.

Furthermore $β^{(k)}$ is now computed as the eigenvalue estimate
instead of $\| x \|_\infty$. The idea is
that if $\textbf{x}^{(k)}$ is already close to the eigenvector $\mathbf v_n$
associated to the dominant eigenvalue $λ_n$, then
```math
\mathbf{y}^{(k)} = \mathbf A\, \mathbf{x}^{(k)} \approx \mathbf A\, \mathbf{v}_n = λ_n \mathbf{x}^{(k)}
\quad \Rightarrow \quad
\mathbf{y}^{(k)}_m \approx λ_n \mathbf{x}^{(k)}_m
\quad \Rightarrow \quad
λ_n \approx \frac{\mathbf{y}^{(k)}_m}{\mathbf{x}^{(k)}_m} = β^{(k)}
```

An implementation of this power method algorithm is:
"""

# ╔═╡ 01186225-602f-4637-99f2-0a6dd569a703
function power_method(A, x; maxiter=100)
    n = size(A, 1)
    x = normalize(x, Inf)  # Normalise initial guess

	history = Float64[]  # Record a history of all βs (estimates of eigenvalue)
    for k in 1:maxiter
        y = A * x
        m = argmax(abs.(y))
		α = 1 / y[m]
		β = y[m] / x[m]
		push!(history, β)
        x = α * y
    end

	(; x, λ=last(history), history)
end

# ╔═╡ 25f2b565-cc6a-4919-ac82-37d6dd62ba16
md"""
Let's try this on a lower-triangular matrix as a test. Note that the eigenvalues of a lower-triangular matrix are exactly the diagonal entries. We set
"""

# ╔═╡ 22b9c8b5-02cd-481e-aca7-bb2fe4e85c3c
begin
	λref = [1, -0.75, 0.6, -0.4, 0]
	# Make a triangular matrix with eigenvalues on the diagonal.
	M = triu(ones(5,5),1) + diagm(λref)
end

# ╔═╡ 81a6721f-f7e5-42c9-b389-518a4c458ae9
md"""
By construction the largest eigenvalue of this matrix is $1$. So let's track convergence:
"""

# ╔═╡ 033b6a9b-2187-4053-bc69-bbc1f03494fa
begin
	λlargest = 1.0  # Largest eigenvalue of M (by construction)

	x1 = ones(size(M, 2))
	results = power_method(M, x1; maxiter=70)
	error = abs.(results.history .- λlargest)
	plot(error; mark=:o, label="", yaxis=:log,
	     title="Convergence of power iteration",
	     ylabel=L"|β^{(k)} - λ_{\textrm{ref}}|", xlabel=L"k")
end

# ╔═╡ 25836ba5-5850-44d5-981b-3cc4dd5c7fdc
md"""
We want to demonstrate why $β^{(k)} \to λ_n$ as $k\to\infty$ more rigorously.

Let us first note that for our algorithm
```math
\tag{4}
\textbf{x}^{(k)} = α^{(1)} \cdot α^{(2)} \cdot \, \cdots \, \cdot \, α^{(k)} \cdot \, \textbf{A}^k \textbf x^{(1)}
= \prod_{i=1}^k α^{(i)} \cdot \mathbf A^k \mathbf x^{(1)}
```
and by construction we always have $\left\| \textbf{x}^{(k)} \right\|_\infty = 1$.
We further note
```math
\begin{aligned}
β^{(k)} &= \frac{y^{(k)}_m}{x^{(k)}_m}
= \frac{\left(\mathbf{A}\, \mathbf{x}^{(k)}\right)_\textcolor{red}{m}}{x^{(k)}_m}\\
&\stackrel{(4)}{=} \frac{
\left( \prod_{i=1}^k α^{(i)} \cdot \mathbf A^{k+1} \mathbf x^{(1)} \right)_\textcolor{red}{m}
}{
\left( \prod_{i=1}^k α^{(i)} \cdot \mathbf A^k \mathbf x^{(1)} \right)_\textcolor{red}{m}
}\\
&\stackrel{(2)}{=}
\frac{
λ_n^{k+1} \left(
\left( \frac{λ_1}{λ_n} \right)^{k+1} z_1 \mathbf v_1 + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k+1} z_{n-1} \, \mathbf v_{n-1} + z_n \mathbf v_{n}
\right)_\textcolor{red}{m}
}{
λ_n^{k} \left(
\left(\frac{λ_1}{λ_n} \right)^{k} z_1 \mathbf v_1 + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k} z_{n-1} \, \mathbf v_{n-1} + z_n \mathbf v_{n}
\right)_\textcolor{red}{m}
}\\
&=
\frac{
λ_n \left(
\left( \frac{λ_1}{λ_n} \right)^{k+1} b_1 + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k+1} b_{n-1} + b_{n}
\right)
}{
\left(
\left(\frac{λ_1}{λ_n} \right)^{k} b_1 + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k} b_{n-1} + b_{n}
\right)
}
\end{aligned}
```
where $_\textcolor{red}{m}$ denotes that we take the $m$-th element of the vector in the brackets
and we further defined $b_i = z_i \left(\mathbf{v}_i\right)_m$, that is $z_i$ multiplied by the $m$-th element of the eigenvector $\mathbf{v}_i$.
Again we assume $z_n \neq 0$, which implies $b_n \neq 0$.
Under this assumption
```math
\tag{5}
β^{(k)} = \frac{y^{(k)}_m}{x^{(k)}_m}=
λ_n \, \frac{
\left( \frac{λ_1}{λ_n} \right)^{k+1} \frac{b_1}{b_n} + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k+1} \frac{b_{n-1}}{b_n} + 1
}{
\left( \frac{λ_1}{λ_n} \right)^{k} \frac{b_1}{b_n} + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k} \frac{b_{n-1}}{b_n} + 1
}
\to λ_n \qquad \text{as $k\to\infty$}.
```
Keeping in mind that $\frac{λ_j}{λ_n} < 1$ for all $1 ≤ i ≤ n-1$ we indeed
observe $β^{(k)}$ to converge to $λ_n$.
"""

# ╔═╡ cc3495af-e53c-4ee8-8ec0-f4b597e17f53
Foldable("A remark on zₙ ≠ 0",
md"""
In the theoretical discussions so far we always assumed $z_n ≠ 0$,
i.e. that the initial guess $\mathbf{x}^{(1)}$ has a non-zero component along
the eigenvector $\mathbf v_n$ of the dominant eigenvalue. This may not
always be the case. However even if $z_n ≠ 0$,
computing $\mathbf{A} \mathbf{x}^{(1)}$
in finite-precision floating-point arithmetic is associated with
a small error, which means that
$\mathbf{A} \mathbf{x}^{(1)}$ and thus
$\mathbf{x}^{(2)}$ does usually
*not* have a zero component along $\mathbf v_n$:
Assuming $z_n ≠ 0$ is not a restriction for practical calculations.
""")

# ╔═╡ c1b01e10-235c-4424-8524-0836a07106ff
md"""
### Convergence of power method

The above plot already suggests a **linear convergence** towards
the exact eigenvalue. Indeed a detailed analysis shows that the convergence rate
can be computed as
```math
\tag{6}
r_{\textrm{power}} = \lim_{k\to\infty} \frac{|β^{(k+1)} - λ_n|}{|β^{(k)} - λ_n|}
= \left|\frac{λ_{n-1}}{λ_n}\right|.
```
"""

# ╔═╡ 38015a4f-17af-47c5-a5b6-5e19c429c841
details("Optional: Detailed derivation",
md"""
To show the rate $r_{\textrm{power}}$ we revisit equation (5)
```math
β^{(k)} =
λ_n \, \frac{
\left( \frac{λ_1}{λ_n} \right)^{k+1} \frac{b_1}{b_n} + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k+1} \frac{b_{n-1}}{b_n} + 1
}{
\left( \frac{λ_1}{λ_n} \right)^{k} \frac{b_1}{b_n} + \cdots + \left( \frac{λ_{n-1}}{λ_n} \right)^{k} \frac{b_{n-1}}{b_n} + 1
}
```
and introduce the shorthand notation
```math
B_i = b_i/b_n \qquad\text{and}\qquad
r_i = λ_i/λ_n
```
such that
```math
β^{(k)} =
λ_n \frac{
r_1^{k+1} B_1 + r_2^{k+1} B_2 + \cdots + r_{n-1}^{k+1} B_{n-1} + 1
}{
r_1^{k}   B_1 + r_2^{k}   B_2 + \cdots + r_{n-1}^{k}   B_{n-1} + 1
}
```
We will make the *additional assumption* $|λ_{n-2}| < |λ_{n-1}|$,
such that $|λ_{n-2} / λ_{n-1}| < 1$.
We could also complete this development without this assumption,
but the resulting expressions are involved.
Now
```math
\begin{aligned}
r_1^{k}   B_1 &+ r_2^{k}   B_2 + \cdots + r_{n-1}^k   B_{n-1} + 1\\
&= 1 + r_{n-1}^k \left[
  \left(\frac{λ_1}{λ_{n-1}}\right)^k B_1
+ \cdots
+ \left(\frac{λ_{n-2}}{λ_{n-1}}\right)^k B_{n-2}
+ B_{n-1}
\right]
\end{aligned}
```
clearly approaches $1 + r_{n-1}^k B_{n-1}$ as $k \to \infty$.
Noting for $k \to \infty$ that
```math
\frac{1}{1 + r_{n-1}^k B_{n-1}} = 1 - r_{n-1}^k B_{n-1} + O(r_{n-1}^{2k})
```
we can conclude in this limit
```math
\begin{aligned}
β^{(k)} - λ_n &= λ_n \left(1 +  r_{n-1}^{k+1} B_{n-1}\right)
\left[ 1 - r_{n-1}^k B_{n-1} + O(r_{n-1}^{2k}) \right] - λ_n\\
&= (r_{n-1} - 1) r_{n-1}^k B_{n-1} + O(r_{n-1}^{2k}).
\end{aligned}
```
This is indeed linear convergence with convergence rate
```math
\frac{|β^{(k\textcolor{red}{+1})} - λ_n|}{|β^{(k)} - λ_n|}
\to \frac{|(r_{n-1} - 1) r_{n-1}^{k\textcolor{red}{+1}} B_{n-1}|}{|(r_{n-1} - 1) r_{n-1}^{k} B_{n-1}|} = |r_{n-1}| = \left|\frac{λ_{n-1}}{λ_n}\right|
\quad \text{as $k \to \infty$}
```
""")

# ╔═╡ d6a56c08-374a-4665-9968-d538fe2c4c59
md"""
So the smaller the ratio between $|λ_{n-1}|$ and $|λ_n|$,
the faster the convergence.
"""

# ╔═╡ a8dfd1bc-5146-4218-9fad-0676c7406f16
md"and we will put these on the diagonal of a triangular matrix, such that the eigenvalues of the resulting matrix $M$ are exactly the $λ_δ$:"

# ╔═╡ 385b07d6-15ef-40ee-8d2e-b3b6505a4171
md"""
We introduce a slider to tune the $λ_{n-1}$ of equation (6):
- `λₙ₋₁ = `$(@bind λₙ₋₁ Slider(-0.6:-0.05:-0.95; show_value=true, default=-0.9) )
"""

# ╔═╡ 79afdc06-a39e-4187-b91d-0af65a62613b
md"Let us take a case where $λ_n = 1.0$. Therefore the size of $λ_{n-1}$ determines the rate of convergence. Let's take $λ_{n-1}$ = $(round(λₙ₋₁; sigdigits=2))."

# ╔═╡ 59b21a54-5cac-4e7f-b800-89c1f5f2d01f
begin
	λ_δ = [0.0, -0.4, 0.6, λₙ₋₁, 1.0]   # The reference eigenvalues we will use
end

# ╔═╡ e5bf05e5-b064-4e43-b64b-22e5e45ddcd0
M_δ = triu(ones(5, 5), 1) + diagm(λ_δ)

# ╔═╡ 5d4697b4-2f3e-452b-afb8-78443efa1f25
let
	λlargest = maximum(abs.(λref))  # Largest eigenvalue of M_δ (by construction)

	x1 = ones(size(M_δ, 2))
	results = power_method(M_δ, x1; maxiter=40)
	error = abs.(results.history .- λlargest)
	p = plot(error; mark=:o, label="", yaxis=:log, ylims=(1e-10, 3),
	     title="Convergence of power iteration",
	     ylabel=L"|β^{(k)} - λ_{\textrm{ref}}|",
	     xlabel=L"k", legend=:bottomleft, lw=2)

	λₙ = 1.0
	r_power = abs(λₙ₋₁ / λₙ)
	plot!(p, k -> r_power^k; ls=:dash, label=L"expected rate $| λ_{n-1} / λ_n|$", lw=2)
end

# ╔═╡ 0e4a443b-e99d-436f-8151-d6cfed9b773b
md"""
## Spectral transformations

The power method provides us with a simple algorithm to compute the *largest* eigenvalue of a matrix and its associated eigenvector.
But what if one actually wanted to compute the smallest eigenvalue
or an eigenvalue somewhere in the middle ?

In this section we discuss an extension to power iteration,
which makes this feasible.
We only need a small ingredient from linear algebra
the **spectral transformations**.

We explore based on a few examples. Consider
"""

# ╔═╡ 4fbed8eb-a929-4afb-a971-d37a75377d8f
Ashift = [ 0.4 -0.6 0.2;
	      -0.3 0.7 -0.4;
	      -0.1 -0.4 0.5]

# ╔═╡ 8290d531-bad1-40b8-a1e9-49cb3a9dae4c
md"Its eigenvalues and eigenvectors are:"

# ╔═╡ 506145ea-399a-4f22-bae3-25257fd0b154
TODO("Maybe pick a matrix with simpler eigenvalues")

# ╔═╡ 2f97e0aa-e9b1-4bdf-bc81-968a3217b2e6
eigen(Ashift)

# ╔═╡ ad005fc4-04fd-400a-9113-757e52488beb
md"Now we add a multiple of the identity matrix, e.g.:"

# ╔═╡ 2d5eabf9-ba09-43f8-8ee6-9b35bd826640
σ = 2  # Shift

# ╔═╡ 2dbd51f3-6629-456d-9f45-bf6aeea07be4
eigen(Ashift + σ * I)  # Add 2 * identity matrix

# ╔═╡ b0a4cf39-749d-470f-8a25-f59cda1a740c
md"Notice, how the eigenvectors are the same and only the eigenvalues have been shifted by $σ$.

Similarly:
"

# ╔═╡ 850f1909-b5ab-402a-b4f4-d11decc4d457
let
	A⁻¹ = inv(Ashift + σ * I)
	eigen(A⁻¹)
end

# ╔═╡ a2f76acd-0f52-4d39-8c4a-c106c8db25ca
md"Notice how this matrix still has the same eigenvectors (albeit in a different order) and the *inverted* eigenvalues of $\mathbf A + σ \mathbf I$."

# ╔═╡ 0775cca6-48e4-49eb-b63b-8e183ae2e2ed
1 ./ eigvals(Ashift + σ * I)

# ╔═╡ b25fb4e5-127b-4939-a44b-001cc68ea811
md"""
We want to formalise our observation more rigourously.
Recall that $λ_i$ is an eigenvalue of $\mathbf A$ with (non-zero) eigenvector $\mathbf x_i$ if and only if
```math
\tag{7}
\mathbf{A} \mathbf x_i = λ_i \mathbf x_i.
```
We usually refer to the pair $(λ_i, \mathbf x_i)$ as an **eigenpair** of $\mathbf A$.

The statement of the **spectral transformations** is:
"""

# ╔═╡ c19c7118-37e5-4bfd-adc9-434a96358abd
md"""
!!! info "Theorem 1: Spectral transformations"
    Assume $\mathbf A \in \mathbb R^{n \times n}$ is diagonalisable.
	Let $(λ_i, \mathbf x_i)$ be an eigenpair of $\mathbf A$, then
	  1. If $\mathbf A$ is invertible, then $(\frac{1}{λ_i}, x_i)$ is an eigenpair of $A^{-1}$.
	  2. For every $σ \in \mathbb{R}$ we have that
         $(λ_i - σ, x_i)$ is an eigenpair of $\mathbf A - σ \mathbf I$.
	  3. If $\mathbf A - σ \mathbf I$ is invertible, then $(\frac{1}{λ_i - σ}, x_i)$ is an eigenpair of $(\mathbf A - σ \mathbf I)^{-1}$.
"""

# ╔═╡ f5014b54-79ea-4506-8ddf-8262bd09d9f9
md"""
> **Proof:**
> The result follows from a few calculations on top of (7). We proceed
> in order of the statements above.
>
> 1. Let $(λ_i, \mathbf x_i)$ be an arbitrary eigenpair of $\mathbf A$.
>    Since $\mathbf A$ is invertible by assumption, $λ_i \neq 0$.
>    Therefore for all eigenvalues $λ_i$ of $\mathbf A$ the fraction
>    $\frac{1}{λ_i}$ is meaningful and we can show:
>    ```math
>    \frac{1}{λ_i} \mathbf x_i = \frac{1}{λ_i} \mathbf I \mathbf x_i
>    = \frac{1}{λ_i} \left(\mathbf A^{-1} \mathbf{A} \right) \mathbf x_i
>    = \frac{1}{λ_i} \mathbf A^{-1} \left(\mathbf{A} \mathbf x_i \right)
>    \stackrel{(6)}{=} \mathbf A^{-1} \mathbf x_i,
>    ```
>    which indeed is the statement that $(\frac{1}{λ_i}, x_i)$ is an eigenpair of $A^{-1}$.
>
> 2. The argument is similar to 1 and we leave it as an exercise.
>
> 3. This statement follows by combining statements 1. and 2.
"""

# ╔═╡ 5ea67b6d-a297-403e-bb3e-e69f647f90a8
md"""
!!! tip "Exercise 1"
    Assume $\mathbf A \in \mathbb R^{n \times n}$ is diagonalisable.
    Prove statement 2. of Theorem 1, that is for all $σ \in \mathbb{R}$:
    If $(λ_i, \mathbf x_i)$ is an eigenpair of $\mathbf A$,
    then $\mathbf x_i$ is also an eigenvector
    of $\mathbf A - σ \mathbf I$ with eigenvalue $λ_i - σ$.
"""

# ╔═╡ 9b32cc52-b74d-49fe-ba3f-192f5efc4ef2
md"""
Consider point 1. of Theorem 1 and assume that $\mathbf A$
has a *smallest* eigenvalue
```math
0 < |λ_1| < |λ_2| ≤ |λ_3| ≤ \cdots ≤ |λ_n|
```
then $\mathbf A^{-1}$ has the eigenvalues
```math
|λ_1^{-1}| > |λ_2^{-1}| ≥ |λ_3^{-1}| ≥ \cdots ≥ |λ_n^{-1}|,
```
thus a dominating (i.e. largest) eigenvalue $|λ_1^{-1}|$.
**Applying** the `power_method` function (Algorithm 1) **to $\mathbf A^{-1}$**
we thus converge to $λ_1^{-1}$ from which we can deduce $λ_1$,
the **eigenvalue of $\mathbf A$ closest to zero**.
"""

# ╔═╡ 36dc90e6-fd8a-4c88-a816-140b663532ef
md"""
Now consider point 3. and assume $σ$ has been chosen such that
```math
\tag{8}
|λ_n - σ| ≥ |λ_{n-1} - σ| ≥ \cdots |λ_{2} - σ| > |λ_1 - σ| > 0,
```
i.e. such that $σ$ is closest to the eigenvalue $λ_1$.
It follows
```math
|(λ_n-σ)^{-1}| ≤ |(λ_{n-1}-σ)^{-1}| ≤ \cdots |(λ_2-σ)^{-1}| < |(λ_1-σ)^{-1}|,
```
such that the `power_method` function converges to $(λ_1-σ)^{-1}$.
From this value we can deduce $λ_1$, since we know $σ$.
In other words by **applying Algorithm 1** to the
**shift-and-invert matrix** $(\mathbf A - σ \mathbf I)^{-1}$
enables us to find the **eigenvalue of $\mathbf A$ closest to $σ$**.
"""

# ╔═╡ 29a4d3a6-29e0-4067-b703-43da5171f7bf
md"""
A naive application of Algorithm 1 would first compute $\mathbf{P} = (\mathbf A - σ \mathbf I)^{-1}$ and then apply
```math
\mathbf y^{(k)} = \mathbf{P} \mathbf x^{(k)}
```
in each step of the power iteration.
However, for many problems the explicit computation of the inverse
$\mathbf{P}$ is numerically unstable.
Instead of computing $\mathbf{P}$ explicitly
one instead obtains $\mathbf y^{(k)}$ by **solving a linear system**
```math
(\mathbf A - σ \mathbf I) \, \mathbf y^{(k)} = \mathbf x^{(k)}
```
for $\mathbf y^{(k)}$,
which is done **using LU factorisation**.
We arrive at the following algorithm,
where the changes compared to Algorithm 1 are marked in red.
"""

# ╔═╡ 49838e6f-63c2-4494-a5d8-9e3dd25554d3
md"""
!!! info "Algorithm 2: Inverse iterations"
     Given
     - a diagonalisable matrix $\mathbf A \in \mathbb R^{n\times n}$,
     - a shift $σ\in \mathbb R$, such that $\mathbf A - σ \mathbf I$ is
       invertible
     - an initial guess $\mathbf x^{(1)} \in \mathbb R^n$
     we iterate for $k = 1, 2, \ldots$:
     1.    $\textcolor{brown}{\text{Solve }(\mathbf A - σ \mathbf I) \mathbf y^{(k)} = \mathbf x^{(k)}
        \text{ for }\mathbf y^{(k)}}$.
     2. Find the index $m$ such that $|y^{(k)}_m| = \|\mathbf y^{(k)}\|_\infty$
     3. Set $α^{(k)} = \frac{1}{y^{(k)}_m}$ and $\textcolor{brown}{β^{(k)} = σ + \frac{x^{(k)}_m}{y^{(k)}_m}}$.
     4. Set $\mathbf x^{(k+1)} = α^{(k)} \mathbf y^{(k)}$
"""

# ╔═╡ 3e0e12ad-4d09-4128-9175-14e71c7de5a2
md"""
Note the additional change in step 3: In Algorithm 1 we obtained the estimate of the dominant eigenvalue as $y^{(k)}_m / x^{(k)}_m$.
Here this estimate approximates $\frac 1 {λ_1 - σ}$, the dominant eigenvalue of $(\mathbf A - σ \mathbf I)^{-1}$.
Therefore an estimate of $λ_1$ itself is obtained by solving
```math
\frac {y^{(k)}_m} {x^{(k)}_m} = \frac 1 {λ_1 - σ} \quad\Longrightarrow\quad
λ_1 = σ + \frac{x^{(k)}_m}{y^{(k)}_m}
```
for $λ_1$, which yields exactly the expression shown in step 3 of Algorithm 2.

An implementation of this algorithm is given in:
"""

# ╔═╡ 4873fe3b-d411-4e06-92b3-8cfbed493a12
function inverse_iterations(A, σ, x; maxiter=100)
	# A:  Matrix
	# σ:  shift
	# x:  initial guess

    n = size(A, 1)
    x = normalize(x, Inf)

	fact = lu(A - σ*I)   # Compute LU factorisation of A-σI

	history = Float64[]
    for k in 1:maxiter
        y = fact \ x
        m = argmax(abs.(y))
		α = 1 / y[m]
		β = σ + x[m] / y[m]
		push!(history, β)
        x = α * y
    end

	(; x, λ=last(history), history)
end

# ╔═╡ 67d258d1-9959-46ad-9f13-2bc9abb4da37
md"""
Notice that in this implementation we make use of the fact that
once an **LU factorisation is computed** it can be **re-used
for solving multiple linear systems** with changing right-hand sides:
in our case we can expect to solve many linear systems
involving the matrix $\mathbf A - σ \mathbf I$.
Therefore we **compute the LU factorisation** only once,
namely at the beginning of the algorithm and before entering
the iterative loop.

Since for dense matrices
computing the factorisation scales $O(n^3)$,
but solving linear systems based on the factorisation only scales $O(n^2)$
([recall chapter 5](https://teaching.matmat.org/numerical-analysis/05_Direct_methods.html)), this reduces the cost per iteration.
"""

# ╔═╡ 8e01a98d-c49f-43b3-9681-07d8e4b7f12a
md"""
To investigate the possibilities enabled by inverse iterations,
we consider a few examples using the following triangular matrix
"""

# ╔═╡ 19e64038-afb3-4df6-ad47-16bad5a98a7b
C = [-4.0  3.0  4.0;
	  0.0  4.0  5.0;
	  0.0  0.0  2.0]

# ╔═╡ 75ab9929-200d-4a69-bb84-187d962662eb
md"""which has eigenvalues $λ_1 = -4$, $λ_2 = 4$ and $λ_3 = 2$.

Since it has no unique dominant eigenvalue ($|λ_1| = |λ_2| = 4$)
**plain power iterations** do not converge, but rather oscillate:
"""

# ╔═╡ fe15a206-e650-4d59-962c-a00bc226bf48
let
	xinit = randn(size(C, 2))
	res = power_method(C, xinit; maxiter=30)

	plot(res.history; mark=:o, c=1, label="", lw=1.5, ylabel=L"β^{(k)}", xlabel=L"k",
		 title="Power iterations on C")
end

# ╔═╡ a152d4e2-afde-4b57-8a22-33f5f26f6880
md"""
In contrast for **inverse iterations** (with $σ=0$) what matters are the eigenvalues of $\mathbf{C}^{-1}$, which are $1 / λ_i$ or
```math
\frac{1}{λ_1} = - \frac 14
\qquad
\frac{1}{λ_2} = \frac 14
\qquad
\frac{1}{λ_3} = \frac 12
```
Therefore there is a single dominant eigenvalue ($\frac 12$) and inverse iterations will converge to the eigenvalue $2$:
"""

# ╔═╡ d467f287-0e29-48fe-adb5-71c52aa66cfc
let
	σ = 0.0  # Just perform plain inverse iterations

	xinit = randn(size(C, 2))
	res = inverse_iterations(C, σ, xinit; maxiter=30)

	plot(res.history; mark=:o, c=1, label="", lw=1.5, ylabel=L"β^{(k)}", xlabel=L"k",
		 title="Inverse iterations on C with σ = $σ")
end

# ╔═╡ 9345e24b-f19b-46ce-8e35-fb8acbaabb53
md"""
Now suppose we knew that $\mathbf{C}$ has one eigenvalue near $-6$, such that we take $σ = -6$. Then what matters for convergence in the inverse iterations are the eigenvalues of $(\mathbf C - (-6) \mathbf I)^{-1}$, which are
$\mu_i = \frac1{λ_i-σ}$ or
```math
\mu_1 = \frac{1}{-4 + 6} = \frac12 \qquad \mu_2 = \frac{1}{4+6} = \frac1{10} \qquad \mu_3 = \frac1{2+6} = \frac{1}{8},
```
such that $\mu_1 = \frac12$ is the dominating eigenvalue. Therefore
with $σ = -6$ inverse iterations converge to $-6 + 1 / \mu_1 = -6 + 1 / (\frac{1}2) = -4$:
"""

# ╔═╡ 10558bcb-43d4-41cd-b39f-63ec58e86b1b
let
	σ = -6.0

	xinit = randn(size(C, 2))
	res = inverse_iterations(C, σ, xinit; maxiter=30)

	plot(res.history; mark=:o, c=1, label="", lw=1.5, ylabel=L"β^{(k)}", xlabel=L"k",
		 title="Inverse iterations on C with σ = $σ")
end

# ╔═╡ 59493fbc-1b97-4ade-bbba-30eea2b5c91a
md"""
### Convergence of inverse iterations

Inserting $\mathbf A - σ \mathbf I$ in the place of $\mathbf A$ in (6) and recalling the eigenvalue ordering (8), i.e.
$|λ_n - σ| ≥ |λ_{n-1} - σ| ≥ \cdots |λ_{2} - σ| > |λ_1 - σ| > 0$
one can show similarly that **inverse iterations converge linearly** with rate
```math
\tag{9}
r_\text{inviter} =
\lim_{k\to\infty} \frac{|β^{(k+1)} - λ_1|}{|β^{(k)} - λ_1|}
= \left|\frac{λ_1 - σ}{λ_2 - σ}\right|
\qquad \text{as $k\to \infty$}.
```
This implies that convergence is fastest if $σ$ is closest to the targeted
eigenvalue $λ_1$ than to all other eigenvalues of $\mathbf A$.
"""

# ╔═╡ d998c91f-0d3c-4846-a6df-9d2c407f5836
md"""
Finally let us consider a case where we use a slider to be able to
change the applied shift:
"""

# ╔═╡ c734b379-ab45-4270-8bea-461332887640
begin
	λT = [1, 0.75, 0.6, -0.4, 0]   # The reference eigenvalues we will use
	T  = triu(ones(5, 5), 1) + diagm(λT)
end

# ╔═╡ 6ffd6918-3b4d-4e1f-ba7e-5c825f96681d
md"""
This time the slider allows to tune the value of $σ$ (which we store in the variable `σT`):
- `σT = `$(@bind σT Slider(-0.5:0.01:1.1; default=0.4, show_value=true))
"""

# ╔═╡ 43d637d5-0b62-4c60-a363-e1616f517eb7
md"This value of $σ$ results in a rate:"

# ╔═╡ 908ae6f5-23fe-4ca8-bceb-e484589441a2
begin
	# compute |λ_5 - σ| > |λ_4 - σ| > |λ_3 - σ| > |λ_2 - σ| > |λ_1 - σ|
	# We need sort to get the data in that order
	diff_sorted = sort(abs.(λT .- σT); rev=true)

	# take last element (|λ_1 - σ|) and last-but-one (|λ_2 - σ|)
	r_inviter = diff_sorted[end] / diff_sorted[end-1]
end

# ╔═╡ 03e47af0-840c-425e-aa82-d265e862da7a
md"""
## Optional: Dynamic shifting

In the discussion in the previous section we noted that the convergence of inverse iterations is best if $σ$ is chosen close to the eigenvalue of the matrix $\mathbf A$. Since **inverse iterations** (Algorithm 2) actually **produce an estimate for the eigenvalue** in each iteration (the $β^{(k)}$),
a natural idea is to **update the shift**:
instead of using the same $σ$ in each iteration,
we select a different $σ = β^{(k)}$ *dynamically*
based on the best eigenvalue estimate currently available to us.

This also implies that for solving the linear system in step 1, i.e.
$(\mathbf A - σ \mathbf I) \, \mathbf y^{(k)} = \mathbf x^{(k)}$
the system matrix is different for each $k$.
Therefore pre-computing the LU factorisation
before entering the iteration loop is no longer possible.

We arrive at the following implementation for an **inverse iteration algorithm with dynamic shifting**:
"""

# ╔═╡ f0ecec01-dde6-4d51-9039-cdfc648d16e9
function dynamic_shifting(A, σ, x; maxiter=100, tol=1e-8)
	# A:  Matrix
	# σ:  shift
	# x:  initial guess