@@ -18,33 +18,45 @@ Image Source: GeeksforGeeks
18
18
It's efficient because it utilizes the information gained from previous character comparisons. When a mismatch occurs,
19
19
the algorithm uses this information to skip over as many characters as possible.
20
20
21
- Considering the string pattern: <br >
22
- <div style =" text-align : center ;" >
23
- "XYXYCXYXYF"
24
- </div >
21
+ Considering the string pattern:
22
+
23
+ Pattern:| X | Y | X | Y | C | X | Y | X | Y | F |
24
+ | -------| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
25
+ Position:| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10|
26
+
25
27
and string:
26
- <div style =" text-align : center ;" >
27
- XYXYCXYXYCXYXYFGABC
28
- </div >
28
+
29
+ String: | X | Y | X | Y | C | X | Y | X | Y | C | X | Y | X | Y | F | G | A | B | C |
30
+ | --------| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
31
+ Position:| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16| 17| 18| 19|
29
32
30
33
KMP has, during its initial processing of the pattern, identified that "XYXY" is a repeating sub-pattern.
31
34
This means when the mismatch at F (10th character of the pattern) and C (10th character of the string) occurs,
32
35
KMP doesn't need to start matching again from the very beginning of the pattern. <br >
33
36
Instead, it leverages the information that "XYXY" has already been matched.
34
37
35
- Therefore, the algorithm continues matching from the 5th character of the pattern string (C in "XYXYCXYXYF"). <br >
36
- It checks this against the 10th character of the string (C in "XYXYCXYXYCXYXYFGABC"). <br >
37
- Since they match, the algorithm continues from there without re-checking the initial "XYXY".
38
+ Therefore, the algorithm continues matching from the 5th character of the pattern string.
39
+
40
+ This is the key idea behind KMP algorithm and is applied throughout the string.
41
+ How it knows to 'reuse' previously identified sub-patterns is by keeping
42
+ track of what is known as the Longest Prefix Suffix (LPS; Longest prefix that is also a suffix) Table. Look at the
43
+ implementation for a step-by-step explanation for how LPS table is generated.
44
+
45
+ Pattern: | | X | Y | X | Y | C | X | Y | X | Y | F |
46
+ | --------| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
47
+ Position:| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10|
48
+ LPS: | -1| 0 | 0 | 1 | 2 | 0 | 1 | 2 | 3 | 4 | 0 |
38
49
39
50
## Complexity Analysis
40
51
Let k be the length of the pattern and n be the length of the string to match against.
41
- ** Time complexity** : O(n+k)
42
52
43
53
Naively, we can look for patterns in a given sequence in O(nk) where n is the length of the sequence and k
44
54
is the length of the pattern. We do this by iterating every character of the sequence, and look at the
45
55
immediate k-1 characters that come after it. This is not a big issue if k is known to be small, but there's
46
56
no guarantee this is the case.
47
57
58
+ ** Time complexity** : O(n+k)
59
+
48
60
KMP does this in O(n+k) by making use of previously identified sub-patterns. It identifies sub-patterns
49
61
by first processing the pattern input in O(k) time, allowing identification of patterns in
50
62
O(n) traversal of the sequence. More details found in the src code.
0 commit comments