@@ -18,33 +18,45 @@ Image Source: GeeksforGeeks
1818It's efficient because it utilizes the information gained from previous character comparisons. When a mismatch occurs,
1919the algorithm uses this information to skip over as many characters as possible.
2020
21- Considering the string pattern: <br >
22- <div style =" text-align : center ;" >
23- "XYXYCXYXYF"
24- </div >
21+ Considering the string pattern:
22+
23+ Pattern:| X | Y | X | Y | C | X | Y | X | Y | F |
24+ | -------| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
25+ Position:| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10|
26+
2527and string:
26- <div style =" text-align : center ;" >
27- XYXYCXYXYCXYXYFGABC
28- </div >
28+
29+ String: | X | Y | X | Y | C | X | Y | X | Y | C | X | Y | X | Y | F | G | A | B | C |
30+ | --------| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
31+ Position:| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16| 17| 18| 19|
2932
3033KMP has, during its initial processing of the pattern, identified that "XYXY" is a repeating sub-pattern.
3134This means when the mismatch at F (10th character of the pattern) and C (10th character of the string) occurs,
3235KMP doesn't need to start matching again from the very beginning of the pattern. <br >
3336Instead, it leverages the information that "XYXY" has already been matched.
3437
35- Therefore, the algorithm continues matching from the 5th character of the pattern string (C in "XYXYCXYXYF"). <br >
36- It checks this against the 10th character of the string (C in "XYXYCXYXYCXYXYFGABC"). <br >
37- Since they match, the algorithm continues from there without re-checking the initial "XYXY".
38+ Therefore, the algorithm continues matching from the 5th character of the pattern string.
39+
40+ This is the key idea behind KMP algorithm and is applied throughout the string.
41+ How it knows to 'reuse' previously identified sub-patterns is by keeping
42+ track of what is known as the Longest Prefix Suffix (LPS; Longest prefix that is also a suffix) Table. Look at the
43+ implementation for a step-by-step explanation for how LPS table is generated.
44+
45+ Pattern: | | X | Y | X | Y | C | X | Y | X | Y | F |
46+ | --------| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---| ---|
47+ Position:| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10|
48+ LPS: | -1| 0 | 0 | 1 | 2 | 0 | 1 | 2 | 3 | 4 | 0 |
3849
3950## Complexity Analysis
4051Let k be the length of the pattern and n be the length of the string to match against.
41- ** Time complexity** : O(n+k)
4252
4353Naively, we can look for patterns in a given sequence in O(nk) where n is the length of the sequence and k
4454is the length of the pattern. We do this by iterating every character of the sequence, and look at the
4555immediate k-1 characters that come after it. This is not a big issue if k is known to be small, but there's
4656no guarantee this is the case.
4757
58+ ** Time complexity** : O(n+k)
59+
4860KMP does this in O(n+k) by making use of previously identified sub-patterns. It identifies sub-patterns
4961by first processing the pattern input in O(k) time, allowing identification of patterns in
5062O(n) traversal of the sequence. More details found in the src code.
0 commit comments