Skip to content

Commit d6c506c

Browse files
committed
docs: Update KMP README again..
1 parent ba79246 commit d6c506c

File tree

2 files changed

+25
-12
lines changed

2 files changed

+25
-12
lines changed

src/main/java/algorithms/patternFinding/KMP.java

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,8 +51,9 @@ public class KMP {
5151
private static int[] getPrefixTable(String pattern) {
5252
// 1-indexed implementation
5353
int len = pattern.length();
54+
// INTERPRETATION: suffix ending at the ith position (1-indexed) matches with numCharsMatched[i] of the prefix
5455
int[] numCharsMatched = new int[len + 1];
55-
numCharsMatched[0] = -1;
56+
numCharsMatched[0] = -1; // since 1-indexed, dummy value. We will exploit this dummy to make code neater later
5657
numCharsMatched[1] = 0; // 1st character has no prefix to match with
5758

5859
int currPrefixMatched = 0; // num of chars of prefix pattern currently matched

src/main/java/algorithms/patternFinding/README.md

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -18,33 +18,45 @@ Image Source: GeeksforGeeks
1818
It's efficient because it utilizes the information gained from previous character comparisons. When a mismatch occurs,
1919
the algorithm uses this information to skip over as many characters as possible.
2020

21-
Considering the string pattern: <br>
22-
<div style="text-align: center;">
23-
"XYXYCXYXYF"
24-
</div>
21+
Considering the string pattern:
22+
23+
Pattern:| X | Y | X | Y | C | X | Y | X | Y | F |
24+
|-------|---|---|---|---|---|---|---|---|---|---|
25+
Position:| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10|
26+
2527
and string:
26-
<div style="text-align: center;">
27-
XYXYCXYXYCXYXYFGABC
28-
</div>
28+
29+
String: | X | Y | X | Y | C | X | Y | X | Y | C | X | Y | X | Y | F | G | A | B | C |
30+
|--------|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
31+
Position:| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15| 16| 17| 18| 19|
2932

3033
KMP has, during its initial processing of the pattern, identified that "XYXY" is a repeating sub-pattern.
3134
This means when the mismatch at F (10th character of the pattern) and C (10th character of the string) occurs,
3235
KMP doesn't need to start matching again from the very beginning of the pattern. <br>
3336
Instead, it leverages the information that "XYXY" has already been matched.
3437

35-
Therefore, the algorithm continues matching from the 5th character of the pattern string (C in "XYXYCXYXYF"). <br>
36-
It checks this against the 10th character of the string (C in "XYXYCXYXYCXYXYFGABC"). <br>
37-
Since they match, the algorithm continues from there without re-checking the initial "XYXY".
38+
Therefore, the algorithm continues matching from the 5th character of the pattern string.
39+
40+
This is the key idea behind KMP algorithm and is applied throughout the string.
41+
How it knows to 'reuse' previously identified sub-patterns is by keeping
42+
track of what is known as the Longest Prefix Suffix (LPS; Longest prefix that is also a suffix) Table. Look at the
43+
implementation for a step-by-step explanation for how LPS table is generated.
44+
45+
Pattern: | | X | Y | X | Y | C | X | Y | X | Y | F |
46+
|--------|---|---|---|---|---|---|---|---|---|---|---|
47+
Position:| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10|
48+
LPS: | -1| 0 | 0 | 1 | 2 | 0 | 1 | 2 | 3 | 4 | 0 |
3849

3950
## Complexity Analysis
4051
Let k be the length of the pattern and n be the length of the string to match against.
41-
**Time complexity**: O(n+k)
4252

4353
Naively, we can look for patterns in a given sequence in O(nk) where n is the length of the sequence and k
4454
is the length of the pattern. We do this by iterating every character of the sequence, and look at the
4555
immediate k-1 characters that come after it. This is not a big issue if k is known to be small, but there's
4656
no guarantee this is the case.
4757

58+
**Time complexity**: O(n+k)
59+
4860
KMP does this in O(n+k) by making use of previously identified sub-patterns. It identifies sub-patterns
4961
by first processing the pattern input in O(k) time, allowing identification of patterns in
5062
O(n) traversal of the sequence. More details found in the src code.

0 commit comments

Comments
 (0)