-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
It may become very useful to some to provide approximate substring matching. This reports the smallest edit distance of the needle in all possible substrings of haystack. Here are some examples:
tests := []struct {
needle, haystack string
want int
}{
{"Machu Picchu", "Machu Picchu", 0},
{"Machu Picchu", "Where is Machu Picchu?", 0},
{"Machu Picchu", "Where is Machu Pichu?", 1},
{"Machu Picchu", "Where is Macchu Picchu?", 1},
{"Stonehenge", "How do I get to Stone Henge?", 2},
{"", "", 0},
{"", "Machu Picchu", 0},
{"u", "Machu Picchu", 0},
{"z", "Machu Picchu", 1},
{"Uluru", "", 5},
}All that is necessary is to initialize the row with all zeroes, run the main algorithm, and then return the smallest value in the row. I have developed some code and was wondering about your thoughts about including it (or some more optimized variant) in this package:
func ComputeSubstringDistance(needle, haystack string) int {
if len(needle) == 0 {
return 0
}
if len(haystack) == 0 {
return utf8.RuneCountInString(needle)
}
if needle == haystack {
return 0
}
// We need to convert to []rune if the strings are non-ascii.
// This could be avoided by using utf8.RuneCountInString
// and then doing some juggling with rune indices.
// The primary challenge is keeping track of the previous rune.
// With needle range loop, its not that easy. And with needle for-loop
// we need to keep track of the inter-rune width using utf8.DecodeRuneInString
s1 := []rune(haystack)
s2 := []rune(needle)
lenS1 := len(s1)
lenS2 := len(s2)
// init the row
x := make([]int, lenS1+1)
// make needle dummy bounds check to prevent the 2 bounds check down below.
// The one inside the loop is particularly costly.
_ = x[lenS1]
// fill in the rest
for i := 1; i <= lenS2; i++ {
prev := i
var current int
for j := 1; j <= lenS1; j++ {
if s2[i-1] == s1[j-1] {
current = x[j-1] // match
} else {
current = min(min(x[j-1]+1, prev+1), x[j]+1)
}
x[j-1] = prev
prev = current
}
x[lenS1] = prev
}
min := x[0]
for i := 1; i <= lenS1; i++ {
if x[i] < min {
min = x[i]
}
}
return min
}Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels