You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Catch up to 2.0.0 release of upstream project as of commit 1924ab8. Implement Ratcliff-Obserhelp algorithm. Add missing SIFT4 example to README. Fix documentation issue #17.
|[Sorensen-Dice coefficient](#sorensen-dice-coefficient)|similarity<br>distance | Yes | No | Set | O(m+n) |
58
+
|[Ratcliff-Obershelp](#ratcliff-obershelp)|similarity<br>distance | Yes | No || ? ||
55
59
56
60
[1] In this library, Levenshtein edit distance, LCS distance and their sibblings are computed using the **dynamic programming** method, which has a cost O(m.n). For Levenshtein distance, the algorithm is sometimes called **Wagner-Fischer algorithm** ("The string-to-string correction problem", 1974). The original algorithm uses a matrix of size m x n to store the Levenshtein distance between string prefixes.
@@ -427,6 +431,64 @@ Similar to Jaccard index, but this time the similarity is computed as 2 * |V1 in
427
431
428
432
Distance is computed as 1 - cosine similarity.
429
433
434
+
## Ratcliff-Obershelp
435
+
Ratcliff/Obershelp Pattern Recognition, also known as Gestalt Pattern Matching, is a string-matching algorithm for determining the similarity of two strings. It was developed in 1983 by John W. Ratcliff and John A. Obershelp and published in the Dr. Dobb's Journal in July 1988
436
+
437
+
Ratcliff/Obershelp computes the similarity between 2 strings, and the returned value lies in the interval [0.0, 1.0].
438
+
439
+
The distance is computed as 1 - Ratcliff/Obershelp similarity.
440
+
441
+
```cs
442
+
usingSystem;
443
+
usingF23.StringSimilarity;
444
+
445
+
publicclassProgram
446
+
{
447
+
publicstaticvoidMain(string[] args)
448
+
{
449
+
varro=newRatcliffObershelp();
450
+
451
+
// substitution of s and t
452
+
Console.WriteLine(ro.Similarity("My string", "My tsring"));
453
+
454
+
// substitution of s and n
455
+
Console.WriteLine(ro.Similarity("My string", "My ntrisg"));
456
+
}
457
+
}
458
+
```
459
+
460
+
will produce:
461
+
462
+
```
463
+
0.8888888888888888
464
+
0.7777777777777778
465
+
```
466
+
467
+
## Experimental
468
+
469
+
### SIFT4
470
+
SIFT4 is a general purpose string distance algorithm inspired by JaroWinkler and Longest Common Subsequence. It was developed to produce a distance measure that matches as close as possible to the human perception of string distance. Hence it takes into account elements like character substitution, character distance, longest common subsequence etc. It was developed using experimental testing, and without theoretical background.
0 commit comments