You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+24-24Lines changed: 24 additions & 24 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -257,17 +257,14 @@ The bag distance is a cheap distance measure which always returns a distance sma
257
257
</details>
258
258
259
259
<details>
260
-
<summary><u>Substring Set</u></summary>
261
-
262
-
Splits the strings on spaces, sorts, re-joins, and then determines Jaro-Winkler distance. Best when the strings contain irrelevent substrings.
263
-
</details>
264
-
265
-
<details>
266
-
<summary><u>Sørensen–Dice</u></summary>
260
+
<summary><u>Double Metaphone</u></summary>
267
261
268
-
Sørensen–Dice coefficient is calculated using bigrams. The equation is `2nt / nx + ny` where nx is the number of bigrams in string x, ny is the number of bigrams in string y, and nt is the number of bigrams in both strings. For example, the bigrams of `night` and `nacht` are `{ni,ig,gh,ht}` and `{na,ac,ch,ht}`. They each have four and the intersection is `ht`.
262
+
Calculates the [Double Metaphone Phonetic Algorithm](https://xlinux.nist.gov/dads/HTML/doubleMetaphone.html) metric of two strings. The return value is based on the match level: strict, strong, normal (default), or weak.
269
263
270
-
``` (2 · 1) / (4 + 4) = 0.25 ```
264
+
* "strict": both encodings for each string must match
265
+
* "strong": the primary encoding for each string must match
266
+
* "normal": the primary encoding of one string must match either encoding of other string (default)
267
+
* "weak": either primary or secondary encoding of one string must match one encoding of other string
271
268
</details>
272
269
273
270
<details>
@@ -303,14 +300,23 @@ Compares two strings by converting each to an approximate phonetic representatio
303
300
</details>
304
301
305
302
<details>
306
-
<summary><u>Double Metaphone</u></summary>
303
+
<summary><u>N-Gram Similarity</u></summary>
307
304
308
-
Calculates the [Double Metaphone Phonetic Algorithm](https://xlinux.nist.gov/dads/HTML/doubleMetaphone.html) metric of two strings. The return value is based on the match level: strict, strong, normal (default), or weak.
305
+
Calculates the ngram distance between two strings. Default ngram: 2.
306
+
</details>
309
307
310
-
* "strict": both encodings for each string must match
311
-
* "strong": the primary encoding for each string must match
312
-
* "normal": the primary encoding of one string must match either encoding of other string (default)
313
-
* "weak": either primary or secondary encoding of one string must match one encoding of other string
308
+
<details>
309
+
<summary><u>Overlap Metric</u></summary>
310
+
311
+
Uses the Overlap Similarity metric to compare two strings by tokenizing the strings and measuring their overlap. Default ngram: 1.
312
+
</details>
313
+
314
+
<details>
315
+
<summary><u>Sørensen–Dice</u></summary>
316
+
317
+
Sørensen–Dice coefficient is calculated using bigrams. The equation is `2nt / nx + ny` where nx is the number of bigrams in string x, ny is the number of bigrams in string y, and nt is the number of bigrams in both strings. For example, the bigrams of `night` and `nacht` are `{ni,ig,gh,ht}` and `{na,ac,ch,ht}`. They each have four and the intersection is `ht`.
318
+
319
+
``` (2 · 1) / (4 + 4) = 0.25 ```
314
320
</details>
315
321
316
322
<details>
@@ -324,15 +330,9 @@ accuracy for search terms containing more than one word.
324
330
</details>
325
331
326
332
<details>
327
-
<summary><u>N-Gram Similarity</u></summary>
328
-
329
-
Calculates the ngram distance between two strings. Default ngram: 2.
330
-
</details>
331
-
332
-
<details>
333
-
<summary><u>Overlap Metric</u></summary>
333
+
<summary><u>Substring Set</u></summary>
334
334
335
-
Uses the Overlap Similarity metric to compare two strings by tokenizing the strings and measuring their overlap. Default ngram: 1.
335
+
Splits the strings on spaces, sorts, re-joins, and then determines Jaro-Winkler distance. Best when the strings contain irrelevent substrings.
336
336
</details>
337
337
338
338
<details>
@@ -361,7 +361,7 @@ A generalization of Sørensen–Dice and Jaccard.
361
361
362
362
## In Development
363
363
364
-
*Author Name Disambiguation (see lib/akin/and.ex for developments)
0 commit comments