Skip to content

xreeple/winklestein

Repository files navigation

Winklestein

Unit Tests

Winklestein is a hybrid string similarity algorithm that combines Levenshtein Distance and Jaro–Winkler Similarity to provide accurate, robust, and tunable similarity scoring for short and medium-length text inputs.

It is especially effective for:

  • Phone numbers
  • Names
  • Addresses
  • Fuzzy matching
  • Record deduplication
  • Autocorrect and search suggestions

📦 Packages

Package Release Preview Downloads
Xreeple.Winklestein NuGet NuGet NuGet

🚀 Features

  • Hybrid scoring algorithm (Levenshtein + Jaro–Winkler)
  • Normalized similarity output between 0.0 and 1.0
  • Three configurable tolerance modes:
    • Aggressive — More forgiving, marks more strings as similar
    • Normal — Balanced, recommended for most cases
    • Lenient — Strict, lower similarity unless strings are very close
  • Zero-dependency, cross‑platform .NET Standard library
  • Simple one‑line usage:
    var score = Winklestein.Compare("hello", "h3llo");

📦 Installation

dotnet add package Winklestein

🧠 Similarity Modes

Winklestein exposes three tunable modes that modify weighting and output sensitivity:

🔴 Aggressive

  • More tolerant of differences
  • Boosts similar prefixes strongly
  • Useful for:
    • OCR corrections
    • Contact search
    • Misspelled user input
  • Produces higher similarity scores

Example:

"mehmet" vs "mehmed" → higher score

⚪ Normal (default)

  • Balanced weighting
  • Works for most fuzzy‑match comparisons
  • Recommended for general-purpose applications

🔵 Lenient

  • Stricter and less forgiving
  • Levenshtein penalty is increased
  • Jaro–Winkler boost is reduced
  • Suitable when:
    • High precision is needed
    • Probabilistic matching must be conservative

Example:

"mehmet" vs "mehmed" → lower score

🧩 Example Usage

Basic comparison

using Winklestein;

var result = Winklestein.Compare("hello", "h3llo");
Console.WriteLine(result);   // 0.89 (example)

With mode selection

using Winklestein;

var result = Winklestein.Compare("mehmet", "mehmed", SimilarityMode.Aggressive);

All modes

var a = Winklestein.Compare("metro", "metre", SimilarityMode.Aggressive);
var b = Winklestein.Compare("metro", "metre", SimilarityMode.Normal);
var c = Winklestein.Compare("metro", "metre", SimilarityMode.Lenient);

🧪 Unit Tests

You can find full tests in the repository, including:

  • Basic equality checks
  • Mode sensitivity checks
  • Distance and Winkler accuracy tests

⚡ Benchmarking

Benchmarks use BenchmarkDotNet and include:

  • Small strings
  • Long strings
  • High‑difference strings
  • Mode comparisons

Run with:

dotnet run -c Release

📜 License

MIT License.


💬 Contribution

Contributions, issues, and PRs are welcome!

About

Winklestein is a hybrid string similarity algorithm that combines Levenshtein Distance and Jaro–Winkler Similarity to provide accurate, robust, and tunable similarity scoring for short and medium-length text inputs.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages