Skip to content

Commit 285d85f

Browse files
author
Marco Zocca
committed
compress the readme
1 parent d1a8761 commit 285d85f

File tree

1 file changed

+3
-5
lines changed

1 file changed

+3
-5
lines changed

README.md

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ NCD(x, y) = (C(xy) - min(C(x), C(y))) / max(C(x), C(y))
2222
```
2323
where C(s) is the compressed size of s using gzip.
2424

25-
25+
This metric has been rediscovered multiple times in the research literature, first for clustering (Cilibrasi and Vitanyi 2005) and more recently in the text classification setting (Jiang et al 2023).
2626

2727
### Why VP-trees?
2828

@@ -63,14 +63,12 @@ let results = knnSearch 2 query tree
6363

6464
## Characteristics
6565

66-
- **Universal metric**: Works with any data that can be compressed, no feature engineering needed
66+
- **Universal metric**: Works with any data that can be compressed, no feature engineering or model training needed
6767
- **Approximate search**: VP-tree pruning makes it an approximate (but highly accurate in practice) nearest neighbor search
68-
- **Lossless comparison**: Based on information-theoretic principles
69-
- **Pure Haskell**: No external dependencies beyond compression libraries
7068

7169
## Testing
7270

73-
The library includes a comprehensive property-based test suite with over 3,100 generated test cases covering:
71+
The library includes a comprehensive property-based test suite covering:
7472
- Core distance and tree construction properties
7573
- Similarity search correctness
7674
- Edge cases and tree structure invariants

0 commit comments

Comments
 (0)