Skip to content

Commit 387b8e2

Browse files
committed
docs: improve README with comparison table and modernized examples
1 parent eebfa5c commit 387b8e2

File tree

1 file changed

+30
-47
lines changed

1 file changed

+30
-47
lines changed

README.md

Lines changed: 30 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -8,45 +8,44 @@ A Ruby library for text classification using Bayesian, Logistic Regression, LSI
88

99
**[Documentation](https://rubyclassifier.com/docs)** · **[Tutorials](https://rubyclassifier.com/docs/tutorials)** · **[Guides](https://rubyclassifier.com/docs/guides)**
1010

11-
> **Note:** This is the original `classifier` gem, actively maintained since 2005. After a quieter period, active development resumed in 2025 with major new features. If you're choosing between this and a fork, this is the canonical, actively-developed version.
11+
> **Note:** This is the original `classifier` gem, maintained for 20 years since 2005. After a quieter period, active development resumed in 2025 with major new features. If you're choosing between this and a fork, this is the canonical, most actively-developed version.
1212
1313
## Why This Library?
1414

1515
This gem has features no fork provides:
1616

17-
| Feature | This Gem | Forks |
18-
|---------|----------|-------|
19-
| **5 classifiers** (Bayes, Logistic Regression, LSI, kNN, TF-IDF) || ❌ Bayes + LSI only |
20-
| **Native C extension** for LSI (5-50x faster) || ❌ GSL dependency or pure Ruby |
21-
| **Zero dependencies** for native speed || ❌ Requires GSL system library |
22-
| **Pluggable persistence** (file, Redis, S3, custom) || ❌ Marshal only |
23-
| **Thread-safe** classifiers |||
24-
| **RBS type annotations** |||
25-
| **Ruby 3.2-3.4 support** || ⚠️ Often outdated |
26-
| **Proper Laplace smoothing** || ❌ Numeric instability |
27-
| **Calibrated probabilities** (Logistic Regression) |||
28-
| **Feature weights** for interpretability |||
29-
30-
### Recent Development (2025)
17+
| | This Gem | Forks |
18+
|:--|:--|:--|
19+
| **Algorithms** | 5 classifiers | 2 only |
20+
| **LSI Performance** | Native C (5-50x faster) | Pure Ruby, or requires GSL/Numo + system libs |
21+
| **Persistence** | Pluggable (file, Redis, S3, SQL) | Marshal only |
22+
| **Thread Safety** |||
23+
| **Type Annotations** | RBS throughout ||
24+
| **Laplace Smoothing** | Numerically stable | ❌ Unstable |
25+
| **Probability Calibration** |||
26+
| **Feature Weights** | ✅ Interpretable ||
27+
28+
### Recent Developments (Late 2025)
3129

3230
- Added Logistic Regression classifier with SGD and L2 regularization
3331
- Added k-Nearest Neighbors classifier with distance-weighted voting
3432
- Added TF-IDF vectorizer with n-gram support
35-
- Built zero-dependency native C extension (replaces GSL requirement)
33+
- Built zero-dependency native C extension (replaces GSL or Numo requirement)
3634
- Added pluggable storage backends for persistence
3735
- Made all classifiers thread-safe
3836
- Fixed Laplace smoothing for numerical stability
3937
- Added RBS type signatures throughout
40-
- Modernized for Ruby 3.2-3.4
38+
- Modernized for new Ruby coding standards
4139

4240
## Table of Contents
4341

4442
- [Installation](#installation)
45-
- [Bayesian Classifier](#bayesian-classifier)
46-
- [Logistic Regression](#logistic-regression)
47-
- [LSI (Latent Semantic Indexing)](#lsi-latent-semantic-indexing)
48-
- [k-Nearest Neighbors (kNN)](#k-nearest-neighbors-knn)
49-
- [TF-IDF Vectorizer](#tf-idf-vectorizer)
43+
- [Algorithms](#bayesian-classifier)
44+
- [Bayesian Classifier](#bayesian-classifier)
45+
- [Logistic Regression](#logistic-regression)
46+
- [LSI (Latent Semantic Indexing)](#lsi-latent-semantic-indexing)
47+
- [k-Nearest Neighbors (kNN)](#k-nearest-neighbors-knn)
48+
- [TF-IDF Vectorizer](#tf-idf-vectorizer)
5049
- [Persistence](#persistence)
5150
- [Performance](#performance)
5251
- [Development](#development)
@@ -75,7 +74,7 @@ gem install classifier
7574

7675
### Native C Extension
7776

78-
The gem includes a native C extension for fast LSI operations. It compiles automatically during gem installation. No external dependencies are required.
77+
The gem includes a zero-dependency native C extension for fast LSI operations (5-50x faster than pure Ruby). It compiles automatically during installation.
7978

8079
To verify the native extension is active:
8180

@@ -90,22 +89,6 @@ To force pure Ruby mode (for debugging):
9089
NATIVE_VECTOR=true ruby your_script.rb
9190
```
9291

93-
To suppress the warning when native extension isn't available:
94-
95-
```bash
96-
SUPPRESS_LSI_WARNING=true ruby your_script.rb
97-
```
98-
99-
### Compatibility
100-
101-
| Ruby Version | Status |
102-
|--------------|--------|
103-
| 4.0 | Supported |
104-
| 3.4 | Supported |
105-
| 3.3 | Supported |
106-
| 3.2 | Supported |
107-
| 3.1 | EOL (unsupported) |
108-
10992
## Bayesian Classifier
11093

11194
Fast, accurate classification with modest memory requirements. Ideal for spam filtering, sentiment analysis, and content categorization.
@@ -118,7 +101,7 @@ require 'classifier'
118101
classifier = Classifier::Bayes.new(:spam, :ham)
119102

120103
# Train with keyword arguments
121-
classifier.train(spam: "Buy cheap viagra now! Limited offer!")
104+
classifier.train(spam: "Buy cheap v1agra now! Limited offer!")
122105
classifier.train(ham: "Meeting scheduled for tomorrow at 10am")
123106

124107
# Train multiple items at once
@@ -209,18 +192,18 @@ require 'classifier'
209192

210193
lsi = Classifier::LSI.new
211194

212-
# Add documents with hash-style syntax (category => item(s))
213-
lsi.add("Pets" => "Dogs are loyal pets that love to play fetch")
214-
lsi.add("Pets" => "Cats are independent and love to nap")
215-
lsi.add("Programming" => "Ruby is a dynamic programming language")
195+
# Add documents (category: item(s))
196+
lsi.add(pets: "Dogs are loyal pets that love to play fetch")
197+
lsi.add(pets: "Cats are independent and love to nap")
198+
lsi.add(programming: "Ruby is a dynamic programming language")
216199

217200
# Add multiple items with the same category
218-
lsi.add("Programming" => ["Python is great for data science", "JavaScript runs in browsers"])
201+
lsi.add(programming: ["Python is great for data science", "JavaScript runs in browsers"])
219202

220203
# Batch operations with multiple categories
221204
lsi.add(
222-
"Pets" => ["Hamsters are small furry pets", "Birds can be great companions"],
223-
"Programming" => "Go is fast and concurrent"
205+
pets: ["Hamsters are small furry pets", "Birds can be great companions"],
206+
programming: "Go is fast and concurrent"
224207
)
225208

226209
# Classify new text

0 commit comments

Comments
 (0)