Skip to content

Commit 594affd

Browse files
authored
Merge pull request #6 from devjerry0/refactor/singular-module-names
Refactor/singular module names
2 parents 32e7f02 + 109a384 commit 594affd

20 files changed

+304
-184
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,6 @@ htmlcov/
2525
.ruff_cache/
2626
.dmypy.json
2727
dmypy.json
28-
.hypothesis/
28+
.hypothesis/
29+
.idea/
30+
uv.lock

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,6 @@ repos:
1010
rev: v1.13.0
1111
hooks:
1212
- id: mypy
13-
files: ^contractions/
13+
files: ^(contractions|tests)/
1414
additional_dependencies: [types-all]
1515

README.md

Lines changed: 47 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,11 @@
55
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
66
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
77

8-
A fast and comprehensive Python library for expanding English contractions and slang.
9-
10-
**This is an enhanced fork of the original [contractions](https://github.com/kootenpv/contractions) library by Pascal van Kooten, with significant improvements in performance, testing, type safety, and maintainability.**
8+
A fast and comprehensive Python library for expanding English contractions.
119

1210
## Features
1311

14-
-**Fast**: 50x faster than version 0.0.18 (uses efficient Aho-Corasick algorithm)
12+
-**Fast**: ~112K ops/sec for typical text expansion (Aho-Corasick algorithm)
1513
- 📚 **Comprehensive**: Handles standard contractions, slang, and custom additions
1614
- 🎯 **Smart**: Preserves case and handles ambiguous contractions intelligently
1715
- 🔧 **Flexible**: Easy to add custom contractions on the fly
@@ -126,7 +124,7 @@ The `preview()` function lets you see all contractions in a text before expandin
126124

127125
```python
128126
text = "I'd love to see what you're thinking"
129-
preview = contractions.preview(text, flank=10)
127+
preview = contractions.preview(text, context_chars=10)
130128

131129
for item in preview:
132130
print(f"Found '{item['match']}' at position {item['start']}")
@@ -178,7 +176,7 @@ Loads custom contractions from a JSON file.
178176
- `FileNotFoundError`: If the file doesn't exist
179177
- `json.JSONDecodeError`: If the file contains invalid JSON
180178

181-
### `preview(text, flank)`
179+
### `preview(text, context_chars)`
182180

183181
Preview contractions in text before expanding.
184182

@@ -188,6 +186,14 @@ Preview contractions in text before expanding.
188186

189187
**Returns:** `list[dict]` - List of matches with context information
190188

189+
### `e(text, leftovers=True, slang=True)`
190+
191+
Shorthand alias for `expand()`.
192+
193+
### `p(text, context_chars)`
194+
195+
Shorthand alias for `preview()`.
196+
191197
## Examples
192198

193199
### Standard Contractions
@@ -230,16 +236,22 @@ he's -> he is (not "he has")
230236

231237
The library uses the Aho-Corasick algorithm for efficient string matching, achieving:
232238

233-
- **~256K ops/sec** for short texts
239+
- **~112K ops/sec** for typical text expansion (short texts with contractions)
240+
- **~251K ops/sec** for preview operations (contraction detection)
234241
- **~17K ops/sec** for medium texts with no contractions
235242
- **~13K ops/sec** for slang-heavy texts
243+
- **~278K ops/sec** for adding custom contractions
244+
245+
Benchmarked on Apple M3 Max, Python 3.13.
236246

237-
Run performance benchmarks:
247+
Run performance benchmarks yourself:
238248

239249
```bash
240-
# Make sure package is installed in development mode
241-
pip install -e .
250+
# Create virtual environment and install
251+
uv venv && source .venv/bin/activate
252+
uv pip install -e .
242253

254+
# Run benchmarks
243255
python tests/test_performance.py
244256
```
245257

@@ -255,27 +267,23 @@ Contributions are welcome! Please feel free to submit a Pull Request.
255267
### Development Setup
256268

257269
```bash
258-
git clone https://github.com/kootenpv/contractions
259-
cd contractions
260-
pip install -e .
261-
pip install pytest pytest-cov ruff mypy
270+
git clone https://github.com/devjerry0/sane-contractions
271+
cd sane-contractions
272+
uv venv && source .venv/bin/activate
273+
uv pip install -e ".[dev]"
262274
```
263275

264276
### Running Tests
265277

266278
```bash
267-
# Run tests
268-
pytest tests/ -v
269-
270-
# Run tests with coverage
271279
pytest tests/ --cov=contractions --cov-report=term-missing
272280
```
273281

274282
### Code Quality
275283

276284
```bash
277285
ruff check .
278-
mypy contractions/__init__.py tests/
286+
mypy contractions/ tests/
279287
```
280288

281289
## What's Different from the Original?
@@ -286,35 +294,38 @@ This fork includes several enhancements over the original `contractions` library
286294
- **`add_dict()`** - Bulk add custom contractions from a dictionary
287295
- **`load_file()`** - Load contractions from JSON files
288296
- **Type hints** - Full type coverage with mypy validation
289-
- **Better structure** - Modular code organization (core, api modules)
297+
- **Better structure** - Modular code organization with single-responsibility modules
298+
- **Facade API** - Clean, simple public API with shorthand aliases (`e()`, `p()`)
290299

291300
### 🚀 Performance Improvements
292-
- Optimized dictionary operations using `|=` operator
301+
- Lazy-loaded TextSearch instances (30x faster imports)
302+
- Optimized dictionary operations and comprehensions
303+
- Eliminated redundant code paths
293304
- Reduced function call overhead
294-
- Improved list comprehensions
295-
- Cached computations
296305

297-
### 🧪 Enhanced Testing
298-
- **100% test coverage** (up from ~60%)
299-
- 16 comprehensive tests including edge cases
300-
- Error handling tests
306+
### 🧪 Testing
307+
- **100% test coverage** enforced via CI/CD
308+
- Comprehensive tests including edge cases
309+
- Input validation and error handling tests
301310
- Performance benchmarking suite
302311

303312
### 📦 Modern Tooling
304-
- **Python 3.10+** support (modern type hints)
305-
- Ruff for fast linting
306-
- Pre-commit hooks
307-
- GitHub Actions CI/CD
308-
- Automated PyPI publishing
313+
- **Python 3.10+** support (modern type hints with `list[dict]`, etc.)
314+
- Ruff for fast linting (replaces black, flake8, isort)
315+
- Mypy for strict type checking
316+
- GitHub Actions CI/CD with concurrency control
317+
- Automated PyPI publishing via Git tags
318+
- `uv` support for fast dependency management
309319

310320
### 📚 Better Documentation
311-
- Comprehensive README with examples
312-
- API reference documentation
313-
- Deployment guide
314-
- Contributing guidelines
321+
- Comprehensive README with real benchmark results
322+
- Complete API reference with examples
323+
- Clear contributing guidelines
315324

316325
## Why "sane-contractions"?
317326

327+
**This is an enhanced fork of the original [contractions](https://github.com/kootenpv/contractions) library by Pascal van Kooten, with improvements in performance, testing, type safety, and maintainability.**
328+
318329
The original library is excellent but has been unmaintained since 2021. This fork provides:
319330
- Active maintenance
320331
- Modern Python practices

contractions/__init__.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,13 +4,13 @@
44
from .api import add, add_dict, e, expand, load_file, p, preview
55

66

7-
def fix(*args, **kwargs):
7+
def fix(*args: object, **kwargs: object) -> str:
88
warnings.warn(
99
"fix() is deprecated and will be removed in v1.0.0. Use expand() instead.",
1010
DeprecationWarning,
1111
stacklevel=2
1212
)
13-
return expand(*args, **kwargs)
13+
return expand(*args, **kwargs) # type: ignore[arg-type]
1414

1515

16-
__all__ = ["expand", "fix", "add", "add_dict", "load_file", "preview", "e", "p", "__version__"]
16+
__all__ = ["__version__", "add", "add_dict", "e", "expand", "fix", "load_file", "p", "preview"]

contractions/_version.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
1-
__version__ = "0.3.0"
1+
__version__ = "0.3.1"
22

contractions/api.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
from .extensions import add_custom_contraction, add_custom_dict, load_custom_from_file
1+
from .extension import add_custom_contraction, add_custom_dict, load_custom_from_file
22
from .processor import expand as _expand
33
from .processor import preview as _preview
44

contractions/bootstrap.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
from .file_loader import load_dict_data, load_list_data
2-
from .transformers import build_apostrophe_variants, normalize_apostrophes
2+
from .transformer import build_apostrophe_variants, normalize_apostrophes
33

44

55
def load_all_contractions() -> tuple[dict[str, str], dict[str, str], dict[str, str]]:
Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
import json
2-
import os
32

4-
from .matchers import (
3+
from pathlib import Path
4+
5+
from .matcher import (
56
_get_basic_matcher,
67
_get_leftovers_matcher,
78
_get_leftovers_slang_matcher,
@@ -33,12 +34,11 @@ def add_custom_dict(contractions_dict: dict[str, str]) -> None:
3334

3435

3536
def load_custom_from_file(filepath: str) -> None:
36-
if not os.path.exists(filepath):
37+
path = Path(filepath)
38+
if not path.exists():
3739
raise FileNotFoundError(f"File not found at: {filepath}")
3840

39-
with open(filepath, encoding="utf-8") as file:
40-
contractions_data = json.load(file)
41-
41+
contractions_data = json.loads(path.read_text(encoding="utf-8"))
4242
validate_file_contains_dict(contractions_data, filepath)
4343
add_custom_dict(contractions_data)
4444

contractions/file_loader.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
import json
22
import pkgutil
33

4+
from typing import cast
5+
46
from .validation import validate_data_type
57

68

@@ -12,7 +14,7 @@ def load_dict_data(filename: str) -> dict[str, str]:
1214
data = json.loads(json_bytes.decode("utf-8"))
1315
validate_data_type(data, dict, filename)
1416

15-
return data
17+
return cast("dict[str, str]", data)
1618

1719

1820
def load_list_data(filename: str) -> list[str]:
@@ -23,5 +25,5 @@ def load_list_data(filename: str) -> list[str]:
2325
data = json.loads(json_bytes.decode("utf-8"))
2426
validate_data_type(data, list, filename)
2527

26-
return data
28+
return cast("list[str]", data)
2729

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
_CASE_INSENSITIVE = "insensitive"
1111

1212

13-
def _load_dicts():
13+
def _load_dicts() -> None:
1414
if _State.contractions_dict is not None:
1515
return
1616

@@ -24,16 +24,19 @@ def _create_matcher(mode: str, *dicts: dict[str, str]) -> TextSearch:
2424
return matcher
2525

2626

27-
def _get_basic_matcher():
27+
def _get_basic_matcher() -> TextSearch:
2828
if _State.basic_matcher is None:
2929
_load_dicts()
30+
assert _State.contractions_dict is not None
3031
_State.basic_matcher = _create_matcher(_MODE_NORM, _State.contractions_dict)
3132
return _State.basic_matcher
3233

3334

34-
def _get_leftovers_matcher():
35+
def _get_leftovers_matcher() -> TextSearch:
3536
if _State.leftovers_matcher is None:
3637
_load_dicts()
38+
assert _State.contractions_dict is not None
39+
assert _State.leftovers_dict is not None
3740
_State.leftovers_matcher = _create_matcher(
3841
_MODE_NORM,
3942
_State.contractions_dict,
@@ -42,9 +45,11 @@ def _get_leftovers_matcher():
4245
return _State.leftovers_matcher
4346

4447

45-
def _get_slang_matcher():
48+
def _get_slang_matcher() -> TextSearch:
4649
if _State.slang_matcher is None:
4750
_load_dicts()
51+
assert _State.contractions_dict is not None
52+
assert _State.slang_dict is not None
4853
_State.slang_matcher = _create_matcher(
4954
_MODE_NORM,
5055
_State.contractions_dict,
@@ -53,9 +58,12 @@ def _get_slang_matcher():
5358
return _State.slang_matcher
5459

5560

56-
def _get_leftovers_slang_matcher():
61+
def _get_leftovers_slang_matcher() -> TextSearch:
5762
if _State.leftovers_slang_matcher is None:
5863
_load_dicts()
64+
assert _State.contractions_dict is not None
65+
assert _State.leftovers_dict is not None
66+
assert _State.slang_dict is not None
5967
_State.leftovers_slang_matcher = _create_matcher(
6068
_MODE_NORM,
6169
_State.contractions_dict,
@@ -65,9 +73,12 @@ def _get_leftovers_slang_matcher():
6573
return _State.leftovers_slang_matcher
6674

6775

68-
def _get_preview_matcher():
76+
def _get_preview_matcher() -> TextSearch:
6977
if _State.preview_matcher is None:
7078
_load_dicts()
79+
assert _State.contractions_dict is not None
80+
assert _State.leftovers_dict is not None
81+
assert _State.slang_dict is not None
7182
all_keys = list(chain(
7283
_State.contractions_dict.keys(),
7384
_State.leftovers_dict.keys(),

0 commit comments

Comments
 (0)