Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
b771265
Update README with real benchmark numbers and fix outdated info
devjerry0 Dec 15, 2025
6616cfa
Remove obsolete setup.cfg
devjerry0 Dec 15, 2025
1e8838b
refactor: rename matchers.py to matcher.py for consistent singular na…
devjerry0 Dec 15, 2025
6fa093a
refactor: rename transformers.py to transformer.py for consistent sin…
devjerry0 Dec 15, 2025
c7a17c6
refactor: rename extensions.py to extension.py for consistent singula…
devjerry0 Dec 15, 2025
2561b2d
fix: set asyncio_default_fixture_loop_scope to silence pytest-asyncio…
devjerry0 Dec 15, 2025
8100e41
chore: consolidate linter configs into pyproject.toml
devjerry0 Dec 15, 2025
cc6ed78
Merge branch 'main' into refactor/singular-module-names
devjerry0 Dec 15, 2025
4c80529
fix: add complete type annotations to pass mypy and ruff strict checks
devjerry0 Dec 15, 2025
b9e3d3a
refactor: replace os.path with pathlib for modern file operations
devjerry0 Dec 15, 2025
438f110
refactor: reorder _extract_viewing_window to improve logical flow in …
devjerry0 Dec 15, 2025
f59fb6f
style: fix all ruff errors and add mypy overrides for tests
devjerry0 Dec 15, 2025
446baba
feat: add strict type checking to all test files
devjerry0 Dec 15, 2025
63577da
style: break down compound assertions per PT018
devjerry0 Dec 15, 2025
742cbea
chore: extend mypy pre-commit hook to check tests/
devjerry0 Dec 15, 2025
6702006
fix: quote type expressions in cast() and escape regex metacharacters
devjerry0 Dec 15, 2025
cdfcb99
chore: bump version to 0.4.0
devjerry0 Dec 15, 2025
109a384
chore: bump version to 0.3.1
devjerry0 Dec 15, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,6 @@ htmlcov/
.ruff_cache/
.dmypy.json
dmypy.json
.hypothesis/
.hypothesis/
.idea/
uv.lock
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,6 @@ repos:
rev: v1.13.0
hooks:
- id: mypy
files: ^contractions/
files: ^(contractions|tests)/
additional_dependencies: [types-all]

83 changes: 47 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,11 @@
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A fast and comprehensive Python library for expanding English contractions and slang.

**This is an enhanced fork of the original [contractions](https://github.com/kootenpv/contractions) library by Pascal van Kooten, with significant improvements in performance, testing, type safety, and maintainability.**
A fast and comprehensive Python library for expanding English contractions.

## Features

- ⚡ **Fast**: 50x faster than version 0.0.18 (uses efficient Aho-Corasick algorithm)
- ⚡ **Fast**: ~112K ops/sec for typical text expansion (Aho-Corasick algorithm)
- 📚 **Comprehensive**: Handles standard contractions, slang, and custom additions
- 🎯 **Smart**: Preserves case and handles ambiguous contractions intelligently
- 🔧 **Flexible**: Easy to add custom contractions on the fly
Expand Down Expand Up @@ -126,7 +124,7 @@ The `preview()` function lets you see all contractions in a text before expandin

```python
text = "I'd love to see what you're thinking"
preview = contractions.preview(text, flank=10)
preview = contractions.preview(text, context_chars=10)

for item in preview:
print(f"Found '{item['match']}' at position {item['start']}")
Expand Down Expand Up @@ -178,7 +176,7 @@ Loads custom contractions from a JSON file.
- `FileNotFoundError`: If the file doesn't exist
- `json.JSONDecodeError`: If the file contains invalid JSON

### `preview(text, flank)`
### `preview(text, context_chars)`

Preview contractions in text before expanding.

Expand All @@ -188,6 +186,14 @@ Preview contractions in text before expanding.

**Returns:** `list[dict]` - List of matches with context information

### `e(text, leftovers=True, slang=True)`

Shorthand alias for `expand()`.

### `p(text, context_chars)`

Shorthand alias for `preview()`.

## Examples

### Standard Contractions
Expand Down Expand Up @@ -230,16 +236,22 @@ he's -> he is (not "he has")

The library uses the Aho-Corasick algorithm for efficient string matching, achieving:

- **~256K ops/sec** for short texts
- **~112K ops/sec** for typical text expansion (short texts with contractions)
- **~251K ops/sec** for preview operations (contraction detection)
- **~17K ops/sec** for medium texts with no contractions
- **~13K ops/sec** for slang-heavy texts
- **~278K ops/sec** for adding custom contractions

Benchmarked on Apple M3 Max, Python 3.13.

Run performance benchmarks:
Run performance benchmarks yourself:

```bash
# Make sure package is installed in development mode
pip install -e .
# Create virtual environment and install
uv venv && source .venv/bin/activate
uv pip install -e .

# Run benchmarks
python tests/test_performance.py
```

Expand All @@ -255,27 +267,23 @@ Contributions are welcome! Please feel free to submit a Pull Request.
### Development Setup

```bash
git clone https://github.com/kootenpv/contractions
cd contractions
pip install -e .
pip install pytest pytest-cov ruff mypy
git clone https://github.com/devjerry0/sane-contractions
cd sane-contractions
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
```

### Running Tests

```bash
# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=contractions --cov-report=term-missing
```

### Code Quality

```bash
ruff check .
mypy contractions/__init__.py tests/
mypy contractions/ tests/
```

## What's Different from the Original?
Expand All @@ -286,35 +294,38 @@ This fork includes several enhancements over the original `contractions` library
- **`add_dict()`** - Bulk add custom contractions from a dictionary
- **`load_file()`** - Load contractions from JSON files
- **Type hints** - Full type coverage with mypy validation
- **Better structure** - Modular code organization (core, api modules)
- **Better structure** - Modular code organization with single-responsibility modules
- **Facade API** - Clean, simple public API with shorthand aliases (`e()`, `p()`)

### 🚀 Performance Improvements
- Optimized dictionary operations using `|=` operator
- Lazy-loaded TextSearch instances (30x faster imports)
- Optimized dictionary operations and comprehensions
- Eliminated redundant code paths
- Reduced function call overhead
- Improved list comprehensions
- Cached computations

### 🧪 Enhanced Testing
- **100% test coverage** (up from ~60%)
- 16 comprehensive tests including edge cases
- Error handling tests
### 🧪 Testing
- **100% test coverage** enforced via CI/CD
- Comprehensive tests including edge cases
- Input validation and error handling tests
- Performance benchmarking suite

### 📦 Modern Tooling
- **Python 3.10+** support (modern type hints)
- Ruff for fast linting
- Pre-commit hooks
- GitHub Actions CI/CD
- Automated PyPI publishing
- **Python 3.10+** support (modern type hints with `list[dict]`, etc.)
- Ruff for fast linting (replaces black, flake8, isort)
- Mypy for strict type checking
- GitHub Actions CI/CD with concurrency control
- Automated PyPI publishing via Git tags
- `uv` support for fast dependency management

### 📚 Better Documentation
- Comprehensive README with examples
- API reference documentation
- Deployment guide
- Contributing guidelines
- Comprehensive README with real benchmark results
- Complete API reference with examples
- Clear contributing guidelines

## Why "sane-contractions"?

**This is an enhanced fork of the original [contractions](https://github.com/kootenpv/contractions) library by Pascal van Kooten, with improvements in performance, testing, type safety, and maintainability.**

The original library is excellent but has been unmaintained since 2021. This fork provides:
- Active maintenance
- Modern Python practices
Expand Down
6 changes: 3 additions & 3 deletions contractions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
from .api import add, add_dict, e, expand, load_file, p, preview


def fix(*args, **kwargs):
def fix(*args: object, **kwargs: object) -> str:
warnings.warn(
"fix() is deprecated and will be removed in v1.0.0. Use expand() instead.",
DeprecationWarning,
stacklevel=2
)
return expand(*args, **kwargs)
return expand(*args, **kwargs) # type: ignore[arg-type]


__all__ = ["expand", "fix", "add", "add_dict", "load_file", "preview", "e", "p", "__version__"]
__all__ = ["__version__", "add", "add_dict", "e", "expand", "fix", "load_file", "p", "preview"]
2 changes: 1 addition & 1 deletion contractions/_version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = "0.3.0"
__version__ = "0.3.1"

2 changes: 1 addition & 1 deletion contractions/api.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .extensions import add_custom_contraction, add_custom_dict, load_custom_from_file
from .extension import add_custom_contraction, add_custom_dict, load_custom_from_file
from .processor import expand as _expand
from .processor import preview as _preview

Expand Down
2 changes: 1 addition & 1 deletion contractions/bootstrap.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .file_loader import load_dict_data, load_list_data
from .transformers import build_apostrophe_variants, normalize_apostrophes
from .transformer import build_apostrophe_variants, normalize_apostrophes


def load_all_contractions() -> tuple[dict[str, str], dict[str, str], dict[str, str]]:
Expand Down
12 changes: 6 additions & 6 deletions contractions/extensions.py → contractions/extension.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import json
import os

from .matchers import (
from pathlib import Path

from .matcher import (
_get_basic_matcher,
_get_leftovers_matcher,
_get_leftovers_slang_matcher,
Expand Down Expand Up @@ -33,12 +34,11 @@ def add_custom_dict(contractions_dict: dict[str, str]) -> None:


def load_custom_from_file(filepath: str) -> None:
if not os.path.exists(filepath):
path = Path(filepath)
if not path.exists():
raise FileNotFoundError(f"File not found at: {filepath}")

with open(filepath, encoding="utf-8") as file:
contractions_data = json.load(file)

contractions_data = json.loads(path.read_text(encoding="utf-8"))
validate_file_contains_dict(contractions_data, filepath)
add_custom_dict(contractions_data)

6 changes: 4 additions & 2 deletions contractions/file_loader.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import json
import pkgutil

from typing import cast

from .validation import validate_data_type


Expand All @@ -12,7 +14,7 @@ def load_dict_data(filename: str) -> dict[str, str]:
data = json.loads(json_bytes.decode("utf-8"))
validate_data_type(data, dict, filename)

return data
return cast("dict[str, str]", data)


def load_list_data(filename: str) -> list[str]:
Expand All @@ -23,5 +25,5 @@ def load_list_data(filename: str) -> list[str]:
data = json.loads(json_bytes.decode("utf-8"))
validate_data_type(data, list, filename)

return data
return cast("list[str]", data)

23 changes: 17 additions & 6 deletions contractions/matchers.py → contractions/matcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
_CASE_INSENSITIVE = "insensitive"


def _load_dicts():
def _load_dicts() -> None:
if _State.contractions_dict is not None:
return

Expand All @@ -24,16 +24,19 @@ def _create_matcher(mode: str, *dicts: dict[str, str]) -> TextSearch:
return matcher


def _get_basic_matcher():
def _get_basic_matcher() -> TextSearch:
if _State.basic_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
_State.basic_matcher = _create_matcher(_MODE_NORM, _State.contractions_dict)
return _State.basic_matcher


def _get_leftovers_matcher():
def _get_leftovers_matcher() -> TextSearch:
if _State.leftovers_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.leftovers_dict is not None
_State.leftovers_matcher = _create_matcher(
_MODE_NORM,
_State.contractions_dict,
Expand All @@ -42,9 +45,11 @@ def _get_leftovers_matcher():
return _State.leftovers_matcher


def _get_slang_matcher():
def _get_slang_matcher() -> TextSearch:
if _State.slang_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.slang_dict is not None
_State.slang_matcher = _create_matcher(
_MODE_NORM,
_State.contractions_dict,
Expand All @@ -53,9 +58,12 @@ def _get_slang_matcher():
return _State.slang_matcher


def _get_leftovers_slang_matcher():
def _get_leftovers_slang_matcher() -> TextSearch:
if _State.leftovers_slang_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.leftovers_dict is not None
assert _State.slang_dict is not None
_State.leftovers_slang_matcher = _create_matcher(
_MODE_NORM,
_State.contractions_dict,
Expand All @@ -65,9 +73,12 @@ def _get_leftovers_slang_matcher():
return _State.leftovers_slang_matcher


def _get_preview_matcher():
def _get_preview_matcher() -> TextSearch:
if _State.preview_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.leftovers_dict is not None
assert _State.slang_dict is not None
all_keys = list(chain(
_State.contractions_dict.keys(),
_State.leftovers_dict.keys(),
Expand Down
15 changes: 7 additions & 8 deletions contractions/processor.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .matchers import (
from .matcher import (
_get_basic_matcher,
_get_leftovers_matcher,
_get_leftovers_slang_matcher,
Expand All @@ -8,13 +8,6 @@
from .validation import validate_int_param, validate_string_param


def _extract_viewing_window(text: str, match_start: int, match_end: int, context_chars: int) -> str:
text_length = len(text)
window_start = max(0, match_start - context_chars)
window_end = min(text_length, match_end + context_chars)
return text[window_start:window_end]


def expand(text: str, leftovers: bool = True, slang: bool = True) -> str:
validate_string_param(text, "text")

Expand All @@ -29,6 +22,12 @@ def expand(text: str, leftovers: bool = True, slang: bool = True) -> str:

return _get_basic_matcher().replace(text)

def _extract_viewing_window(text: str, match_start: int, match_end: int, context_chars: int) -> str:
text_length = len(text)
window_start = max(0, match_start - context_chars)
window_end = min(text_length, match_end + context_chars)

return text[window_start:window_end]

def preview(text: str, context_chars: int) -> list[dict[str, str | int]]:
validate_int_param(context_chars, "context_chars")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def _get_combinations(tokens: list[str], joiners: list[str]) -> list[str]:
return ["".join(combination) for combination in product(*interspersed_options)]


def _intersperse(items: list, separator) -> list:
def _intersperse(items: list, separator: list[str]) -> list:
num_items = len(items)
num_separators = num_items - 1
total_slots = num_items + num_separators
Expand Down
Loading
Loading