Skip to content
Merged
Show file tree
Hide file tree
Changes from 11 commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,6 @@ htmlcov/
.ruff_cache/
.dmypy.json
dmypy.json
.hypothesis/
.hypothesis/
.idea/
uv.lock
83 changes: 47 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,11 @@
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A fast and comprehensive Python library for expanding English contractions and slang.

**This is an enhanced fork of the original [contractions](https://github.com/kootenpv/contractions) library by Pascal van Kooten, with significant improvements in performance, testing, type safety, and maintainability.**
A fast and comprehensive Python library for expanding English contractions.

## Features

- ⚡ **Fast**: 50x faster than version 0.0.18 (uses efficient Aho-Corasick algorithm)
- ⚡ **Fast**: ~112K ops/sec for typical text expansion (Aho-Corasick algorithm)
- 📚 **Comprehensive**: Handles standard contractions, slang, and custom additions
- 🎯 **Smart**: Preserves case and handles ambiguous contractions intelligently
- 🔧 **Flexible**: Easy to add custom contractions on the fly
Expand Down Expand Up @@ -126,7 +124,7 @@ The `preview()` function lets you see all contractions in a text before expandin

```python
text = "I'd love to see what you're thinking"
preview = contractions.preview(text, flank=10)
preview = contractions.preview(text, context_chars=10)

for item in preview:
print(f"Found '{item['match']}' at position {item['start']}")
Expand Down Expand Up @@ -178,7 +176,7 @@ Loads custom contractions from a JSON file.
- `FileNotFoundError`: If the file doesn't exist
- `json.JSONDecodeError`: If the file contains invalid JSON

### `preview(text, flank)`
### `preview(text, context_chars)`

Preview contractions in text before expanding.

Expand All @@ -188,6 +186,14 @@ Preview contractions in text before expanding.

**Returns:** `list[dict]` - List of matches with context information

### `e(text, leftovers=True, slang=True)`

Shorthand alias for `expand()`.

### `p(text, context_chars)`

Shorthand alias for `preview()`.

## Examples

### Standard Contractions
Expand Down Expand Up @@ -230,16 +236,22 @@ he's -> he is (not "he has")

The library uses the Aho-Corasick algorithm for efficient string matching, achieving:

- **~256K ops/sec** for short texts
- **~112K ops/sec** for typical text expansion (short texts with contractions)
- **~251K ops/sec** for preview operations (contraction detection)
- **~17K ops/sec** for medium texts with no contractions
- **~13K ops/sec** for slang-heavy texts
- **~278K ops/sec** for adding custom contractions

Benchmarked on Apple M3 Max, Python 3.13.

Run performance benchmarks:
Run performance benchmarks yourself:

```bash
# Make sure package is installed in development mode
pip install -e .
# Create virtual environment and install
uv venv && source .venv/bin/activate
uv pip install -e .

# Run benchmarks
python tests/test_performance.py
```

Expand All @@ -255,27 +267,23 @@ Contributions are welcome! Please feel free to submit a Pull Request.
### Development Setup

```bash
git clone https://github.com/kootenpv/contractions
cd contractions
pip install -e .
pip install pytest pytest-cov ruff mypy
git clone https://github.com/devjerry0/sane-contractions
cd sane-contractions
uv venv && source .venv/bin/activate
uv pip install -e ".[dev]"
```

### Running Tests

```bash
# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ --cov=contractions --cov-report=term-missing
```

### Code Quality

```bash
ruff check .
mypy contractions/__init__.py tests/
mypy contractions/ tests/
```

## What's Different from the Original?
Expand All @@ -286,35 +294,38 @@ This fork includes several enhancements over the original `contractions` library
- **`add_dict()`** - Bulk add custom contractions from a dictionary
- **`load_file()`** - Load contractions from JSON files
- **Type hints** - Full type coverage with mypy validation
- **Better structure** - Modular code organization (core, api modules)
- **Better structure** - Modular code organization with single-responsibility modules
- **Facade API** - Clean, simple public API with shorthand aliases (`e()`, `p()`)

### 🚀 Performance Improvements
- Optimized dictionary operations using `|=` operator
- Lazy-loaded TextSearch instances (30x faster imports)
- Optimized dictionary operations and comprehensions
- Eliminated redundant code paths
- Reduced function call overhead
- Improved list comprehensions
- Cached computations

### 🧪 Enhanced Testing
- **100% test coverage** (up from ~60%)
- 16 comprehensive tests including edge cases
- Error handling tests
### 🧪 Testing
- **100% test coverage** enforced via CI/CD
- Comprehensive tests including edge cases
- Input validation and error handling tests
- Performance benchmarking suite

### 📦 Modern Tooling
- **Python 3.10+** support (modern type hints)
- Ruff for fast linting
- Pre-commit hooks
- GitHub Actions CI/CD
- Automated PyPI publishing
- **Python 3.10+** support (modern type hints with `list[dict]`, etc.)
- Ruff for fast linting (replaces black, flake8, isort)
- Mypy for strict type checking
- GitHub Actions CI/CD with concurrency control
- Automated PyPI publishing via Git tags
- `uv` support for fast dependency management

### 📚 Better Documentation
- Comprehensive README with examples
- API reference documentation
- Deployment guide
- Contributing guidelines
- Comprehensive README with real benchmark results
- Complete API reference with examples
- Clear contributing guidelines

## Why "sane-contractions"?

**This is an enhanced fork of the original [contractions](https://github.com/kootenpv/contractions) library by Pascal van Kooten, with improvements in performance, testing, type safety, and maintainability.**

The original library is excellent but has been unmaintained since 2021. This fork provides:
- Active maintenance
- Modern Python practices
Expand Down
6 changes: 3 additions & 3 deletions contractions/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@
from .api import add, add_dict, e, expand, load_file, p, preview


def fix(*args, **kwargs):
def fix(*args: object, **kwargs: object) -> str:
warnings.warn(
"fix() is deprecated and will be removed in v1.0.0. Use expand() instead.",
DeprecationWarning,
stacklevel=2
)
return expand(*args, **kwargs)
return expand(*args, **kwargs) # type: ignore[arg-type]


__all__ = ["expand", "fix", "add", "add_dict", "load_file", "preview", "e", "p", "__version__"]
__all__ = ["__version__", "add", "add_dict", "e", "expand", "fix", "load_file", "p", "preview"]
2 changes: 1 addition & 1 deletion contractions/api.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .extensions import add_custom_contraction, add_custom_dict, load_custom_from_file
from .extension import add_custom_contraction, add_custom_dict, load_custom_from_file
from .processor import expand as _expand
from .processor import preview as _preview

Expand Down
2 changes: 1 addition & 1 deletion contractions/bootstrap.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from .file_loader import load_dict_data, load_list_data
from .transformers import build_apostrophe_variants, normalize_apostrophes
from .transformer import build_apostrophe_variants, normalize_apostrophes


def load_all_contractions() -> tuple[dict[str, str], dict[str, str], dict[str, str]]:
Expand Down
12 changes: 6 additions & 6 deletions contractions/extensions.py → contractions/extension.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
import json
import os

from .matchers import (
from pathlib import Path

from .matcher import (
_get_basic_matcher,
_get_leftovers_matcher,
_get_leftovers_slang_matcher,
Expand Down Expand Up @@ -33,12 +34,11 @@ def add_custom_dict(contractions_dict: dict[str, str]) -> None:


def load_custom_from_file(filepath: str) -> None:
if not os.path.exists(filepath):
path = Path(filepath)
if not path.exists():
raise FileNotFoundError(f"File not found at: {filepath}")

with open(filepath, encoding="utf-8") as file:
contractions_data = json.load(file)

contractions_data = json.loads(path.read_text(encoding="utf-8"))
validate_file_contains_dict(contractions_data, filepath)
add_custom_dict(contractions_data)

6 changes: 4 additions & 2 deletions contractions/file_loader.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
import json
import pkgutil

from typing import cast

from .validation import validate_data_type


Expand All @@ -12,7 +14,7 @@ def load_dict_data(filename: str) -> dict[str, str]:
data = json.loads(json_bytes.decode("utf-8"))
validate_data_type(data, dict, filename)

return data
return cast(dict[str, str], data)


def load_list_data(filename: str) -> list[str]:
Expand All @@ -23,5 +25,5 @@ def load_list_data(filename: str) -> list[str]:
data = json.loads(json_bytes.decode("utf-8"))
validate_data_type(data, list, filename)

return data
return cast(list[str], data)

23 changes: 17 additions & 6 deletions contractions/matchers.py → contractions/matcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
_CASE_INSENSITIVE = "insensitive"


def _load_dicts():
def _load_dicts() -> None:
if _State.contractions_dict is not None:
return

Expand All @@ -24,16 +24,19 @@ def _create_matcher(mode: str, *dicts: dict[str, str]) -> TextSearch:
return matcher


def _get_basic_matcher():
def _get_basic_matcher() -> TextSearch:
if _State.basic_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
_State.basic_matcher = _create_matcher(_MODE_NORM, _State.contractions_dict)
return _State.basic_matcher


def _get_leftovers_matcher():
def _get_leftovers_matcher() -> TextSearch:
if _State.leftovers_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.leftovers_dict is not None
_State.leftovers_matcher = _create_matcher(
_MODE_NORM,
_State.contractions_dict,
Expand All @@ -42,9 +45,11 @@ def _get_leftovers_matcher():
return _State.leftovers_matcher


def _get_slang_matcher():
def _get_slang_matcher() -> TextSearch:
if _State.slang_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.slang_dict is not None
_State.slang_matcher = _create_matcher(
_MODE_NORM,
_State.contractions_dict,
Expand All @@ -53,9 +58,12 @@ def _get_slang_matcher():
return _State.slang_matcher


def _get_leftovers_slang_matcher():
def _get_leftovers_slang_matcher() -> TextSearch:
if _State.leftovers_slang_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.leftovers_dict is not None
assert _State.slang_dict is not None
_State.leftovers_slang_matcher = _create_matcher(
_MODE_NORM,
_State.contractions_dict,
Expand All @@ -65,9 +73,12 @@ def _get_leftovers_slang_matcher():
return _State.leftovers_slang_matcher


def _get_preview_matcher():
def _get_preview_matcher() -> TextSearch:
if _State.preview_matcher is None:
_load_dicts()
assert _State.contractions_dict is not None
assert _State.leftovers_dict is not None
assert _State.slang_dict is not None
all_keys = list(chain(
_State.contractions_dict.keys(),
_State.leftovers_dict.keys(),
Expand Down
15 changes: 7 additions & 8 deletions contractions/processor.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from .matchers import (
from .matcher import (
_get_basic_matcher,
_get_leftovers_matcher,
_get_leftovers_slang_matcher,
Expand All @@ -8,13 +8,6 @@
from .validation import validate_int_param, validate_string_param


def _extract_viewing_window(text: str, match_start: int, match_end: int, context_chars: int) -> str:
text_length = len(text)
window_start = max(0, match_start - context_chars)
window_end = min(text_length, match_end + context_chars)
return text[window_start:window_end]


def expand(text: str, leftovers: bool = True, slang: bool = True) -> str:
validate_string_param(text, "text")

Expand All @@ -29,6 +22,12 @@ def expand(text: str, leftovers: bool = True, slang: bool = True) -> str:

return _get_basic_matcher().replace(text)

def _extract_viewing_window(text: str, match_start: int, match_end: int, context_chars: int) -> str:
text_length = len(text)
window_start = max(0, match_start - context_chars)
window_end = min(text_length, match_end + context_chars)

return text[window_start:window_end]

def preview(text: str, context_chars: int) -> list[dict[str, str | int]]:
validate_int_param(context_chars, "context_chars")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ def _get_combinations(tokens: list[str], joiners: list[str]) -> list[str]:
return ["".join(combination) for combination in product(*interspersed_options)]


def _intersperse(items: list, separator) -> list:
def _intersperse(items: list, separator: list[str]) -> list:
num_items = len(items)
num_separators = num_items - 1
total_slots = num_items + num_separators
Expand Down
Loading
Loading