GroundTruth

Hallucination detection for LLM outputs. Evaluate whether generated text remains grounded in source context using three complementary methods — overlap, entailment, and consistency — aggregated into a single calibrated score.

Features

Three detection methods — n-gram recall (overlap), semantic key-term coverage (entailment), and Jaccard similarity across candidates (consistency).
Weighted aggregation — combine detectors with custom weights; weights are auto-normalized when detectors are added or removed.
Confidence scoring — every result includes a confidence value (distance from decision boundary) alongside the hallucination score.
Pluggable NLI backend — swap in any HuggingFace or custom NLI model via the NLIProvider protocol; heuristic fallback requires zero dependencies.
Batch processing — detect_batch handles multiple claim/context pairs efficiently in a single call.
Zero required dependencies — core runs on the standard library; torch and transformers are optional extras.

Quick Start

pip install -e .

from groundtruth import GroundTruthDetector, OverlapDetector, EntailmentDetector, ConsistencyDetector

detector = GroundTruthDetector(
    detectors=[
        (OverlapDetector(), 1.0),
        (EntailmentDetector(), 1.0),
        (ConsistencyDetector(), 0.5),
    ]
)

context = "The Eiffel Tower is located in Paris, France, and was completed in 1889."
claim   = "The Eiffel Tower is in London."

result = detector.detect(claim, context)
print(result.score)           # e.g. 0.83 (closer to 1.0 = likely hallucination)
print(result.is_hallucination) # True
print(result.confidence)      # e.g. 0.66

Batch detection

claims   = ["Paris is the capital of France.", "The tower was built in 1950."]
contexts = [context, context]

results = detector.detect_batch(claims, contexts)
for r in results:
    print(r.score, r.is_hallucination)

Optional NLI backend

pip install -e ".[transformers]"

from groundtruth import EntailmentDetector
from transformers import pipeline

nli = pipeline("zero-shot-classification", model="cross-encoder/nli-deberta-v3-small")

class HFProvider:
    def predict(self, premise: str, hypothesis: str) -> float:
        out = nli(premise, candidate_labels=[hypothesis])
        return 1.0 - out["scores"][0]  # invert: entailed → low score

detector = EntailmentDetector(nli_provider=HFProvider())

Architecture

src/groundtruth/
├── __init__.py        # Public API exports
├── base.py            # Abstract BaseDetector (detect → DetectionResult)
├── models.py          # DetectionResult, AggregatedResult dataclasses
├── aggregator.py      # GroundTruthDetector — orchestrates detectors, batching, weight normalization
└── detectors/
    ├── overlap.py     # N-gram recall: computes unigram/bigram coverage of claim against context
    ├── entailment.py  # Key-term heuristic or pluggable NLI backend via NLIProvider protocol
    └── consistency.py # Jaccard similarity across candidate responses; falls back to context

Score semantics: 0.0 = fully grounded, 1.0 = likely hallucination. Each detector normalizes independently; the aggregator computes a weighted average and re-normalizes weights on every add_detector call.

Extending: subclass BaseDetector, implement detect(claim, context, **kwargs) -> DetectionResult, and pass the instance to GroundTruthDetector.

Development

pip install -e ".[dev]"
pytest -v          # run all tests
ruff check .       # lint

Tests live in tests/ and cover core behavior, edge cases, weight normalization, batch processing, and the NLIProvider protocol.

Contributing

See CONTRIBUTING.md for the fork → feature branch → test → lint → PR workflow.

License

MIT — see LICENSE.

Built by TechKnowMad Labs

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github		.github
claude-talk-to-figma-mcp		claude-talk-to-figma-mcp
examples		examples
src/groundtruth		src/groundtruth
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
EVOLUTION.md		EVOLUTION.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GroundTruth

Features

Quick Start

Batch detection

Optional NLI backend

Architecture

Development

Contributing

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GroundTruth

Features

Quick Start

Batch detection

Optional NLI backend

Architecture

Development

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages