Skip to content

Research: Advanced negation detection improvements #83

@berntpopp

Description

@berntpopp

Context

Following implementation of ConText algorithm for multilingual negation detection (PR #82, fixes #79), this tracks potential enhancements for more sophisticated clinical negation handling.

Current State: ConText-based direction-aware negation detection with TERMINATE/PSEUDO handling (F1 ~85-90% typical).

Proposed Improvements

1. Additional ConText Categories (Medium Priority)

What: Implement remaining ConText assertion categories beyond NEGATED_EXISTENCE

  • POSSIBLE_EXISTENCE - "possible", "might have"
  • HISTORICAL - "history of", "previous"
  • HYPOTHETICAL - "if patient develops", "risk of"
  • FAMILY - "mother has", "family history of"

Why: Distinguishes uncertainty/temporality from simple negation
Effort: Low (rules exist in ConText standard)
When: When users report misclassification of uncertain/historical findings

2. Cross-Sentence Scope Detection (Low Priority)

What: Extend negation scope beyond sentence boundaries
Example: "No neurological findings. Reflexes are normal." (current: treats as separate)
Why: Clinical text often uses implicit continuation
Effort: Medium (requires discourse analysis)
When: If users report cross-sentence negation errors

3. Nested Negation Handling (Research)

What: Support complex nested structures: "not ruled out" (double negative = affirmation)
Why: Occurs in specialist clinical language
Effort: Medium-High
When: After measuring prevalence in real data

4. ML-Enhanced Scope Boundaries (Future Research)

What: Hybrid approach using NegBERT/BioBERT for scope detection
Reference: NegBERT (Khandelwal & Sawant, 2020) - F1 92% on NegEx corpus
Why: State-of-the-art performance for complex syntax
Effort: High (requires ML infrastructure, training corpus, inference pipeline)
When: Only if rule-based approach shows systematic failures
YAGNI: Rule-based ConText sufficient for current use cases

Decision Criteria

Implement when:
✅ Multiple user reports of specific negation error pattern
✅ Error impacts clinical accuracy (false positives/negatives)
✅ Benefit outweighs computational/maintenance cost

Defer if:
❌ No user complaints about current implementation
❌ Edge case with <1% occurrence rate
❌ Would add significant complexity

References

Related Issues

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions