Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 34 additions & 0 deletions docs/concepts/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,33 @@ Parts: ["MUC1 oncoprotein", "nuclear targeting"]
# Both parts must exist in the reference
```

### 4. Title Validation

In addition to excerpt/quote validation, the validator can verify reference titles using **exact matching** (not substring). Titles are validated when:

- A slot implements `dcterms:title` or has `slot_uri: dcterms:title`
- A slot is named `title` (fallback)

**Example:**
```yaml
reference_title: "MUC1 oncoprotein blocks nuclear targeting of c-Abl"
```

Title matching uses the same normalization as excerpts (case, whitespace, punctuation, Greek letters) but requires the **entire title to match**, not just a substring.

```python
# These match after normalization:
expected = "Role of JAK1 in Cell-Signaling"
actual = "Role of JAK1 in Cell Signaling"
# Both normalize to: "role of jak1 in cell signaling"

# These do NOT match (partial title):
expected = "Role of JAK1" # Missing "in Cell Signaling"
actual = "Role of JAK1 in Cell Signaling"
```

See [Validating Reference Titles](../how-to/validate-titles.md) for detailed usage.

## Why Deterministic Matching?

### Not Fuzzy Matching
Expand Down Expand Up @@ -171,10 +198,17 @@ classes:
slot_uri: linkml:excerpt # Marks as quoted text
reference:
slot_uri: linkml:authoritative_reference # Marks as reference ID
reference_title:
slot_uri: dcterms:title # Marks as reference title (optional)
```

When LinkML validates data, it calls our plugin for fields marked with these URIs.

The plugin discovers fields via:
- `implements` attribute (e.g., `implements: [dcterms:title]`)
- `slot_uri` attribute (e.g., `slot_uri: dcterms:title`)
- Fallback slot names (`reference`, `supporting_text`, `title`)

## Editorial Conventions

### Square Brackets `[...]`
Expand Down
146 changes: 146 additions & 0 deletions docs/how-to/validate-titles.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
# Validating Reference Titles

This guide explains how to validate that reference titles in your data match the actual titles from the source publications.

## Overview

Title validation ensures that when you cite a reference with a title, that title matches what the publication actually has. Unlike excerpt validation (which uses substring matching), title validation uses **exact matching after normalization**.

## When to Use Title Validation

Title validation is useful when:

- Your data includes reference titles that should match the source
- You want to catch typos or outdated titles
- You need to verify metadata accuracy in curated datasets

## Schema Setup

Mark title fields in your LinkML schema using `dcterms:title`:

### Using `implements`

```yaml
id: https://example.org/my-schema
name: my-schema

prefixes:
linkml: https://w3id.org/linkml/
dcterms: http://purl.org/dc/terms/

classes:
Evidence:
attributes:
reference:
implements:
- linkml:authoritative_reference
reference_title:
implements:
- dcterms:title
supporting_text:
implements:
- linkml:excerpt
```

### Using `slot_uri`

```yaml
classes:
Evidence:
attributes:
reference:
slot_uri: linkml:authoritative_reference
title:
slot_uri: dcterms:title
supporting_text:
slot_uri: linkml:excerpt
```

## Example Data

**data.yaml:**
```yaml
- reference: PMID:16888623
reference_title: "MUC1 oncoprotein blocks nuclear targeting of c-Abl"
supporting_text: "MUC1 oncoprotein blocks nuclear targeting"
```

**Validate:**
```bash
linkml-reference-validator validate data \
data.yaml \
--schema schema.yaml \
--target-class Evidence
```

## What Gets Normalized

Title matching allows for minor orthographic variations:

| Variation | Example |
|-----------|---------|
| **Case** | `"JAK1 Protein"` matches `"jak1 protein"` |
| **Whitespace** | `"Cell Signaling"` matches `"Cell Signaling"` |
| **Punctuation** | `"T-Cell Receptor"` matches `"T Cell Receptor"` |
| **Greek letters** | `"α-catenin"` matches `"alpha-catenin"` |
| **Trailing periods** | `"Study Title."` matches `"Study Title"` |

## Title-Only Validation

You can validate titles without excerpts. If your data has reference and title fields but no excerpt field, the validator will validate the title alone:

```yaml
classes:
Reference:
attributes:
id:
implements:
- linkml:authoritative_reference
title:
implements:
- dcterms:title
```

```yaml
- id: PMID:16888623
title: "MUC1 oncoprotein blocks nuclear targeting of c-Abl"
```

## Combined Validation

When both title and excerpt fields are present, both are validated together:

1. The excerpt is checked for substring match in the reference content
2. The title is checked for exact match (after normalization) against the reference title

If either fails, validation fails with a specific error message.

## Error Messages

### Title Mismatch

```
Title mismatch for PMID:16888623: expected 'Wrong Title' but got 'MUC1 oncoprotein blocks nuclear targeting of c-Abl'
```

### Reference Has No Title

```
Reference PMID:99999999 has no title to validate against
```

## Differences from Excerpt Validation

| Aspect | Title Validation | Excerpt Validation |
|--------|------------------|-------------------|
| **Matching** | Exact (after normalization) | Substring |
| **Partial matches** | Not allowed | Allowed with `...` |
| **Editorial notes** | Not supported | `[brackets]` removed |
| **Use case** | Metadata accuracy | Quote verification |

## Best Practices

1. **Use exact titles**: Copy the title exactly from the source
2. **Don't abbreviate**: Title must match completely
3. **Check special characters**: Greek letters, subscripts, etc.
4. **Verify after fetching**: The cached reference shows the actual title
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ linkml-reference-validator ensures that text excerpts in your data accurately ma
- **Deterministic validation** - No fuzzy matching or AI hallucinations
- **Multiple reference sources** - PubMed, DOIs, local files, and URLs
- **Editorial convention support** - Handles `[clarifications]` and `...` ellipsis
- **Title validation** - Verify reference titles with `dcterms:title`
- **Multiple interfaces** - CLI for quick checks, Python API for integration
- **LinkML integration** - Validates data files with `linkml:excerpt` annotations
- **Smart caching** - Stores references locally to avoid repeated API calls
Expand Down
2 changes: 2 additions & 0 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ linkml-reference-validator validate text \
- **Automatic Caching**: References cached locally after first fetch
- **Editorial Notes**: Use `[...]` for clarifications: `"MUC1 [mucin 1] oncoprotein"`
- **Ellipsis**: Use `...` for omitted text: `"MUC1 ... nuclear targeting"`
- **Title Validation**: Verify reference titles with `dcterms:title`
- **Deterministic Matching**: Substring-based (not AI/fuzzy matching)
- **PubMed & PMC**: Fetches from NCBI automatically
- **DOI Support**: Fetches metadata from Crossref API
Expand All @@ -121,5 +122,6 @@ linkml-reference-validator validate text \

- **[Tutorial 1: Getting Started](notebooks/01_getting_started.ipynb)** - CLI basics with real examples
- **[Tutorial 2: Advanced Usage](notebooks/02_advanced_usage.ipynb)** - Data validation with LinkML schemas
- **[Validating Reference Titles](how-to/validate-titles.md)** - Verify titles with `dcterms:title`
- **[Concepts](concepts/how-it-works.md)** - Understanding the validation process
- **[CLI Reference](reference/cli.md)** - Complete command documentation
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ nav:
- Validating Entrez Accessions: how-to/validate-entrez.md
- Validating DOIs: how-to/validate-dois.md
- Validating URLs: how-to/validate-urls.md
- Validating Reference Titles: how-to/validate-titles.md
- Using Local Files and URLs: how-to/use-local-files-and-urls.md
- Adding a New Reference Source: how-to/add-reference-source.md
- Concepts:
Expand Down
Loading