-
Notifications
You must be signed in to change notification settings - Fork 0
Add title validation with dcterms:title support #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Add validate_title() method in SupportingTextValidator for exact title matching after normalization (case, whitespace, punctuation, Greek letters) - Add _find_title_fields() in plugin to discover slots implementing dcterms:title or having slot_uri dcterms:title - Add _validate_title() in plugin to yield validation results - Update _validate_instance() to validate titles from title fields, either combined with excerpt or standalone - Add comprehensive test suite (16 tests) for title validation Title validation uses exact matching (not substring like excerpt) but allows for minor orthographic variation including: - Case insensitivity - Whitespace normalization - Punctuation differences - Greek letter equivalence (α ↔ alpha) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Claude finished @cmungall's task —— View job PR Review: Title Validation ImplementationTodo List
Code Quality & Best Practices ✅Overall, this is a well-structured implementation with good separation of concerns: Strengths:
Minor Suggestions:
Potential Bugs & Issues
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request adds title validation functionality to the LinkML reference validator, enabling exact matching of reference titles after normalization. The feature supports schema-driven discovery of title fields through dcterms:title interface implementation or slot_uri specification.
Key changes:
- Adds
validate_title()method for standalone title validation with exact matching after normalization (case, whitespace, punctuation, Greek letters) - Implements
_find_title_fields()to discover title slots viadcterms:titleinterface or slot_uri - Integrates title validation into the plugin's validation flow, supporting both combined excerpt+title validation and standalone title-only validation
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| tests/test_title_validation.py | Comprehensive test suite with 16 tests covering title validation scenarios including normalization, exact matching, field discovery, and plugin integration |
| src/linkml_reference_validator/validation/supporting_text_validator.py | Adds validate_title() method implementing exact title matching with normalization support for case, whitespace, punctuation, and Greek letters |
| src/linkml_reference_validator/plugins/reference_validation_plugin.py | Adds _find_title_fields() for schema discovery and _validate_title() for integration; updates validation flow to support title fields alongside excerpt validation |
| for title_field in title_fields: | ||
| title_value = instance.get(title_field) | ||
| if not title_value: | ||
| continue | ||
|
|
||
| for ref_field in reference_fields: | ||
| ref_value = instance.get(ref_field) | ||
| if ref_value: | ||
| reference_id = self._extract_reference_id(ref_value) | ||
| if reference_id: | ||
| yield from self._validate_title( | ||
| title_value, | ||
| reference_id, | ||
| f"{path}.{title_field}" if path else title_field, | ||
| ) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the excerpt validation loop above, if an instance has multiple title fields or multiple reference fields with values, this could result in redundant title validations. Consider validating only the first available title against the first available reference to avoid unnecessary duplicate checks.
| for title_field in title_fields: | |
| title_value = instance.get(title_field) | |
| if not title_value: | |
| continue | |
| for ref_field in reference_fields: | |
| ref_value = instance.get(ref_field) | |
| if ref_value: | |
| reference_id = self._extract_reference_id(ref_value) | |
| if reference_id: | |
| yield from self._validate_title( | |
| title_value, | |
| reference_id, | |
| f"{path}.{title_field}" if path else title_field, | |
| ) | |
| # Validate only the first available title against the first available reference | |
| first_title_field: Optional[str] = None | |
| first_title_value: Optional[str] = None | |
| for title_field in title_fields: | |
| title_value = instance.get(title_field) | |
| if title_value: | |
| first_title_field = title_field | |
| first_title_value = title_value | |
| break | |
| if first_title_field and first_title_value: | |
| for ref_field in reference_fields: | |
| ref_value = instance.get(ref_field) | |
| if ref_value: | |
| reference_id = self._extract_reference_id(ref_value) | |
| if reference_id: | |
| yield from self._validate_title( | |
| first_title_value, | |
| reference_id, | |
| f"{path}.{first_title_field}" | |
| if path | |
| else first_title_field, | |
| ) | |
| break |
| if not reference: | ||
| return ValidationResult( | ||
| is_valid=False, | ||
| reference_id=reference_id, | ||
| supporting_text="", | ||
| severity=ValidationSeverity.ERROR, | ||
| message=f"Could not fetch reference: {reference_id}", | ||
| path=path, | ||
| ) | ||
|
|
||
| if not reference.title: | ||
| return ValidationResult( | ||
| is_valid=False, | ||
| reference_id=reference_id, | ||
| supporting_text="", | ||
| severity=ValidationSeverity.ERROR, | ||
| message=f"Reference {reference_id} has no title to validate against", | ||
| path=path, | ||
| ) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing test coverage for the case where reference fetching fails (returns None) in validate_title(). Consider adding a test similar to test_validate_title_only_no_reference_title but for when the fetcher returns None entirely.
| """LinkML validation plugin for supporting text validation. | ||
| def pre_process(self, context: ValidationContext) -> None: | ||
| """Pre-process hook called before validation. | ||
| This plugin integrates with the LinkML validation framework to validate | ||
| that supporting text quotes actually appear in their referenced publications. | ||
| Args: | ||
| context: Validation context from LinkML | ||
| The plugin discovers reference and excerpt fields using LinkML's interface | ||
| mechanism. It looks for: | ||
| - Slots implementing linkml:authoritative_reference | ||
| - Slots implementing linkml:excerpt |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class docstring should be updated to mention title validation functionality. It currently only mentions "supporting text validation" and lists excerpt fields, but doesn't document the new dcterms:title field discovery and validation capability that this PR adds.
| # Check implements for dcterms:title | ||
| if slot.implements: | ||
| for interface in slot.implements: | ||
| if "dcterms:title" in interface or "title" in interface.lower(): |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition "title" in interface.lower() on line 334 is overly broad and could match unintended interfaces. For example, it would match "subtitle", "entitle", "retitled", etc. The check should be more specific, such as checking for exact match with "dcterms:title" or using word boundaries. Consider using a more precise match like interface.lower() == "title" or re.search(r'\btitle\b', interface.lower()).
| if "dcterms:title" in interface or "title" in interface.lower(): | |
| interface_lower = interface.lower() | |
| if interface_lower == "dcterms:title" or interface_lower == "title": |
| for excerpt_field in excerpt_fields: | ||
| excerpt_value = instance.get(excerpt_field) | ||
| if not excerpt_value: | ||
| continue | ||
|
|
||
| for ref_field in reference_fields: | ||
| ref_value = instance.get(ref_field) | ||
| if ref_value: | ||
| reference_id = self._extract_reference_id(ref_value) | ||
| # Get title from title field or from reference dict | ||
| expected_title = None | ||
| for title_field in title_fields: | ||
| title_value = instance.get(title_field) | ||
| if title_value: | ||
| expected_title = title_value | ||
| break | ||
| if not expected_title: | ||
| expected_title = self._extract_title(ref_value) | ||
| if reference_id: | ||
| validated_with_excerpt = True | ||
| yield from self._validate_excerpt( | ||
| excerpt_value, | ||
| reference_id, | ||
| expected_title, | ||
| f"{path}.{excerpt_field}" if path else excerpt_field, | ||
| ) |
Copilot
AI
Jan 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested loop structure could cause duplicate validations if an instance has multiple reference fields with values. For example, if both "reference" and "reference_id" are present and have values, the same excerpt would be validated against both references. Consider breaking after the first successful reference field match, or validating only against the first available reference field to avoid redundant validations.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
- Add new how-to guide: docs/how-to/validate-titles.md - Add title validation section to docs/concepts/how-it-works.md - Update index.md and quickstart.md to mention title validation - Add validate-titles.md to mkdocs.yml navigation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <[email protected]>
|
Claude finished @cmungall's task —— View job PR Review: Title Validation ImplementationTodo List
Code Quality & Best Practices ✅Overall, this is a well-structured implementation with good separation of concerns: Strengths:
Minor Suggestions:
Potential Bugs & Issues
|
Summary
validate_title()method inSupportingTextValidatorfor exact title matching after normalization_find_title_fields()in plugin to discover slots implementingdcterms:titleor havingslot_uridcterms:title_validate_title()in plugin to yield validation results for title mismatches_validate_instance()to validate titles from dedicated title fields, either combined with excerpt validation or standaloneTitle Validation Behavior
Title validation uses exact matching (not substring like excerpt) but allows for minor orthographic variation:
JAK1matchesjak1)Cell-SignalingmatchesCell Signaling)α-cateninmatchesalpha-catenin)Schema Usage
Slots can be marked for title validation using:
Or via
slot_uri:Test plan
tests/test_title_validation.pyjust test)🤖 Generated with Claude Code