-
Notifications
You must be signed in to change notification settings - Fork 41
Implement merge-concrete-entities function and add some tests #153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I'm too tired to review this evening but at least all the tests pass. |
gvanrossum
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ought to be consistent with the TS version at
https://github.com/microsoft/TypeAgent/blob/120e280c7b2acfeb499b5bbc5d78cea23f75b531/ts/packages/knowPro/src/knowledgeMerge.ts#L144C8-L144C38
I'm experimenting with copilot review -- hopefully it can do the check for us.
This comment was marked as duplicate.
This comment was marked as duplicate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements the previously unimplemented merge_concrete_entities function, which merges a list of concrete entities by name, combining their types and facets. The implementation includes comprehensive test coverage with 9 test cases covering various merging scenarios.
- Replaces
NotImplementedErrorstub with working implementation using a dictionary-based merging strategy - Adds helper classes and functions (
_MergedEntity,_facets_to_merged,_merged_to_facets) to support the merging logic - Implements comprehensive test suite covering edge cases including empty lists, single entities, distinct entities, case sensitivity, type deduplication, and facet merging
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/typeagent/knowpro/knowledge.py | Implements merge_concrete_entities function with helper classes and functions for merging entities by name, combining types (sorted and deduplicated) and facets (concatenated with "; " separator) |
| tests/test_knowledge.py | Adds 9 comprehensive test cases covering various merge scenarios including empty inputs, single entities, distinct entities, case sensitivity, type/facet merging, and deduplication |
|
@copilot Please review this function carefully. It should be equivalent in semantics to the TypeScript function mergeConcreteEntities defined here: |
|
@bmerkle Feel free to ignore or implement Copilot's suggestions (or let it implement some itself). I am still waiting for it to compare the Python version to the TypeScript version. |
refactored based on @copilot suggestion Co-authored-by: Copilot <[email protected]>
- adopted implementation and tests to be caseinsensitive (as in the TS reference app)
| @dataclass | ||
| class _MergedEntity: | ||
| """Internal helper for merging entities.""" | ||
|
|
||
| name: str | ||
| types: set[str] | ||
| facets: dict[str, set[str]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd move this before merge_concrete_entities.
My general ideal ordering is: functions top-down, but classes ahead of functions (or at least ahead of the first function that references it).
src/typeagent/knowpro/knowledge.py
Outdated
| def _add_facet_to_merged( | ||
| merged: dict[str, set[str]], facet: kplib.Facet | ||
| ) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My version of black collapses this to one line.
| def merge_concrete_entities( | ||
| entities: list[kplib.ConcreteEntity], | ||
| ) -> list[kplib.ConcreteEntity]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could make the lower-casing optional, controlled by an optional argument. That argument could be a Callable[[str], str] and have a default of str.lower. (That should work -- or otherwise lambda x: x.lower().)
| if existing is None: | ||
| # First occurrence - create new merged entity | ||
| merged[name_key] = _MergedEntity( | ||
| name=entity.name.lower(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| name=entity.name.lower(), | |
| name=name_key, |
| Complex types like Quantity and Quantifier use their __str__ representation. | ||
| """ | ||
| return str(value).lower() if value else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would return "" if the value is 0 or 0.0.
| return str(value).lower() if value else "" | |
| return str(value).lower() if value is not None else "" |
| def _merged_to_facets(merged_facets: dict[str, set[str]]) -> list[kplib.Facet]: | ||
| """Convert a merged facets dict back to a list of Facets.""" | ||
| facets = [] | ||
| for name, values in merged_facets.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe also sort the names?
| for name, values in merged_facets.items(): | |
| for name, values in sorted(merged_facets.items()): |
Or maybe
| for name, values in merged_facets.items(): | |
| for name in sorted(merged_facets): | |
| values = merged_value[name] |
| if existing is None: | ||
| # First occurrence - create new merged entity | ||
| merged[name_key] = _MergedEntity( | ||
| name=entity.name.lower(), | ||
| types=set(t.lower() for t in entity.type), | ||
| facets=_facets_to_merged(entity.facets) if entity.facets else {}, | ||
| ) | ||
| else: | ||
| # Merge into existing | ||
| existing.types.update(t.lower() for t in entity.type) | ||
| if entity.facets: | ||
| _merge_facets(existing.facets, entity.facets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider creating only an "empty" _MergedEntity and taking the else flow always. Then you could drop _facets_to_merged().
| if existing is None: | |
| # First occurrence - create new merged entity | |
| merged[name_key] = _MergedEntity( | |
| name=entity.name.lower(), | |
| types=set(t.lower() for t in entity.type), | |
| facets=_facets_to_merged(entity.facets) if entity.facets else {}, | |
| ) | |
| else: | |
| # Merge into existing | |
| existing.types.update(t.lower() for t in entity.type) | |
| if entity.facets: | |
| _merge_facets(existing.facets, entity.facets) | |
| if existing is None: | |
| # First occurrence - create new merged entity | |
| merged[name_key] = _MergedEntity( | |
| name=name_key, | |
| types=set(), | |
| facets={}, | |
| ) | |
| # Merge into existing | |
| existing.types.update(t.lower() for t in entity.type) | |
| if entity.facets: | |
| _merge_facets(existing.facets, entity.facets) |
| facets: dict[str, set[str]] | ||
|
|
||
|
|
||
| def _facet_value_to_string(value: kplib.Value | None) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd get rid of this function and just inline it.
first implementation of missing merge-concrete-entities function
please have a look if the semantic and implementation is ok.
if yes, we could also add some further test cases.