Skip to content

Conversation

@bmerkle
Copy link
Contributor

@bmerkle bmerkle commented Jan 5, 2026

first implementation of missing merge-concrete-entities function
please have a look if the semantic and implementation is ok.
if yes, we could also add some further test cases.

@gvanrossum
Copy link
Collaborator

I'm too tired to review this evening but at least all the tests pass.

Copy link
Collaborator

@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ought to be consistent with the TS version at
https://github.com/microsoft/TypeAgent/blob/120e280c7b2acfeb499b5bbc5d78cea23f75b531/ts/packages/knowPro/src/knowledgeMerge.ts#L144C8-L144C38

I'm experimenting with copilot review -- hopefully it can do the check for us.

@gvanrossum gvanrossum marked this pull request as draft January 6, 2026 22:13
@gvanrossum

This comment was marked as duplicate.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements the previously unimplemented merge_concrete_entities function, which merges a list of concrete entities by name, combining their types and facets. The implementation includes comprehensive test coverage with 9 test cases covering various merging scenarios.

  • Replaces NotImplementedError stub with working implementation using a dictionary-based merging strategy
  • Adds helper classes and functions (_MergedEntity, _facets_to_merged, _merged_to_facets) to support the merging logic
  • Implements comprehensive test suite covering edge cases including empty lists, single entities, distinct entities, case sensitivity, type deduplication, and facet merging

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
src/typeagent/knowpro/knowledge.py Implements merge_concrete_entities function with helper classes and functions for merging entities by name, combining types (sorted and deduplicated) and facets (concatenated with "; " separator)
tests/test_knowledge.py Adds 9 comprehensive test cases covering various merge scenarios including empty inputs, single entities, distinct entities, case sensitivity, type/facet merging, and deduplication

@gvanrossum
Copy link
Collaborator

gvanrossum commented Jan 6, 2026

@copilot Please review this function carefully. It should be equivalent in semantics to the TypeScript function mergeConcreteEntities defined here:
https://github.com/microsoft/TypeAgent/blob/120e280c7b2acfeb499b5bbc5d78cea23f75b531/ts/packages/knowPro/src/knowledgeMerge.ts#L144C8-L144C38

@gvanrossum
Copy link
Collaborator

@bmerkle Feel free to ignore or implement Copilot's suggestions (or let it implement some itself). I am still waiting for it to compare the Python version to the TypeScript version.

bmerkle and others added 2 commits January 7, 2026 00:58
refactored based on @copilot suggestion

Co-authored-by: Copilot <[email protected]>
- adopted implementation and tests to be caseinsensitive (as in the TS reference app)
Comment on lines +131 to +137
@dataclass
class _MergedEntity:
"""Internal helper for merging entities."""

name: str
types: set[str]
facets: dict[str, set[str]]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move this before merge_concrete_entities.

My general ideal ordering is: functions top-down, but classes ahead of functions (or at least ahead of the first function that references it).

Comment on lines 148 to 150
def _add_facet_to_merged(
merged: dict[str, set[str]], facet: kplib.Facet
) -> None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My version of black collapses this to one line.

Comment on lines 75 to 77
def merge_concrete_entities(
entities: list[kplib.ConcreteEntity],
) -> list[kplib.ConcreteEntity]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could make the lower-casing optional, controlled by an optional argument. That argument could be a Callable[[str], str] and have a default of str.lower. (That should work -- or otherwise lambda x: x.lower().)

if existing is None:
# First occurrence - create new merged entity
merged[name_key] = _MergedEntity(
name=entity.name.lower(),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
name=entity.name.lower(),
name=name_key,

Complex types like Quantity and Quantifier use their __str__ representation.
"""
return str(value).lower() if value else ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would return "" if the value is 0 or 0.0.

Suggested change
return str(value).lower() if value else ""
return str(value).lower() if value is not None else ""

def _merged_to_facets(merged_facets: dict[str, set[str]]) -> list[kplib.Facet]:
"""Convert a merged facets dict back to a list of Facets."""
facets = []
for name, values in merged_facets.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also sort the names?

Suggested change
for name, values in merged_facets.items():
for name, values in sorted(merged_facets.items()):

Or maybe

Suggested change
for name, values in merged_facets.items():
for name in sorted(merged_facets):
values = merged_value[name]

Comment on lines +104 to +115
if existing is None:
# First occurrence - create new merged entity
merged[name_key] = _MergedEntity(
name=entity.name.lower(),
types=set(t.lower() for t in entity.type),
facets=_facets_to_merged(entity.facets) if entity.facets else {},
)
else:
# Merge into existing
existing.types.update(t.lower() for t in entity.type)
if entity.facets:
_merge_facets(existing.facets, entity.facets)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider creating only an "empty" _MergedEntity and taking the else flow always. Then you could drop _facets_to_merged().

Suggested change
if existing is None:
# First occurrence - create new merged entity
merged[name_key] = _MergedEntity(
name=entity.name.lower(),
types=set(t.lower() for t in entity.type),
facets=_facets_to_merged(entity.facets) if entity.facets else {},
)
else:
# Merge into existing
existing.types.update(t.lower() for t in entity.type)
if entity.facets:
_merge_facets(existing.facets, entity.facets)
if existing is None:
# First occurrence - create new merged entity
merged[name_key] = _MergedEntity(
name=name_key,
types=set(),
facets={},
)
# Merge into existing
existing.types.update(t.lower() for t in entity.type)
if entity.facets:
_merge_facets(existing.facets, entity.facets)

facets: dict[str, set[str]]


def _facet_value_to_string(value: kplib.Value | None) -> str:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd get rid of this function and just inline it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants