-
Notifications
You must be signed in to change notification settings - Fork 38
Description
You are adding first-class community support to py3plex so users can query, aggregate, explain, compare, and track communities as a native DSL target—on par with nodes and edges.
Hard constraints
❌ Do NOT create any new markdown files.
✅ Keep AGENTS.md up to date by editing it in place (add sections + examples; don’t let it drift).
Preserve backward compatibility (legacy DSL and existing APIs must keep working).
Reuse the existing DSL v2 AST → executor → QueryResult architecture (do not create a parallel implementation).
All new public behavior must have tests.
- Define the “Community” Data Model (core)
Todos
Introduce a canonical CommunityID concept:
stable within an execution result (int or str)
support hierarchical IDs later (optional, but don’t block it)
Define a CommunityRecord schema (internal dataclass/TypedDict):
community_id
layer_scope (single layer / multi-layer / aggregated)
members (list of nodes (id, layer) tuples)
size
intra_edges, inter_edges (counts)
density_intra, cut_size or boundary_edges
modularity_contribution (if available from algorithm)
metadata dict for algorithm-specific fields (resolution gamma, seed, etc.)
Store community assignments in a normalized place:
Prefer: network.assign_partition(partition_vector, *, name="louvain", meta=...)
Ensure existing assign_partition remains compatible
- DSL v2: Add Q.communities() as a First-Class Target
API Surface
Implement Q.communities() returning a CommunityQueryBuilder.
Support the same ergonomic chain features as nodes/edges:
.where(...)
.compute(...)
.group_by(...)
.per_layer() (community-per-layer grouping)
.limit(), .order_by()
.execute(net)
Filtering (where)
Support community-level predicates:
size__gt, size__between, density_intra__lt, cut_size__gt
layer__eq or from_layers(L["social"] + L["work"]) semantics for community scope
membership predicates:
.where(has_member="Alice") (node id)
.where(has_member=("Alice","social")) (node+layer tuple)
.where(member_type__eq="gene") (if node attrs exist)
stability predicates (if UQ / multi-run partition exists):
stability__gt
consensus__gt (if you compute co-assignment frequency)
Computation (compute)
Add community-level metrics:
size
intra_edges, inter_edges
conductance / normalized_cut
density_intra
modularity_contribution (if algorithm provides; else compute approx)
hub_nodes_top_k (top nodes by centrality within community)
enrichment hooks (optional stub): integrate with attribute tables later
Grouping
.per_layer() on communities:
groups communities by dominant layer scope or layer-specific partitions
.group_by("layer"), .group_by("algorithm"):
for comparing community properties across algorithms/seeds
- DSL v1 (Legacy String DSL) Compatibility
Todos
Extend parser to allow:
SELECT communities WHERE ... COMPUTE ...
Ensure legacy queries compile to the same AST as DSL v2.
Provide meaningful errors:
DSLSyntaxError for unsupported tokens
domain exceptions for missing partitions, missing algorithms, etc.
- AST & Executor: Implement Community Target End-to-End
AST
Add a new target type: Target.COMMUNITIES
Add nodes for community predicates and aggregations (reuse existing predicate/agg nodes where possible)
Ensure community operations can be combined with:
layer selection
grouping
aggregation
ordering / limiting
Executor
Add an execution pathway:
-
resolve available partition(s) on network (default, named, or supplied)
-
build CommunityRecord objects
-
apply filters
-
compute requested community metrics
-
apply grouping/aggregation/order/limit
Make it robust to:
networks with no partition assigned (error with actionable message)
multiplex vs multilayer semantics (community across replicas vs per-layer partition)
- QueryResult: Community-Friendly Results
Todos
Extend QueryResult to support a new payload:
.communities list of CommunityRecord (or IDs + computed fields)
Implement:
to_pandas() for communities (one row per community)
expand_explanations=True support if you later add .explain()
group_summary() for community groupings
Make sure QueryResult remains backward compatible for nodes/edges.
- Bridges Between Targets (Ergonomics = Adoption)
Must-have convenience methods
Q.communities().members() → returns a NodeQueryBuilder scoped to selected communities
supports: .compute(...), .where(...) etc on member nodes
Q.communities().boundary_edges() → returns an EdgeQueryBuilder representing cut edges
Q.nodes().community() or .with_community():
annotate node results with community_id (like a join)
Q.edges().within_community() and Q.edges().between_communities():
common slicing pattern
- Pipelines Integration (Communities as Pipeline Artifacts)
Todos
Ensure pipeline community detection steps store:
partition vector
algorithm metadata
optional quality scores (modularity, log-likelihood for SBM, etc.)
Add a pipeline step (or extend existing) to produce a CommunityStats artifact compatible with Q.communities().
Ensure pipeline outputs can be queried directly (or via network metadata).
- Uncertainty & Stability (Optional MVP+, but design now)
MVP approach
Support multiple partitions stored on network:
assign_partition(..., name=f"louvain_seed_{s}")
Implement Q.communities().uq(...) semantics for:
partition stability across seeds/perturbations
consensus communities (basic co-assignment frequency)
Expose simple stability metrics:
community member Jaccard vs consensus
average co-assignment score
(If too big, ship hooks + metadata first, implement full stability later—but keep API stable.)
- Testing Requirements
Todos
Unit tests:
Q.communities() basic execution with an assigned partition
filters on size/density/cut
.members() returns correct nodes
.boundary_edges() returns correct cut edges
.to_pandas() shape and columns
Parity tests:
node query annotated with communities matches community members
Legacy DSL tests:
SELECT communities ... matches DSL v2 results
Error tests:
missing partition assigned → raises domain-specific exception with clear message
- Documentation: Update AGENTS.md In Place (No New Markdown)
Required edits
Add a new section: “Community Queries (First-Class Communities)”
Include at least 3 examples:
-
basic community selection + compute
-
members() bridge to nodes
-
boundary_edges() bridge to edges
Update any “current limitations” language if it references community gaps.
- Examples (No new .md; code examples ok if repo has examples/)
Todos
Add or update a Python example under existing examples/ structure (if allowed):
demonstrate end-to-end: detect communities → Q.communities() → members → boundary edges → aggregation
Ensure examples don’t require optional deps unless guarded.
Definition of Done
Q.communities() works end-to-end with filters, compute, ordering, and to_pandas().
Bridges to nodes/edges work (members(), boundary_edges(), node annotation).
Legacy DSL supports SELECT communities ....
Tests cover core behaviors and errors.
AGENTS.md updated in place; no new markdown files created.