Skip to content

dsl com det #983

@SkBlaz

Description

@SkBlaz

You are adding first-class community support to py3plex so users can query, aggregate, explain, compare, and track communities as a native DSL target—on par with nodes and edges.

Hard constraints

❌ Do NOT create any new markdown files.

✅ Keep AGENTS.md up to date by editing it in place (add sections + examples; don’t let it drift).

Preserve backward compatibility (legacy DSL and existing APIs must keep working).

Reuse the existing DSL v2 AST → executor → QueryResult architecture (do not create a parallel implementation).

All new public behavior must have tests.


  1. Define the “Community” Data Model (core)

Todos

Introduce a canonical CommunityID concept:

stable within an execution result (int or str)

support hierarchical IDs later (optional, but don’t block it)

Define a CommunityRecord schema (internal dataclass/TypedDict):

community_id

layer_scope (single layer / multi-layer / aggregated)

members (list of nodes (id, layer) tuples)

size

intra_edges, inter_edges (counts)

density_intra, cut_size or boundary_edges

modularity_contribution (if available from algorithm)

metadata dict for algorithm-specific fields (resolution gamma, seed, etc.)

Store community assignments in a normalized place:

Prefer: network.assign_partition(partition_vector, *, name="louvain", meta=...)

Ensure existing assign_partition remains compatible


  1. DSL v2: Add Q.communities() as a First-Class Target

API Surface

Implement Q.communities() returning a CommunityQueryBuilder.

Support the same ergonomic chain features as nodes/edges:

.where(...)

.compute(...)

.group_by(...)

.per_layer() (community-per-layer grouping)

.limit(), .order_by()

.execute(net)

Filtering (where)

Support community-level predicates:

size__gt, size__between, density_intra__lt, cut_size__gt

layer__eq or from_layers(L["social"] + L["work"]) semantics for community scope

membership predicates:

.where(has_member="Alice") (node id)

.where(has_member=("Alice","social")) (node+layer tuple)

.where(member_type__eq="gene") (if node attrs exist)

stability predicates (if UQ / multi-run partition exists):

stability__gt

consensus__gt (if you compute co-assignment frequency)

Computation (compute)

Add community-level metrics:

size

intra_edges, inter_edges

conductance / normalized_cut

density_intra

modularity_contribution (if algorithm provides; else compute approx)

hub_nodes_top_k (top nodes by centrality within community)

enrichment hooks (optional stub): integrate with attribute tables later

Grouping

.per_layer() on communities:

groups communities by dominant layer scope or layer-specific partitions

.group_by("layer"), .group_by("algorithm"):

for comparing community properties across algorithms/seeds


  1. DSL v1 (Legacy String DSL) Compatibility

Todos

Extend parser to allow:

SELECT communities WHERE ... COMPUTE ...

Ensure legacy queries compile to the same AST as DSL v2.

Provide meaningful errors:

DSLSyntaxError for unsupported tokens

domain exceptions for missing partitions, missing algorithms, etc.


  1. AST & Executor: Implement Community Target End-to-End

AST

Add a new target type: Target.COMMUNITIES

Add nodes for community predicates and aggregations (reuse existing predicate/agg nodes where possible)

Ensure community operations can be combined with:

layer selection

grouping

aggregation

ordering / limiting

Executor

Add an execution pathway:

  1. resolve available partition(s) on network (default, named, or supplied)

  2. build CommunityRecord objects

  3. apply filters

  4. compute requested community metrics

  5. apply grouping/aggregation/order/limit

Make it robust to:

networks with no partition assigned (error with actionable message)

multiplex vs multilayer semantics (community across replicas vs per-layer partition)


  1. QueryResult: Community-Friendly Results

Todos

Extend QueryResult to support a new payload:

.communities list of CommunityRecord (or IDs + computed fields)

Implement:

to_pandas() for communities (one row per community)

expand_explanations=True support if you later add .explain()

group_summary() for community groupings

Make sure QueryResult remains backward compatible for nodes/edges.


  1. Bridges Between Targets (Ergonomics = Adoption)

Must-have convenience methods

Q.communities().members() → returns a NodeQueryBuilder scoped to selected communities

supports: .compute(...), .where(...) etc on member nodes

Q.communities().boundary_edges() → returns an EdgeQueryBuilder representing cut edges

Q.nodes().community() or .with_community():

annotate node results with community_id (like a join)

Q.edges().within_community() and Q.edges().between_communities():

common slicing pattern


  1. Pipelines Integration (Communities as Pipeline Artifacts)

Todos

Ensure pipeline community detection steps store:

partition vector

algorithm metadata

optional quality scores (modularity, log-likelihood for SBM, etc.)

Add a pipeline step (or extend existing) to produce a CommunityStats artifact compatible with Q.communities().

Ensure pipeline outputs can be queried directly (or via network metadata).


  1. Uncertainty & Stability (Optional MVP+, but design now)

MVP approach

Support multiple partitions stored on network:

assign_partition(..., name=f"louvain_seed_{s}")

Implement Q.communities().uq(...) semantics for:

partition stability across seeds/perturbations

consensus communities (basic co-assignment frequency)

Expose simple stability metrics:

community member Jaccard vs consensus

average co-assignment score

(If too big, ship hooks + metadata first, implement full stability later—but keep API stable.)


  1. Testing Requirements

Todos

Unit tests:

Q.communities() basic execution with an assigned partition

filters on size/density/cut

.members() returns correct nodes

.boundary_edges() returns correct cut edges

.to_pandas() shape and columns

Parity tests:

node query annotated with communities matches community members

Legacy DSL tests:

SELECT communities ... matches DSL v2 results

Error tests:

missing partition assigned → raises domain-specific exception with clear message


  1. Documentation: Update AGENTS.md In Place (No New Markdown)

Required edits

Add a new section: “Community Queries (First-Class Communities)”

Include at least 3 examples:

  1. basic community selection + compute

  2. members() bridge to nodes

  3. boundary_edges() bridge to edges

Update any “current limitations” language if it references community gaps.


  1. Examples (No new .md; code examples ok if repo has examples/)

Todos

Add or update a Python example under existing examples/ structure (if allowed):

demonstrate end-to-end: detect communities → Q.communities() → members → boundary edges → aggregation

Ensure examples don’t require optional deps unless guarded.


Definition of Done

Q.communities() works end-to-end with filters, compute, ordering, and to_pandas().

Bridges to nodes/edges work (members(), boundary_edges(), node annotation).

Legacy DSL supports SELECT communities ....

Tests cover core behaviors and errors.

AGENTS.md updated in place; no new markdown files created.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions