Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Jan 7, 2026

Adds automated discovery of interpretable network claims (e.g., degree >= k → pagerank_rank <= r) from multilayer network data with statistical support/coverage measures and lazy counterexample integration.

Implementation

Core Package (py3plex/claims/)

  • types.py: Immutable Claim/Antecedent/Consequent dataclasses with evaluation and DSL round-trip
  • generator.py: Quantile-based candidate enumeration (threshold/rank/top_p predicates)
  • scorer.py: Support (P(B|A)) and coverage (P(A)) calculation with deterministic ranking
  • learner.py: Orchestrates DSL metric computation → candidate generation → scoring → provenance

DSL Integration

  • ClaimLearnerBuilder with chainable API: .from_metrics(), .layers(), .min_support(), .seed()
  • Entry point: Q.learn_claims() following existing builder patterns
  • Lazy Claim.counterexample(net) invokes engine Align comment formatting throughout codebase #34 on demand

Determinism & Provenance

  • Same seed → identical ordered claims (deterministic quantiles, sorting, tie-breaking)
  • Full provenance: network fingerprint, metrics, parameters, timestamp, seed (JSON-serializable)

Example Usage

from py3plex.dsl import Q

claims = (
    Q.learn_claims()
     .from_metrics(["degree", "pagerank", "betweenness_centrality"])
     .layers(L["social"])
     .min_support(0.9)      # P(consequent | antecedent) >= 0.9
     .min_coverage(0.05)    # P(antecedent) >= 0.05
     .max_claims(20)
     .seed(42)
     .execute(net)
)

for claim in claims:
    print(claim.claim_string)  # "degree__gte(10.0) -> pagerank__rank_lte(50)"
    cex = claim.counterexample(net, seed=42)  # Lazy falsification

Testing

  • 25 unit tests covering generation, scoring, DSL integration, error handling
  • 2 Hypothesis property tests: determinism across runs, support/coverage soundness
  • AGENTS.md updated with comprehensive usage patterns and interpretation warnings

Design Constraints Met

  • Minimal public API (single Q.learn_claims() entry)
  • py3plex exceptions only (ClaimLearningError extends AlgorithmError)
  • No heavy dependencies (numpy already present)
  • Explicit, interpretable algorithms (quantile thresholds, lexicographic tie-breaking)
  • Claims are hypotheses, not truths—designed for falsification
Original prompt

This section details on the original issue you should resolve

<issue_title>ilp</issue_title>
<issue_description>You are GitHub Copilot working inside the py3plex repository. Implement “Learning Claims from Data” (feature #39) as a first-class, inductive reasoning capability that integrates tightly with the existing DSL v2 and the Counterexample Engine (#34).

NON-NEGOTIABLE REPO CONSTRAINTS

  • Do NOT add any new top-level .md files.
  • You MUST update the existing AGENTS.md (follow its tone/structure) and relevant .rst docs under docfiles/.
  • Add unit tests + at least 2 Hypothesis property tests.
  • Deterministic behavior given seed; record full provenance.
  • Use py3plex domain exceptions only (py3plex.exceptions).
  • Minimal public API; hide machinery in internal modules.
  • No heavy new dependencies; keep algorithms explicit and interpretable.

FEATURE GOAL
Automatically discover plausible, interpretable implication-style claims from multilayer network data, such as:

degree ≥ k → pagerank_rank ≤ r

Claims are:

This is inductive rule discovery for networks, not correlation mining.

MVP SCOPE (must ship)

  • Learn single-antecedent → single-consequent claims over node metrics.
  • Antecedents: simple threshold predicates on cheap metrics:
    degree, strength, layer_count, top_p(metric)
  • Consequents: threshold or rank predicates on one metric:
    pagerank, betweenness_centrality
  • Layer-restricted learning supported (via L[...] expressions).
  • Claims returned as executable objects with .claim_string.
  • Lazy integration with counterexamples (Align comment formatting throughout codebase #34): claims can be falsified on demand.

PUBLIC API (DSL v2–consistent)

Primary entry point:

from py3plex.dsl import Q

claims = (
    Q.learn_claims()
     .from_metrics(["degree", "pagerank", "betweenness"])
     .layers(L["ppi"] + L["coexpression"])   # optional
     .min_support(0.9)
     .min_coverage(0.05)
     .max_antecedents(1)
     .max_claims(20)
     .seed(42)
     .execute(net)
)

Returned object:

iterable of Claim objects, sorted by rank Each Claim exposes:

claim.antecedent

claim.consequent

claim.support        # P(consequent | antecedent)

claim.coverage       # P(antecedent)

claim.claim_string   # DSL-compatible string

claim.counterexample()  # invokes SkBlaz/py3plex#34 lazily

claim.to_dict()


Round-trip guarantee:

Q.counterexample().claim(claim.claim_string).execute(net)

ARCHITECTURE / FILE LAYOUT Create package: py3plex/claims/

types.py

dataclasses: Claim, Antecedent, Consequent, ClaimScore


predicates.py

canonical predicate representations

threshold predicates, rank predicates

stable string rendering to DSL form


generator.py

enumerate candidate antecedents & consequents

choose thresholds from quantiles (e.g. 70/80/90%)


scorer.py

compute support, coverage

deterministic ranking logic


learner.py

orchestrates metric computation, candidate enumeration, scoring

applies pruning rules


integration.py (optional)

hooks into counterexample engine (#34)



DSL integration:

Add ClaimLearnerBuilder behind Q.learn_claims()

Builder methods:

from_metrics(list[str])

layers(L expression)

min_support(float)

min_coverage(float)

max_antecedents(int)   # MVP: only 1 supported; validate input

max_claims(int)

seed(int)

execute(net)



NO new DSL syntaxreuse claim strings compatible with SkBlaz/py3plex#34.

ALGORITHMIC DETAILS (REQUIRED)

1. Metric computation



Compute required metrics ONCE using DSL v2: Q.nodes().from_layers(...).compute(...)

Cache results with provenance.

Deterministic ordering of nodes (id, layer).


2. Candidate antecedent generation



For each metric m in cheap_metrics: thresholds = quantiles(m, [0.7, 0.8, 0.9]) predicates: m >= t

Additional simple predicates: layer_count >= k top_p(m, p) where p ∈ {0.05, 0.1, 0.2}

Keep candidate set small and interpretable.


3. Candidate consequent generation



For each metric m in target_metrics: value-threshold consequents: m >= t (quantiles) rank-based consequents: rank(m) <= r (e.g. 20, 50, 100)


4. Scoring implications For each candidate (AB):



support = |{v : A(v) and B(v)}| / |{v : A(v)}|

coverage = |{v : A(v)}| / N

Discard if: support < min_support coverage < min_coverage

Store ClaimScore with deterministic float rounding.


5. Ranking claims Primary sort:



1. support (desc)


2. coverage (desc)


3. antecedent simplicity (prefer fewer terms)


4. stable tie-break: claim_string lexicographic



6. Counterexample integration (lazy)



Do NOT eagerly compute counterexamples.

Each Claim exposes: claim.counterexample(net, **kwargs) which internally calls SkBlaz/py3plex#34.


PROVENANCE REQUIREMENTS Each Claim must include meta["provenance"]:

engine: "claim_learner"

py3plex_version

timestamp_utc

network_fingerprint (node/edge/layer counts)

metrics_used

thr...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes SkBlaz/py3plex#1011

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Copilot AI changed the title [WIP] Implement learning claims from data feature in DSL v2 Implement inductive claim learning with DSL integration and counterexample falsification Jan 7, 2026
Copilot AI requested a review from SkBlaz January 7, 2026 05:44
@SkBlaz SkBlaz marked this pull request as ready for review January 7, 2026 10:11
@SkBlaz SkBlaz merged commit 3d6e614 into master Jan 7, 2026
31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants