Implement inductive claim learning with DSL integration and counterexample falsification #1012

Copilot · 2026-01-07T05:23:41Z

Adds automated discovery of interpretable network claims (e.g., degree >= k → pagerank_rank <= r) from multilayer network data with statistical support/coverage measures and lazy counterexample integration.

Implementation

Core Package (py3plex/claims/)

types.py: Immutable Claim/Antecedent/Consequent dataclasses with evaluation and DSL round-trip
generator.py: Quantile-based candidate enumeration (threshold/rank/top_p predicates)
scorer.py: Support (P(B|A)) and coverage (P(A)) calculation with deterministic ranking
learner.py: Orchestrates DSL metric computation → candidate generation → scoring → provenance

DSL Integration

ClaimLearnerBuilder with chainable API: .from_metrics(), .layers(), .min_support(), .seed()
Entry point: Q.learn_claims() following existing builder patterns
Lazy Claim.counterexample(net) invokes engine Align comment formatting throughout codebase #34 on demand

Determinism & Provenance

Same seed → identical ordered claims (deterministic quantiles, sorting, tie-breaking)
Full provenance: network fingerprint, metrics, parameters, timestamp, seed (JSON-serializable)

Example Usage

from py3plex.dsl import Q

claims = (
    Q.learn_claims()
     .from_metrics(["degree", "pagerank", "betweenness_centrality"])
     .layers(L["social"])
     .min_support(0.9)      # P(consequent | antecedent) >= 0.9
     .min_coverage(0.05)    # P(antecedent) >= 0.05
     .max_claims(20)
     .seed(42)
     .execute(net)
)

for claim in claims:
    print(claim.claim_string)  # "degree__gte(10.0) -> pagerank__rank_lte(50)"
    cex = claim.counterexample(net, seed=42)  # Lazy falsification

Testing

25 unit tests covering generation, scoring, DSL integration, error handling
2 Hypothesis property tests: determinism across runs, support/coverage soundness
AGENTS.md updated with comprehensive usage patterns and interpretation warnings

Design Constraints Met

Minimal public API (single Q.learn_claims() entry)
py3plex exceptions only (ClaimLearningError extends AlgorithmError)
No heavy dependencies (numpy already present)
Explicit, interpretable algorithms (quantile thresholds, lexicographic tie-breaking)
Claims are hypotheses, not truths—designed for falsification

Original prompt

This section details on the original issue you should resolve

<issue_title>ilp</issue_title>
<issue_description>You are GitHub Copilot working inside the py3plex repository. Implement “Learning Claims from Data” (feature #39) as a first-class, inductive reasoning capability that integrates tightly with the existing DSL v2 and the Counterexample Engine (#34).

NON-NEGOTIABLE REPO CONSTRAINTS

Do NOT add any new top-level .md files.
You MUST update the existing AGENTS.md (follow its tone/structure) and relevant .rst docs under docfiles/.
Add unit tests + at least 2 Hypothesis property tests.
Deterministic behavior given seed; record full provenance.
Use py3plex domain exceptions only (py3plex.exceptions).
Minimal public API; hide machinery in internal modules.
No heavy new dependencies; keep algorithms explicit and interpretable.

FEATURE GOAL
Automatically discover plausible, interpretable implication-style claims from multilayer network data, such as:

degree ≥ k → pagerank_rank ≤ r

Claims are:

executable (DSL-compatible),
falsifiable via the Counterexample Engine (Align comment formatting throughout codebase #34),
ranked by statistical support and coverage,
auditable via provenance.

This is inductive rule discovery for networks, not correlation mining.

MVP SCOPE (must ship)

Learn single-antecedent → single-consequent claims over node metrics.
Antecedents: simple threshold predicates on cheap metrics:
degree, strength, layer_count, top_p(metric)
Consequents: threshold or rank predicates on one metric:
pagerank, betweenness_centrality
Layer-restricted learning supported (via L[...] expressions).
Claims returned as executable objects with .claim_string.
Lazy integration with counterexamples (Align comment formatting throughout codebase #34): claims can be falsified on demand.

PUBLIC API (DSL v2–consistent)

Primary entry point:

from py3plex.dsl import Q

claims = (
    Q.learn_claims()
     .from_metrics(["degree", "pagerank", "betweenness"])
     .layers(L["ppi"] + L["coexpression"])   # optional
     .min_support(0.9)
     .min_coverage(0.05)
     .max_antecedents(1)
     .max_claims(20)
     .seed(42)
     .execute(net)
)

Returned object:

iterable of Claim objects, sorted by rank Each Claim exposes:

claim.antecedent

claim.consequent

claim.support        # P(consequent | antecedent)

claim.coverage       # P(antecedent)

claim.claim_string   # DSL-compatible string

claim.counterexample()  # invokes SkBlaz/py3plex#34 lazily

claim.to_dict()


Round-trip guarantee:

Q.counterexample().claim(claim.claim_string).execute(net)

ARCHITECTURE / FILE LAYOUT Create package: py3plex/claims/

types.py

dataclasses: Claim, Antecedent, Consequent, ClaimScore


predicates.py

canonical predicate representations

threshold predicates, rank predicates

stable string rendering to DSL form


generator.py

enumerate candidate antecedents & consequents

choose thresholds from quantiles (e.g. 70/80/90%)


scorer.py

compute support, coverage

deterministic ranking logic


learner.py

orchestrates metric computation, candidate enumeration, scoring

applies pruning rules


integration.py (optional)

hooks into counterexample engine (#34)



DSL integration:

Add ClaimLearnerBuilder behind Q.learn_claims()

Builder methods:

from_metrics(list[str])

layers(L expression)

min_support(float)

min_coverage(float)

max_antecedents(int)   # MVP: only 1 supported; validate input

max_claims(int)

seed(int)

execute(net)



NO new DSL syntax — reuse claim strings compatible with SkBlaz/py3plex#34.

ALGORITHMIC DETAILS (REQUIRED)

1. Metric computation



Compute required metrics ONCE using DSL v2: Q.nodes().from_layers(...).compute(...)

Cache results with provenance.

Deterministic ordering of nodes (id, layer).


2. Candidate antecedent generation



For each metric m in cheap_metrics: thresholds = quantiles(m, [0.7, 0.8, 0.9]) predicates: m >= t

Additional simple predicates: layer_count >= k top_p(m, p) where p ∈ {0.05, 0.1, 0.2}

Keep candidate set small and interpretable.


3. Candidate consequent generation



For each metric m in target_metrics: value-threshold consequents: m >= t (quantiles) rank-based consequents: rank(m) <= r (e.g. 20, 50, 100)


4. Scoring implications For each candidate (A → B):



support = |{v : A(v) and B(v)}| / |{v : A(v)}|

coverage = |{v : A(v)}| / N

Discard if: support < min_support coverage < min_coverage

Store ClaimScore with deterministic float rounding.


5. Ranking claims Primary sort:



1. support (desc)


2. coverage (desc)


3. antecedent simplicity (prefer fewer terms)


4. stable tie-break: claim_string lexicographic



6. Counterexample integration (lazy)



Do NOT eagerly compute counterexamples.

Each Claim exposes: claim.counterexample(net, **kwargs) which internally calls SkBlaz/py3plex#34.


PROVENANCE REQUIREMENTS Each Claim must include meta["provenance"]:

engine: "claim_learner"

py3plex_version

timestamp_utc

network_fingerprint (node/edge/layer counts)

metrics_used

thr...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes SkBlaz/py3plex#1011

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

Co-authored-by: SkBlaz <[email protected]>

Initial plan

4d64ce2

Copilot AI assigned Copilot and SkBlaz Jan 7, 2026

Copilot started work on behalf of SkBlaz January 7, 2026 05:24 View session

Copilot AI and others added 3 commits January 7, 2026 05:31

Implement core claim learning infrastructure (Phases 1-6)

08c0a1d

Co-authored-by: SkBlaz <[email protected]>

Add comprehensive test suite for claim learning (Phase 7)

b75d78e

Co-authored-by: SkBlaz <[email protected]>

Add comprehensive documentation for Learning Claims (Phase 8)

b851c33

Co-authored-by: SkBlaz <[email protected]>

Copilot AI changed the title ~~[WIP] Implement learning claims from data feature in DSL v2~~ Implement inductive claim learning with DSL integration and counterexample falsification Jan 7, 2026

Copilot AI requested a review from SkBlaz January 7, 2026 05:44

Copilot finished work on behalf of SkBlaz January 7, 2026 05:44

SkBlaz marked this pull request as ready for review January 7, 2026 10:11

SkBlaz merged commit 3d6e614 into master Jan 7, 2026
31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement inductive claim learning with DSL integration and counterexample falsification #1012

Implement inductive claim learning with DSL integration and counterexample falsification #1012

Uh oh!

Copilot AI commented Jan 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Implement inductive claim learning with DSL integration and counterexample falsification #1012

Implement inductive claim learning with DSL integration and counterexample falsification #1012

Uh oh!

Conversation

Copilot AI commented Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Implementation

Example Usage

Testing

Design Constraints Met

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jan 7, 2026 •

edited

Loading