Skip to content

Implement Domain PAC validation#611

Open
p-senichenkov wants to merge 45 commits intoDesbordante:mainfrom
p-senichenkov:PAC-validation
Open

Implement Domain PAC validation#611
p-senichenkov wants to merge 45 commits intoDesbordante:mainfrom
p-senichenkov:PAC-validation

Conversation

@p-senichenkov
Copy link
Collaborator

Implement base classes for Probabilistic Approximate Constratins validation.
Probabilistic Approximate Constraint is a primitve intoduced in "Checks and Balances: Monitoring Data Quality Problems in Network Traffic Databases" by Flip Korn et al. PAC with parameters epsilon and delta means that Probability that approximate constraint with error=epsilon holds is greater than delta.

Implement Domain PAC verifier.
Domain PAC on domain D means that Pr(x in D+-epsilon) > delta.

Add Domain PAC verifier bindings and examples.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@p-senichenkov p-senichenkov force-pushed the PAC-validation branch 3 times, most recently from 6a357b4 to a2844b0 Compare November 28, 2025 15:27
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@p-senichenkov p-senichenkov force-pushed the PAC-validation branch 2 times, most recently from 9821bc5 to b418f88 Compare November 28, 2025 19:09
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

Copy link
Collaborator

@MichaelS239 MichaelS239 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review. Good work overall!

@p-senichenkov
Copy link
Collaborator Author

p-senichenkov commented Dec 22, 2025

Sorry for pushing on such late stage of review, but I've found a mistake in math behind Parallelepiped domain (Product comparer isn't suitable there, you've mentioned this issue in one of the comments). I've intentionally carried all changes into a single commit and didn't rebased onto new main, so that you can easily look at these changes.

Also, I've noticed that comparer in domains is superfluous, and I'm currently working on another version of this algo (there are some other changes in calculation of epsilon-delta pairs). But I want to hold a rich comparison of these versions, both benchmarks (I'm afraid that calculating distances for each tuple in the table can take a long time for some complex domains) and usability tests (that version gives other epsilon-delta pairs, though epsilon still agrees delta, because heuristic being used differs a bit).

Anyway, thank you for reviewing so thoroughly, and I'm sorry I didn't notified you as soon as problems were detected.

@p-senichenkov p-senichenkov force-pushed the PAC-validation branch 2 times, most recently from 6405250 to ec548d5 Compare January 28, 2026 11:59
Change `using Destructor = std::function<void(std::byte*)>` to `using
Destructor = std::function<void(std::byte const*)>` in model::Type,
because
a) all currently implemented destructors work with const pointer;
b) GetDestructor() is not used anywhere, except for implementations of
pac::model::IDomain;
c) implementations of pac::model::IDomain work with const pointers.
Implement ComparableTupleType class to compare fixed-sized tuples of
values.
Implement IDomain class to represent an ordered domain in a metric space
of values.
Implement default domain classes: Ball and Parallelepiped.

Implement PAC and DomainPAC classes.
Add option names and descriptions required by PAC verifier
Implement PACVerifier -- a base class for all PAC verifiers.
Implement verifier for Domain Probabilistic Approximate Constraints
Implement unit tests for Domain PAC veririfier.
Implement Python bindigns for Domain PAC verifier
Add 4 basic examples for Domain PAC verifier:
- 1D segment, highlights
- 2D parallelepiped, levelling coefficients
- 2D ball
- string values
and 1 advanced example:
- Custom domain
Re-arrange work in PAC verifier so that it suites better for FD and UCC
PAC validation
Add separate MakeTuples function to make value tuples from typed column
data, becuase it used several times in all PAC verifiers
Add `using Tuples = std::vector<Tuple>` for convenience
Move MakePAC call to the right place in Domain PAC verifier
Replace `pac->SetEpsilon(0)` with `pac = nullptr` in base PACVerifier
class
Comparer used in Parallelepiped default domain was not consistent with
chebyshev metric (that was used in Parallelepiped).
Use metric based on distance from domain
- Normalize includes in bindings
- Remove comparer notions from descritption of Python classes
- Apply some review suggestions
- Normalize inceludes in Domain PAC verifier-related files
- Use "textbook" version of PAC-Man algorithm
- Remove comparer from Domain interface
- Apply some review suggestions
- Normalize includes in PAC verifier-related classes
- Use "textbook" version of PAC-Man

Include utility wherever needed (PAC verifier)
- Normalize includes in base model classes
- Remove all comparers from base model classes
- Fix math behind the metric used in Parallelepiped domain
- Apply some review suggestions
- Actualized examples
- Fix some typos in examples basic-1 and basic-2
- Remove epsilon and delta from Domain PAC's tuple representation (which
  is used in __hash__), because doubles cannot be hashed properly
- Comapre epsilons and deltas approximately in Domain PAC's __eq__
Migrate Domain PAC verifier to target-based CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants