Implement Domain PAC validation#611
Conversation
5ebbac8 to
b48b7f3
Compare
b48b7f3 to
d2021b0
Compare
src/core/algorithms/pac/pac_verifier/domain_pac_verifier_base.cpp
Outdated
Show resolved
Hide resolved
6a357b4 to
a2844b0
Compare
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier.h
Outdated
Show resolved
Hide resolved
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_base.cpp
Outdated
Show resolved
Hide resolved
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_cli_adapter.cpp
Outdated
Show resolved
Hide resolved
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_cli_adapter.h
Outdated
Show resolved
Hide resolved
9821bc5 to
b418f88
Compare
MichaelS239
left a comment
There was a problem hiding this comment.
Sorry for the late review. Good work overall!
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_base.cpp
Outdated
Show resolved
Hide resolved
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_base.cpp
Outdated
Show resolved
Hide resolved
|
Sorry for pushing on such late stage of review, but I've found a mistake in math behind Parallelepiped domain (Product comparer isn't suitable there, you've mentioned this issue in one of the comments). I've intentionally carried all changes into a single commit and didn't rebased onto new main, so that you can easily look at these changes. Also, I've noticed that comparer in domains is superfluous, and I'm currently working on another version of this algo (there are some other changes in calculation of epsilon-delta pairs). But I want to hold a rich comparison of these versions, both benchmarks (I'm afraid that calculating distances for each tuple in the table can take a long time for some complex domains) and usability tests (that version gives other epsilon-delta pairs, though epsilon still agrees delta, because heuristic being used differs a bit). Anyway, thank you for reviewing so thoroughly, and I'm sorry I didn't notified you as soon as problems were detected. |
6405250 to
ec548d5
Compare
88d6b32 to
254af6b
Compare
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_base.h
Outdated
Show resolved
Hide resolved
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_base.cpp
Outdated
Show resolved
Hide resolved
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_base.cpp
Outdated
Show resolved
Hide resolved
src/core/algorithms/pac/pac_verifier/domain_pac_verifier/domain_pac_verifier_base.cpp
Outdated
Show resolved
Hide resolved
Change `using Destructor = std::function<void(std::byte*)>` to `using Destructor = std::function<void(std::byte const*)>` in model::Type, because a) all currently implemented destructors work with const pointer; b) GetDestructor() is not used anywhere, except for implementations of pac::model::IDomain; c) implementations of pac::model::IDomain work with const pointers.
Implement ComparableTupleType class to compare fixed-sized tuples of values. Implement IDomain class to represent an ordered domain in a metric space of values. Implement default domain classes: Ball and Parallelepiped. Implement PAC and DomainPAC classes.
Add option names and descriptions required by PAC verifier
Implement PACVerifier -- a base class for all PAC verifiers.
Implement verifier for Domain Probabilistic Approximate Constraints
Implement unit tests for Domain PAC veririfier.
Implement Python bindigns for Domain PAC verifier
Add 4 basic examples for Domain PAC verifier: - 1D segment, highlights - 2D parallelepiped, levelling coefficients - 2D ball - string values and 1 advanced example: - Custom domain
Re-arrange work in PAC verifier so that it suites better for FD and UCC PAC validation
Add separate MakeTuples function to make value tuples from typed column data, becuase it used several times in all PAC verifiers
Add `using Tuples = std::vector<Tuple>` for convenience
Move MakePAC call to the right place in Domain PAC verifier
Replace `pac->SetEpsilon(0)` with `pac = nullptr` in base PACVerifier class
Comparer used in Parallelepiped default domain was not consistent with chebyshev metric (that was used in Parallelepiped). Use metric based on distance from domain
- Normalize includes in bindings - Remove comparer notions from descritption of Python classes - Apply some review suggestions
- Normalize inceludes in Domain PAC verifier-related files - Use "textbook" version of PAC-Man algorithm - Remove comparer from Domain interface - Apply some review suggestions
- Normalize includes in PAC verifier-related classes - Use "textbook" version of PAC-Man Include utility wherever needed (PAC verifier)
- Normalize includes in base model classes - Remove all comparers from base model classes - Fix math behind the metric used in Parallelepiped domain - Apply some review suggestions
- Actualized examples - Fix some typos in examples basic-1 and basic-2
- Remove epsilon and delta from Domain PAC's tuple representation (which is used in __hash__), because doubles cannot be hashed properly - Comapre epsilons and deltas approximately in Domain PAC's __eq__
Migrate Domain PAC verifier to target-based CMake
e7d32e1 to
ab7f699
Compare
Implement base classes for Probabilistic Approximate Constratins validation.
Probabilistic Approximate Constraint is a primitve intoduced in "Checks and Balances: Monitoring Data Quality Problems in Network Traffic Databases" by Flip Korn et al. PAC with parameters epsilon and delta means that Probability that approximate constraint with error=epsilon holds is greater than delta.
Implement Domain PAC verifier.
Domain PAC on domain D means that Pr(x in D+-epsilon) > delta.
Add Domain PAC verifier bindings and examples.