Rules Document content #637
Replies: 2 comments 1 reply
-
|
Hello! I don’t remember if we are expected to comment on the provisional table of contents above. If not, I’ll just delete this comment 🙂 Anyway, I have three comments, which I also discussed at the latest meeting. ============================
As I mentioned at the meeting, I think this should be specified in 2. Packaging SHACL. In my view, as I proposed in the other discussion group, there should be a "cluster" (or "bundle," or any other suitable name) that groups together some data, a set of shapes, and a set of rules. We should also decide whether shapes are applied before the rules or vice versa, with respect to the two operations infer() and query(). I understood that you and the others were considering this a viable idea, but it is definitely something to discuss with the SHACL Profiling task force. How should we proceed? I can raise the point at the next WG meeting on Monday, or I could open an issue directly in the Packaging SHACL section (once I figure out how 😅), but I’m not sure if that’s the proper procedure. By the way, the SHACL 1.2 Editor Draft now displays entirely black in my browser, and the githack URLs give a 404, so I’m not sure what’s going on. ============================
could also mean creating new blank nodes or literals, which could lead to infinite loops. We discussed at the meeting that we should explain at some point how to prevent these infinite loops (but not in Section 2!). It's not fully clear to me how to formally prevent them in the grammar, because infinite loops are triggered by rules whose antecedents are always satisfied (or satisfied, then not, then satisfied again, and so on, infinitely). Is there a way to constrain the grammar to avoid this? I can't think of one, but I'm happy to learn 😊 Alternatively, we could simply add a disclaimer noting the risk of infinite loops and that it's the user's responsibility to ensure their rule set does not generate them. ============================ For example, with a PhD student of mine, I am developing a system to check compliance with Ghana Petroleum Commission regulations (see this paper). One regulation requires companies operating in Ghana to employ at least 80% Ghanaians among the technical staff after 5 years. To infer whether a company complies, we must: (1) count the total employees; (2) count the Ghanaian employees; (3) check that (2)>0.8*(1) This obviously requires aggregate functions (COUNT). I've read more carefully how aggregates are used in SHACL 1.2 Node Expressions, but I don’t think they can be directly used here. This isn't a validation problem: the data are valid, we must infer whether the data comply with the regulations or not. Also, these regulations can include exceptions, e.g., companies may be exempt under certain conditions, but evaluating these conditions may require several additional operations, potentially including more aggregates. Therefore, I don't think this is a validation issue, it looks like an inference one. The problem is indeed more general: some inferences are required only when certain quotas or thresholds are met, or when the sum of some values exceeds a given limit, or in similar situations. Think, for example, of applications in finance. Nevertheless, I’m not an expert in stratification enough to know if the basic stratification method, used for negation-as-failure, can be easily extended to aggregates as well, e.g., evaluating aggregate rules only after all non-aggregate rules have been evaluated. Cheers, |
Beta Was this translation helpful? Give feedback.
-
Ok, let me try... tomorrow :-)
Indeed. I was already told that shapes are like rules, with the main difference being that they produce an error message rather than new triples. So, in principle, everything could be modelled as rules. As you say, it’s something worth exploring. However, even if it works technically, I’m not sure it’s a good idea conceptually. Perhaps we should instead maintain a clear conceptual distinction between validation and inference, using two separate constructs to better emphasize this difference. Okay about infinite loops. I also thought about the parallel with NAF, but the difference is that an infinite loop can be triggered even by a single rule. However, I can now see that the rule would depend on itself, so what you propose would still work.
As I mentioned in my previous reply, we might need rules to infer new values when certain quotas or thresholds are met, or when the sum of some values exceeds a given limit. I haven’t conducted empirical analyses to determine how often this need arises, but intuitively, it seems like it could be fairly common. I actually worked on a small use case with my PhD student and already ran into this need. Maybe I was "unlucky" and found a rare case, but even if it is seldom needed, I don’t see why we should prevent aggregates in the bodies. Are they really that much harder to handle than negation-as-failure? Perhaps the problem (my problem) is that I haven’t fully understood node expressions. Can they also be used for rules? From the current draft, I understood that they can only be used for shapes, i.e., for validating the data. How would that work, for example, in the use case I described earlier? Suppose we have two classes: Can we use node expressions to state that if at least 80% of a company’s employees are from Ghana, then the company belongs to the class With SHACL-SPARQL rules, we still had to specify As I understand it, with the current grammar we cannot. That’s why I was proposing allowing aggregates in the bodies and then extending stratification to them (evaluating aggregate rules only after all non-aggregate rules have been evaluated). However, there may be some reason (which I don’t know) why stratification works for negation-as-failure but not for aggregates. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
1: Introduction
Terminology from RDF Concepts, SPARQL Query
Namespaces - including
shnex,shrl:,sparql:,xsd:,rdf:Test suite
2: Shape Rules
Informative section (that is, non-normative) explaining the main features
Basic outline, key features, not all features.
Audience: rules engineer
What are "rules"? A set of conditions on the shape of the data that infer new information from existing data.
Apply rules: rules+data -> new information
A rule set is a collection of rules. Unit of execution.
An execution is rule set + data (base data)
2.1 Structure of a Rule
Head-Body: match body (the "if",) and generate new information ("triples")
Or as H is true when B
Body is a pattern, with value restriction.
2.2 Evalaution
"match body, get variables, use head as a template.
Makes new information available to other rules.
Execute until no change.
Operations - infer(), query()
A rules system can provide one or both of these operations.
2.3 Rule Dependencies
Rules can depend on rules.
Including recursion.
:ancestor:ancestorof:ancestor2.4 Rule Set Stratification
"We say that rule1 depends on rule2 if ..."
2.5 Negation-as-failure
Shape Rules also supports
2.6 Assignment
and why it is beyond datalog
"safe assignment" - previous stratification - is it enough?
3. Shape Rules Abstract Syntax
Normative.
This section is the formal definition of rules and rule set execution.
3.1 Well-formedness Condition for a rule body
FILTERs, Assignments conditions so variables are defined before use.
3.2 Dependency Relationship
Definition rule A deopends on rule B if ...
3.3 Stratification
Definition
... and well-formedness conditions (NAF and recursion)
NB EXISTS - with body pattern (a limited graph pattern)
4. Concrete Syntax forms for Shapes Rules
4.1 RDF Rules Syntax
4.2 Compact Rules Syntax
4.2.1 Compact Syntax Abbreviations
4.3 SPARQL function restrictions
No
BOUND,FILTER (NOT) EXISTS(available as a pattern)Explain pattern of EXISTS/NOT EXISTS is a body pattern
5. Shape Rules Evaluation
Define evaluation
Stratification.
Necessary if NAF.
6. Workspace named tuples
Possible addition. Tuples as space during execution. Avoids repeating pattern fragments.
Syntax:
TUPLE(name/string, varTerm1, varTerm2, ...)-- include this and maybe a short form.&name(varTerm1, varTerm2, ...)7. Attaching Rules to Shapes
Does (some) targets as patterns work for a definition?
Appendix A: Shape Rules Grammar
Appendix B: Relationship to SHACL-AF
Appendix C: Relationship to node expressions
Beta Was this translation helpful? Give feedback.
All reactions