Skip to content

Commit ae554d5

Browse files
authored
Merge pull request #98 from w3c/Issue95
Turn the input to Group(..) and to Aggregation(..) into solution sequences
2 parents 35a4a45 + 8e30456 commit ae554d5

File tree

1 file changed

+92
-72
lines changed

1 file changed

+92
-72
lines changed

spec/index.html

Lines changed: 92 additions & 72 deletions
Original file line numberDiff line numberDiff line change
@@ -8745,7 +8745,11 @@ <h5>Grouping and Aggregation</h5>
87458745
<p>Step: GROUP BY</p>
87468746
<p>If the <code>GROUP BY</code> keyword is used, or there is implicit grouping due to the
87478747
use of aggregates in the projection, then grouping is performed by the
8748-
<a href="#defn_algGroup">Group</a> function. It divides the solution set into groups of one or
8748+
<a href="#defn_algGroup">Group</a> function.
8749+
In this case, before grouping, the solution set is converted into a solution
8750+
sequence by applying the <a href="#defn_algToList">ToList</a> function.
8751+
Next, the <a href="#defn_algGroup">Group</a> function
8752+
divides this solution sequence into groups of one or
87498753
more solutions, with the same overall cardinality. In case of implicit grouping, a fixed
87508754
constant (1) is used to group all solutions into a single group.</p>
87518755
<p>Step: Aggregates</p>
@@ -8765,9 +8769,9 @@ <h5>Grouping and Aggregation</h5>
87658769
Let E := [], a list of pairs of the form (variable, expression)
87668770

87678771
If Q contains GROUP BY exprlist
8768-
Let G := Group(exprlist, P)
8772+
Let G := Group(exprlist, ToList(P))
87698773
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
8770-
Let G := Group((1), P)
8774+
Let G := Group((1), ToList(P))
87718775
Else
87728776
skip the rest of the aggregate step
87738777
End
@@ -9415,10 +9419,10 @@ <h4>Aggregate Algebra</h4>
94159419
<div id="defn_algGroup">
94169420
<b>Definition: Group</b>
94179421
</div>
9418-
<p>Group evaluates a list of expressions against a solution sequence, producing a set
9422+
<p>Group evaluates a list of expressions against a solution sequence Ψ, producing a set
94199423
of partial functions from keys to solution sequences.</p>
9420-
<p>Group(exprlist, Ω) = { ListEval(exprlist, μ) → { μ' | μ' in Ω, ListEval(exprlist, μ)
9421-
= ListEval(exprlist, μ') } | μ in Ω }</p>
9424+
<p>Group(exprlist, Ψ) = { ListEval(exprlist, μ) → [ μ' | μ' in Ψ, ListEval(exprlist, μ)
9425+
= ListEval(exprlist, μ') ] | μ in Ψ }</p>
94229426
</div>
94239427
<div class="defn">
94249428
<p><b>Definition: ListEval</b></p>
@@ -9441,22 +9445,37 @@ <h4>Aggregate Algebra</h4>
94419445
</div>
94429446
<p>Let <i>exprlist</i> be a list of expressions or *, <i>func</i> a set function,
94439447
<i>scalarvals</i> a set of partial functions (possibly empty) passed from the aggregate
9444-
in the query, and let { key<sub>1</sub>→Ω<sub>1</sub>, ...,
9445-
key<sub>m</sub>→Ω<sub>m</sub> } be a multiset of partial functions from keys to
9448+
in the query, and let { key<sub>1</sub>→Ψ<sub>1</sub>, ...,
9449+
key<sub>m</sub>→Ψ<sub>m</sub> } be a set of partial functions from keys to
94469450
solution sequences as produced by the grouping step.</p>
9447-
<p>Aggregation applies the set function func to the given multiset and produces a
9448-
single value for each key and partition of solutions for that key.</p>
9449-
<p>Aggregation(exprlist, func, scalarvals, { key<sub>1</sub>→Ω<sub>1</sub>, ...,
9450-
key<sub>m</sub>→Ω<sub>m</sub> } )<br>
9451-
&nbsp;&nbsp;&nbsp;= { (key, F(Ω)) | key → Ω in { key<sub>1</sub>→Ω<sub>1</sub>, ...,
9452-
key<sub>m</sub>→Ω<sub>m</sub> } }</p>
9451+
<p>Aggregation applies the set function func to the given set and produces a
9452+
single value for each key and group of solutions for that key.</p>
9453+
<p>Aggregation(exprlist, func, scalarvals, { key<sub>1</sub>→Ψ<sub>1</sub>, ...,
9454+
key<sub>m</sub>→Ψ<sub>m</sub> } )<br>
9455+
&nbsp;&nbsp;&nbsp;= { (key, F(Ψ)) | key → Ψ in { key<sub>1</sub>→Ψ<sub>1</sub>, ...,
9456+
key<sub>m</sub>→Ψ<sub>m</sub> } }</p>
94539457
<p>where<br>
9454-
&nbsp;&nbsp;M(Ω) = { ListEval(exprlist, μ) | μ in Ω }<br>
9455-
&nbsp;&nbsp;F(Ω) = func(M(Ω), scalarvals), for non-DISTINCT<br>
9456-
&nbsp;&nbsp;F(Ω) = func(Distinct(M(Ω)), scalarvals), for DISTINCT</p>
9458+
&nbsp;&nbsp;M(Ψ) = [ ListEval(exprlist, μ) | μ in Ψ ]<br>
9459+
&nbsp;&nbsp;F(Ψ) = func(M(Ψ), scalarvals), for non-<code>DISTINCT</code><br>
9460+
&nbsp;&nbsp;F(Ψ) = func(Dedup(M(Ψ)), scalarvals), for <code>DISTINCT</code></p>
9461+
<p>with Dedup(M(Ψ)) being an order-preserving, duplicate-free version of the sequence M(Ψ); that is, Dedup(M(Ψ)) is a sequence of RDF terms that has the following four properties.</p>
9462+
<ol>
9463+
<li>Every unique element in M(Ψ) is contained in Dedup(M(Ψ)).</li>
9464+
<li>Every element in Dedup(M(Ψ)) is contained in M(Ψ).</li>
9465+
<li>Dedup(M(Ψ)) is free of duplicates. That is, the element at the |i|-th position in Dedup(M(Ψ)) is not the same term as the element at the |j|-th position in Dedup(M(Ψ)) for every two natural numbers |i| and |j| such that |i| &ne; |j|.</li>
9466+
<li>For any two elements <var>e<sub>1</sub></var> and <var>e<sub>2</sub></var> in Dedup(M(Ψ)), the relative order of their first occurrences in M(Ψ) is preserved in Dedup(M(Ψ)). That is, if <var>i<sub>1</sub></var>&nbsp;&lt;&nbsp;<var>i<sub>2</sub></var>, then <var>j<sub>1</sub></var>&nbsp;&lt;&nbsp;<var>j<sub>2</sub></var>, where
9467+
<ul>
9468+
<li><var>i<sub>1</sub></var> is the smallest natural number such that <var>e<sub>1</sub></var> is at the <var>i<sub>1</sub></var>-th position in M(Ψ),</li>
9469+
<li><var>i<sub>2</sub></var> is the smallest natural number such that <var>e<sub>2</sub></var> is at the <var>i<sub>2</sub></var>-th position in M(Ψ),</li>
9470+
<li><var>j<sub>1</sub></var> is the position of <var>e<sub>1</sub></var> in Dedup(M(Ψ)), and</li>
9471+
<li><var>j<sub>2</sub></var> is the position of <var>e<sub>2</sub></var> in Dedup(M(Ψ)).</li>
9472+
</ul>
9473+
</li>
9474+
</ol>
9475+
94579476
<p><b>Special Case:</b> when <code>COUNT</code> is used with the expression
94589477
<code>*</code> the value of F will be the cardinality of the group solution sequence,
9459-
<code>card[Ω]</code>, or <code>card[Distinct(Ω)]</code> if the <code>DISTINCT</code>
9478+
<code>card[Ψ]</code>, or <code>card[Dedup(Ψ)]</code> if the <code>DISTINCT</code>
94609479
keyword is present.</p>
94619480
</div>
94629481
<p><i>scalarvals</i> are used to pass values to the underlying set function, bypassing
@@ -9466,7 +9485,7 @@ <h4>Aggregate Algebra</h4>
94669485
<p>All aggregates may have the <code>DISTINCT</code> keyword as the first token in their
94679486
argument list. If this keyword is present then first argument to func is Distinct(M).</p>
94689487
<p>Example</p>
9469-
<p>Given a solution multiset (Ω) with the following values:</p>
9488+
<p>Given a solution sequence Ψ with the following values:</p>
94709489
<table>
94719490
<tbody>
94729491
<tr>
@@ -9497,10 +9516,10 @@ <h4>Aggregate Algebra</h4>
94979516
</table>
94989517
<p>And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY
94999518
?x.</p>
9500-
<p>We produce G = Group((?x), Ω) = { ( (1), { μ<sub>1</sub>, μ<sub>2</sub> } ), ( (2), {
9501-
μ<sub>3</sub> } ) }</p>
9519+
<p>We produce G = Group((?x), Ψ) = { (1) → [μ<sub>1</sub>, μ<sub>2</sub>], (2) →
9520+
[μ<sub>3</sub>] }</p>
95029521
<p>And so Aggregation((?y, ?z), ex:agg, {}, G) =<br>
9503-
{ ((1), eg:agg({(2, 3), (3, 4)}, {})), ((2), eg:agg({(5, 6)}, {})) }.</p>
9522+
{ ((1), eg:agg([(2, 3), (3, 4)], {})), ((2), eg:agg([(5, 6)], {})) }.</p>
95049523
<div class="defn">
95059524
<p><b>Definition: AggregateJoin</b></p>
95069525
<p>Let S<sub>1</sub>, ..., S<sub>n</sub> be a list of sets, where each set
@@ -9511,24 +9530,24 @@ <h4>Aggregate Algebra</h4>
95119530
..., agg<sub>n</sub>→val<sub>n</sub> | key in K and key→val<sub>i</sub> in
95129531
S<sub>i</sub> for each 1 &lt;= i &lt;= n }</p>
95139532
</div>
9514-
<p>Flatten is a function which is used to collapse multisets of lists into a multiset, so
9515-
for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 }.</p>
9533+
<p>Flatten is a function which is used to collapse a sequence of lists into a single list.
9534+
For example, [(1,&nbsp;2), (3,&nbsp;4)] becomes (1, 2, 3, 4).</p>
95169535
<div class="defn">
95179536
<p><b>Definition: Flatten</b></p>
9518-
<p>The Flatten(M) function takes a multiset of lists, M {(L<sub>1</sub>, L<sub>2</sub>,
9519-
...), ...}, and returns the multiset { x | L in M and x in L }.</p>
9537+
<p>The Flatten(S) function takes a sequence of lists, S = [(L<sub>1</sub>, L<sub>2</sub>,
9538+
...), ...], and returns the list ( x | L in S and x in L ).</p>
95209539
</div>
95219540
<section id="setFunctions">
95229541
<h5>Set Functions</h5>
95239542
<p>The set functions which underlie SPARQL aggregates all have a common signature:
9524-
SetFunc(M), or SetFunc(M, scalarvals) where M is a multiset of lists, and scalarvals is
9543+
SetFunc(S), or SetFunc(S, scalarvals) where S is a sequence of lists, and scalarvals is
95259544
one or more scalar values that are passed to the set function indirectly via the ( ...
95269545
; key=value ) syntax for aggregates in the SPARQL grammar. The only use of this that is
95279546
supported by the built-in aggregates in SPARQL Query 1.1 is <code>GROUP_CONCAT</code>,
95289547
as in <code>GROUP_CONCAT(?x ; separator=", ")</code>.</p>
95299548
<p>Note that the name "Set Function" is somewhat historical — the arguments to set
9530-
functions are in fact multisets. The name is retained due to the commonality with SQL
9531-
Set Functions, which also operate over multisets.</p>
9549+
functions are in fact sequences. The name is retained due to the commonality with SQL
9550+
Set Functions, which operate over multisets.</p>
95329551
<p>The set functions defined in this document are Count, Sum, Min, Max, Avg,
95339552
GroupConcat, and Sample — corresponding to the aggregates <code>COUNT</code>,
95349553
<code>SUM</code>, <code>MIN</code>, <code>MAX</code>, <code>AVG</code>,
@@ -9546,10 +9565,10 @@ <h5>Count</h5>
95469565
has a bound, non-error value within the aggregate group.</p>
95479566
<div class="defn">
95489567
<p><b>Definition: <span id="defn_aggCount">Count</span></b></p>
9549-
<pre class="code nohighlight">xsd:integer Count(multiset M)</pre>
9550-
<p>N = Flatten(M)</p>
9551-
<p>remove error elements from N</p>
9552-
<p>Count(M) = card[N]</p>
9568+
<pre class="code nohighlight">xsd:integer Count(sequence S)</pre>
9569+
<p>L = Flatten(S)</p>
9570+
<p>remove error elements from L</p>
9571+
<p>Count(S) = card[L]</p>
95539572
</div>
95549573
</section>
95559574
<section id="aggSum">
@@ -9561,13 +9580,14 @@ <h5>Sum</h5>
95619580
be 6.0 (float).</p>
95629581
<div class="defn">
95639582
<p><b>Definition: <span id="defn_aggSum">Sum</span></b></p>
9564-
<pre class="code nohighlight">numeric Sum(multiset M)</pre>
9565-
<p>Sum(M) = Sum(ToList(Flatten(M))).</p>
9566-
<p>Sum(S) = op:numeric-add(S<sub>1</sub>, Sum(S<sub>2..n</sub>)) when card[S] &gt;
9583+
<pre class="code nohighlight">numeric Sum(sequence S)</pre>
9584+
<p>L = Flatten(S)</p>
9585+
<p>Sum(S) = Sum(L)</p>
9586+
<p>Sum(L) = op:numeric-add(L<sub>1</sub>, Sum(L<sub>2..n</sub>)) when card[L] &gt;
95679587
1<br>
9568-
Sum(S) = op:numeric-add(S<sub>1</sub>, 0) when card[S] = 1<br>
9569-
Sum(S) = "0"^^xsd:integer when card[S] = 0</p>
9570-
<p>In this way, Sum({1, 2, 3}) = op:numeric-add(1, op:numeric-add(2,
9588+
Sum(L) = op:numeric-add(L<sub>1</sub>, 0) when card[L] = 1<br>
9589+
Sum(L) = "0"^^xsd:integer when card[L] = 0</p>
9590+
<p>In this way, Sum( (1, 2, 3) ) = op:numeric-add(1, op:numeric-add(2,
95719591
op:numeric-add(3, 0))).</p>
95729592
</div>
95739593
</section>
@@ -9577,11 +9597,11 @@ <h5>Avg</h5>
95779597
average value for an expression over a group. It is defined in terms of Sum and Count.
95789598
<div class="defn">
95799599
<p><b>Definition: <span id="defn_aggAvg">Avg</span></b></p>
9580-
<pre class="code nohighlight">numeric Avg(multiset M)</pre>
9581-
<p>Avg(M) = "0"^^xsd:integer, where Count(M) = 0</p>
9582-
<p>Avg(M) = Sum(M) / Count(M), where Count(M) &gt; 0</p>
9600+
<pre class="code nohighlight">numeric Avg(sequence S)</pre>
9601+
<p>Avg(S) = "0"^^xsd:integer, where Count(S) = 0</p>
9602+
<p>Avg(S) = Sum(S) / Count(S), where Count(S) &gt; 0</p>
95839603
</div>
9584-
<p>For example, Avg({1, 2, 3}) = Sum({1, 2, 3})/Count({1, 2, 3}) = 6/3 = 2.</p>
9604+
<p>For example, Avg([(1), (2), (3)]) = Sum([(1), (2), (3)])/Count([(1), (2), (3)]) = 6/3 = 2.</p>
95859605
</section>
95869606
<section id="aggMin">
95879607
<h5>Min</h5>
@@ -9591,12 +9611,12 @@ <h5>Min</h5>
95919611
arbitrarily typed expressions.</p>
95929612
<div class="defn">
95939613
<p><b>Definition: <span id="defn_aggMin">Min</span></b></p>
9594-
<pre class="code nohighlight">term Min(multiset M)</pre>
9595-
<p>Min(M) = Min(ToList(Flatten(M)))</p>
9596-
<p>Min({}) = error.</p>
9597-
<p>The flattened multiset of values passed as an argument is converted to a sequence
9598-
S, this sequence is ordered as per the <code>ORDER BY ASC</code> clause.</p>
9599-
<p>Min(S) = S<sub>0</sub></p>
9614+
<pre class="code nohighlight">term Min(sequence S)</pre>
9615+
<p>L = Flatten(S)</p>
9616+
<p>Min(S) = Min(L)</p>
9617+
<p>The flattened list L of values is ordered as per the <code>ORDER BY ASC</code> clause.</p>
9618+
<p>Min(L) = L<sub>0</sub> if card[L] > 0<br>
9619+
Min(L) = error if card[L] = 0</p>
96009620
</div>
96019621
</section>
96029622
<section id="aggMax">
@@ -9607,12 +9627,12 @@ <h5>Max</h5>
96079627
arbitrarily typed expressions.</p>
96089628
<div class="defn">
96099629
<p><b>Definition: <span id="defn_aggMax">Max</span></b></p>
9610-
<pre class="code nohighlight">term Max(multiset M)</pre>
9611-
<p>Max(M) = Max(ToList(Flatten(M)))</p>
9612-
<p>Max({}) = error.</p>
9613-
<p>The multiset of values passed as an argument is converted to a sequence S, this
9614-
sequence is ordered as per the <code>ORDER BY DESC</code> clause.</p>
9615-
<p>Max(S) = S<sub>0</sub></p>
9630+
<pre class="code nohighlight">term Max(sequence S)</pre>
9631+
<p>L = Flatten(S)</p>
9632+
<p>Max(S) = Max(L)</p>
9633+
<p>The flattened list L of values is ordered as per the <code>ORDER BY DESC</code> clause.</p>
9634+
<p>Max(L) = L<sub>0</sub> if card[L] > 0<br>
9635+
Max(L) = error if card[L] = 0</p>
96169636
</div>
96179637
</section>
96189638
<section id="aggGroupConcat">
@@ -9623,33 +9643,33 @@ <h5>GroupConcat</h5>
96239643
SEPARATOR.</p>
96249644
<div class="defn">
96259645
<p><b>Definition: <span id="defn_aggGroupConcat">GroupConcat</span></b></p>
9626-
<pre class="code nohighlight">literal GroupConcat(multiset M)</pre>
9646+
<pre class="code nohighlight">literal GroupConcat(sequence S)</pre>
96279647
<p>If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to
96289648
be the "space" character, unicode codepoint U+0020.</p>
9629-
<p>The multiset of values, M passed as an argument is converted to a sequence S.</p>
9630-
<p>GroupConcat(M, scalarvals) = GroupConcat(Flatten(M), scalarvals("separator"))</p>
9631-
<p>GroupConcat(S, sep) = "", where <span style=
9632-
"font-size: 140%">|</span>S<span style="font-size: 140%">|</span> = 0</p>
9633-
<p>GroupConcat(S, sep) = CONCAT("", S<sub>0</sub>), where
9634-
<span style="font-size: 140%">|</span>S<span style="font-size: 140%">|</span> = 1</p>
9635-
<p>GroupConcat(S, sep) = CONCAT(S<sub>0</sub>, sep, GroupConcat(S<sub>1..n-1</sub>,
9636-
sep)), where <span style="font-size: 140%">|</span>S<span style="font-size: 140%">|</span> &gt; 1</p>
9637-
</div>
9638-
<p>For example, GroupConcat({"a", "b", "c"}, {"separator" → "."}) = "a.b.c".</p>
9649+
<p>L = Flatten(S)</p>
9650+
<p>GroupConcat(S, scalarvals) = GroupConcat(L, scalarvals("separator"))</p>
9651+
<p>GroupConcat(L, sep) = "", where <span style=
9652+
"font-size: 140%">|</span>L<span style="font-size: 140%">|</span> = 0</p>
9653+
<p>GroupConcat(L, sep) = CONCAT("", L<sub>0</sub>), where
9654+
<span style="font-size: 140%">|</span>L<span style="font-size: 140%">|</span> = 1</p>
9655+
<p>GroupConcat(L, sep) = CONCAT(L<sub>0</sub>, sep, GroupConcat(L<sub>1..n-1</sub>,
9656+
sep)), where <span style="font-size: 140%">|</span>L<span style="font-size: 140%">|</span> &gt; 1</p>
9657+
</div>
9658+
<p>For example, GroupConcat([("a"), ("b"), ("c")], {"separator" → "."}) = "a.b.c".</p>
96399659
</section>
96409660
<section id="aggSample">
96419661
<h5>Sample</h5>
9642-
<p>Sample is a set function which returns an arbitrary value from the multiset passed
9662+
<p>Sample is a set function which returns an arbitrary value from the sequence passed
96439663
to it.</p>
96449664
<div class="defn">
96459665
<p><b>Definition: <span id="defn_aggSample">Sample</span></b></p>
9646-
<pre class="code nohighlight">RDFTerm Sample(multiset M)</pre>
9647-
<p>Sample(M) = v, where v in Flatten(M)</p>
9648-
<p>Sample({}) = error</p>
9666+
<pre class="code nohighlight">RDFTerm Sample(sequence S)</pre>
9667+
<p>Sample(S) = v, where v in Flatten(S)</p>
9668+
<p>Sample([]) = error</p>
96499669
</div>
9650-
<p>For example, given Sample({"a", "b", "c"}), "a", "b", and "c" are all valid return
9670+
<p>For example, given Sample([("a"), ("b"), ("c")]), "a", "b", and "c" are all valid return
96519671
values. Note that Sample() is not required to be deterministic for a given input, the
9652-
only restriction is that the output value must be present in the input multiset.</p>
9672+
only restriction is that the output value must be present in the input sequence.</p>
96539673
</section>
96549674
</section>
96559675
<section id="sparqlAlgebraEval">

0 commit comments

Comments
 (0)