@@ -8745,7 +8745,11 @@ <h5>Grouping and Aggregation</h5>
8745
8745
<p>Step: GROUP BY</p>
8746
8746
<p>If the <code>GROUP BY</code> keyword is used, or there is implicit grouping due to the
8747
8747
use of aggregates in the projection, then grouping is performed by the
8748
- <a href="#defn_algGroup">Group</a> function. It divides the solution set into groups of one or
8748
+ <a href="#defn_algGroup">Group</a> function.
8749
+ In this case, before grouping, the solution set is converted into a solution
8750
+ sequence by applying the <a href="#defn_algToList">ToList</a> function.
8751
+ Next, the <a href="#defn_algGroup">Group</a> function
8752
+ divides this solution sequence into groups of one or
8749
8753
more solutions, with the same overall cardinality. In case of implicit grouping, a fixed
8750
8754
constant (1) is used to group all solutions into a single group.</p>
8751
8755
<p>Step: Aggregates</p>
@@ -8765,9 +8769,9 @@ <h5>Grouping and Aggregation</h5>
8765
8769
Let E := [], a list of pairs of the form (variable, expression)
8766
8770
8767
8771
If Q contains GROUP BY exprlist
8768
- Let G := Group(exprlist, P )
8772
+ Let G := Group(exprlist, ToList(P) )
8769
8773
Else If Q contains an aggregate in SELECT, HAVING, ORDER BY
8770
- Let G := Group((1), P )
8774
+ Let G := Group((1), ToList(P) )
8771
8775
Else
8772
8776
skip the rest of the aggregate step
8773
8777
End
@@ -9415,10 +9419,10 @@ <h4>Aggregate Algebra</h4>
9415
9419
<div id="defn_algGroup">
9416
9420
<b>Definition: Group</b>
9417
9421
</div>
9418
- <p>Group evaluates a list of expressions against a solution sequence, producing a set
9422
+ <p>Group evaluates a list of expressions against a solution sequence Ψ , producing a set
9419
9423
of partial functions from keys to solution sequences.</p>
9420
- <p>Group(exprlist, Ω ) = { ListEval(exprlist, μ) → { μ' | μ' in Ω , ListEval(exprlist, μ)
9421
- = ListEval(exprlist, μ') } | μ in Ω }</p>
9424
+ <p>Group(exprlist, Ψ ) = { ListEval(exprlist, μ) → [ μ' | μ' in Ψ , ListEval(exprlist, μ)
9425
+ = ListEval(exprlist, μ') ] | μ in Ψ }</p>
9422
9426
</div>
9423
9427
<div class="defn">
9424
9428
<p><b>Definition: ListEval</b></p>
@@ -9441,22 +9445,37 @@ <h4>Aggregate Algebra</h4>
9441
9445
</div>
9442
9446
<p>Let <i>exprlist</i> be a list of expressions or *, <i>func</i> a set function,
9443
9447
<i>scalarvals</i> a set of partial functions (possibly empty) passed from the aggregate
9444
- in the query, and let { key<sub>1</sub>→Ω <sub>1</sub>, ...,
9445
- key<sub>m</sub>→Ω <sub>m</sub> } be a multiset of partial functions from keys to
9448
+ in the query, and let { key<sub>1</sub>→Ψ <sub>1</sub>, ...,
9449
+ key<sub>m</sub>→Ψ <sub>m</sub> } be a set of partial functions from keys to
9446
9450
solution sequences as produced by the grouping step.</p>
9447
- <p>Aggregation applies the set function func to the given multiset and produces a
9448
- single value for each key and partition of solutions for that key.</p>
9449
- <p>Aggregation(exprlist, func, scalarvals, { key<sub>1</sub>→Ω <sub>1</sub>, ...,
9450
- key<sub>m</sub>→Ω <sub>m</sub> } )<br>
9451
- = { (key, F(Ω )) | key → Ω in { key<sub>1</sub>→Ω <sub>1</sub>, ...,
9452
- key<sub>m</sub>→Ω <sub>m</sub> } }</p>
9451
+ <p>Aggregation applies the set function func to the given set and produces a
9452
+ single value for each key and group of solutions for that key.</p>
9453
+ <p>Aggregation(exprlist, func, scalarvals, { key<sub>1</sub>→Ψ <sub>1</sub>, ...,
9454
+ key<sub>m</sub>→Ψ <sub>m</sub> } )<br>
9455
+ = { (key, F(Ψ )) | key → Ψ in { key<sub>1</sub>→Ψ <sub>1</sub>, ...,
9456
+ key<sub>m</sub>→Ψ <sub>m</sub> } }</p>
9453
9457
<p>where<br>
9454
- M(Ω) = { ListEval(exprlist, μ) | μ in Ω }<br>
9455
- F(Ω) = func(M(Ω), scalarvals), for non-DISTINCT<br>
9456
- F(Ω) = func(Distinct(M(Ω)), scalarvals), for DISTINCT</p>
9458
+ M(Ψ) = [ ListEval(exprlist, μ) | μ in Ψ ]<br>
9459
+ F(Ψ) = func(M(Ψ), scalarvals), for non-<code>DISTINCT</code><br>
9460
+ F(Ψ) = func(Dedup(M(Ψ)), scalarvals), for <code>DISTINCT</code></p>
9461
+ <p>with Dedup(M(Ψ)) being an order-preserving, duplicate-free version of the sequence M(Ψ); that is, Dedup(M(Ψ)) is a sequence of RDF terms that has the following four properties.</p>
9462
+ <ol>
9463
+ <li>Every unique element in M(Ψ) is contained in Dedup(M(Ψ)).</li>
9464
+ <li>Every element in Dedup(M(Ψ)) is contained in M(Ψ).</li>
9465
+ <li>Dedup(M(Ψ)) is free of duplicates. That is, the element at the |i|-th position in Dedup(M(Ψ)) is not the same term as the element at the |j|-th position in Dedup(M(Ψ)) for every two natural numbers |i| and |j| such that |i| ≠ |j|.</li>
9466
+ <li>For any two elements <var>e<sub>1</sub></var> and <var>e<sub>2</sub></var> in Dedup(M(Ψ)), the relative order of their first occurrences in M(Ψ) is preserved in Dedup(M(Ψ)). That is, if <var>i<sub>1</sub></var> < <var>i<sub>2</sub></var>, then <var>j<sub>1</sub></var> < <var>j<sub>2</sub></var>, where
9467
+ <ul>
9468
+ <li><var>i<sub>1</sub></var> is the smallest natural number such that <var>e<sub>1</sub></var> is at the <var>i<sub>1</sub></var>-th position in M(Ψ),</li>
9469
+ <li><var>i<sub>2</sub></var> is the smallest natural number such that <var>e<sub>2</sub></var> is at the <var>i<sub>2</sub></var>-th position in M(Ψ),</li>
9470
+ <li><var>j<sub>1</sub></var> is the position of <var>e<sub>1</sub></var> in Dedup(M(Ψ)), and</li>
9471
+ <li><var>j<sub>2</sub></var> is the position of <var>e<sub>2</sub></var> in Dedup(M(Ψ)).</li>
9472
+ </ul>
9473
+ </li>
9474
+ </ol>
9475
+
9457
9476
<p><b>Special Case:</b> when <code>COUNT</code> is used with the expression
9458
9477
<code>*</code> the value of F will be the cardinality of the group solution sequence,
9459
- <code>card[Ω ]</code>, or <code>card[Distinct(Ω )]</code> if the <code>DISTINCT</code>
9478
+ <code>card[Ψ ]</code>, or <code>card[Dedup(Ψ )]</code> if the <code>DISTINCT</code>
9460
9479
keyword is present.</p>
9461
9480
</div>
9462
9481
<p><i>scalarvals</i> are used to pass values to the underlying set function, bypassing
@@ -9466,7 +9485,7 @@ <h4>Aggregate Algebra</h4>
9466
9485
<p>All aggregates may have the <code>DISTINCT</code> keyword as the first token in their
9467
9486
argument list. If this keyword is present then first argument to func is Distinct(M).</p>
9468
9487
<p>Example</p>
9469
- <p>Given a solution multiset (Ω) with the following values:</p>
9488
+ <p>Given a solution sequence Ψ with the following values:</p>
9470
9489
<table>
9471
9490
<tbody>
9472
9491
<tr>
@@ -9497,10 +9516,10 @@ <h4>Aggregate Algebra</h4>
9497
9516
</table>
9498
9517
<p>And the query expression SELECT (ex:agg(?y, ?z) AS ?agg) WHERE { ?x ?y ?z } GROUP BY
9499
9518
?x.</p>
9500
- <p>We produce G = Group((?x), Ω ) = { ( (1), { μ<sub>1</sub>, μ<sub>2</sub> } ) , ( (2), {
9501
- μ<sub>3</sub> } ) }</p>
9519
+ <p>We produce G = Group((?x), Ψ ) = { (1) → [ μ<sub>1</sub>, μ<sub>2</sub>] , (2) →
9520
+ [ μ<sub>3</sub>] }</p>
9502
9521
<p>And so Aggregation((?y, ?z), ex:agg, {}, G) =<br>
9503
- { ((1), eg:agg({ (2, 3), (3, 4)} , {})), ((2), eg:agg({ (5, 6)} , {})) }.</p>
9522
+ { ((1), eg:agg([ (2, 3), (3, 4)] , {})), ((2), eg:agg([ (5, 6)] , {})) }.</p>
9504
9523
<div class="defn">
9505
9524
<p><b>Definition: AggregateJoin</b></p>
9506
9525
<p>Let S<sub>1</sub>, ..., S<sub>n</sub> be a list of sets, where each set
@@ -9511,24 +9530,24 @@ <h4>Aggregate Algebra</h4>
9511
9530
..., agg<sub>n</sub>→val<sub>n</sub> | key in K and key→val<sub>i</sub> in
9512
9531
S<sub>i</sub> for each 1 <= i <= n }</p>
9513
9532
</div>
9514
- <p>Flatten is a function which is used to collapse multisets of lists into a multiset, so
9515
- for example { (1, 2), (3, 4) } becomes { 1, 2, 3, 4 } .</p>
9533
+ <p>Flatten is a function which is used to collapse a sequence of lists into a single list.
9534
+ For example, [ (1, 2), (3, 4)] becomes ( 1, 2, 3, 4) .</p>
9516
9535
<div class="defn">
9517
9536
<p><b>Definition: Flatten</b></p>
9518
- <p>The Flatten(M ) function takes a multiset of lists, M { (L<sub>1</sub>, L<sub>2</sub>,
9519
- ...), ...} , and returns the multiset { x | L in M and x in L } .</p>
9537
+ <p>The Flatten(S ) function takes a sequence of lists, S = [ (L<sub>1</sub>, L<sub>2</sub>,
9538
+ ...), ...] , and returns the list ( x | L in S and x in L ) .</p>
9520
9539
</div>
9521
9540
<section id="setFunctions">
9522
9541
<h5>Set Functions</h5>
9523
9542
<p>The set functions which underlie SPARQL aggregates all have a common signature:
9524
- SetFunc(M ), or SetFunc(M , scalarvals) where M is a multiset of lists, and scalarvals is
9543
+ SetFunc(S ), or SetFunc(S , scalarvals) where S is a sequence of lists, and scalarvals is
9525
9544
one or more scalar values that are passed to the set function indirectly via the ( ...
9526
9545
; key=value ) syntax for aggregates in the SPARQL grammar. The only use of this that is
9527
9546
supported by the built-in aggregates in SPARQL Query 1.1 is <code>GROUP_CONCAT</code>,
9528
9547
as in <code>GROUP_CONCAT(?x ; separator=", ")</code>.</p>
9529
9548
<p>Note that the name "Set Function" is somewhat historical — the arguments to set
9530
- functions are in fact multisets . The name is retained due to the commonality with SQL
9531
- Set Functions, which also operate over multisets.</p>
9549
+ functions are in fact sequences . The name is retained due to the commonality with SQL
9550
+ Set Functions, which operate over multisets.</p>
9532
9551
<p>The set functions defined in this document are Count, Sum, Min, Max, Avg,
9533
9552
GroupConcat, and Sample — corresponding to the aggregates <code>COUNT</code>,
9534
9553
<code>SUM</code>, <code>MIN</code>, <code>MAX</code>, <code>AVG</code>,
@@ -9546,10 +9565,10 @@ <h5>Count</h5>
9546
9565
has a bound, non-error value within the aggregate group.</p>
9547
9566
<div class="defn">
9548
9567
<p><b>Definition: <span id="defn_aggCount">Count</span></b></p>
9549
- <pre class="code nohighlight">xsd:integer Count(multiset M )</pre>
9550
- <p>N = Flatten(M )</p>
9551
- <p>remove error elements from N </p>
9552
- <p>Count(M ) = card[N ]</p>
9568
+ <pre class="code nohighlight">xsd:integer Count(sequence S )</pre>
9569
+ <p>L = Flatten(S )</p>
9570
+ <p>remove error elements from L </p>
9571
+ <p>Count(S ) = card[L ]</p>
9553
9572
</div>
9554
9573
</section>
9555
9574
<section id="aggSum">
@@ -9561,13 +9580,14 @@ <h5>Sum</h5>
9561
9580
be 6.0 (float).</p>
9562
9581
<div class="defn">
9563
9582
<p><b>Definition: <span id="defn_aggSum">Sum</span></b></p>
9564
- <pre class="code nohighlight">numeric Sum(multiset M)</pre>
9565
- <p>Sum(M) = Sum(ToList(Flatten(M))).</p>
9566
- <p>Sum(S) = op:numeric-add(S<sub>1</sub>, Sum(S<sub>2..n</sub>)) when card[S] >
9583
+ <pre class="code nohighlight">numeric Sum(sequence S)</pre>
9584
+ <p>L = Flatten(S)</p>
9585
+ <p>Sum(S) = Sum(L)</p>
9586
+ <p>Sum(L) = op:numeric-add(L<sub>1</sub>, Sum(L<sub>2..n</sub>)) when card[L] >
9567
9587
1<br>
9568
- Sum(S ) = op:numeric-add(S <sub>1</sub>, 0) when card[S ] = 1<br>
9569
- Sum(S ) = "0"^^xsd:integer when card[S ] = 0</p>
9570
- <p>In this way, Sum({ 1, 2, 3} ) = op:numeric-add(1, op:numeric-add(2,
9588
+ Sum(L ) = op:numeric-add(L <sub>1</sub>, 0) when card[L ] = 1<br>
9589
+ Sum(L ) = "0"^^xsd:integer when card[L ] = 0</p>
9590
+ <p>In this way, Sum( ( 1, 2, 3) ) = op:numeric-add(1, op:numeric-add(2,
9571
9591
op:numeric-add(3, 0))).</p>
9572
9592
</div>
9573
9593
</section>
@@ -9577,11 +9597,11 @@ <h5>Avg</h5>
9577
9597
average value for an expression over a group. It is defined in terms of Sum and Count.
9578
9598
<div class="defn">
9579
9599
<p><b>Definition: <span id="defn_aggAvg">Avg</span></b></p>
9580
- <pre class="code nohighlight">numeric Avg(multiset M )</pre>
9581
- <p>Avg(M ) = "0"^^xsd:integer, where Count(M ) = 0</p>
9582
- <p>Avg(M ) = Sum(M ) / Count(M ), where Count(M ) > 0</p>
9600
+ <pre class="code nohighlight">numeric Avg(sequence S )</pre>
9601
+ <p>Avg(S ) = "0"^^xsd:integer, where Count(S ) = 0</p>
9602
+ <p>Avg(S ) = Sum(S ) / Count(S ), where Count(S ) > 0</p>
9583
9603
</div>
9584
- <p>For example, Avg({1, 2, 3}) = Sum({1, 2, 3}) /Count({1, 2, 3} ) = 6/3 = 2.</p>
9604
+ <p>For example, Avg([(1), (2), (3)]) = Sum([(1), (2), (3)]) /Count([(1), (2), (3)] ) = 6/3 = 2.</p>
9585
9605
</section>
9586
9606
<section id="aggMin">
9587
9607
<h5>Min</h5>
@@ -9591,12 +9611,12 @@ <h5>Min</h5>
9591
9611
arbitrarily typed expressions.</p>
9592
9612
<div class="defn">
9593
9613
<p><b>Definition: <span id="defn_aggMin">Min</span></b></p>
9594
- <pre class="code nohighlight">term Min(multiset M )</pre>
9595
- <p>Min(M) = Min(ToList( Flatten(M)) )</p>
9596
- <p>Min({} ) = error. </p>
9597
- <p>The flattened multiset of values passed as an argument is converted to a sequence
9598
- S, this sequence is ordered as per the <code>ORDER BY ASC</code> clause.</p >
9599
- <p> Min(S ) = S<sub>0</sub> </p>
9614
+ <pre class="code nohighlight">term Min(sequence S )</pre>
9615
+ <p>L = Flatten(S )</p>
9616
+ <p>Min(S ) = Min(L) </p>
9617
+ <p>The flattened list L of values is ordered as per the <code>ORDER BY ASC</code> clause.</p>
9618
+ <p>Min(L) = L<sub>0</sub> if card[L] > 0<br >
9619
+ Min(L ) = error if card[L] = 0 </p>
9600
9620
</div>
9601
9621
</section>
9602
9622
<section id="aggMax">
@@ -9607,12 +9627,12 @@ <h5>Max</h5>
9607
9627
arbitrarily typed expressions.</p>
9608
9628
<div class="defn">
9609
9629
<p><b>Definition: <span id="defn_aggMax">Max</span></b></p>
9610
- <pre class="code nohighlight">term Max(multiset M )</pre>
9611
- <p>Max(M) = Max(ToList( Flatten(M)) )</p>
9612
- <p>Max({} ) = error. </p>
9613
- <p>The multiset of values passed as an argument is converted to a sequence S, this
9614
- sequence is ordered as per the <code>ORDER BY DESC</code> clause.</p >
9615
- <p> Max(S ) = S<sub>0</sub> </p>
9630
+ <pre class="code nohighlight">term Max(sequence S )</pre>
9631
+ <p>L = Flatten(S )</p>
9632
+ <p>Max(S ) = Max(L) </p>
9633
+ <p>The flattened list L of values is ordered as per the <code>ORDER BY DESC</code> clause.</p>
9634
+ <p>Max(L) = L<sub>0</sub> if card[L] > 0<br >
9635
+ Max(L ) = error if card[L] = 0 </p>
9616
9636
</div>
9617
9637
</section>
9618
9638
<section id="aggGroupConcat">
@@ -9623,33 +9643,33 @@ <h5>GroupConcat</h5>
9623
9643
SEPARATOR.</p>
9624
9644
<div class="defn">
9625
9645
<p><b>Definition: <span id="defn_aggGroupConcat">GroupConcat</span></b></p>
9626
- <pre class="code nohighlight">literal GroupConcat(multiset M )</pre>
9646
+ <pre class="code nohighlight">literal GroupConcat(sequence S )</pre>
9627
9647
<p>If the "separator" scalar argument is absent from GROUP_CONCAT then it is taken to
9628
9648
be the "space" character, unicode codepoint U+0020.</p>
9629
- <p>The multiset of values, M passed as an argument is converted to a sequence S. </p>
9630
- <p>GroupConcat(M , scalarvals) = GroupConcat(Flatten(M) , scalarvals("separator"))</p>
9631
- <p>GroupConcat(S , sep) = "", where <span style=
9632
- "font-size: 140%">|</span>S <span style="font-size: 140%">|</span> = 0</p>
9633
- <p>GroupConcat(S , sep) = CONCAT("", S <sub>0</sub>), where
9634
- <span style="font-size: 140%">|</span>S <span style="font-size: 140%">|</span> = 1</p>
9635
- <p>GroupConcat(S , sep) = CONCAT(S <sub>0</sub>, sep, GroupConcat(S <sub>1..n-1</sub>,
9636
- sep)), where <span style="font-size: 140%">|</span>S <span style="font-size: 140%">|</span> > 1</p>
9637
- </div>
9638
- <p>For example, GroupConcat({ "a", "b", "c"} , {"separator" → "."}) = "a.b.c".</p>
9649
+ <p>L = Flatten(S) </p>
9650
+ <p>GroupConcat(S , scalarvals) = GroupConcat(L , scalarvals("separator"))</p>
9651
+ <p>GroupConcat(L , sep) = "", where <span style=
9652
+ "font-size: 140%">|</span>L <span style="font-size: 140%">|</span> = 0</p>
9653
+ <p>GroupConcat(L , sep) = CONCAT("", L <sub>0</sub>), where
9654
+ <span style="font-size: 140%">|</span>L <span style="font-size: 140%">|</span> = 1</p>
9655
+ <p>GroupConcat(L , sep) = CONCAT(L <sub>0</sub>, sep, GroupConcat(L <sub>1..n-1</sub>,
9656
+ sep)), where <span style="font-size: 140%">|</span>L <span style="font-size: 140%">|</span> > 1</p>
9657
+ </div>
9658
+ <p>For example, GroupConcat([( "a"), ( "b"), ( "c")] , {"separator" → "."}) = "a.b.c".</p>
9639
9659
</section>
9640
9660
<section id="aggSample">
9641
9661
<h5>Sample</h5>
9642
- <p>Sample is a set function which returns an arbitrary value from the multiset passed
9662
+ <p>Sample is a set function which returns an arbitrary value from the sequence passed
9643
9663
to it.</p>
9644
9664
<div class="defn">
9645
9665
<p><b>Definition: <span id="defn_aggSample">Sample</span></b></p>
9646
- <pre class="code nohighlight">RDFTerm Sample(multiset M )</pre>
9647
- <p>Sample(M ) = v, where v in Flatten(M )</p>
9648
- <p>Sample({} ) = error</p>
9666
+ <pre class="code nohighlight">RDFTerm Sample(sequence S )</pre>
9667
+ <p>Sample(S ) = v, where v in Flatten(S )</p>
9668
+ <p>Sample([] ) = error</p>
9649
9669
</div>
9650
- <p>For example, given Sample({ "a", "b", "c"} ), "a", "b", and "c" are all valid return
9670
+ <p>For example, given Sample([( "a"), ( "b"), ( "c")] ), "a", "b", and "c" are all valid return
9651
9671
values. Note that Sample() is not required to be deterministic for a given input, the
9652
- only restriction is that the output value must be present in the input multiset .</p>
9672
+ only restriction is that the output value must be present in the input sequence .</p>
9653
9673
</section>
9654
9674
</section>
9655
9675
<section id="sparqlAlgebraEval">
0 commit comments