@@ -977,11 +977,11 @@ to put your clean data into all the right places.
977977
978978Let's start with something simple. How about we output a "PR Report" listing
979979the title of each PR along with its PR number and creation date:
980- ``` mdtest-command dir=book/src/tutorials
980+ ``` mdtest-command-skip dir=book/src/tutorials
981981super -f table -c '{DATE:created_at,NUMBER:f"PR #{number}",TITLE:title}' prs.bsup
982982```
983983and you'll see this output...
984- ``` mdtest-output head
984+ ``` mdtest-output-skip head
985985DATE NUMBER TITLE
9869862019-11-11T19:50:46Z PR #1 Make "make" work in zq
9879872019-11-11T20:57:12Z PR #2 fix install target
@@ -996,14 +996,14 @@ to convert the field `number` into a string and format it with surrounding text.
996996Instead of old PRs, we can get the latest list of PRs using the
997997[ ` tail ` operator] ( ../super-sql/operators/tail.md ) since we know the data is sorted
998998chronologically. This command retrieves the last five PRs in the dataset:
999- ``` mdtest-command dir=book/src/tutorials
999+ ``` mdtest-command-skip dir=book/src/tutorials
10001000super -f table -c '
10011001 tail 5
10021002 | {DATE:created_at,"NUMBER":f"PR #{number}",TITLE:title}
10031003' prs.bsup
10041004```
10051005and the output is:
1006- ``` mdtest-output
1006+ ``` mdtest-output-skip
10071007DATE NUMBER TITLE
100810082019-11-18T22:14:08Z PR #26 ndjson writer
100910092019-11-18T22:43:07Z PR #27 Add reader for ndjson input
@@ -1014,11 +1014,11 @@ DATE NUMBER TITLE
10141014
10151015How about some aggregations? We can count the number of PRs and sort by the
10161016count highest first:
1017- ``` mdtest-command dir=book/src/tutorials
1017+ ``` mdtest-command-skip dir=book/src/tutorials
10181018super -s -c "count() by user:=user.login | sort count desc" prs.bsup
10191019```
10201020produces
1021- ``` mdtest-output
1021+ ``` mdtest-output-skip
10221022{user:"mattnibs",count:10}
10231023{user:"aswan",count:7}
10241024{user:"mccanne",count:6}
@@ -1028,13 +1028,13 @@ produces
10281028How about getting a list of all of the reviewers? To do this, we need to
10291029traverse the records in the ` requested_reviewers ` array and collect up
10301030the login field from each record:
1031- ``` mdtest-command dir=book/src/tutorials
1031+ ``` mdtest-command-skip dir=book/src/tutorials
10321032super -s -c 'unnest requested_reviewers | collect(login)' prs.bsup
10331033```
10341034Oops, this gives us an array of the reviewer logins
10351035with repetitions since [ ` collect ` ] ( ../super-sql/aggregates/collect.md )
10361036collects each item that it encounters into an array:
1037- ``` mdtest-output
1037+ ``` mdtest-output-skip
10381038["mccanne","nwt","henridf","mccanne","nwt","mccanne","mattnibs","henridf","mccanne","mattnibs","henridf","mccanne","mattnibs","henridf","mccanne","nwt","aswan","henridf","mccanne","nwt","aswan","philrz","mccanne","mccanne","aswan","henridf","aswan","mccanne","nwt","aswan","mikesbrown","henridf","aswan","mattnibs","henridf","mccanne","aswan","nwt","henridf","mattnibs","aswan","aswan","mattnibs","aswan","henridf","aswan","henridf","mccanne","aswan","aswan","mccanne","nwt","aswan","henridf","aswan"]
10391039```
10401040What we'd prefer is a set of reviewers where each reviewer appears only once. This
@@ -1043,11 +1043,11 @@ is easily done with the [`union`](../super-sql/aggregates/union.md) aggregate fu
10431043computes the set-wise union of its input and produces a ` set ` type as its
10441044output. In this case, the output is a set of strings, written ` |[string]| `
10451045in the query language. For example:
1046- ``` mdtest-command dir=book/src/tutorials
1046+ ``` mdtest-command-skip dir=book/src/tutorials
10471047super -s -c 'unnest requested_reviewers | reviewers:=union(login)' prs.bsup
10481048```
10491049produces
1050- ``` mdtest-output
1050+ ``` mdtest-output-skip
10511051{reviewers:|["nwt","aswan","philrz","henridf","mccanne","mattnibs","mikesbrown"]|}
10521052```
10531053Ok, that's pretty neat.
@@ -1063,11 +1063,11 @@ create this with a ["lateral subquery"] **TODO: FIX**.
10631063Instead of computing a set-union over all the reviewers across all PRs,
10641064we instead want to compute the set-union over the reviewers in each PR.
10651065We can do this as follows:
1066- ``` mdtest-command dir=book/src/tutorials
1066+ ``` mdtest-command-skip dir=book/src/tutorials
10671067super -s -c 'unnest requested_reviewers into ( reviewers:=union(login) )' prs.bsup
10681068```
10691069which produces an output like this:
1070- ``` mdtest-output head
1070+ ``` mdtest-output-skip head
10711071{reviewers:|["nwt","mccanne"]|}
10721072{reviewers:|["nwt","henridf","mccanne"]|}
10731073{reviewers:|["mccanne","mattnibs"]|}
@@ -1088,7 +1088,7 @@ bringing that value into the scope using a `with` clause appended to the
10881088` over ` expression and returning a
10891089[ record literal] ( ../super-sql/types/record.md#record-expressions )
10901090with the desired value:
1091- ``` mdtest-command dir=book/src/tutorials
1091+ ``` mdtest-command-skip dir=book/src/tutorials
10921092super -s -c '
10931093 unnest {user:user.login,reviewer:requested_reviewers} into (
10941094 reviewers:=union(reviewer.login) by user
@@ -1097,7 +1097,7 @@ super -s -c '
10971097' prs.bsup
10981098```
10991099which gives us
1100- ``` mdtest-output head
1100+ ``` mdtest-output-skip head
11011101{user:"aswan",reviewers:|["mccanne"]|}
11021102{user:"aswan",reviewers:|["nwt","mccanne"]|}
11031103{user:"aswan",reviewers:|["nwt","henridf","mccanne"]|}
@@ -1110,7 +1110,7 @@ which gives us
11101110```
11111111The final step is to simply aggregate the "reviewer sets" with the ` user ` field
11121112as the grouping key:
1113- ``` mdtest-command dir=book/src/tutorials
1113+ ``` mdtest-command-skip dir=book/src/tutorials
11141114super -S -c '
11151115 unnest {user:user.login,reviewer:requested_reviewers} into (
11161116 reviewers:=union(reviewer.login) by user
@@ -1120,7 +1120,7 @@ super -S -c '
11201120' prs.bsup
11211121```
11221122and we get
1123- ``` mdtest-output
1123+ ``` mdtest-output-skip
11241124{
11251125 user: "aswan",
11261126 groups: |[
@@ -1233,7 +1233,7 @@ To quantify this concept, we can easily modify this query to compute
12331233the average number of reviewers requested instead of the set of groups
12341234of reviewers. To do this, we just average the reviewer set size
12351235with an aggregation:
1236- ``` mdtest-command dir=book/src/tutorials
1236+ ``` mdtest-command-skip dir=book/src/tutorials
12371237super -s -c '
12381238 unnest {user:user.login,reviewer:requested_reviewers} into (
12391239 reviewers:=union(reviewer.login) by user
@@ -1243,7 +1243,7 @@ super -s -c '
12431243' prs.bsup
12441244```
12451245which produces
1246- ``` mdtest-output
1246+ ``` mdtest-output-skip
12471247{user:"mccanne",avg_reviewers:1.}
12481248{user:"nwt",avg_reviewers:1.75}
12491249{user:"aswan",avg_reviewers:2.4}
@@ -1253,7 +1253,7 @@ which produces
12531253
12541254Of course, if you'd like the query output in JSON, you can just say ` -j ` and
12551255` super ` will happily format the sets as JSON arrays, e.g.,
1256- ``` mdtest-command dir=book/src/tutorials
1256+ ``` mdtest-command-skip dir=book/src/tutorials
12571257super -j -c '
12581258 unnest {user:user.login,reviewer:requested_reviewers} into (
12591259 reviewers:=union(reviewer.login) by user
@@ -1263,7 +1263,7 @@ super -j -c '
12631263' prs.bsup
12641264```
12651265produces
1266- ``` mdtest-output
1266+ ``` mdtest-output-skip
12671267{"user":"aswan","groups":[["mccanne"],["nwt","mccanne"],["nwt","henridf","mccanne"],["henridf","mccanne","mattnibs"]]}
12681268{"user":"henridf","groups":[["nwt","aswan","mccanne"]]}
12691269{"user":"mattnibs","groups":[["aswan","henridf"],["aswan","mccanne"],["aswan","henridf","mccanne"],["nwt","aswan","henridf","mccanne"],["nwt","aswan","mccanne","mikesbrown"],["nwt","aswan","philrz","henridf","mccanne"]]}
0 commit comments