You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 11e-weather_near_you.asciidoc
+76-28Lines changed: 76 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,32 +33,80 @@ Let's walk through it. As with the other walkthroughs in this chapter, we're goi
33
33
// and write out keys in base-4 (`0`, `1`, `2`, `3`)
34
34
35
35
36
-
An important difference from the conventional COGROUP comes in how we designed the sorting keys and data structures. In a conventional COGROUP, we order the data by the partition key, then the table slot (all records from the left-mentioned input precede those from the last-mentioned input), then any secondary sort keys. That means we don't need a data structure for the last-mentioned input and don't even hold its records in memory -- all possible matches for a record from the last input are already sitting hot in RAM ready to make beautiful output tuples. In the spatial COGROUP, we partition on the coarse zoom-level prefix, then sort on the full `quadord` key before the table slot index. Since the keys must be sorted to support the depth-first-like traversal, it's likely that matching rows from each slot will intermingle. So while the regular COGROUP doesn't have to allocate a data structure for the records in its last-mentioned input, a spatial join of two tables needs to maintain two stacks.
37
-
38
-
39
-
quad slot place | - - - - stack 0: tiles for first slot (rooms) - - - - | - - stack 1: 2nd slot (people) - -
40
-
0 0 Manor | Manor 0 |
41
-
0 0 West Wing | Manor 0 WestWing 0 |
42
-
00 0 Kitchen | Manor 0 WestWing 0 Kitchen 00 |
43
-
01 0 Kitchen | Manor 0 WestWing 0 Kitchen 01 |
44
-
0120 1 Mme La Blanc | Manor 0 WestWing 0 Kitchen 01 | Mme La Blanc 0120
0300 1 Red Kelly | 0 WestWing 030 Closet ! 0300 Red Kelly
48
+
031 0 Stairs | 0 WestWing 031 Stairs !
49
+
032 0 Closet | 0 WestWing 032 Closet !
50
+
033 0 Stairs | 0 WestWing 033 Stairs !
51
+
... | !
52
+
10 0 Conservatory | 10 Conservatory !
53
+
10 0 East Wing | 10 Conservatory 10 EastWing !
54
+
110 0 Conservatory | 110 Conservatory !
55
+
110 0 East Wing | 110 Conservatory 110 EastWing !
56
+
112 0 Conservatory | 112 Conservatory !
57
+
112 0 East Wing | 112 Conservatory 112 EastWing !
58
+
1123 1 Ms Peach | 112 Conservatory 112 EastWing ! 1123 Ms Peach
59
+
60
+
61
+
The `0 West Wing` covers all of the `0***` blocks, so it comes first in line. There's nothing to sweep off the stacks, and nothing to pair with, so we just push it onto stack 0. Next is the `00 Kitchen` block, covering `00**`. We keep the `0 WestWing` block (as `0` is a prefix of `00`), push `00 Kitchen` onto stack 0, and since there's nothing to pair with, continue.
62
+
63
+
The `01 Kitchen` block is next. It evicts the neighboring `00 Kitchen` block, but not its ancestor `0 WestWing` block. Since there's still nothing in the `people` stack, we push `01 Kitchen` and move on. Mme La Rose's record, the first we've seen from the peopl slot, now finally gets the party started. Both keys on the stack (`0` and `01`) are ancestors of `0120`, and so nothing is swept and we push Mme La Rose onto the people table. The matching phase generates pairs indicating that Mme La Rose is `01 Kitchen` and `0 WestWing` at the time of the incident.
64
+
65
+
The `0120 Pantry` tile sweeps its predecessor but produces no matches, as do the next two Pantry tiles.
66
+
`0133 Mr Saffron` finds himself paired with three containing shapes: the WestWing, Kitchen and Pantry. The next step continues with the `02 Dining Room` in the lower left of the `0` block. This sweeps out both the Pantry and the Kitchen from the stack, but retains the parent `0 WestWing`.
67
+
68
+
We've supplied a few more of the blocks -- trace through them until you're getting the hang of it. Let's skip ahead though.
69
+
The diagonal south wall separating the Dining room from the Lounge means that the 2002, 2003, 2011, and 2012 blocks contain parts of each room, and it's useful to see where the ambiguity is resolved.
2003 1 Dr Jade | 20 WestWing 2003 Dining 2003 Lounge ! 2003 Dr Jade
80
+
2003 1 Sr Azul | 20 WestWing 2003 Dining 2003 Lounge ! 2003 Dr Jade 2003 Sr Azul
81
+
2010 0 Dining | 20 WestWing 2010 Dining !
82
+
83
+
The `2003 Lounge` record comes up with the `20 WestWing` and `2003 Dining` records already on the stack. The `2003 Dining` record has the same tile id, and so according to our rules it is _not_ evicted. The `2003 Dr Jade` record sweeps nothing from the stack and generates pairs for the WestWing, Dining Room and Lounge. The `2003 Sr Azul` record sweeps nothing from either stack (even the `Dr Jade` record). Its only pairings are with WestWing, Dining Room and Lounge, though -- the continued presence of `2003 Dr Jade` in the second stack has nothing to do with the matchmaking.
84
+
85
+
Here is a sample of the pairs that come out of all of this:
86
+
87
+
0 WestWing 0120 55 30 Mme La Rose
88
+
01 Kitchen 0120 55 30 Mme La Rose
89
+
0 WestWing 0133 90 40 Mr Saffron
90
+
01 Kitchen 0133 90 40 Mr Saffron
91
+
0133 Pantry 0133 90 40 Mr Saffron
92
+
...
93
+
20 WestWing 2003 20 115 Dr Jade
94
+
2003 Dining 2003 20 115 Dr Jade
95
+
2003 Lounge 2003 20 115 Dr Jade
96
+
20 WestWing 2003 23 122 Sr Azul
97
+
2003 Dining 2003 23 122 Sr Azul
98
+
2003 Lounge 2003 23 122 Sr Azul
99
+
100
+
The output of this step only considered tile membership, and so Dr Jade and Sr Azul are each listed in candidate pairings with the Dining Room and the Lounge. That's by design; our next step is to use each room's geometry object to filter out the non-matches.
101
+
102
+
------
103
+
peep_rooms_f = FILTER peep_rooms_g BY GeoIntersects(room.geom, peep.pt);
104
+
------
105
+
106
+
Using only Hadoop's built-in sort and little memory overhead,
107
+
we were able to
108
+
assemble records into groups even when they weren't contiguous in the sort order.
109
+
110
+
// TODO: more here
64
111
112
+
An important difference from the conventional COGROUP comes in how we designed the sorting keys and data structures. In a conventional COGROUP, we order the data by the partition key, then the table slot (all records from the left-mentioned input precede those from the last-mentioned input), then any secondary sort keys. That means we don't need a data structure for the last-mentioned input and don't even hold its records in memory -- all possible matches for a record from the last input are already sitting hot in RAM ready to make beautiful output tuples. In the spatial COGROUP, we partition on the coarse zoom-level prefix, then sort on the full `quadord` key before the table slot index. Since the keys must be sorted to support the depth-first-like traversal, it's likely that matching rows from each slot will intermingle. So while the regular COGROUP doesn't have to allocate a data structure for the records in its last-mentioned input, a spatial join of two tables needs to maintain two stacks.
0 commit comments