Skip to content

Commit 097cefc

Browse files
author
Philip (flip) Kromer
committed
build, book, build!
1 parent a5dbf8b commit 097cefc

File tree

3 files changed

+442
-46
lines changed

3 files changed

+442
-46
lines changed

11e-weather_near_you.asciidoc

Lines changed: 76 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -33,32 +33,80 @@ Let's walk through it. As with the other walkthroughs in this chapter, we're goi
3333
// and write out keys in base-4 (`0`, `1`, `2`, `3`)
3434

3535

36-
An important difference from the conventional COGROUP comes in how we designed the sorting keys and data structures. In a conventional COGROUP, we order the data by the partition key, then the table slot (all records from the left-mentioned input precede those from the last-mentioned input), then any secondary sort keys. That means we don't need a data structure for the last-mentioned input and don't even hold its records in memory -- all possible matches for a record from the last input are already sitting hot in RAM ready to make beautiful output tuples. In the spatial COGROUP, we partition on the coarse zoom-level prefix, then sort on the full `quadord` key before the table slot index. Since the keys must be sorted to support the depth-first-like traversal, it's likely that matching rows from each slot will intermingle. So while the regular COGROUP doesn't have to allocate a data structure for the records in its last-mentioned input, a spatial join of two tables needs to maintain two stacks.
37-
38-
39-
quad slot place | - - - - stack 0: tiles for first slot (rooms) - - - - | - - stack 1: 2nd slot (people) - -
40-
0 0 Manor | Manor 0 |
41-
0 0 West Wing | Manor 0 WestWing 0 |
42-
00 0 Kitchen | Manor 0 WestWing 0 Kitchen 00 |
43-
01 0 Kitchen | Manor 0 WestWing 0 Kitchen 01 |
44-
0120 1 Mme La Blanc | Manor 0 WestWing 0 Kitchen 01 | Mme La Blanc 0120
45-
0123 0 Pantry | Manor 0 WestWing 0 Kitchen 01 Pantry 0123 |
46-
0132 0 Pantry | Manor 0 WestWing 0 Kitchen 01 Pantry 0132 |
47-
0133 0 Pantry | Manor 0 WestWing 0 Kitchen 01 Pantry 0133 |
48-
0133 1 Mr Saffron | Manor 0 WestWing 0 Kitchen 01 Pantry 0133 | Mr Saffron 0133
49-
02 0 Dining | Manor 0 WestWing 0 Dining 02 |
50-
... | |
51-
... | |
52-
20 0 Manor | Manor 20 WestWing 20 |
53-
20 0 West Wing | Manor 20 WestWing 20 |
54-
2000 0 Dining | Manor 20 WestWing 20 Dining 2000 |
55-
2001 0 Dining | Manor 20 WestWing 20 Dining 2001 |
56-
2002 0 Dining | Manor 20 WestWing 20 Dining 2002 |
57-
2002 0 Lounge | Manor 20 WestWing 20 Dining 2002 Lounge 2002 |
58-
2003 0 Dining | Manor 20 WestWing 20 Dining 2003 |
59-
2003 0 Lounge | Manor 20 WestWing 20 Dining 2003 Lounge 2003 |
60-
2003 1 Dr Jade | Manor 20 WestWing 20 Dining 2003 Lounge 2003 | Dr Jade 2003
61-
2003 1 Sr Rojo | Manor 20 WestWing 20 Dining 2003 Lounge 2003 | Sr Rojo 2003
62-
2010 0 Dining | Manor 20 WestWing 20 Dining |
63-
36+
quad slot place | -- stack 0: tiles for first slot (rooms) -- ! -- stack 1: people slot --
37+
0 0 West Wing | 0 WestWing !
38+
00 0 Kitchen | 0 WestWing 00 Kitchen !
39+
01 0 Kitchen | 0 WestWing 01 Kitchen !
40+
0120 1 Mme La Rose | 0 WestWing 01 Kitchen ! 0120 Mme La Rose
41+
0123 0 Pantry | 0 WestWing 01 Kitchen 0123 Pantry !
42+
0132 0 Pantry | 0 WestWing 01 Kitchen 0132 Pantry !
43+
0133 0 Pantry | 0 WestWing 01 Kitchen 0133 Pantry !
44+
0133 1 Mr Saffron | 0 WestWing 01 Kitchen 0133 Pantry ! 0133 Mr Saffron
45+
02 0 Dining | 0 WestWing 02 Dining !
46+
030 0 Closet | 0 WestWing 030 Closet !
47+
0300 1 Red Kelly | 0 WestWing 030 Closet ! 0300 Red Kelly
48+
031 0 Stairs | 0 WestWing 031 Stairs !
49+
032 0 Closet | 0 WestWing 032 Closet !
50+
033 0 Stairs | 0 WestWing 033 Stairs !
51+
... | !
52+
10 0 Conservatory | 10 Conservatory !
53+
10 0 East Wing | 10 Conservatory 10 EastWing !
54+
110 0 Conservatory | 110 Conservatory !
55+
110 0 East Wing | 110 Conservatory 110 EastWing !
56+
112 0 Conservatory | 112 Conservatory !
57+
112 0 East Wing | 112 Conservatory 112 EastWing !
58+
1123 1 Ms Peach | 112 Conservatory 112 EastWing ! 1123 Ms Peach
59+
60+
61+
The `0 West Wing` covers all of the `0***` blocks, so it comes first in line. There's nothing to sweep off the stacks, and nothing to pair with, so we just push it onto stack 0. Next is the `00 Kitchen` block, covering `00**`. We keep the `0 WestWing` block (as `0` is a prefix of `00`), push `00 Kitchen` onto stack 0, and since there's nothing to pair with, continue.
62+
63+
The `01 Kitchen` block is next. It evicts the neighboring `00 Kitchen` block, but not its ancestor `0 WestWing` block. Since there's still nothing in the `people` stack, we push `01 Kitchen` and move on. Mme La Rose's record, the first we've seen from the peopl slot, now finally gets the party started. Both keys on the stack (`0` and `01`) are ancestors of `0120`, and so nothing is swept and we push Mme La Rose onto the people table. The matching phase generates pairs indicating that Mme La Rose is `01 Kitchen` and `0 WestWing` at the time of the incident.
64+
65+
The `0120 Pantry` tile sweeps its predecessor but produces no matches, as do the next two Pantry tiles.
66+
`0133 Mr Saffron` finds himself paired with three containing shapes: the WestWing, Kitchen and Pantry. The next step continues with the `02 Dining Room` in the lower left of the `0` block. This sweeps out both the Pantry and the Kitchen from the stack, but retains the parent `0 WestWing`.
67+
68+
We've supplied a few more of the blocks -- trace through them until you're getting the hang of it. Let's skip ahead though.
69+
The diagonal south wall separating the Dining room from the Lounge means that the 2002, 2003, 2011, and 2012 blocks contain parts of each room, and it's useful to see where the ambiguity is resolved.
70+
71+
... | !
72+
20 0 West Wing | 20 WestWing !
73+
2000 0 Dining | 20 WestWing 2000 Dining !
74+
2001 0 Dining | 20 WestWing 2001 Dining !
75+
2002 0 Dining | 20 WestWing 2002 Dining !
76+
2002 0 Lounge | 20 WestWing 2002 Dining 2002 Lounge !
77+
2003 0 Dining | 20 WestWing 2003 Dining !
78+
2003 0 Lounge | 20 WestWing 2003 Dining 2003 Lounge !
79+
2003 1 Dr Jade | 20 WestWing 2003 Dining 2003 Lounge ! 2003 Dr Jade
80+
2003 1 Sr Azul | 20 WestWing 2003 Dining 2003 Lounge ! 2003 Dr Jade 2003 Sr Azul
81+
2010 0 Dining | 20 WestWing 2010 Dining !
82+
83+
The `2003 Lounge` record comes up with the `20 WestWing` and `2003 Dining` records already on the stack. The `2003 Dining` record has the same tile id, and so according to our rules it is _not_ evicted. The `2003 Dr Jade` record sweeps nothing from the stack and generates pairs for the WestWing, Dining Room and Lounge. The `2003 Sr Azul` record sweeps nothing from either stack (even the `Dr Jade` record). Its only pairings are with WestWing, Dining Room and Lounge, though -- the continued presence of `2003 Dr Jade` in the second stack has nothing to do with the matchmaking.
84+
85+
Here is a sample of the pairs that come out of all of this:
86+
87+
0 WestWing 0120 55 30 Mme La Rose
88+
01 Kitchen 0120 55 30 Mme La Rose
89+
0 WestWing 0133 90 40 Mr Saffron
90+
01 Kitchen 0133 90 40 Mr Saffron
91+
0133 Pantry 0133 90 40 Mr Saffron
92+
...
93+
20 WestWing 2003 20 115 Dr Jade
94+
2003 Dining 2003 20 115 Dr Jade
95+
2003 Lounge 2003 20 115 Dr Jade
96+
20 WestWing 2003 23 122 Sr Azul
97+
2003 Dining 2003 23 122 Sr Azul
98+
2003 Lounge 2003 23 122 Sr Azul
99+
100+
The output of this step only considered tile membership, and so Dr Jade and Sr Azul are each listed in candidate pairings with the Dining Room and the Lounge. That's by design; our next step is to use each room's geometry object to filter out the non-matches.
101+
102+
------
103+
peep_rooms_f = FILTER peep_rooms_g BY GeoIntersects(room.geom, peep.pt);
104+
------
105+
106+
Using only Hadoop's built-in sort and little memory overhead,
107+
we were able to
108+
assemble records into groups even when they weren't contiguous in the sort order.
109+
110+
// TODO: more here
64111

112+
An important difference from the conventional COGROUP comes in how we designed the sorting keys and data structures. In a conventional COGROUP, we order the data by the partition key, then the table slot (all records from the left-mentioned input precede those from the last-mentioned input), then any secondary sort keys. That means we don't need a data structure for the last-mentioned input and don't even hold its records in memory -- all possible matches for a record from the last input are already sitting hot in RAM ready to make beautiful output tuples. In the spatial COGROUP, we partition on the coarse zoom-level prefix, then sort on the full `quadord` key before the table slot index. Since the keys must be sorted to support the depth-first-like traversal, it's likely that matching rows from each slot will intermingle. So while the regular COGROUP doesn't have to allocate a data structure for the records in its last-mentioned input, a spatial join of two tables needs to maintain two stacks.

11z-spatial_manor-data.asciidoc

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11

2-
Mme La Blanc 55 30 0120
2+
Mme La Rose 55 30 0120
33
Mr Saffron 90 40 0133
44
Ms Peach 165 30 1123
55
Rabbi Green 120 60 1211
66
The Vicar 155 170 1302
77
Dr Jade 20 115 2003
8-
Sr Rojo 23 122 2003
8+
Sr Azul 23 122 2003
99
Sgt Teal 145 130 3031
1010
Dean Grey 85 180 2330
1111

@@ -50,7 +50,7 @@
5050
0 0 West Wing Manor WestWing
5151
00 0 Kitchen Manor WestWing Kitchen
5252
01 0 Kitchen Manor WestWing Kitchen
53-
0120 1 Mme La Blanc Manor WestWing Kitchen Mme La Blanc
53+
0120 1 Mme La Rose Manor WestWing Kitchen Mme La Rose
5454
0123 0 Pantry Manor WestWing Kitchen Pantry
5555
0132 0 Pantry Manor WestWing Kitchen Pantry
5656
0133 0 Pantry Manor WestWing Kitchen Pantry
@@ -107,7 +107,7 @@
107107
2003 0 Dining Manor WestWing Dining
108108
2003 0 Lounge Manor WestWing Dining Lounge
109109
2003 1 Dr Jade Manor WestWing Dining Lounge Dr Jade
110-
2003 1 Sr Rojo Manor WestWing Dining Lounge Sr Rojo
110+
2003 1 Sr Azul Manor WestWing Dining Lounge Sr Azul
111111
2010 0 Dining Manor WestWing Dining
112112
2011 0 Dining Manor WestWing Dining
113113
2011 0 Lounge Manor WestWing Dining Lounge

0 commit comments

Comments
 (0)