|
1 | 1 |
|
2 | | - |
3 | | -5. **Pipelineable** Additions/Improvements |
4 | | - - Controlling Case Sensitivity in String Comparisons (`ff = FILTER fr BY EqualsIgnoreCase(franch_id, 'bOs'); DUMP ff;`) |
5 | | - - Select Records Using a List of Values |
6 | | - - very small inline list with the `CASE` statement -- `CASE X WHEN val1 ... WHEN val2 ... ELSE .. END` and `CASE WHEN cond .. WHEN cond .. ELSE .. END`) |
7 | | - - Transforming Strings with Regular Expressions |
8 | | - - Transforming Nulls into Real Values |
9 | | - - Converting a Number to its String Representation (and Back) (cast with (int)) |
10 | | - - Embedding Quotes and Special Characters Within String Literals. |
11 | | - - JSON encoding/decoding on a value (vs on a record) |
12 | | - - Assigning a Unique Identifier to Each Record (use `-tagPath` when loading; may require most recent Pig) |
13 | | - - `$0` references; `*` and `..` refereces |
14 | | -
|
15 | | - - Flattening a tuple gives you columns; Flattening a bag gives you rows |
16 | | - - Flattening bags == cross product |
17 | | - - Transposing Columns into Records (make the inline bag from several fields, then flatten it) |
18 | | - - Converting an Internally-Delimited Field into Multiple Columns Using STRSPLIT |
19 | | - - Converting an Internally-Delimited Field into Multiple Rows Using STRSPLITBAG |
20 | | - - Exploding a String into its Characters |
21 | | - - demonstrate case and ternary statements (combine/move demonstration in filter section?) |
22 | | - |
23 | | -6. **Grouping** Additions/Improvements |
24 | | - - JSON-encoded string |
25 | | - - completely summarizing |
26 | | - |
27 | | -
|
28 | | -7. **Joining** Additions/Improvements |
29 | | - - Replicated join |
30 | | - - stuff in "important notes about joins" |
31 | | -
|
32 | | -8. **Ordering and Uniquing** |
33 | | - - Demonstrate Sort in map/reduce |
34 | | - - max with/without ties, with/without record |
35 | | - - top-k with/without ties, with/without record |
36 | | - - running min/max |
37 | | - - mode (make an exercise) |
38 | | - - cardinality ie count of distict values |
39 | | -
|
40 | | -9. **Advanced Patterns** |
41 | | - - Better COGROUP |
42 | | - - Stitch and Over |
43 | | - - multi-join, |
44 | | - - master-detail |
45 | | - - z-score |
46 | | - - group/decorate/flatten |
47 | | - - group/flatten/re-flatten |
48 | | - - cube & rollup |
49 | | - - run expectancy (prediction) |
50 | | -
|
51 | 2 | 10. **Event log** |
52 | | - - Parsing logs and using regular expressions |
53 | | - - lead and lag |
54 | 3 | - geo IP via range query |
55 | 4 | - sessionizing, user paths |
56 | 5 | - abusing a webserver for testing |
57 | | - - Histograms and time series of pageviews |
58 | | - - Anomaly detection on Wikipedia Pageviews |
59 | | - - windowing and rolling statistics |
60 | | - - correlation of joint timeseries |
61 | | - - Holt-Winters |
62 | | - - Correlations |
| 6 | + - One more topic from the chopping block, below |
63 | 7 |
|
64 | 8 | 11. **Geo Analysis** |
65 | 9 | - quad keys for point density heat map |
|
72 | 16 | - joining stadiums onto quads |
73 | 17 | - breaking voronoi regions into multi-scale quads |
74 | 18 | - map weather observations to cells, average |
75 | | - - spatial join of points and multi-scale quads |
76 | | - - spatial join of quads on quads ("range" query) |
77 | 19 |
|
78 | 20 | 12. **Text Analysis** |
79 | 21 | - grep'ing etc for simple matches |
|
83 | 25 | - group decorate flatten to get rates |
84 | 26 | - good turing to knock back |
85 | 27 | - pointwise mutual information to see words |
| 28 | +
|
| 29 | +
|
| 30 | +10. **Event log** (Chopping Block topics) |
| 31 | + - Parsing logs and using regular expressions |
| 32 | + - lead and lag; windowing and rolling statistics |
| 33 | + - Histograms and time series of pageviews |
| 34 | + - Anomaly detection on Wikipedia Pageviews |
| 35 | + - correlation of joint timeseries |
| 36 | + - Holt-Winters |
| 37 | + - Correlations |
| 38 | + |
| 39 | +11. **Geo Analysis** (Chopping Block) |
| 40 | + - spatial join of points and multi-scale quads |
| 41 | + - spatial join of quads on quads ("range" query) |
| 42 | +
|
| 43 | +12. **Text Analysis** (Chopping Block topics) |
86 | 44 | - Minhashing to combat a massive feature space |
87 | 45 | - How to cheat with Bloom filters |
88 | | - - |
89 | 46 |
|
90 | | -13. **Data Munging (Semi-Structured Data)** |
91 | | - - Wikipedia for character encoding |
92 | | - - airports for reconciliation |
93 | | - - weather: parsing flat pack file |
94 | 47 |
|
95 | 48 | 14. **Statistics** |
96 | 49 | - subsetting / sampling your data: consistent sampling, distributions, replacement |
|
112 | 65 | - tuning |
113 | 66 | - why algebraic UDFs are awesome and how to be algebraic |
114 | 67 |
|
| 68 | +13. **Data Munging (Semi-Structured Data)** |
| 69 | + - Wikipedia for character encoding |
| 70 | + - airports for reconciliation |
| 71 | + - weather: parsing flat pack file |
| 72 | +
|
115 | 73 | 21. *Hadoop Internals* |
116 | 74 | - What happens when a job is launched |
117 | 75 | - A shallow dive into the HDFS |
|
121 | 79 | - Tuning for the Brave and Foolish |
122 | 80 | - The USE Method |
123 | 81 | 23. **Data Modeling for HBase-style Database** |
| 82 | +
|
| 83 | +
|
| 84 | +=== Chopping Block |
| 85 | + |
| 86 | + |
| 87 | +5. **Pipelineable** Additions/Improvements |
| 88 | + - Controlling Case Sensitivity in String Comparisons (`ff = FILTER fr BY EqualsIgnoreCase(franch_id, 'bOs'); DUMP ff;`) |
| 89 | + - Select Records Using a List of Values |
| 90 | + - very small inline list with the `CASE` statement -- `CASE X WHEN val1 ... WHEN val2 ... ELSE .. END` and `CASE WHEN cond .. WHEN cond .. ELSE .. END`) |
| 91 | + - Transforming Strings with Regular Expressions |
| 92 | + - Transforming Nulls into Real Values |
| 93 | + - Converting a Number to its String Representation (and Back) (cast with (int)) |
| 94 | + - Embedding Quotes and Special Characters Within String Literals. |
| 95 | + - JSON encoding/decoding on a value (vs on a record) |
| 96 | + - Assigning a Unique Identifier to Each Record (use `-tagPath` when loading; may require most recent Pig) |
| 97 | + - `$0` references; `*` and `..` refereces |
| 98 | + |
| 99 | + - Flattening a tuple gives you columns; Flattening a bag gives you rows |
| 100 | + - Flattening bags == cross product |
| 101 | + - Transposing Columns into Records (make the inline bag from several fields, then flatten it) |
| 102 | + - Converting an Internally-Delimited Field into Multiple Columns Using STRSPLIT |
| 103 | + - Converting an Internally-Delimited Field into Multiple Rows Using STRSPLITBAG |
| 104 | + - Exploding a String into its Characters |
| 105 | + - demonstrate case and ternary statements (combine/move demonstration in filter section?) |
| 106 | + |
| 107 | +6. **Grouping** Additions/Improvements |
| 108 | + - JSON-encoded string |
| 109 | + - completely summarizing |
| 110 | + |
| 111 | + |
| 112 | +7. **Joining** Additions/Improvements |
| 113 | + - Replicated join |
| 114 | + - stuff in "important notes about joins" |
| 115 | + |
| 116 | +8. **Ordering and Uniquing** |
| 117 | + - Demonstrate Sort in map/reduce |
| 118 | + - max with/without ties, with/without record |
| 119 | + - top-k with/without ties, with/without record |
| 120 | + - running min/max |
| 121 | + - mode (make an exercise) |
| 122 | + - cardinality ie count of distict values |
| 123 | + |
| 124 | +9. **Advanced Patterns** |
| 125 | + - Better COGROUP |
| 126 | + - Stitch and Over |
| 127 | + - multi-join, |
| 128 | + - master-detail |
| 129 | + - z-score |
| 130 | + - group/decorate/flatten |
| 131 | + - group/flatten/re-flatten |
| 132 | + - cube & rollup |
| 133 | + - run expectancy (prediction) |
124 | 134 |
|
125 | 135 | 27. **Intro to Storm+Trident** |
126 | 136 | 28. **Machine Learning without Grad School**: |
|
0 commit comments