|
12 | 12 |
|
13 | 13 | ## Index
|
14 | 14 |
|
15 |
| -* [spark](#spark) |
16 |
| -* [mapreduce-python](#mapreduce-python) |
17 |
| -* [kaggle-and-business-analyses](#kaggle-and-business-analyses) |
18 | 15 | * [deep-learning](#deep-learning)
|
19 | 16 | * [scikit-learn](#scikit-learn)
|
20 | 17 | * [statistical-inference-scipy](#statistical-inference-scipy)
|
21 | 18 | * [pandas](#pandas)
|
22 | 19 | * [matplotlib](#matplotlib)
|
23 | 20 | * [numpy](#numpy)
|
24 | 21 | * [python-data](#python-data)
|
| 22 | +* [kaggle-and-business-analyses](#kaggle-and-business-analyses) |
| 23 | +* [spark](#spark) |
| 24 | +* [mapreduce-python](#mapreduce-python) |
25 | 25 | * [amazon web services](#aws)
|
26 | 26 | * [command lines](#commands)
|
27 | 27 | * [misc](#misc)
|
|
31 | 31 | * [contact-info](#contact-info)
|
32 | 32 | * [license](#license)
|
33 | 33 |
|
34 |
| -<br/> |
35 |
| -<p align="center"> |
36 |
| - <img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/spark.png"> |
37 |
| -</p> |
38 |
| - |
39 |
| -## spark |
40 |
| - |
41 |
| -IPython Notebook(s) demonstrating spark and HDFS functionality. |
42 |
| - |
43 |
| -| Notebook | Description | |
44 |
| -|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| |
45 |
| -| [spark](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. | |
46 |
| -| [hdfs](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. | |
47 |
| - |
48 |
| -<br/> |
49 |
| -<p align="center"> |
50 |
| - <img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/mrjob.png"> |
51 |
| -</p> |
52 |
| - |
53 |
| -## mapreduce-python |
54 |
| - |
55 |
| -IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality. |
56 |
| - |
57 |
| -| Notebook | Description | |
58 |
| -|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| |
59 |
| -| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and [mrjob](https://github.com/Yelp/mrjob) config file to analyze Amazon S3 bucket logs on Elastic MapReduce. [Disco](https://github.com/discoproject/disco/) is another python-based alternative.| |
60 |
| - |
61 |
| -<br/> |
62 |
| -<p align="center"> |
63 |
| - <img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/kaggle.png"> |
64 |
| -</p> |
65 |
| - |
66 |
| -## kaggle-and-business-analyses |
67 |
| - |
68 |
| -IPython Notebook(s) used in [kaggle](https://www.kaggle.com/) competitions and business analyses. |
69 |
| - |
70 |
| -| Notebook | Description | |
71 |
| -|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| |
72 |
| -| [titanic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb) | Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning. | |
73 |
| -| [churn-analysis](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/analyses/churn.ipynb) | Predicts customer churn. Exercises logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Discussion of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.| |
74 |
| - |
75 | 34 | <br/>
|
76 | 35 | <p align="center">
|
77 | 36 | <img src="http://i.imgur.com/ZhKXrKZ.png">
|
@@ -230,6 +189,47 @@ IPython Notebook(s) demonstrating Python functionality geared towards data analy
|
230 | 189 | | [pdb](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/pdb.ipynb) | Learn how to debug in Python with the interactive source code debugger. |
|
231 | 190 | | [unit tests](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/python-data/unit_tests.ipynb) | Learn how to test in Python with Nose unit tests. |
|
232 | 191 |
|
| 192 | +<br/> |
| 193 | +<p align="center"> |
| 194 | + <img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/kaggle.png"> |
| 195 | +</p> |
| 196 | + |
| 197 | +## kaggle-and-business-analyses |
| 198 | + |
| 199 | +IPython Notebook(s) used in [kaggle](https://www.kaggle.com/) competitions and business analyses. |
| 200 | + |
| 201 | +| Notebook | Description | |
| 202 | +|-------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------| |
| 203 | +| [titanic](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/kaggle/titanic.ipynb) | Predicts survival on the Titanic. Demonstrates data cleaning, exploratory data analysis, and machine learning. | |
| 204 | +| [churn-analysis](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/analyses/churn.ipynb) | Predicts customer churn. Exercises logistic regression, gradient boosting classifers, support vector machines, random forests, and k-nearest-neighbors. Discussion of confusion matrices, ROC plots, feature importances, prediction probabilities, and calibration/descrimination.| |
| 205 | + |
| 206 | +<br/> |
| 207 | +<p align="center"> |
| 208 | + <img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/spark.png"> |
| 209 | +</p> |
| 210 | + |
| 211 | +## spark |
| 212 | + |
| 213 | +IPython Notebook(s) demonstrating spark and HDFS functionality. |
| 214 | + |
| 215 | +| Notebook | Description | |
| 216 | +|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| |
| 217 | +| [spark](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/spark.ipynb) | In-memory cluster computing framework, up to 100 times faster for certain applications and is well suited for machine learning algorithms. | |
| 218 | +| [hdfs](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/spark/hdfs.ipynb) | Reliably stores very large files across machines in a large cluster. | |
| 219 | + |
| 220 | +<br/> |
| 221 | +<p align="center"> |
| 222 | + <img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/mrjob.png"> |
| 223 | +</p> |
| 224 | + |
| 225 | +## mapreduce-python |
| 226 | + |
| 227 | +IPython Notebook(s) demonstrating Hadoop MapReduce with mrjob functionality. |
| 228 | + |
| 229 | +| Notebook | Description | |
| 230 | +|--------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------| |
| 231 | +| [mapreduce-python](http://nbviewer.ipython.org/github/donnemartin/data-science-ipython-notebooks/blob/master/mapreduce/mapreduce-python.ipynb) | Runs MapReduce jobs in Python, executing jobs locally or on Hadoop clusters. Demonstrates Hadoop Streaming in Python code with unit test and [mrjob](https://github.com/Yelp/mrjob) config file to analyze Amazon S3 bucket logs on Elastic MapReduce. [Disco](https://github.com/discoproject/disco/) is another python-based alternative.| |
| 232 | + |
233 | 233 | <br/>
|
234 | 234 | <p align="center">
|
235 | 235 | <img src="https://raw.githubusercontent.com/donnemartin/data-science-ipython-notebooks/master/images/aws.png">
|
|
0 commit comments