Skip to content

Commit 7a39c0a

Browse files
authored
Add Kaggle kernels to documentation (#993)
* Add Kaggle kernels * Change year in docs * Update build script
1 parent 1f0fdaa commit 7a39c0a

File tree

7 files changed

+105
-7
lines changed

7 files changed

+105
-7
lines changed

doc/build-doc.sh

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,17 @@
1515
# limitations under the License.
1616
#===============================================================================
1717

18+
SAMPLES_DIR=sources/samples
19+
20+
# remove the samples folder if it exists
21+
if [ -d "$SAMPLES_DIR" ]; then rm -Rf $SAMPLES_DIR; fi
22+
23+
# create a samples folder
24+
mkdir $SAMPLES_DIR
25+
1826
# copy jupyter notebooks
1927
cd ..
20-
cp examples/notebooks/*.ipynb doc/sources/samples
28+
cp examples/notebooks/*.ipynb doc/$SAMPLES_DIR
2129

2230
# build the documentation
2331
cd doc

doc/sources/acceleration.rst

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,12 @@ Configurations:
3636

3737
- HW: c5.24xlarge AWS EC2 Instance using an Intel Xeon Platinum 8275CL with 2 sockets and 24 cores per socket
3838
- SW: scikit-learn version 0.24.2, scikit-learn-intelex version 2021.2.3, Python 3.8
39+
40+
Kaggle Kernels
41+
**************
42+
43+
Check out `Introduction to scikit-learn-intelex
44+
<https://www.kaggle.com/code/lordozvlad/introduction-to-scikit-learn-intelex/notebook>`_,
45+
a Kaggle notebook that summarizes the speedup you can achieve with |intelex|.
46+
47+
The acceleration is measured for a variety of machine learning workflows and Kaggle datasets.

doc/sources/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,13 @@
3737
# -- Project information -----------------------------------------------------
3838

3939
project = 'Intel(R) Extension for Scikit-learn*'
40-
copyright = '2021, Intel'
40+
copyright = '2022, Intel'
4141
author = 'Intel'
4242

4343
# The short X.Y version
4444
version = '2021'
4545
# The full version, including alpha/beta/rc tags
46-
release = '2021.5'
46+
release = '2021.6'
4747

4848

4949
# -- General configuration ---------------------------------------------------

doc/sources/kaggle.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,17 @@
1919
Kaggle Kernels
2020
--------------------
2121

22+
Check out Kaggle notebooks created by |intelex| developers.
23+
24+
.. rubric:: Acceleration
25+
26+
`Introduction to scikit-learn-intelex <https://www.kaggle.com/code/lordozvlad/introduction-to-scikit-learn-intelex/notebook>`_
27+
provides a summary of the speedup you can achieve with |intelex|.
28+
29+
.. rubric:: Machine Learning Workflows
30+
31+
Browse this chapter to find Kaggle kernels that use scikit-learn-intelex for a specific type of a machine learning task.
32+
2233
Kaggle kernels that use scikit-learn and |intelex|:
2334

2435
.. toctree::

doc/sources/kaggle/automl.rst

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,10 @@
2626
.. |automl_with_intelex_titanic| replace:: AutoML Binary Classification (Gradient Boosting, Random Forest, kNN) using AutoGluon with |intelex|
2727
.. _automl_with_intelex_titanic: https://www.kaggle.com/lordozvlad/titanic-automl-with-intel-extension-for-sklearn/notebook
2828

29+
.. |automl_with_intelex_tps_jan| replace:: AutoML Binary Classification (Random Forest, SVR, Blending) using PyCaret with |intelex|
30+
.. _automl_with_intelex_tps_jan: https://www.kaggle.com/code/lordozvlad/tps-jan-fast-pycaret-with-scikit-learn-intelex/notebook
31+
32+
2933
Kaggle Kernels that use AutoML and |intelex|
3034
--------------------------------------------
3135

@@ -55,4 +59,8 @@ The following Kaggle kernels show how to patch autoML framewokrs with |intelex|.
5559
* - |automl_with_intelex_tps_nov|_
5660

5761
**Data:** [TPS Nov 2021] Synthetic spam emails data
58-
- Identify spam emails via features extraced from the email
62+
- Identify spam emails via features extraced from the email
63+
* - |automl_with_intelex_tps_jan|_
64+
65+
**Data:** [TPS Jan 2022] Fictional Sales data
66+
- Predict the corresponding item sales for each date-country-store-item combination

doc/sources/kaggle/classification.rst

Lines changed: 54 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,18 @@ Binary Classification
4545
- search for optimal parameters using Optuna
4646
- training and prediction using scikit-learn-intelex
4747
- performance comparison to scikit-learn
48+
* - `Feature Importance in Random Forest for Binary Classification
49+
<https://www.kaggle.com/code/lordozvlad/fast-feature-importance-using-scikit-learn-intelex/notebook>`_
50+
51+
**Data:** [TPS Nov 2021] Synthetic spam emails data
4852

53+
- Identify spam emails via features extraced from the email
54+
-
55+
56+
- reducing DataFrame memory usage
57+
- computing feature importance with ELI5 and the default scikit-learn permutation importance
58+
- training using scikit-learn-intelex
59+
- performance comparison to scikit-learn
4960
* - `Random Forest for Binary Classification
5061
<https://www.kaggle.com/andreyrus/tps-apr-rf-with-intel-extension-for-scikit-learn>`_
5162

@@ -80,7 +91,6 @@ Binary Classification
8091
- training and prediction using scikit-learn-intelex
8192
- performance comparison to scikit-learn
8293

83-
8494
MultiClass Classification
8595
+++++++++++++++++++++++++
8696

@@ -115,7 +125,7 @@ MultiClass Classification
115125
* - `Stacking Classifer with Logistic Regression, kNN, Random Forest, and Quantile Transformer
116126
<https://www.kaggle.com/owerbat/tps-jun-fast-stacking-with-scikit-learn-intelex>`_
117127

118-
**Data:** [TPS Jun 2021] synthetic eCommerce data
128+
**Data:** [TPS Jun 2021] Synthetic eCommerce data
119129
- Predict the category of an eCommerce product
120130
-
121131

@@ -125,6 +135,36 @@ MultiClass Classification
125135
- searching for optimal parameters for the stacking classifier
126136
- training and prediction using scikit-learn-intelex
127137
- performance comparison to scikit-learn
138+
* - `Support Vector Classification (SVC) for MultiClass Classification
139+
<https://www.kaggle.com/code/alexeykolobyanin/tps-dec-svc-with-sklearnex-20x-speedup>`_
140+
141+
**Data:** [TPS Dec 2021] Synthetic Forest Cover Type data
142+
- Predict the forest cover type
143+
-
144+
- data preprocessing
145+
- training and prediction using scikit-learn-intelex
146+
- performance comparison to scikit-learn
147+
* - `Feature Importance in Random Forest for MultiClass Classification
148+
<https://www.kaggle.com/code/lordozvlad/tps-dec-fast-feature-importance-with-sklearnex>`_
149+
150+
**Data:** [TPS Dec 2021] Synthetic Forest Cover Type data
151+
152+
- Predict the forest cover type
153+
-
154+
155+
- reducing DataFrame memory usage
156+
- computing feature importance with ELI5
157+
- training and prediction using scikit-learn-intelex
158+
- performance comparison to scikit-learn
159+
* - `k-Nearest Neighbors (kNN) for MultiClass Classification
160+
<https://www.kaggle.com/code/alexeykolobyanin/tps-feb-knn-with-sklearnex-13x-speedup>`_
161+
162+
**Data:** [TPS Feb 2022] Bacteria DNA
163+
- Predict bacteria species based on repeated lossy measurements of DNA snippets
164+
-
165+
- data preprocessing
166+
- training and prediction using scikit-learn-intelex
167+
- performance comparison to scikit-learn
128168

129169
Classification Tasks in Computer Vision
130170
+++++++++++++++++++++++++++++++++++++++
@@ -190,4 +230,15 @@ Classification Tasks in Natural Language Processing
190230
- feature extraction using TfidfVectorizer
191231
- training and prediction using scikit-learn-intelex
192232
- performance comparison to scikit-learn
193-
233+
* - `Support Vector Classification (SVC) for Binary Classification with Sparse Data (NLP task)
234+
<https://www.kaggle.com/code/alex97andreev/fast-svm-for-sparse-data-from-nlp-problem>`_
235+
236+
**Data:** Stack Overflow questions
237+
- Predict the binary quality rating for Stack Overflow questions
238+
-
239+
240+
- data preprocessing
241+
- TF-IDF calculation
242+
- search for optimal paramters using Optuna
243+
- training and prediction using scikit-learn-intelex
244+
- performance comparison to scikit-learn

doc/sources/kaggle/regression.rst

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,17 @@ Using a Single Regressor
9393
- search for optimal parameters using Optuna
9494
- training and prediction using scikit-learn-intelex
9595
- performance comparison to scikit-learn
96+
* - `Random Forest Regression with Feature Importance Computation
97+
<https://www.kaggle.com/code/lordozvlad/tps-mar-fast-workflow-using-scikit-learn-intelex>`_
98+
99+
**Data:** [TPS Mar 2022] Spatio-temporal traffic data
100+
- Forecast twelve-hours of traffic flow in a major U.S. metropolitan area
101+
-
102+
103+
- feature engineering
104+
- computing feature importance with ELI5
105+
- training and prediction using scikit-learn-intelex
106+
- performance comparison to scikit-learn
96107
* - `Ridge Regression
97108
<https://www.kaggle.com/alexeykolobyanin/tps-sep-ridge-with-sklearn-intelex-2x-speedup>`_
98109

0 commit comments

Comments
 (0)