Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 8 additions & 15 deletions source/widgets/model/adaboost.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,18 +18,15 @@ The [AdaBoost](https://en.wikipedia.org/wiki/AdaBoost) (short for "Adaptive boos

**AdaBoost** works for both classification and regression.

![](images/AdaBoost-stamped.png)
![](images/AdaBoost-stamped.png){width=300px}

1. The learner can be given a name under which it will appear in other widgets. The default name is "AdaBoost".
2. Set the parameters. The base estimator is a tree and you can set:
- *Number of estimators*
- *Learning rate*: it determines to what extent the newly acquired information will override the old information (0 = the agent will not learn anything, 1 = the agent considers only the most recent information)
- *Fixed seed for random generator*: set a fixed seed to enable reproducing the results.
3. Boosting method.
- *Classification algorithm* (if classification on input): SAMME (updates base estimator's weights with classification results) or SAMME.R (updates base estimator's weight with probability estimates).
- *Regression loss function* (if regression on input): Linear (), Square (), Exponential ().
4. Produce a report.
5. Click *Apply* after changing the settings. That will put the new learner in the output and, if the training examples are given, construct a new model and output it as well. To communicate changes automatically tick *Apply Automatically*.
- *Loss (regression)*: Regression loss function (if regression on input). It can be linear, square, or exponential.
3. *Fixed seed for random generator*: set a fixed seed to enable reproducing the results.
4. Click *Apply* after changing the settings. That will put the new learner in the output and, if the training examples are given, construct a new model and output it as well. To communicate changes automatically tick *Apply Automatically*.

Preprocessing
-------------
Expand All @@ -43,13 +40,9 @@ AdaBoost uses default preprocessing when no other preprocessors are given. It ex

To remove default preprocessing, connect an empty [Preprocess](../data/preprocess.md) widget to the learner.

Examples
--------
Example
-------

For classification, we loaded the *iris* dataset. We used *AdaBoost*, [Tree](../model/tree.md) and [Logistic Regression](../model/logisticregression.md) and evaluated the models' performance in [Test & Score](../evaluate/testandscore.md).
We loaded the *iris* dataset with the [File](../data/file.md) widget. We used *AdaBoost* to boost the [Random Forest](../model/randomforest.md) model. We compare and evaluate the models' performance in [Test & Score](../evaluate/testandscore.md).

![](images/AdaBoost-classification.png)

For regression, we loaded the *housing* dataset, sent the data instances to two different models (**AdaBoost** and [Tree](../model/tree.md)) and output them to the [Predictions](../evaluate/predictions.md) widget.

![](images/AdaBoost-regression.png)
![](images/AdaBoost-Example1.png)
Binary file removed source/widgets/model/icons/adaboost.png
Binary file not shown.
Binary file removed source/widgets/model/icons/cn2ruleinduction.png
Binary file not shown.
Binary file removed source/widgets/model/icons/constant.png
Binary file not shown.
Binary file removed source/widgets/model/icons/knn.png
Binary file not shown.
Binary file removed source/widgets/model/icons/linear-regression.png
Binary file not shown.
Binary file removed source/widgets/model/icons/load-model.png
Binary file not shown.
Binary file removed source/widgets/model/icons/logistic-regression.png
Binary file not shown.
Binary file removed source/widgets/model/icons/naive-bayes.png
Binary file not shown.
Binary file removed source/widgets/model/icons/neural-network.png
Binary file not shown.
Binary file removed source/widgets/model/icons/random-forest.png
Binary file not shown.
Binary file removed source/widgets/model/icons/save-model.png
Binary file not shown.
Binary file removed source/widgets/model/icons/stacking.png
Binary file not shown.
Binary file removed source/widgets/model/icons/stochastic-gradient.png
Binary file not shown.
Binary file removed source/widgets/model/icons/svm.png
Binary file not shown.
Binary file removed source/widgets/model/icons/tree.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file removed source/widgets/model/images/AdaBoost-regression.png
Binary file not shown.
Binary file modified source/widgets/model/images/AdaBoost-stamped.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file modified source/widgets/model/images/LogisticRegression-stamped.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file removed source/widgets/model/images/NN-Example-Predict.png
Binary file not shown.
Binary file removed source/widgets/model/images/NN-Example-Test.png
Binary file not shown.
Binary file modified source/widgets/model/images/NeuralNetwork-stamped.png
Binary file modified source/widgets/model/images/SaveModel-example.png
Binary file removed source/widgets/model/images/SaveModel-save.png
Diff not rendered.
Binary file removed source/widgets/model/images/SaveModel-stamped.png
Diff not rendered.
Binary file added source/widgets/model/images/SaveModel.png
Diff not rendered.
Diff not rendered.
20 changes: 12 additions & 8 deletions source/widgets/model/logisticregression.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,12 +14,12 @@ The logistic regression classification algorithm with LASSO (L1) or ridge (L2) r
- Model: trained model
- Coefficients: logistic regression coefficients

**Logistic Regression** learns a [Logistic Regression](https://en.wikipedia.org/wiki/Logistic_regression) model from the data. It only works for classification tasks.
**Logistic Regression** learns a [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) model from the data. It only works for classification tasks.

![](images/LogisticRegression-stamped.png)
![](images/LogisticRegression-stamped.png){width=300px}

1. A name under which the learner appears in other widgets. The default name is "Logistic Regression".
2. [Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) type (either [L1](https://en.wikipedia.org/wiki/Least_squares#Lasso_method) or [L2](https://en.wikipedia.org/wiki/Tikhonov_regularization)). Set the cost strength (default is C=1).
2. [Regularization](https://en.wikipedia.org/wiki/Regularization_(mathematics)) type (either [L1](https://en.wikipedia.org/wiki/Least_squares#Lasso_method) or [L2](https://en.wikipedia.org/wiki/Ridge_regression)). Set the cost strength (default is C=1).
3. Press *Apply* to commit changes. If *Apply Automatically* is ticked, changes will be communicated automatically.

Preprocessing
Expand All @@ -39,11 +39,15 @@ Feature Scoring

Logistic Regression can be used with Rank for feature scoring. See [Learners as Scorers](../../learners-as-scorers/index.md) for an example.

Example
-------
Examples
--------

The widget is used just as any other widget for inducing a classifier. This is an example demonstrating prediction results with logistic regression on the *hayes-roth* dataset. We first load *hayes-roth_learn* in the [File](../data/file.md) widget and pass the data to **Logistic Regression**. Then we pass the trained model to [Predictions](../evaluate/predictions.md).
The widget is used just as any other widget for training a classifier. This is an example demonstrating prediction results with logistic regression on the *heart_disease* dataset. We first load *heart_disease* in the [File](../data/file.md) widget and pass it to [Data Sampler](../data/datasampler.md), which splits the data at 70:30 ratio. Then we pass the *Data Sample* to **Logistic Regression** and the trained model to [Predictions](../evaluate/predictions.md).

Now we want to predict class value on a new dataset. We load *hayes-roth_test* in the second **File** widget and connect it to **Predictions**. We can now observe class values predicted with **Logistic Regression** directly in **Predictions**.
Now we want to predict class value on a left-out subset. We connect the *Remaining Data* output from the File widget to **Predictions**. We can now observe class values predicted with **Logistic Regression** directly in **Predictions**.

![](images/LogisticRegression-classification.png)
![](images/LogisticRegression-Example1.png)

The logistic regression model can also be explained with the [Nomogram](../visualize/nomogram.md) widget. Train the model by connecting *heart_disease* data from the File widget to Logistic Regression. Then, pass the trained model to Nomogram, which shows feature importance and enables interactive exploration.

![](images/LogisticRegression-Example2.png)
26 changes: 11 additions & 15 deletions source/widgets/model/neuralnetwork.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,22 @@ The **Neural Network** widget uses sklearn's [Multi-layer Perceptron algorithm](

1. A name under which it will appear in other widgets. The default name is "Neural Network".
2. Set model parameters:
- Neurons per hidden layer: defined as the ith element represents the number of neurons in the ith hidden layer. E.g. a neural network with 3 layers can be defined as 2, 3, 2.
- Activation function for the hidden layer:
- *Neurons in hidden layers*: the setting represents the number of neurons in the i-th hidden layer. E.g. a neural network with 3 layers can be defined as `2,3,2`.
- *Activation* function for the hidden layer:
- Identity: no-op activation, useful to implement linear bottleneck
- Logistic: the logistic sigmoid function
- tanh: the hyperbolic tan function
- ReLu: the rectified linear unit function
- Solver for weight optimization:
- *Solver* for weight optimization:
- L-BFGS-B: an optimizer in the family of quasi-Newton methods
- SGD: stochastic gradient descent
- Adam: stochastic gradient-based optimizer
- Alpha: L2 penalty (regularization term) parameter
- Max iterations: maximum number of iterations
- *Regularization*: L2 penalty (regularization term) parameter
- *Maximal number of iterations*: highest number of iterations
- *Replicable training*: if ticked, training results can be reproduced.

Other parameters are set to [sklearn's defaults](http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html).
3. Produce a report.
4. When the box is ticked (*Apply Automatically*), the widget will communicate changes automatically. Alternatively, click *Apply*.
3. When *Apply Automatically* is ticked, the widget will communicate changes automatically. Alternatively, click *Apply*.

Preprocessing
-------------
Expand All @@ -49,13 +49,9 @@ Neural Network uses default preprocessing when no other preprocessors are given.

To remove default preprocessing, connect an empty [Preprocess](../data/preprocess.md) widget to the learner.

Examples
--------
Example
-------

The first example is a classification task on *iris* dataset. We compare the results of **Neural Network** with the [Logistic Regression](../model/logisticregression.md).
The example is a classification task on the *iris* dataset from the [File](../data/file.md) widget. We compare the results of **Neural Network** with the [Logistic Regression](../model/logisticregression.md) in the [Test and Score](../evaluate/testandscore.md) widget.

![](images/NN-Example-Test.png)

The second example is a prediction task, still using the *iris* data. This workflow shows how to use the *Learner* output. We input the **Neural Network** prediction model into [Predictions](../evaluate/predictions.md) and observe the predicted values.

![](images/NN-Example-Predict.png)
![](images/NeuralNetwork-Example.png)
15 changes: 7 additions & 8 deletions source/widgets/model/savemodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,22 +3,21 @@ Save Model

Save a trained model to an output file.

If the file is saved to the same directory as the workflow or in the subtree of that directory, the widget remembers the relative path. Otherwise it will store an absolute path, but disable auto save for security reasons.
If the file is saved to the same directory as the workflow or in the subtree of that directory, the widget remembers the relative path. Otherwise it will store an absolute path, but disable auto save for security reasons. All models are saved in a .pickle format (.pkcls).

**Inputs**

- Model: trained model

![](images/SaveModel-stamped.png)
![](images/SaveModel.png){width=50%}

1. Choose from previously saved models.
2. Save the created model with the *Browse* icon. Click on the icon and enter the name of the file. The model will be saved to a pickled file.
![](images/SaveModel-save.png)
3. Save the model.
1. If *Autosave when receiving new data*, the previously saved model will be overwritten when input model is updated.
2. *Save* the created model. The model will be saved to a pickled file.
3. *Save as...* enables specifying the name of the file.

Example
-------

When you want to save a custom-set model, feed the data to the model (e.g. [Logistic Regression](../model/logisticregression.md)) and connect it to **Save Model**. Name the model; load it later into workflows with [Load Model](../model/loadmodel.md). Datasets used with **Load Model** have to contain compatible attributes.
When you want to save a custom-set model, feed the data to the model (e.g. [Logistic Regression](../model/logisticregression.md)) and connect it to **Save Model**. Load it later into workflows with [Load Model](../model/loadmodel.md). Datasets used with **Load Model** have to contain compatible attributes.

![](images/SaveModel-example.png)
![](images/SaveModel-Example.png)
32 changes: 16 additions & 16 deletions source/widgets/model/stochasticgradient.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ The **Stochastic Gradient Descent** widget uses [stochastic gradient descent](ht

![](images/StochasticGradientDescent-stamped.png)

1. Specify the name of the model. The default name is "SGD".
2. Algorithm parameters:
- Classification loss function:
- [Hinge](https://en.wikipedia.org/wiki/Hinge_loss) (linear SVM)
1. Specify the name of the model. The default name is "Stochastic Gradient Descent".
2. Loss function:
- *Classification*:
- [Hinge](https://en.wikipedia.org/wiki/Hinge_loss) (linear SVM), default.
- [Logistic Regression](http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression) (logistic regression SGD)
- [Modified Huber](https://en.wikipedia.org/wiki/Huber_loss) (smooth loss that brings tolerance to outliers as well as probability estimates)
- *Squared Hinge* (quadratically penalized hinge)
Expand All @@ -29,30 +29,30 @@ The **Stochastic Gradient Descent** widget uses [stochastic gradient descent](ht
- [Huber](https://en.wikipedia.org/wiki/Huber_loss) (switches to linear loss beyond ε)
- [Epsilon insensitive](http://kernelsvm.tripod.com/) (ignores errors within ε, linear beyond it)
- *Squared epsilon insensitive* (loss is squared beyond ε-region).
- Regression loss function:
- [Squared Loss](https://en.wikipedia.org/wiki/Mean_squared_error#Regression) (fitted to ordinary least-squares)
- *Regression*:
- [Squared Loss](https://en.wikipedia.org/wiki/Mean_squared_error#Regression) (fitted to ordinary least-squares), deafult.
- [Huber](https://en.wikipedia.org/wiki/Huber_loss) (switches to linear loss beyond ε)
- [Epsilon insensitive](http://kernelsvm.tripod.com/) (ignores errors within ε, linear beyond it)
- *Squared epsilon insensitive* (loss is squared beyond ε-region).
3. Regularization norms to prevent overfitting:
- None.
- [Lasso (L1)](https://en.wikipedia.org/wiki/Taxicab_geometry) (L1 leading to sparse solutions)
- [Ridge (L2)](https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm) (L2, standard regularizer)
- [Ridge (L2)](https://en.wikipedia.org/wiki/Norm_(mathematics)#p-norm) (L2, standard regularizer), default.
- [Elastic net](https://en.wikipedia.org/wiki/Elastic_net_regularization) (mixing both penalty norms).

Regularization strength defines how much regularization will be applied (the less we regularize, the more we allow the model to fit the data) and the mixing parameter what the ratio between L1 and L2 loss will be (if set to 0 then the loss is L2, if set to 1 then it is L1).
4. Learning parameters.
4. Optimization:
- Learning rate:
- *Constant*: learning rate stays the same through all epochs (passes)
- *Constant*: learning rate stays the same through all epochs (passes), default.
- [Optimal](http://leon.bottou.org/projects/sgd): a heuristic proposed by Leon Bottou
- [Inverse scaling](http://users.ics.aalto.fi/jhollmen/dippa/node22.html): earning rate is inversely related to the number of iterations
- Initial learning rate.
- Inverse scaling exponent: learning rate decay.
- Number of iterations: the number of passes through the training data.
- *Initial learning rate*.
- *Inverse scaling exponent*: learning rate decay.
- *Number of iterations*: the number of passes through the training data.
- *Tolerance*: training will stop when current loss is within tolerance of best loss.
- If *Shuffle data after each iteration* is on, the order of data instances is mixed after each pass.
- If *Fixed seed for random shuffling* is on, the algorithm will use a fixed random seed and enable replicating the results.
5. Produce a report.
6. Press *Apply* to commit changes. Alternatively, tick the box on the left side of the *Apply* button and changes will be communicated automatically.
5. Press *Apply* to commit changes. Alternatively, tick the box on the left side of the *Apply* button and changes will be communicated automatically.

Preprocessing
-------------
Expand All @@ -77,8 +77,8 @@ Examples

For the classification task, we will use *iris* dataset and test two models on it. We connected [Stochastic Gradient Descent](../model/stochasticgradient.md) and [Tree](../model/tree.md) to [Test & Score](../evaluate/testandscore.md). We also connected [File](../data/file.md) to **Test & Score** and observed model performance in the widget.

![](images/StochasticGradientDescent-classification.png)
![](images/StochasticGradientDescent-Example1.png)

For the regression task, we will compare three different models to see which predict what kind of results. For the purpose of this example, the *housing* dataset is used. We connect the [File](../data/file.md) widget to **Stochastic Gradient Descent**, [Linear Regression](../model/linearregression.md) and [kNN](../model/knn.md) widget and all four to the [Predictions](../evaluate/predictions.md) widget.

![](images/StochasticGradientDescent-regression.png)
![](images/StochasticGradientDescent-Example2.png)
Loading