Skip to content

Commit 7399665

Browse files
committed
Fix links to docs
1 parent b535f0d commit 7399665

File tree

1 file changed

+9
-9
lines changed

1 file changed

+9
-9
lines changed

README.md

Lines changed: 9 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ foreach (['positive', 'negative'] as $label) {
4444
}
4545
```
4646

47-
Now, we can instantiate a new [Labeled](https://docs.rubixml.com/datasets/labeled.html) dataset object with the imported samples and labels.
47+
Now, we can instantiate a new [Labeled](https://docs.rubixml.com/latest/datasets/labeled.html) dataset object with the imported samples and labels.
4848

4949
```php
5050
use Rubix\ML\Datasets\Labeled;
@@ -53,16 +53,16 @@ $dataset = new Labeled($samples, $labels);
5353
```
5454

5555
### Dataset Preparation
56-
Neural networks compute a non-linear continuous function and therefore require continuous features as inputs. However, the samples given to us in the IMDB dataset are in raw text format. Therefore, we'll need to convert those text blobs to continuous features before training. We'll do so using the [bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model) technique which produces long sparse vectors of word counts using a fixed vocabulary. The entire series of transformations necessary to prepare the incoming dataset for the network can be implemented in a transformer [Pipeline](https://docs.rubixml.com/pipeline.html).
56+
Neural networks compute a non-linear continuous function and therefore require continuous features as inputs. However, the samples given to us in the IMDB dataset are in raw text format. Therefore, we'll need to convert those text blobs to continuous features before training. We'll do so using the [bag-of-words](https://en.wikipedia.org/wiki/Bag-of-words_model) technique which produces long sparse vectors of word counts using a fixed vocabulary. The entire series of transformations necessary to prepare the incoming dataset for the network can be implemented in a transformer [Pipeline](https://docs.rubixml.com/latest/pipeline.html).
5757

58-
First, we'll convert all characters to lowercase using [Text Normalizer](https://docs.rubixml.com/transformers/text-normalizer.html) so that every word is represented by only a single token. Then, [Word Count Vectorizer](https://docs.rubixml.com/transformers/word-count-vectorizer.html) creates a fixed-length continuous feature vector of word counts from the raw text and [TF-IDF Transformer](https://docs.rubixml.com/transformers/tf-idf-transformer.html) applies a weighting scheme to those counts. Finally, [Z Scale Standardizer](https://docs.rubixml.com/transformers/z-scale-standardizer.html) takes the TF-IDF weighted counts and centers and scales the sample matrix to have 0 mean and unit variance. This last step will help the neural network converge quicker.
58+
First, we'll convert all characters to lowercase using [Text Normalizer](https://docs.rubixml.com/latest/transformers/text-normalizer.html) so that every word is represented by only a single token. Then, [Word Count Vectorizer](https://docs.rubixml.com/latest/transformers/word-count-vectorizer.html) creates a fixed-length continuous feature vector of word counts from the raw text and [TF-IDF Transformer](https://docs.rubixml.com/latest/transformers/tf-idf-transformer.html) applies a weighting scheme to those counts. Finally, [Z Scale Standardizer](https://docs.rubixml.com/latest/transformers/z-scale-standardizer.html) takes the TF-IDF weighted counts and centers and scales the sample matrix to have 0 mean and unit variance. This last step will help the neural network converge quicker.
5959

6060
The Word Count Vectorizer is a bag-of-words feature extractor that uses a fixed vocabulary and term counts to quantify the words that appear in a document. We elect to limit the size of the vocabulary to 10,000 of the most frequent words that satisfy the criteria of appearing in at least 2 different documents but no more than 10,000 documents. In this way, we limit the amount of *noise* words that enter the training set.
6161

6262
Another common text feature representation are [TF-IDF](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) values which take the term frequencies (TF) from Word Count Vectorizer and weigh them by their inverse document frequencies (IDF). IDFs can be interpreted as the word's *importance* within the training corpus. Specifically, higher weight is given to words that are more rare.
6363

6464
### Instantiating the Learner
65-
The next thing we'll do is define the architecture of the neural network and instantiate the [Multilayer Perceptron](https://docs.rubixml.com/classifiers/multilayer-perceptron.html) classifier. The network uses 5 hidden layers consisting of a [Dense](https://docs.rubixml.com/neural-network/hidden-layers/dense.html) layer of neurons followed by a non-linear [Activation](https://docs.rubixml.com/neural-network/hidden-layers/activation.html) layer and an optional [Batch Norm](https://docs.rubixml.com/neural-network/hidden-layers/batch-norm.html) layer for normalizing the activations. The first 3 hidden layers use a [Leaky ReLU](https://docs.rubixml.com/neural-network/activation-functions/leaky-relu.html) activation function while the last 2 utilize a trainable form of the Leaky ReLU called [PReLU](https://docs.rubixml.com/neural-network/hidden-layers/prelu.html) for *Parametric* Rectified Linear Unit. The benefit that *leakage* provides over standard rectification is that it allows neurons to learn even if they did not activate by allowing a small gradient to pass through during backpropagation. We've found that this architecture works fairly well for this problem but feel free to experiment on your own.
65+
The next thing we'll do is define the architecture of the neural network and instantiate the [Multilayer Perceptron](https://docs.rubixml.com/latest/classifiers/multilayer-perceptron.html) classifier. The network uses 5 hidden layers consisting of a [Dense](https://docs.rubixml.com/latest/neural-network/hidden-layers/dense.html) layer of neurons followed by a non-linear [Activation](https://docs.rubixml.com/latest/neural-network/hidden-layers/activation.html) layer and an optional [Batch Norm](https://docs.rubixml.com/latest/neural-network/hidden-layers/batch-norm.html) layer for normalizing the activations. The first 3 hidden layers use a [Leaky ReLU](https://docs.rubixml.com/latest/neural-network/activation-functions/leaky-relu.html) activation function while the last 2 utilize a trainable form of the Leaky ReLU called [PReLU](https://docs.rubixml.com/latest/neural-network/hidden-layers/prelu.html) for *Parametric* Rectified Linear Unit. The benefit that *leakage* provides over standard rectification is that it allows neurons to learn even if they did not activate by allowing a small gradient to pass through during backpropagation. We've found that this architecture works fairly well for this problem but feel free to experiment on your own.
6666

6767
```php
6868
use Rubix\ML\PersistentModel;
@@ -104,9 +104,9 @@ $estimator = new PersistentModel(
104104
);
105105
```
106106

107-
We'll choose a batch size of 256 samples and perform network parameter updates using the [AdaMax](https://docs.rubixml.com/neural-network/optimizers/adamax.html) optimizer. AdaMax is based on the [Adam](https://docs.rubixml.com/neural-network/optimizers/adam.html) algorithm but tends to handle sparse updates better. When setting the learning rate of an optimizer, the important thing to note is that a learning rate that is too low will cause the network to learn slowly while a rate that is too high will prevent the network from learning at all. A global learning rate of 0.0001 seems to work pretty well for this problem.
107+
We'll choose a batch size of 256 samples and perform network parameter updates using the [AdaMax](https://docs.rubixml.com/latest/neural-network/optimizers/adamax.html) optimizer. AdaMax is based on the [Adam](https://docs.rubixml.com/latest/neural-network/optimizers/adam.html) algorithm but tends to handle sparse updates better. When setting the learning rate of an optimizer, the important thing to note is that a learning rate that is too low will cause the network to learn slowly while a rate that is too high will prevent the network from learning at all. A global learning rate of 0.0001 seems to work pretty well for this problem.
108108

109-
Lastly, we'll wrap the entire estimator in a [Persistent Model](https://docs.rubixml.com/persistent-model.html) wrapper so we can save and load it later in our other scripts. The [Filesystem](https://docs.rubixml.com/persisters/filesystem.html) persister tells the wrapper to save and load the serialized model data from a path on disk. Setting the history parameter to true tells the persister to keep a history of past saves.
109+
Lastly, we'll wrap the entire estimator in a [Persistent Model](https://docs.rubixml.com/latest/persistent-model.html) wrapper so we can save and load it later in our other scripts. The [Filesystem](https://docs.rubixml.com/latest/persisters/filesystem.html) persister tells the wrapper to save and load the serialized model data from a path on disk. Setting the history parameter to true tells the persister to keep a history of past saves.
110110

111111
### Training
112112
Now, you can call the `train()` method on the learner with the training dataset we instantiated earlier as an argument to kick off the training process.
@@ -116,14 +116,14 @@ $estimator->train($dataset);
116116
```
117117

118118
### Validation Score and Loss
119-
During training, the learner will record the validation score and the training loss at each iteration or *epoch*. The validation score is calculated using the default [F Beta](https://docs.rubixml.com/cross-validation/metrics/f-beta.html) metric on a hold out portion of the training set called a *validation* set. Contrariwise, the training loss is the value of the cost function (in this case the [Cross Entropy](https://docs.rubixml.com/neural-network/cost-functions/cross-entropy.html) loss) calculated over the samples left in the training set. We can visualize the training progress by plotting these metrics. To output the scores and losses you can call the additional `scores()` and `steps()` methods respectively.
119+
During training, the learner will record the validation score and the training loss at each iteration or *epoch*. The validation score is calculated using the default [F Beta](https://docs.rubixml.com/latest/cross-validation/metrics/f-beta.html) metric on a hold out portion of the training set called a *validation* set. Contrariwise, the training loss is the value of the cost function (in this case the [Cross Entropy](https://docs.rubixml.com/latest/neural-network/cost-functions/cross-entropy.html) loss) calculated over the samples left in the training set. We can visualize the training progress by plotting these metrics. To output the scores and losses you can call the additional `scores()` and `steps()` methods respectively.
120120

121121
```php
122122
$scores = $estimator->scores();
123123

124124
$losses = $estimator->steps();
125125
```
126-
Next, we'll use an [Unlabeled](https://docs.rubixml.com/datasets/unlabeled.html) dataset object to temporarily store and convert the scores and losses into CSV format so that we can import the data into our favorite plotting application such as [Plotly](https://plotly.com) or [Excel](https://www.microsoft.com/en-us/microsoft-365/excel). The global `array_transpose()` function takes a 2-dimensional array and changes the rows to columns and vice versa. It is necessary to call this function in order to get the samples into the correct *shape* for the dataset object.
126+
Next, we'll use an [Unlabeled](https://docs.rubixml.com/latest/datasets/unlabeled.html) dataset object to temporarily store and convert the scores and losses into CSV format so that we can import the data into our favorite plotting application such as [Plotly](https://plotly.com) or [Excel](https://www.microsoft.com/en-us/microsoft-365/excel). The global `array_transpose()` function takes a 2-dimensional array and changes the rows to columns and vice versa. It is necessary to call this function in order to get the samples into the correct *shape* for the dataset object.
127127

128128
```php
129129
use Rubix\ML\Datasets\Unlabeled;
@@ -170,7 +170,7 @@ foreach (['positive', 'negative'] as $label) {
170170
}
171171
```
172172

173-
Then, load the samples and labels into a [Labeled](https://docs.rubixml.com/datasets/labeled.html) dataset object using the `build()` method, randomize the order, and take the first 10,000 rows and put them in a new dataset object.
173+
Then, load the samples and labels into a [Labeled](https://docs.rubixml.com/latest/datasets/labeled.html) dataset object using the `build()` method, randomize the order, and take the first 10,000 rows and put them in a new dataset object.
174174

175175
```php
176176
use Rubix\ML\Datasets\Labeled;

0 commit comments

Comments
 (0)