Skip to content

Commit 7a45981

Browse files
committed
Update to ML 1.0
1 parent 7399665 commit 7a45981

File tree

6 files changed

+28
-41
lines changed

6 files changed

+28
-41
lines changed

.gitignore

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
composer.lock
33
progress.csv
44
report.json
5+
*.rbx
6+
*.old
57
.vscode
6-
.vs
7-
*.model
8-
*.old
8+
.vs

README.md

Lines changed: 11 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ use Rubix\ML\Persisters\Filesystem;
8484
$estimator = new PersistentModel(
8585
new Pipeline([
8686
new TextNormalizer(),
87-
new WordCountVectorizer(10000, 2, 10000, new NGram(1, 2)),
87+
new WordCountVectorizer(10000, 0.00008, 0.4, new NGram(1, 2)),
8888
new TfIdfTransformer(),
8989
new ZScaleStandardizer(),
9090
], new MultilayerPerceptron([
@@ -100,7 +100,7 @@ $estimator = new PersistentModel(
100100
new Dense(50),
101101
new PReLU(),
102102
], 256, new AdaMax(0.0001))),
103-
new Filesystem('sentiment.model', true)
103+
new Filesystem('sentiment.rbx', true)
104104
);
105105
```
106106

@@ -116,22 +116,14 @@ $estimator->train($dataset);
116116
```
117117

118118
### Validation Score and Loss
119-
During training, the learner will record the validation score and the training loss at each iteration or *epoch*. The validation score is calculated using the default [F Beta](https://docs.rubixml.com/latest/cross-validation/metrics/f-beta.html) metric on a hold out portion of the training set called a *validation* set. Contrariwise, the training loss is the value of the cost function (in this case the [Cross Entropy](https://docs.rubixml.com/latest/neural-network/cost-functions/cross-entropy.html) loss) calculated over the samples left in the training set. We can visualize the training progress by plotting these metrics. To output the scores and losses you can call the additional `scores()` and `steps()` methods respectively.
119+
During training, the learner will record the validation score and the training loss at each iteration or *epoch*. The validation score is calculated using the default [F Beta](https://docs.rubixml.com/latest/cross-validation/metrics/f-beta.html) metric on a hold out portion of the training set called a *validation* set. Contrariwise, the training loss is the value of the cost function (in this case the [Cross Entropy](https://docs.rubixml.com/latest/neural-network/cost-functions/cross-entropy.html) loss) calculated over the samples left in the training set. We can visualize the training progress by plotting these metrics. To output the scores and losses you can call the additional `steps()` method and pass the resulting iterator to a Writable extractor such as [CSV](https://docs.rubixml.com/latest/extractors/csv.html).
120120

121121
```php
122-
$scores = $estimator->scores();
122+
use Rubix\ML\Extractors\CSV;
123123

124-
$losses = $estimator->steps();
125-
```
126-
Next, we'll use an [Unlabeled](https://docs.rubixml.com/latest/datasets/unlabeled.html) dataset object to temporarily store and convert the scores and losses into CSV format so that we can import the data into our favorite plotting application such as [Plotly](https://plotly.com) or [Excel](https://www.microsoft.com/en-us/microsoft-365/excel). The global `array_transpose()` function takes a 2-dimensional array and changes the rows to columns and vice versa. It is necessary to call this function in order to get the samples into the correct *shape* for the dataset object.
127-
128-
```php
129-
use Rubix\ML\Datasets\Unlabeled;
130-
use function Rubix\ML\array_transpose;
131-
132-
$table = array_transpose([$scores, $losses]);
124+
$extractor = new CSV('progress.csv', true);
133125

134-
Unlabeled::build($table)->toCSV()->write('progress.csv');
126+
$extractor->export($estimator->steps());
135127
```
136128

137129
Here is an example of what the validation score and training loss looks like when they are plotted. The validation score should be getting better with each epoch as the loss decreases. You can generate your own plots by importing the `progress.csv` file into your plotting application.
@@ -184,7 +176,7 @@ Next, we'll use the Persistent Model wrapper to load the network we trained earl
184176
use Rubix\ML\PersistentModel;
185177
use Rubix\ML\Persisters\Filesystem;
186178

187-
$estimator = PersistentModel::load(new Filesystem('sentiment.model'));
179+
$estimator = PersistentModel::load(new Filesystem('sentiment.rbx'));
188180
```
189181

190182
Now we can use the estimator to make predictions on the testing set. The `predict()` method on t he estimator takes a dataset as input and returns an array of predictions.
@@ -214,10 +206,10 @@ $results = $report->generate($predictions, $dataset->labels());
214206
echo $results;
215207
```
216208

217-
We'll also save a copy of the report to a JSON file.
209+
We'll also save a copy of the report to a JSON file using the Filesystem persister.
218210

219211
```php
220-
$results->toJSON()->write('report.json');
212+
$results->toJSON()->saveTo(new Filesystem('report.json'));
221213
```
222214

223215
Now we can execute the validation script from the command line.
@@ -327,7 +319,7 @@ First, load the model from storage using the static `load()` method on the Persi
327319
use Rubix\ML\PersistentModel;
328320
use Rubix\ML\Persisters\Filesystem;
329321

330-
$estimator = PersistentModel::load(new Filesystem('sentiment.model'));
322+
$estimator = PersistentModel::load(new Filesystem('sentiment.rbx'));
331323
```
332324

333325
Next, we'll use the built-in PHP function `readline()` to prompt the user to enter some text that we'll store in a variable.
@@ -366,4 +358,4 @@ See DATASET_README. For comments or questions regarding the dataset please conta
366358
>- Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. (2011). Learning Word Vectors for Sentiment Analysis. The 49th Annual Meeting of the Association for Computational Linguistics (ACL 2011).
367359
368360
## License
369-
The code is licensed [MIT](LICENSE) and the tutorial is licensed [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
361+
The code is licensed [MIT](LICENSE) and the tutorial is licensed [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).

composer.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
}
2121
],
2222
"require": {
23-
"php": ">=7.2",
24-
"rubix/ml": "^0.3.0"
23+
"php": ">=7.4",
24+
"rubix/ml": "^1.0"
2525
},
2626
"scripts": {
2727
"predict": "@php predict.php",

predict.php

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77

88
ini_set('memory_limit', '-1');
99

10-
$estimator = PersistentModel::load(new Filesystem('sentiment.model'));
10+
$estimator = PersistentModel::load(new Filesystem('sentiment.rbx'));
1111

1212
while (empty($text)) $text = readline("Enter some text to analyze:\n");
1313

train.php

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,13 @@
22

33
include __DIR__ . '/vendor/autoload.php';
44

5-
use Rubix\ML\Other\Loggers\Screen;
5+
use Rubix\ML\Loggers\Screen;
66
use Rubix\ML\Datasets\Labeled;
77
use Rubix\ML\PersistentModel;
88
use Rubix\ML\Pipeline;
99
use Rubix\ML\Transformers\TextNormalizer;
1010
use Rubix\ML\Transformers\WordCountVectorizer;
11-
use Rubix\ML\Other\Tokenizers\NGram;
11+
use Rubix\ML\Tokenizers\NGram;
1212
use Rubix\ML\Transformers\TfIdfTransformer;
1313
use Rubix\ML\Transformers\ZScaleStandardizer;
1414
use Rubix\ML\Classifiers\MultilayerPerceptron;
@@ -19,9 +19,7 @@
1919
use Rubix\ML\NeuralNet\ActivationFunctions\LeakyReLU;
2020
use Rubix\ML\NeuralNet\Optimizers\AdaMax;
2121
use Rubix\ML\Persisters\Filesystem;
22-
use Rubix\ML\Datasets\Unlabeled;
23-
24-
use function Rubix\ML\array_transpose;
22+
use Rubix\ML\Extractors\CSV;
2523

2624
ini_set('memory_limit', '-1');
2725

@@ -43,7 +41,7 @@
4341
$estimator = new PersistentModel(
4442
new Pipeline([
4543
new TextNormalizer(),
46-
new WordCountVectorizer(10000, 2, 10000, new NGram(1, 2)),
44+
new WordCountVectorizer(10000, 0.00008, 0.4, new NGram(1, 2)),
4745
new TfIdfTransformer(),
4846
new ZScaleStandardizer(),
4947
], new MultilayerPerceptron([
@@ -59,19 +57,16 @@
5957
new Dense(50),
6058
new PReLU(),
6159
], 256, new AdaMax(0.0001))),
62-
new Filesystem('sentiment.model', true)
60+
new Filesystem('sentiment.rbx', true)
6361
);
6462

6563
$estimator->setLogger($logger);
6664

6765
$estimator->train($dataset);
6866

69-
$scores = $estimator->scores();
70-
$losses = $estimator->steps();
67+
$extractor = new CSV('progress.csv', true);
7168

72-
Unlabeled::build(array_transpose([$scores, $losses]))
73-
->toCSV(['scores', 'losses'])
74-
->write('progress.csv');
69+
$extractor->export($estimator->steps());
7570

7671
$logger->info('Progress saved to progress.csv');
7772

validate.php

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
include __DIR__ . '/vendor/autoload.php';
44

5-
use Rubix\ML\Other\Loggers\Screen;
5+
use Rubix\ML\Loggers\Screen;
66
use Rubix\ML\Datasets\Labeled;
77
use Rubix\ML\PersistentModel;
88
use Rubix\ML\Persisters\Filesystem;
@@ -27,7 +27,7 @@
2727

2828
$dataset = Labeled::build($samples, $labels)->randomize()->take(10000);
2929

30-
$estimator = PersistentModel::load(new Filesystem('sentiment.model'));
30+
$estimator = PersistentModel::load(new Filesystem('sentiment.rbx'));
3131

3232
$logger->info('Making predictions');
3333

@@ -42,6 +42,6 @@
4242

4343
echo $results;
4444

45-
$results->toJSON()->write('report.json');
45+
$results->toJSON()->saveTo(new Filesystem('report.json'));
4646

47-
$logger->info('Report saved to report.json');
47+
$logger->info('Report saved to report.json');

0 commit comments

Comments
 (0)