Skip to content

Commit 2d2b38d

Browse files
authored
'LinearRegressor.BGD' constructor added (#232)
1 parent c2e771e commit 2d2b38d

File tree

6 files changed

+275
-20
lines changed

6 files changed

+275
-20
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Changelog
22

3+
## 16.15.0
4+
- `LinearRegressor.BGD` constructor added
5+
36
## 16.14.0
47
- `LinearRegressor.SGD` constructor added
58

README.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -45,8 +45,12 @@ it in web applications.
4545
A class that performs linear binary classification of data. To use this kind of classifier your data has to be
4646
[linearly separable](https://en.wikipedia.org/wiki/Linear_separability).
4747

48-
- [LogisticRegressor.SGD](https://pub.dev/documentation/ml_algo/latest/ml_algo/LogisticRegressor/LogisticRegressor.SGD.html).
48+
- [LogisticRegressor.SGD](https://pub.dev/documentation/ml_algo/latest/ml_algo/LogisticRegressor/LogisticRegressor.SGD.html).
4949
Implementation of the logistic regression algorithm based on stochastic gradient descent with L2 regularisation.
50+
To use this kind of classifier your data has to be [linearly separable](https://en.wikipedia.org/wiki/Linear_separability).
51+
52+
- [LogisticRegressor.BGD](https://pub.dev/documentation/ml_algo/latest/ml_algo/LogisticRegressor/LogisticRegressor.BGD.html).
53+
Implementation of the logistic regression algorithm based on batch gradient descent with L2 regularisation.
5054
To use this kind of classifier your data has to be [linearly separable](https://en.wikipedia.org/wiki/Linear_separability).
5155

5256
- [SoftmaxRegressor](https://pub.dev/documentation/ml_algo/latest/ml_algo/SoftmaxRegressor-class.html).
@@ -64,10 +68,10 @@ it in web applications.
6468
- [LinearRegressor](https://pub.dev/documentation/ml_algo/latest/ml_algo/LinearRegressor-class.html).
6569
A general class for finding a linear pattern in training data and predicting outcomes as real numbers.
6670

67-
- [LinearRegressor.lasso](https://pub.dev/documentation/ml_algo/latest/ml_algo/LinearRegressor/LinearRegressor.lasso.html)
71+
- [LinearRegressor.lasso](https://pub.dev/documentation/ml_algo/latest/ml_algo/LinearRegressor/LinearRegressor.lasso.html)
6872
Implementation of the linear regression algorithm based on coordinate descent with lasso regularisation
6973

70-
- [LinearRegressor.SGD](https://pub.dev/documentation/ml_algo/latest/ml_algo/LinearRegressor/LinearRegressor.SGD.html)
74+
- [LinearRegressor.SGD](https://pub.dev/documentation/ml_algo/latest/ml_algo/LinearRegressor/LinearRegressor.SGD.html)
7175
Implementation of the linear regression algorithm based on stochastic gradient descent with L2 regularisation
7276

7377
- [KnnRegressor](https://pub.dev/documentation/ml_algo/latest/ml_algo/KnnRegressor-class.html)
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
import 'package:ml_algo/ml_algo.dart';
2+
import 'package:ml_dataframe/ml_dataframe.dart';
3+
import 'package:ml_linalg/vector.dart';
4+
import 'package:test/test.dart';
5+
6+
Future<Vector> evaluateLogisticRegressor(MetricType metric, DType dtype) {
7+
final samples = getPimaIndiansDiabetesDataFrame().shuffle(seed: 12);
8+
final numberOfFolds = 5;
9+
final validator = CrossValidator.kFold(
10+
samples,
11+
numberOfFolds: numberOfFolds,
12+
);
13+
final createClassifier = (DataFrame trainSamples) => LogisticRegressor.BGD(
14+
trainSamples,
15+
'Outcome',
16+
iterationsLimit: 50,
17+
initialLearningRate: 1e-4,
18+
learningRateType: LearningRateType.constant,
19+
dtype: dtype,
20+
);
21+
22+
return validator.evaluate(
23+
createClassifier,
24+
metric,
25+
);
26+
}
27+
28+
Future main() async {
29+
group('LogisticRegressor.BGD', () {
30+
test(
31+
'should return adequate score on pima indians diabetes dataset using '
32+
'accuracy metric, dtype=DType.float32', () async {
33+
final scores =
34+
await evaluateLogisticRegressor(MetricType.accuracy, DType.float32);
35+
36+
expect(scores.mean(), greaterThan(0.5));
37+
});
38+
39+
test(
40+
'should return adequate score on pima indians diabetes dataset using '
41+
'accuracy metric, dtype=DType.float64', () async {
42+
final scores =
43+
await evaluateLogisticRegressor(MetricType.accuracy, DType.float32);
44+
45+
expect(scores.mean(), greaterThan(0.5));
46+
});
47+
48+
test(
49+
'should return adequate score on pima indians diabetes dataset using '
50+
'precision metric, dtype=DType.float32', () async {
51+
final scores =
52+
await evaluateLogisticRegressor(MetricType.precision, DType.float32);
53+
54+
expect(scores.mean(), greaterThan(0.5));
55+
});
56+
57+
test(
58+
'should return adequate score on pima indians diabetes dataset using '
59+
'precision metric, dtype=DType.float64', () async {
60+
final scores =
61+
await evaluateLogisticRegressor(MetricType.precision, DType.float32);
62+
63+
expect(scores.mean(), greaterThan(0.5));
64+
});
65+
66+
test(
67+
'should return adequate score on pima indians diabetes dataset using '
68+
'recall metric, dtype=DType.float32', () async {
69+
final scores =
70+
await evaluateLogisticRegressor(MetricType.recall, DType.float32);
71+
72+
expect(scores.mean(), greaterThan(0.5));
73+
});
74+
75+
test(
76+
'should return adequate score on pima indians diabetes dataset using '
77+
'recall metric, dtype=DType.float64', () async {
78+
final scores =
79+
await evaluateLogisticRegressor(MetricType.recall, DType.float32);
80+
81+
expect(scores.mean(), greaterThan(0.5));
82+
});
83+
});
84+
}

lib/src/classifier/logistic_regressor/logistic_regressor.dart

Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -188,6 +188,9 @@ abstract class LogisticRegressor
188188
dtype: dtype,
189189
);
190190

191+
/// Creates a [LogisticRegressor] instance based on Stochastic
192+
/// Gradient Descent algorithm
193+
///
191194
/// Parameters:
192195
///
193196
/// [trainingData] Observations that will be used by the classifier to learn
@@ -346,6 +349,160 @@ abstract class LogisticRegressor
346349
dtype: dtype,
347350
);
348351

352+
/// Creates a [LogisticRegressor] instance based on Batch Gradient Descent
353+
/// algorithm
354+
///
355+
/// Parameters:
356+
///
357+
/// [trainingData] Observations that will be used by the classifier to learn
358+
/// the coefficients. Must contain [targetName] column.
359+
///
360+
/// [targetName] A string that serves as a name of the target column (a
361+
/// column that contains class labels or outcomes for the associated
362+
/// features).
363+
///
364+
/// [learningRateType] A value defining a strategy for the learning rate
365+
/// behaviour throughout the whole fitting process.
366+
///
367+
/// [iterationsLimit] A number of fitting iterations. Uses as a condition of
368+
/// convergence in the optimization algorithm. Default value is `100`.
369+
///
370+
/// [initialLearningRate] The initial value defining velocity of the convergence of the
371+
/// gradient descent optimizer. Default value is `1e-3`.
372+
///
373+
/// [decay] The value meaning "speed" of learning rate decrease. Applicable only
374+
/// for [LearningRateType.timeBased], [LearningRateType.stepBased], and
375+
/// [LearningRateType.exponential] strategies
376+
///
377+
/// [dropRate] The value that is used as a number of learning iterations after
378+
/// which the learning rate will be decreased. The value is applicable only for
379+
/// [LearningRateType.stepBased] learning rate; it will be omitted for other
380+
/// learning rate strategies
381+
///
382+
/// [minCoefficientsUpdate] A minimum distance between coefficient vectors in
383+
/// two contiguous iterations. Uses as a condition of convergence in the
384+
/// optimization algorithm. If a difference between the two vectors is small
385+
/// enough, there is no reason to continue fitting. Default value is `1e-12`
386+
///
387+
/// [probabilityThreshold] A probability on the basis of which it is decided,
388+
/// whether an observation relates to positive class label (see
389+
/// [positiveLabel] parameter) or to negative class label (see [negativeLabel]
390+
/// parameter). The greater the probability, the more strict the classifier
391+
/// is. Default value is `0.5`.
392+
///
393+
/// [lambda] A coefficient of regularization. Uses to prevent the regressor's
394+
/// overfitting. The more the value of [lambda], the more regular the
395+
/// coefficients of the equation of the predicting hyperplane are. Extremely
396+
/// large [lambda] may decrease the coefficients to nothing, otherwise too
397+
/// small [lambda] may be a cause of too large absolute values of the
398+
/// coefficients, that is also bad.
399+
///
400+
/// [fitIntercept] Whether or not to fit intercept term. Default value is
401+
/// `false`. Intercept in 2-dimensional space is a bias of the line (relative
402+
/// to X-axis).
403+
///
404+
/// [interceptScale] A value, defining a size of the intercept.
405+
///
406+
/// [initialCoefficientsType] Defines the coefficients that will be
407+
/// autogenerated at the first optimization iteration. By default
408+
/// all the autogenerated coefficients are equal to zeroes. If
409+
/// [initialCoefficients] are provided, the parameter will be ignored
410+
///
411+
/// [initialCoefficients] Coefficients to be used in the first iteration of
412+
/// optimization algorithm. [initialCoefficients] is a vector, length of which
413+
/// must be equal to the number of features in [trainingData] : in case of
414+
/// logistic regression only one column from [trainingData] is used as a
415+
/// prediction target column, thus the number of features is equal to
416+
/// the number of columns in [trainingData] minus 1 (target column). Keep in
417+
/// mind, that if your model considers intercept term, [initialCoefficients]
418+
/// should contain an extra element in the beginning of the vector and it
419+
/// denotes the intercept term coefficient
420+
///
421+
/// [positiveLabel] A value that will be used for the positive class.
422+
/// By default, `1`.
423+
///
424+
/// [negativeLabel] A value that will be used for the negative class.
425+
/// By default, `0`.
426+
///
427+
/// [collectLearningData] Whether or not to collect learning data, for
428+
/// instance cost function value per each iteration. Affects performance much.
429+
/// If [collectLearningData] is true, one may access [costPerIteration]
430+
/// getter in order to evaluate learning process more thoroughly. Default value
431+
/// is `false`
432+
///
433+
/// [dtype] A data type for all the numeric values, used by the algorithm. Can
434+
/// affect performance or accuracy of the computations. Default value is
435+
/// [DType.float32]
436+
///
437+
/// Example:
438+
///
439+
/// ```dart
440+
/// import 'package:ml_algo/ml_algo.dart';
441+
/// import 'package:ml_dataframe/ml_dataframe.dart';
442+
///
443+
/// void main() {
444+
/// final samples = getPimaIndiansDiabetesDataFrame().shuffle(seed: 12);
445+
/// final model = LogisticRegressor.BGD(
446+
/// samples,
447+
/// 'Outcome',
448+
/// iterationsLimit: 50,
449+
/// initialLearningRate: 1e-4,
450+
/// learningRateType: LearningRateType.constant,
451+
/// dtype: dtype,
452+
/// );
453+
/// }
454+
/// ```
455+
///
456+
/// Keep in mind that you need to select a proper learning rate strategy for
457+
/// every particular model. For more details, refer to [LearningRateType],
458+
/// also consider [decay] and [dropRate] parameters.
459+
factory LogisticRegressor.BGD(
460+
DataFrame trainingData,
461+
String targetName, {
462+
required LearningRateType learningRateType,
463+
int iterationsLimit = iterationLimitDefaultValue,
464+
double initialLearningRate = initialLearningRateDefaultValue,
465+
double decay = decayDefaultValue,
466+
int dropRate = dropRateDefaultValue,
467+
double minCoefficientsUpdate = minCoefficientsUpdateDefaultValue,
468+
double probabilityThreshold = probabilityThresholdDefaultValue,
469+
double lambda = lambdaDefaultValue,
470+
bool fitIntercept = fitInterceptDefaultValue,
471+
double interceptScale = interceptScaleDefaultValue,
472+
InitialCoefficientsType initialCoefficientsType =
473+
initialCoefficientsTypeDefaultValue,
474+
num positiveLabel = positiveLabelDefaultValue,
475+
num negativeLabel = negativeLabelDefaultValue,
476+
bool collectLearningData = collectLearningDataDefaultValue,
477+
DType dtype = dTypeDefaultValue,
478+
Vector? initialCoefficients,
479+
}) =>
480+
initLogisticRegressorModule().get<LogisticRegressorFactory>().create(
481+
trainData: trainingData,
482+
targetName: targetName,
483+
optimizerType: LinearOptimizerType.gradient,
484+
iterationsLimit: iterationsLimit,
485+
initialLearningRate: initialLearningRate,
486+
decay: decay,
487+
dropRate: dropRate,
488+
minCoefficientsUpdate: minCoefficientsUpdate,
489+
probabilityThreshold: probabilityThreshold,
490+
lambda: lambda,
491+
regularizationType: RegularizationType.L2,
492+
batchSize: trainingData.shape.first,
493+
fitIntercept: fitIntercept,
494+
interceptScale: interceptScale,
495+
isFittingDataNormalized: false,
496+
learningRateType: learningRateType,
497+
initialCoefficientsType: initialCoefficientsType,
498+
initialCoefficients:
499+
initialCoefficients ?? Vector.empty(dtype: dtype),
500+
positiveLabel: positiveLabel,
501+
negativeLabel: negativeLabel,
502+
collectLearningData: collectLearningData,
503+
dtype: dtype,
504+
);
505+
349506
/// Restores previously fitted classifier instance from the [json]
350507
///
351508
/// ````dart

pubspec.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name: ml_algo
22
description: Machine learning algorithms, Machine learning models performance evaluation functionality
3-
version: 16.14.0
3+
version: 16.15.0
44
homepage: https://github.com/gyrdym/ml_algo
55

66
environment:

0 commit comments

Comments
 (0)