Skip to content

Commit 1ced4c0

Browse files
authored
Remove async toy datasets (#227)
1 parent f65088d commit 1ced4c0

File tree

8 files changed

+35
-35
lines changed

8 files changed

+35
-35
lines changed

CHANGELOG.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
# Changelog
22

3+
## 16.11.4
4+
- `getPimaIndiansDiabetesDataFrame`, `getIrisDataFrame` used
5+
36
## 16.11.3
47
- Toy datasets from `ml_dataframe` package used
58

README.md

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -121,7 +121,7 @@ We have 2 options here:
121121

122122
- Download the dataset from [Pima Indians Diabetes Database](https://www.kaggle.com/uciml/pima-indians-diabetes-database).
123123

124-
- Or we may simply use [loadPimaIndiansDiabetesDataset](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/loadPimaIndiansDiabetesDataset.html) function
124+
- Or we may simply use [getPimaIndiansDiabetesDataFrame](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/getPimaIndiansDiabetesDataFrame.html) function
125125
from [ml_dataframe](https://pub.dev/packages/ml_dataframe) package. The function returns a ready to use [DataFrame](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/DataFrame-class.html) instance
126126
filled with `Pima Indians Diabetes Database` data.
127127

@@ -342,7 +342,7 @@ import 'package:ml_preprocessing/ml_preprocessing.dart';
342342
343343
void main() async {
344344
// Another option - to use a toy dataset:
345-
// final samples = await loadPimaIndiansDiabetesDataset();
345+
// final samples = getPimaIndiansDiabetesDataFrame();
346346
final samples = await fromCsv('datasets/pima_indians_diabetes_database.csv', headerExists: true);
347347
final targetColumnName = 'Outcome';
348348
final splits = splitData(samples, [0.7]);
@@ -387,7 +387,7 @@ import 'package:ml_preprocessing/ml_preprocessing.dart';
387387
void main() async {
388388
final rawCsvContent = await rootBundle.loadString('assets/datasets/pima_indians_diabetes_database.csv');
389389
// Another option - to use a toy dataset:
390-
// final samples = await loadPimaIndiansDiabetesDataset();
390+
// final samples = getPimaIndiansDiabetesDataFrame();
391391
final samples = DataFrame.fromRawCsv(rawCsvContent);
392392
final targetColumnName = 'Outcome';
393393
final splits = splitData(samples, [0.7]);
@@ -599,7 +599,7 @@ void main() async {
599599
Let's try to classify data from a well-known [Iris](https://www.kaggle.com/datasets/uciml/iris) dataset using a non-linear algorithm - [decision trees](https://en.wikipedia.org/wiki/Decision_tree)
600600

601601
First, you need to download the data and place it in a proper place in your file system. To do so you should follow the
602-
instructions which are given in the [Logistic regression](#logistic-regression) section. Or you may use [loadIrisDataset](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/loadIrisDataset.html)
602+
instructions which are given in the [Logistic regression](#logistic-regression) section. Or you may use [getIrisDataFrame](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/getIrisDataFrame.html)
603603
function that returns ready to use [DataFrame](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/DataFrame-class.html) instance filled with `Iris`dataset.
604604

605605
After loading the data, it's needed to preprocess it. We should drop the `Id` column since the column doesn't make sense.
@@ -612,7 +612,7 @@ import 'package:ml_dataframe/ml_dataframe.dart';
612612
import 'package:ml_preprocessing/ml_preprocessing.dart';
613613
614614
void main() async {
615-
final samples = (await loadIrisDataset())
615+
final samples = getIrisDataset()
616616
.shuffle()
617617
.dropSeries(seriesNames: ['Id']);
618618
@@ -675,14 +675,14 @@ efficient to retrieve data.
675675
Let's retrieve some data points through a kd-tree built on the [Iris](https://www.kaggle.com/datasets/uciml/iris) dataset.
676676

677677
First, we need to prepare the data. To do so, it's needed to load the dataset. For this purpose, we may use
678-
[loadIrisDataset](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/loadIrisDataset.html) function from [ml_dataframe](https://pub.dev/packages/ml_dataframe). The function returns prefilled with the Iris data DataFrame instance:
678+
[getIrisDataFrame](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/getIrisDataFrame.html) function from [ml_dataframe](https://pub.dev/packages/ml_dataframe). The function returns prefilled with the Iris data DataFrame instance:
679679

680680
```dart
681681
import 'package:ml_algo/ml_algo.dart';
682682
import 'package:ml_dataframe/ml_dataframe.dart';
683683
684-
void main() async {
685-
final originalData = await loadIrisDataset();
684+
void main() {
685+
final originalData = getIrisDataFrame();
686686
}
687687
```
688688

@@ -693,8 +693,8 @@ drop these columns:
693693
import 'package:ml_algo/ml_algo.dart';
694694
import 'package:ml_dataframe/ml_dataframe.dart';
695695
696-
void main() async {
697-
final originalData = await loadIrisDataset();
696+
void main() {
697+
final originalData = getIrisDataFrame();
698698
final data = originalData.dropSeries(names: ['Id', 'Species']);
699699
}
700700
```
@@ -705,8 +705,8 @@ Next, we can build the tree:
705705
import 'package:ml_algo/ml_algo.dart';
706706
import 'package:ml_dataframe/ml_dataframe.dart';
707707
708-
void main() async {
709-
final originalData = await loadIrisDataset();
708+
void main() {
709+
final originalData = getIrisDataFrame();
710710
final data = originalData.dropSeries(names: ['Id', 'Species']);
711711
final tree = KDTree(data);
712712
}
@@ -719,8 +719,8 @@ import 'package:ml_algo/ml_algo.dart';
719719
import 'package:ml_dataframe/ml_dataframe.dart';
720720
import 'package:ml_linalg/vector.dart';
721721
722-
void main() async {
723-
final originalData = await loadIrisDataset();
722+
void main() {
723+
final originalData = getIrisDataFrame();
724724
final data = originalData.dropSeries(names: ['Id', 'Species']);
725725
final tree = KDTree(data);
726726
final neighbourCount = 5;
@@ -742,8 +742,8 @@ The nearest point has an index 75 in the original data. Let's check a record at
742742
```dart
743743
import 'package:ml_dataframe/ml_dataframe.dart';
744744
745-
void main() async {
746-
final originalData = await loadIrisDataset();
745+
void main() {
746+
final originalData = getIrisDataFrame();
747747
748748
print(originalData.rows.elementAt(75));
749749
}
@@ -784,8 +784,8 @@ import 'dart:io';
784784
import 'package:ml_algo/ml_algo.dart';
785785
import 'package:ml_dataframe/ml_dataframe.dart';
786786
787-
void main() async {
788-
final originalData = await loadIrisDataset();
787+
void main() {
788+
final originalData = getIrisDataFrame();
789789
final data = originalData.dropSeries(names: ['Id', 'Species']);
790790
final tree = KDTree(data);
791791

e2e/decision_tree_classifier/decision_tree_classifier_test.dart

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ import 'package:ml_linalg/vector.dart';
55
import 'package:ml_preprocessing/ml_preprocessing.dart';
66
import 'package:test/test.dart';
77

8-
Future<Vector> evaluateClassifier(MetricType metric, DType dtype) async {
9-
final samples = (await loadIrisDataset()).shuffle().dropSeries(names: ['Id']);
8+
Future<Vector> evaluateClassifier(MetricType metric, DType dtype) {
9+
final samples = getIrisDataFrame().shuffle().dropSeries(names: ['Id']);
1010
final pipeline = Pipeline(samples, [
1111
toIntegerLabels(
1212
columnNames: ['Species'],

e2e/kd_tree/kd_tree_test.dart

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,8 @@ import 'package:test/test.dart';
66

77
void main() async {
88
group('KDTree', () {
9-
test('should return correct list of neighbours, dtype=DType.float32',
10-
() async {
11-
final originalData = await loadIrisDataset();
9+
test('should return correct list of neighbours, dtype=DType.float32', () {
10+
final originalData = getIrisDataFrame();
1211
final data = originalData.dropSeries(names: ['Id', 'Species']);
1312
final tree = KDTree(data);
1413
final neighbours = tree.query(Vector.fromList([6.5, 3.01, 4.5, 1.5]), 5);
@@ -18,9 +17,8 @@ void main() async {
1817
'((Index: 75, Distance: 0.17349341930302867), (Index: 51, Distance: 0.21470911402365767), (Index: 65, Distance: 0.26095956499211426), (Index: 86, Distance: 0.29681616124778537), (Index: 56, Distance: 0.4172527193942372))');
1918
});
2019

21-
test('should return correct list of neighbours, dtype=DType.float64',
22-
() async {
23-
final originalData = await loadIrisDataset();
20+
test('should return correct list of neighbours, dtype=DType.float64', () {
21+
final originalData = getIrisDataFrame();
2422
final data = originalData.dropSeries(names: ['Id', 'Species']);
2523
final tree = KDTree(data, dtype: DType.float64);
2624
final neighbours = tree.query(

e2e/knn_classifier/knn_classifier_test.dart

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,8 +5,8 @@ import 'package:ml_linalg/vector.dart';
55
import 'package:ml_preprocessing/ml_preprocessing.dart';
66
import 'package:test/test.dart';
77

8-
Future<Vector> evaluateKnnClassifier(MetricType metric, DType dtype) async {
9-
final samples = (await loadIrisDataset()).shuffle().dropSeries(names: ['Id']);
8+
Future<Vector> evaluateKnnClassifier(MetricType metric, DType dtype) {
9+
final samples = getIrisDataFrame().shuffle().dropSeries(names: ['Id']);
1010
final targetName = 'Species';
1111
final pipeline = Pipeline(samples, [
1212
toIntegerLabels(

e2e/logistic_regressor/logistic_regressor_test.dart

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ import 'package:ml_linalg/dtype.dart';
44
import 'package:ml_linalg/vector.dart';
55
import 'package:test/test.dart';
66

7-
Future<Vector> evaluateLogisticRegressor(MetricType metric, DType dtype) async {
8-
final samples = (await loadPimaIndiansDiabetesDataset()).shuffle();
7+
Future<Vector> evaluateLogisticRegressor(MetricType metric, DType dtype) {
8+
final samples = getPimaIndiansDiabetesDataFrame().shuffle();
99
final numberOfFolds = 5;
1010
final targetNames = ['Outcome'];
1111
final validator = CrossValidator.kFold(

e2e/softmax_regressor/softmax_regressor_test.dart

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,8 @@ import 'package:ml_linalg/vector.dart';
55
import 'package:ml_preprocessing/ml_preprocessing.dart';
66
import 'package:test/test.dart';
77

8-
Future<Vector> evaluateSoftmaxRegressor(
9-
MetricType metricType, DType dtype) async {
10-
final samples = (await loadIrisDataset()).shuffle().dropSeries(names: ['Id']);
8+
Future<Vector> evaluateSoftmaxRegressor(MetricType metricType, DType dtype) {
9+
final samples = getIrisDataFrame().shuffle().dropSeries(names: ['Id']);
1110
final pipeline = Pipeline(samples, [
1211
toOneHotLabels(
1312
columnNames: ['Species'],

pubspec.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
name: ml_algo
22
description: Machine learning algorithms, Machine learning models performance evaluation functionality
3-
version: 16.11.3
3+
version: 16.11.4
44
homepage: https://github.com/gyrdym/ml_algo
55

66
environment:
@@ -10,7 +10,7 @@ dependencies:
1010
collection: ^1.16.0
1111
injector: ^2.0.0
1212
json_annotation: ^4.0.0
13-
ml_dataframe: ^1.4.2
13+
ml_dataframe: ^1.5.0
1414
ml_linalg: ^13.7.0
1515
ml_preprocessing: ^7.0.2
1616
quiver: ^3.0.0

0 commit comments

Comments
 (0)