You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Download the dataset from [Pima Indians Diabetes Database](https://www.kaggle.com/uciml/pima-indians-diabetes-database).
120
+
We have 2 options here:
121
+
122
+
- Download the dataset from [Pima Indians Diabetes Database](https://www.kaggle.com/uciml/pima-indians-diabetes-database).
123
+
124
+
- Or we may simply use [loadPimaIndiansDiabetesDataset](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/loadPimaIndiansDiabetesDataset.html) function
125
+
from [ml_dataframe](https://pub.dev/packages/ml_dataframe) package. The function returns a ready to use [DataFrame](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/DataFrame-class.html) instance
126
+
filled with `Pima Indians Diabetes Database` data.
127
+
128
+
If we chose the first option, we should do the following:
121
129
122
130
#### For a desktop application:
123
131
@@ -136,8 +144,8 @@ in your pubspec.yaml:
136
144
````
137
145
dependencies:
138
146
...
139
-
ml_algo: ^16.0.0
140
-
ml_dataframe: ^1.0.0
147
+
ml_algo: ^16.11.2
148
+
ml_dataframe: ^1.4.2
141
149
...
142
150
````
143
151
@@ -164,10 +172,10 @@ final samples = DataFrame.fromRawCsv(rawCsvContent);
164
172
165
173
Data in this file is represented by 768 records and 8 features. The 9th column is a label column, it contains either 0 or 1
166
174
on each row. This column is our target - we should predict a class label for each observation. The column's name is
167
-
`class variable (0 or 1)`. Let's store it:
175
+
`Outcome`. Let's store it:
168
176
169
177
````dart
170
-
final targetColumnName = 'class variable (0 or 1)';
178
+
final targetColumnName = 'Outcome';
171
179
````
172
180
173
181
Now it's the time to prepare data splits. Since we have a smallish dataset (only 768 records), we can't afford to
final rawCsvContent = await rootBundle.loadString('assets/datasets/pima_indians_diabetes_database.csv');
580
+
final rawCsvContent = await rootBundle.loadString('assets/datasets/housing.csv');
569
581
final samples = DataFrame.fromRawCsv(rawCsvContent, fieldDelimiter: ' ')
570
582
..shuffle();
571
583
final targetName = 'col_13';
@@ -587,7 +599,8 @@ void main() async {
587
599
Let's try to classify data from a well-known [Iris](https://www.kaggle.com/datasets/uciml/iris) dataset using a non-linear algorithm - [decision trees](https://en.wikipedia.org/wiki/Decision_tree)
588
600
589
601
First, you need to download the data and place it in a proper place in your file system. To do so you should follow the
590
-
instructions which are given in the [Logistic regression](#logistic-regression) section.
602
+
instructions which are given in the [Logistic regression](#logistic-regression) section. Or you may use [loadIrisDataset](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/loadIrisDataset.html)
603
+
function that returns ready to use [DataFrame](https://pub.dev/documentation/ml_dataframe/latest/ml_dataframe/DataFrame-class.html) instance filled with `Iris`dataset.
591
604
592
605
After loading the data, it's needed to preprocess it. We should drop the `Id` column since the column doesn't make sense.
593
606
Also, we need to encode the 'Species' column - originally, it contains 3 repeated string labels, to feed it to the classifier
0 commit comments