-
Notifications
You must be signed in to change notification settings - Fork 17
Issue when training on a FileLessDataset #171
Copy link
Copy link
Open
Labels
failureIssue found in production while not necessarily being a mistakeIssue found in production while not necessarily being a mistake
Description
When trying to train a model on a converted dataset
┌──[user@packing-box]──[/mnt/share]──[main|+2…12663]──────── ────[172.17.0.2]──[16:14:19]────
$ model -v train test-converted -A dt
00:00:01.672 [DEBUG ] model creating Model(None)...
00:00:01.673 [INFO ] model Selected algorithm: Decision Tree
00:00:01.673 [DEBUG ] model Preparing dataset...
00:00:01.674 [INFO ] model Reference dataset: test-converted
00:00:01.675 [INFO ] model Loading features...
00:00:01.764 [DEBUG ] matplotlib matplotlib data path: /home/user/.local/lib/python3.12/site-packages/matplotlib/mpl-data
00:00:01.781 [DEBUG ] matplotlib CONFIGDIR=/home/user/.config/matplotlib
00:00:01.784 [DEBUG ] matplotlib interactive is False
00:00:01.784 [DEBUG ] matplotlib platform is linux
00:00:01.857 [DEBUG ] matplotlib CACHEDIR=/home/user/.cache/matplotlib
00:00:02.116 [WARNING ] model No selectable feature ; this may be due to a model unrelated to the input
A model was succesfully trained on the non-converted dataset and no error during the conversion :
┌──[user@packing-box]──[/mnt/share]──[main|+2…12663]─── ────[172.17.0.2]──[16:09:13]────
$ model train exp3-0 -A dt
00:00:01.149 [INFO] Selected algorithm: Decision Tree
00:00:01.150 [INFO] Reference dataset: exp3-0
00:00:01.150 [INFO] Computing features...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 674/674 samples • 0:00:44 • 0:00:00
00:00:47.245 [INFO] Making pipeline...
00:00:47.296 [INFO] Training model...
00:00:52.729 [INFO] [1/2] MinMaxScaler
00:00:52.747 [INFO] [2/2] Decision Tree
Name: exp3-0_msdos-pe32-pe64_674_dt_f137
Classification metrics
. Accuracy Precision Recall F-Measure MCC AUC
────────────────────────────────────────────────────────────────────────
Train 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
Test 100.00% 100.00% 100.00% 100.00% 100.00% 100.00%
00:00:52.915 [INFO] Parameters:
- class_weight = None
- criterion = entropy
- max_features = None
- max_leaf_nodes = None
- splitter = best
- random_state = 42
- max_depth = 2
$ dataset convert exp3-0 -n test-converted
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 674/674 samples • 0:00:07 • 0:00:00 • test-converted
00:00:12.439 [INFO] Converting to fileless dataset...
00:00:12.612 [INFO] Size of dataset: 343MB
00:00:12.621 [INFO] Computing features...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 674/674 samples • 0:02:04 • 0:00:00 • test-converted
00:02:17.507 [INFO] Size of new dataset: 697KB (compression factor: 503)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
failureIssue found in production while not necessarily being a mistakeIssue found in production while not necessarily being a mistake