Skip to content

Issue when training on a FileLessDataset #171

@clement-alloin-afk

Description

@clement-alloin-afk

When trying to train a model on a converted dataset

┌──[user@packing-box]──[/mnt/share]──[main|+2…12663]────────                 ────[172.17.0.2]──[16:14:19]────

$ model -v train test-converted -A dt
00:00:01.672 [DEBUG   ] model              creating Model(None)...
00:00:01.673 [INFO    ] model              Selected algorithm: Decision Tree
00:00:01.673 [DEBUG   ] model              Preparing dataset...
00:00:01.674 [INFO    ] model              Reference dataset:  test-converted
00:00:01.675 [INFO    ] model              Loading features...
00:00:01.764 [DEBUG   ] matplotlib         matplotlib data path: /home/user/.local/lib/python3.12/site-packages/matplotlib/mpl-data
00:00:01.781 [DEBUG   ] matplotlib         CONFIGDIR=/home/user/.config/matplotlib
00:00:01.784 [DEBUG   ] matplotlib         interactive is False
00:00:01.784 [DEBUG   ] matplotlib         platform is linux
00:00:01.857 [DEBUG   ] matplotlib         CACHEDIR=/home/user/.cache/matplotlib

00:00:02.116 [WARNING ] model              No selectable feature ; this may be due to a model unrelated to the input

A model was succesfully trained on the non-converted dataset and no error during the conversion :

┌──[user@packing-box]──[/mnt/share]──[main|+2…12663]───                          ────[172.17.0.2]──[16:09:13]────

$ model train exp3-0 -A dt
00:00:01.149 [INFO] Selected algorithm: Decision Tree
00:00:01.150 [INFO] Reference dataset:  exp3-0
00:00:01.150 [INFO] Computing features...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 674/674 samples • 0:00:44 • 0:00:00
00:00:47.245 [INFO] Making pipeline...
00:00:47.296 [INFO] Training model...
00:00:52.729 [INFO] [1/2] MinMaxScaler
00:00:52.747 [INFO] [2/2] Decision Tree

Name: exp3-0_msdos-pe32-pe64_674_dt_f137
Classification metrics                                                    
                                                                          
    .     Accuracy   Precision   Recall    F-Measure     MCC       AUC    
 ──────────────────────────────────────────────────────────────────────── 
  Train   100.00%    100.00%     100.00%   100.00%     100.00%   100.00%  
  Test    100.00%    100.00%     100.00%   100.00%     100.00%   100.00%  
                                                                          
00:00:52.915 [INFO] Parameters:
- class_weight = None
- criterion = entropy
- max_features = None
- max_leaf_nodes = None
- splitter = best
- random_state = 42
- max_depth = 2

$ dataset convert exp3-0 -n test-converted
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 674/674 samples • 0:00:07 • 0:00:00 • test-converted
00:00:12.439 [INFO] Converting to fileless dataset...
00:00:12.612 [INFO] Size of dataset:     343MB
00:00:12.621 [INFO] Computing features...
100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 674/674 samples • 0:02:04 • 0:00:00 • test-converted
00:02:17.507 [INFO] Size of new dataset: 697KB (compression factor: 503)

Metadata

Metadata

Assignees

Labels

failureIssue found in production while not necessarily being a mistake

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions