-
Notifications
You must be signed in to change notification settings - Fork 29
Description
JASP Version
0.19.1
Commit ID
84d54b934fa27731bb9eec44a4aa5f7ab0744dfd
JASP Module
Machine Learning
What analysis are you seeing the problem on?
Machine Learning > Prediction
What OS are you seeing the problem on?
Windows 11
Bug Description
This bug report is probably related to Bug report #2978 which was also submitted by me 3 weeks ago and it is closed now.
I am using JASP 0.19.2.0 which is a nightly build.
I open a Training database (CSV file) and train a Random forest model. Then I save the trained model. Then I open a test database which is exactly the same format as the training dataset (I know for sure because they were part of the same worksheet which I split into a training and a test dataset). Then I load the trained model and I try to get predictions for the test dataset, but the Prediction table does not load. The error says that a predictor in the test data set is of different format from its format in the training dataset. But this is simply not true.
Expected Behaviour
The Prediction table should load.
Steps to Reproduce
-
Open JASP and load the IndivLoansTrainingSample.csv file (cannot attach it because it is confidential)
-
Review the uploaded data file. JASP automatically assigns a type to each variable. It assigns Ordinal type to some variables, but it seems that Ordinal is not acceptable in the Random forest model, is it? I think this is a bug.
-
Optional: Change the type of some variables. For example, from Ordinal to Nominal or from Nominal to Scale. If JASP erroneously considers a Scale variable as Nominal, it will have a huge effect on the Random Forest model, will it not?
-
Open the Machine Learning module and train a Random forest model. (I am attaching the trained model)
-
Open another instance of JASP and load the IndivLoansTestSample.csv file (cannot attach it because it is confidential)
-
Review the uploaded data. Make sure that all variables in the Test dataset are exactly the same format as in the Training dataset. Change data types if needed.
-
Load the Machine Learning>Prediction>Prediction module
-
Load the trained model
-
Build the prediction table by picking the right predictors from the trained model. When you add all of the required predictors from the training model (not all variables in the dataset were used to train a model), you will get a message that a variable is in a different format. stop('Type of predictors in new data do not match that of the training data.')
-
The bug is still there even if I do not alter the type of any variable in the Training and Test sets. Even if I pick only several among those variables which were correctly recognized by JASP, I still get the same error, which is absurd because the predictors' type was automatically recognized by JASP and they were the same type, I checked it many times!
-
I have a suggestion: in the Machine Learning>Prediction>Prediction module, when I load the trained model, the prediction table tells me which predictors it expects me to load. Why don't you add the type of variable which JASP expects for each predictor? Let's say Length (Scale), TypeOfBondage (Nominal), etc. This will save a lot of nerves!
-
Also, if Ordinal variables are not acceptable in the Random Forest algorithm, why don't you prevent their usage?
...
Log (if any)
-------- Application Info --------
JASP Version: JASP 0.19.2
Build Branch: HEAD
Build Date: Nov 26 2024 18:09:03 (Netherlands)
Last Commit: 84d54b934fa27731bb9eec44a4aa5f7ab0744dfd
-------- Basic Info --------
Operating System: Windows 11 Version 23H2
Product Version: 11
Kernel Type: winnt
Kernel Version: 10.0.22631
Architecture: x86_64
Install Path: D:/Program Files/JASP
Platfotm Name: windows
System Local: bg_BG
-------- Extra Info --------
Current code page
Active code page: 437
Active code page: 65001
Host Name: SHOSHOCI
OS Name: Microsoft Windows 11 Pro
OS Version: 10.0.22631 N/A Build 22631
OS Manufacturer: Microsoft Corporation
OS Configuration: Standalone Workstation
OS Build Type: Multiprocessor Free
Registered Owner: 359898893538
Registered Organization:
Product ID: 00330-52813-47920-AAOEM
Original Install Date: 31.1.2023 г., 12:22:04
System Boot Time: 27.11.2024 г., 9:44:36
System Manufacturer: LENOVO
System Model: 82LM
System Type: x64-based PC
Processor(s): 1 Processor(s) Installed.
[01]: AMD64 Family 23 Model 104 Stepping 1 AuthenticAMD ~2100 Mhz
BIOS Version: LENOVO G5CN64WW(V2.10), 6.10.2022 г.
Windows Directory: C:\Windows
System Directory: C:\Windows\system32
Boot Device: \Device\HarddiskVolume1
System Locale: en-us;English (United States)
Input Locale: en-us;English (United States)
Time Zone: (UTC+02:00) Helsinki, Kyiv, Riga, Sofia, Tallinn, Vilnius
Total Physical Memory: 15 706 MB
Available Physical Memory: 8 855 MB
Virtual Memory: Max Size: 16 730 MB
Virtual Memory: Available: 7 726 MB
Virtual Memory: In Use: 9 004 MB
Page File Location(s): C:\pagefile.sys
Domain: WORKGROUP
Logon Server: \SHOSHOCI
Hotfix(s): 5 Hotfix(s) Installed.
[01]: KB5045935
[02]: KB5012170
[03]: KB5027397
[04]: KB5046633
[05]: KB5044620
Network Card(s): 2 NIC(s) Installed.
[01]: Realtek 8822CE Wireless LAN 802.11ac PCI-E NIC
Connection Name: Wi-Fi
Status: Media disconnected
[02]: Realtek USB GbE Family Controller
Connection Name: Ethernet
DHCP Enabled: Yes
DHCP Server: 192.168.1.1
IP address(es)
[01]: 192.168.1.14
[02]: fe80::f73:9fc3:5374:b2df
[03]: fda9:de81:d862:0:bdaa:acda:e64e:528a
[04]: fda9:de81:d862:0:d3ed:28cb:8c0e:2133
Hyper-V Requirements: A hypervisor has been detected. Features required for Hyper-V will not be displayed.
JASP 2024-11-27 14_21_05 Desktop.log
JASP 2024-11-27 14_21_05 Engine 1.log
More Debug Information
This is the error message which I get when the Prediction table fails to load:
This analysis terminated unexpectedly.
Error in randomForest:::predict.randomForest(model, newdata = dataset): Type of predictors in new data do not match that of the training data.
Stack trace
analysis(jaspResults = jaspResults, dataset = dataset, options = options)
.mlPredictionsTable(model, dataset, options, jaspResults, ready, position = 2)
.mlPredictionsState(model, dataset, options, jaspResults, ready)
createJaspState(.mlPredictionGetPredictions(model, dataset))
jaspStateR$new(object = object, dependencies = dependencies)
initialize(...)
.mlPredictionGetPredictions(model, dataset)
.mlPredictionGetPredictions.randomForest(model, dataset)
randomForest:::predict.randomForest(model, newdata = dataset)
stop('Type of predictors in new data do not match that of the training data.')
To receive assistance with this problem, please report the message above at: https://jasp-stats.org/bug-reports
Final Checklist
- I have included a screenshot showcasing the issue, if possible.
- I have included a JASP file (zipped) or data file that causes the crash/bug, if applicable.
- I have accurately described the bug, and steps to reproduce it.