Remove Orange implementation of randomized PCA#6815
Remove Orange implementation of randomized PCA#6815markotoplak merged 5 commits intobiolab:masterfrom
Conversation
scikit-learn now supports PCA on sparse matrices without densifying them first. This PR basically reverts biolab#3532. At the time this was implemented, scikit-learn did not support this yet.
|
@markotoplak Was this the only thing blocking scikit-learn=1.5.0? If so, should I also change the dependency? |
|
Better sparse support was added to scikit-learn in 1.4.0 (https://scikit-learn.org/dev/auto_examples/release_highlights/plot_release_highlights_1_4_0.html#improved-memory-and-runtime-efficiency-for-pca-on-sparse-data), hence the new minimum version bump. |
|
I've sifted through some of the failing tests. On macos, we now have the following curious error. scikit-learn now supports sparse data, but uses the arpack solver (here), which requires that the number of components is less than the number of rows/columns in the data. This is now a problem, because in Orange, we allow the user to request as many principal components as there are columns, hence the failure. Some windows and ubuntu failures seem unrelated to the changes here. |
|
It seems to me that most of the tests are now failing for different reasons. |
This PR basically reverts #3532.
At the time this was implemented, scikit-learn did not support PCA on sparse matrices without first densifying them. It has since added this support.
Issue
PCA failing on scikit-learn 1.5 due to PCA internal API changes.
Description of changes
The reason for our own custom PCA implementation (proper sparse support) is now invalid, since scikit-learn properly handles sprase data now. Therefore, I just removed our implementation of PCA and we now rely purely on scikit-learn. Our implentation was always meant to be temporary.
Includes