Skip to content

Remove Orange implementation of randomized PCA#6815

Merged
markotoplak merged 5 commits intobiolab:masterfrom
pavlin-policar:remove-randomized-pca
May 31, 2024
Merged

Remove Orange implementation of randomized PCA#6815
markotoplak merged 5 commits intobiolab:masterfrom
pavlin-policar:remove-randomized-pca

Conversation

@pavlin-policar
Copy link
Collaborator

This PR basically reverts #3532.

At the time this was implemented, scikit-learn did not support PCA on sparse matrices without first densifying them. It has since added this support.

Issue

PCA failing on scikit-learn 1.5 due to PCA internal API changes.

Description of changes

The reason for our own custom PCA implementation (proper sparse support) is now invalid, since scikit-learn properly handles sprase data now. Therefore, I just removed our implementation of PCA and we now rely purely on scikit-learn. Our implentation was always meant to be temporary.

Includes
  • Code changes
  • Tests
  • Documentation

scikit-learn now supports PCA on sparse matrices without densifying them first. This PR basically reverts biolab#3532. At the time this was implemented, scikit-learn did not support this yet.
@pavlin-policar
Copy link
Collaborator Author

@markotoplak Was this the only thing blocking scikit-learn=1.5.0? If so, should I also change the dependency?

@pavlin-policar
Copy link
Collaborator Author

Better sparse support was added to scikit-learn in 1.4.0 (https://scikit-learn.org/dev/auto_examples/release_highlights/plot_release_highlights_1_4_0.html#improved-memory-and-runtime-efficiency-for-pca-on-sparse-data), hence the new minimum version bump.

@pavlin-policar
Copy link
Collaborator Author

I've sifted through some of the failing tests.

On macos, we now have the following curious error. scikit-learn now supports sparse data, but uses the arpack solver (here), which requires that the number of components is less than the number of rows/columns in the data. This is now a problem, because in Orange, we allow the user to request as many principal components as there are columns, hence the failure.

Some windows and ubuntu failures seem unrelated to the changes here.

@pavlin-policar
Copy link
Collaborator Author

It seems to me that most of the tests are now failing for different reasons.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants