[ENH] Add parameters and similarity measures to tSNE#2510
[ENH] Add parameters and similarity measures to tSNE#2510lanzagar merged 2 commits intobiolab:masterfrom
Conversation
fbecd39 to
9b2cedb
Compare
9b2cedb to
12f8c72
Compare
|
@lanzagar I've added the missing parameters to tSNE. This can probably be merged easily. I decided to leave out the Orange metrics because that would require many more changes e.g. setting precomputed parameters, computing PCA manually on the widget, all of which is doable, but should go into a separate PR. |
Codecov Report
@@ Coverage Diff @@
## master #2510 +/- ##
=========================================
+ Coverage 82.1% 82.1% +<.01%
=========================================
Files 328 328
Lines 56210 56220 +10
=========================================
+ Hits 46151 46161 +10
Misses 10059 10059 |
| n_iter = Setting(1000) | ||
|
|
||
| init_index = Setting(0) | ||
| init_values = [("random", "Random"), ("pca", "PCA")] |
There was a problem hiding this comment.
Can we leave the default init to be pca.
I would also switch their positions to have pca as the first (default) option and then random.
This would then also make it consistent with mds (pca = first & default)
There was a problem hiding this comment.
I know this goes against the sklearn defaults, but we already had pca as the only option before and I think it makes more sense for users (random can show a completely different figure each time). The consistency with mds is an added bonus.
There was a problem hiding this comment.
As far as I can tell, both the MDS widget and manifold MDS control pane have the option to initialize randomly. I have, however, made PCA the default value, since that is what MDS does in both cases. Is there a third MDS widget that I missed?
There was a problem hiding this comment.
This is fine now. That is what I meant - just change the order of pca and random, so that pca is default (but we still have both options, like in mds)
| self.lr_spin = self._create_spin_parameter( | ||
| "learning_rate", 1, 1000, "Learning rate:") | ||
| self.n_iter_spin = self._create_spin_parameter( | ||
| "n_iter", 0, 1e5, "Max iterations:") |
There was a problem hiding this comment.
Max iterations spin allows values as low as 0. But running it with <250 results in an error. Change min value of spin control to 250.
| "metric", "Metric:") | ||
| self.parameters["init"] = "pca" | ||
| self.perplexity_spin = self._create_spin_parameter( | ||
| "perplexity", 0, 1000, "Perplexity:") |
There was a problem hiding this comment.
I am not sure about this one, but sklearn doc says to consider values 5-50. I think spin limits of 1-100 should be fine?
| self.perplexity_spin = self._create_spin_parameter( | ||
| "perplexity", 0, 1000, "Perplexity:") | ||
| self.early_exaggeration_spin = self._create_spin_parameter( | ||
| "early_exaggeration", 1, 1000, "Early exaggeration:") |
There was a problem hiding this comment.
Not sure about this one either, but 1000 sounds really high. Probably 1-100 should be fine?
| perplexity = Setting(30) | ||
| early_exaggeration = Setting(4) | ||
| learning_rate = Setting(1000) | ||
| n_iter = Setting(1000) |
There was a problem hiding this comment.
I suggest we have the same defaults as the latest sklearn version (they have been changing them recently)
perplexity=30, e.e.=12, lr=200, iter=1000
There was a problem hiding this comment.
I didn't realize they had changed them. Fixed now.
12f8c72 to
554d55d
Compare
Issue
The Manifold Learning widget did provide tSNE, yet lacked all the important parameters with which to tune the embedding. Most notably, I find
learning_rateto be the most important parameter and has a huge effect on the result.Description of changes
Add relevant parameters (perplexity, early exaggeration, learning rate, max iterations and initialization).
Includes