Skip to content

Conversation

@mfalkiewicz
Copy link

Refactoring the kernel normalization and transition matrix computation as independent function will allow for more code re-use.

metric=self.metric)
return self.affinity_matrix_
self.affinity_matrix_ = self.affinity(X)
return self.affinity_matrix_
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function description should be modified if this function is not going to return the affinity matrix.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-added the return of affinity_matrix_, but not as a field in self. The reason is that in the code I am currently writing I want to be able to bind results of several affinity calculations to different fields in the object.

raise ImportError('Checks require scikit-learn, but not found')

ndim = L.shape[0]
if overwrite:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overwrite is no longer supported in the rewrite. i think this should still be used as a memory saving option.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mfalkiewicz - this is not supported in the refactor

k = int(max(2, np.round(D.shape[0] * 0.01)))
eps = 2 * np.median(np.sort(D, axis=0)[k+1, :])**2
k = int(max(2, np.round(D.shape[0] * f_neighbors)))
eps = np.median(np.sort(D, axis=0)[0:k+1, :]**2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i remember this correctly, eps is based on the k+1 th entry rather than than everything upto that point. it basically provides a scaling factor given the sorted distance.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach currently implemented is more liberal than the one I propose and both of them are correct. All manifold learning approaches are based on local similarities and these should be emphasized - I think the median of all the values up to the k-th captures this property more accurately. Besides, I think that parametrization of the kernel in terms of average distances to a certain fraction of nearest neighbors is more natural, because it doesn't depend on the scale of the affinities that one uses.

L_alpha = d_alpha[:, np.newaxis] * L_alpha

M = L_alpha
M = _compute_markov_matrix(L, use_sparse = use_sparse, alpha = alpha)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

M = _compute_markov_matrix(L if overwrite else L.copy(), use_sparse = use_sparse, alpha = alpha)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding this comment and overwrite - I agree that it would be nice to have a memory saving option, but I am not sure if this refactor will allow to manipulate that. I don't know the details of the lexical scope of Python, but maybe you could enlighten me. _compute_markov_matrix creates a local binding A. If we pass an alias as an argument (L) rather than the object itself (L.copy()), is A also going to be an alias? In other words, if we pass an alias to variable that is in the lexical scope of function compute_diffusion_map as an argument to function _compute_markov_matrix, is _compute_markov_matrix going to mutate the object that the alias refers to OR create it's local copy and mutate that instead?

Sorry for slightly confusing description, but I don't know how to phrase this in a simpler way.

@satra
Copy link
Collaborator

satra commented Jan 2, 2018

@mfalkiewicz - should #13 still be a PR?

@mfalkiewicz
Copy link
Author

@satra: no, I merged them both together and deleted #13.

@satra
Copy link
Collaborator

satra commented Feb 1, 2018

also tests are not passing - since the markov computation has changed.

i would recommend creating a second markov option with your changes, unless you think the current implementation is an error. this will help keep the api consistent. and you can create a new set of embedding tests with the alternate calculation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants