Add option for numba acceleration. Improve tests by MattScicluna · Pull Request #73 · KrishnaswamyLab/graphtools

MattScicluna · 2025-08-19T15:56:23Z

No description provided.

…butes). Added random_landmarking attribute

bjoaofelipe · 2025-08-27T00:08:43Z

Ran into two errors:

======================================================================
ERROR: test_random_landmarking.test_random_landmarking_with_non_euclidean_distances
Test random landmarking with non-euclidean distance metrics

Traceback (most recent call last):
File "/home/runner/work/graphtools/graphtools/test/test_random_landmarking.py", line 383, in test_random_landmarking_with_non_euclidean_distances
warnings.warn(
UserWarning: Random landmarking may not be respecting the distance parameter. All distance metrics produced identical cluster assignments.

======================================================================
FAIL: test_landmark.test_set_params

Traceback (most recent call last):
File "/home/runner/work/graphtools/graphtools/test/test_landmark.py", line 205, in test_set_params
assert G.get_params() == {
^^^^^^^^^^^^^^^^^^^
AssertionError

bjoaofelipe · 2025-08-27T14:25:39Z

The distance parameter is not being used. Need to start passing it downstream. Every algorithm uses the default euclidean distance

bjoaofelipe · 2025-08-27T14:47:24Z

Found the error on:

    def build_landmark_op(self):
        """Build the landmark operator

            Calculates spectral clusters on the kernel, and calculates transition
            probabilities between cluster centers by using transition probabilities
            between samples assigned to each cluster.
            
            random_landmarking:
            This method randomly selects n_landmark points and assigns each sample to its nearest landmark
            using Euclidean distance .

        """
        if self.random_landmarking :
            with _logger.log_task("landmark operator"):
                is_sparse = sparse.issparse(self.kernel)
                n_samples = self.data.shape[0]
                rng = np.random.default_rng(self.random_state)
                landmark_indices = rng.choice(n_samples, self.n_landmark, replace=False)
                data = self.data if not hasattr(self, 'data_nu') else self.data_nu # because of the scaling to review
                if n_samples > 5000:   # sklearn.euclidean_distances is faster than cdist for big dataset 
                    distances = euclidean_distances(data, data[landmark_indices])
                else:
                    distances = cdist(data, data[landmark_indices], metric="euclidean")
                self._clusters = np.argmin(distances, axis=1)

We are forcing euclidean distances here, which is not the expected behavior

…ith and without numba are close enough

…arameter

MattScicluna · 2025-08-31T17:39:51Z

Found the error on:

    def build_landmark_op(self):
        """Build the landmark operator

            Calculates spectral clusters on the kernel, and calculates transition
            probabilities between cluster centers by using transition probabilities
            between samples assigned to each cluster.
            
            random_landmarking:
            This method randomly selects n_landmark points and assigns each sample to its nearest landmark
            using Euclidean distance .

        """
        if self.random_landmarking :
            with _logger.log_task("landmark operator"):
                is_sparse = sparse.issparse(self.kernel)
                n_samples = self.data.shape[0]
                rng = np.random.default_rng(self.random_state)
                landmark_indices = rng.choice(n_samples, self.n_landmark, replace=False)
                data = self.data if not hasattr(self, 'data_nu') else self.data_nu # because of the scaling to review
                if n_samples > 5000:   # sklearn.euclidean_distances is faster than cdist for big dataset 
                    distances = euclidean_distances(data, data[landmark_indices])
                else:
                    distances = cdist(data, data[landmark_indices], metric="euclidean")
                self._clusters = np.argmin(distances, axis=1)

We are forcing euclidean distances here, which is not the expected behavior

I added a test to test_landmark 04256c3 to confirm that this unexpected behaviour happens for both random landmarking and the default.

bjoaofelipe · 2025-08-31T21:19:49Z

I will create a new isue mentioning that distances are being ignored by the build_graph class.

MattScicluna added 3 commits August 19, 2025 09:58

added numba to graphtools graph construction. added unittest for numba

81eda46

fixed small bug with graphs (so random_landmark graphs keep all attri…

d7220a8

…butes). Added random_landmarking attribute

added tests for random_landmarking

283f23a

MattScicluna mentioned this pull request Aug 19, 2025

Random landmarking #70

Merged

Merge branch 'master' into use_numba

42ae1a0

bjoaofelipe marked this pull request as ready for review August 27, 2025 00:05

bjoaofelipe closed this Aug 27, 2025

bjoaofelipe reopened this Aug 27, 2025

MattScicluna added 9 commits August 31, 2025 11:27

added handling of missing prange when numba not installed

121c8f1

Merge remote-tracking branch 'origin/use_numba' into use_numba

21787b9

fixed bandwidth computation when is scalar.

47cc2bc

changed test_exact to not use numba

95df032

changed test_knn to not use numba

19a89cb

fixed test_landmark to include random_landmarking in test_set_params

64adbcf

fixed test_numba edge cases test. Added test to confirm that values w…

e8f3765

…ith and without numba are close enough

updated test_mnn to disable numba

0425622

added test to confirm regular landmarking is also ignoring distance p…

04256c3

…arameter

MattScicluna changed the title ~~Add option for numba acceleration~~ Add option for numba acceleration. Improve tests Aug 31, 2025

bjoaofelipe merged commit c59357b into KrishnaswamyLab:master Aug 31, 2025
1 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option for numba acceleration. Improve tests#73

Add option for numba acceleration. Improve tests#73
bjoaofelipe merged 13 commits intoKrishnaswamyLab:masterfrom
MattScicluna:use_numba

MattScicluna commented Aug 19, 2025

Uh oh!

bjoaofelipe commented Aug 27, 2025

Uh oh!

bjoaofelipe commented Aug 27, 2025

Uh oh!

bjoaofelipe commented Aug 27, 2025

Uh oh!

MattScicluna commented Aug 31, 2025 •

edited

Loading

Uh oh!

bjoaofelipe commented Aug 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

MattScicluna commented Aug 19, 2025

Uh oh!

bjoaofelipe commented Aug 27, 2025

====================================================================== ERROR: test_random_landmarking.test_random_landmarking_with_non_euclidean_distances Test random landmarking with non-euclidean distance metrics

====================================================================== FAIL: test_landmark.test_set_params

Uh oh!

bjoaofelipe commented Aug 27, 2025

Uh oh!

bjoaofelipe commented Aug 27, 2025

Uh oh!

MattScicluna commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bjoaofelipe commented Aug 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

======================================================================
ERROR: test_random_landmarking.test_random_landmarking_with_non_euclidean_distances
Test random landmarking with non-euclidean distance metrics

======================================================================
FAIL: test_landmark.test_set_params

MattScicluna commented Aug 31, 2025 •

edited

Loading