This repository contains a comprehensive benchmark analysis comparing the performance of CPU-based scikit-learn against GPU-accelerated RAPIDS cuML for a K-Nearest Neighbors (KNN) task.
The analysis is broken into two parts, each in its own notebook:
- Analysis 1: Scaling by Data Size (N): Compares performance on a small dataset (~9.7k items) vs. a large dataset (~87k items) while holding
Kconstant. - Analysis 2: Scaling by Workload (K): Compares performance on both datasets while varying the number of neighbors (
K=10, 50, 100).
Notebook 1: Data Size (N) Scaling Benchmark
Notebook 2: Data Size (N) and Workload (K) Scaling Benchmark
This benchmark answers: "How much faster is the GPU as the dataset gets bigger?"
The GPU's advantage increases dramatically with data size. On the large dataset, the GPU was ~81x faster, completing a 7.6 billion comparison task in 1.92 seconds versus the CPU's 155 seconds (2.6 minutes).
| Dataset | Library | Time (s) | Speedup |
|---|---|---|---|
| Small (9,742 movies) | scikit-learn (CPU) | 1.8290 | - |
| Small (9,742 movies) | cuML (GPU) | 0.0371 | 49.3x |
| Latest (87,585 movies) | scikit-learn (CPU) | 155.5607 | - |
| Latest (87,585 movies) | cuML (GPU) | 1.9223 | 80.9x |

Note: This chart (from the first notebook) shows results from an earlier run. The table reflects the most recent data.
This benchmark answers: "How does performance change as we ask for more neighbors (K)?"
Changing K from 10 to 100 has a negligible impact on performance for both the CPU and GPU. This proves that for a brute-force
| K | cuML (GPU) | scikit-learn (CPU) | Speedup (x) | |
|---|---|---|---|---|
| Latest (87,585 movies) | 10 | 1.92 | 155.56 | 80.92 |
| 50 | 1.98 | 154.66 | 78.07 | |
| 100 | 2.04 | 154.93 | 76.13 | |
| Small (9,742 movies) | 10 | 0.04 | 1.83 | 49.35 |
| 50 | 0.04 | 1.83 | 48.45 | |
| 100 | 0.04 | 1.89 | 49.13 |
Both analyses use the same core methodology to compare the time to find the K-Nearest Neighbors for every single item in the dataset.
| Parameter | Detail |
|---|---|
| Task | Benchmark K-Nearest Neighbors (KNN) |
| Datasets |
|
| Similarity Metric | Jaccard Similarity |
| Algorithm |
brute-force (exhaustive |
| CPU Library | scikit-learn |
| GPU Library | RAPIDS cuML |
| GPU Hardware | NVIDIA T4 (via Google Colab) |
Both libraries use a "brute-force" algorithm, which means for a dataset with
-
MovieLens Small (
$N \approx 9,700$ ): Total Comparisons$\approx \frac{N^2}{2} \approx \frac{9,700^2}{2} \approx$ 47 Million Comparisons! -
MovieLens Latest (
$N \approx 87,000$ ): Total Comparisons$\approx \frac{N^2}{2} \approx \frac{87,000^2}{2} \approx$ 7.6 Billion Comparisons!
The similarity metric used is Jaccard Similarity, which calculates the ratio of shared genres to the total unique genres between two movies:
Where
The notebooks are designed to run in Google Colab.
- Click the "Open In Colab" badges at the top of this README to open your desired notebook.
- In the Colab notebook, go to Runtime > Change runtime type and select
T4 GPU(or any other available GPU). - Run all cells in order from top to bottom.
- This project was prepared for the CMP5104: Recommender Systems master's class under the guidance of Doç. Dr. Tevfik Aytekin.
- This benchmark uses datasets provided by the GroupLens research lab at the University of Minnesota.
- This project is powered by the NVIDIA RAPIDS open-source software libraries.
Read RecSys Best Paper Awards: https://recsys.acm.org/best-papers/
