@@ -4,7 +4,8 @@ High-performance dimensionality reduction for Ruby, powered by the [annembed](ht
44
55## Features
66
7- - ** UMAP algorithm** : State-of-the-art dimensionality reduction
7+ - ** Multiple algorithms** : UMAP, t-SNE, LargeVis, and Diffusion Maps for dimensionality reduction
8+ - ** SVD** : Randomized Singular Value Decomposition for linear dimensionality reduction
89- ** High performance** : Leverages Rust's speed and parallelization
910- ** Easy to use** : Simple, scikit-learn-like API
1011- ** Model persistence** : Save and load trained models
@@ -31,37 +32,169 @@ Or install it yourself as:
3132- Ruby 2.7 or higher
3233- Rust toolchain (for building from source)
3334
34- ## Quick Start
35+ ## Quick Start - Interactive Example
36+
37+ Copy and paste this into IRB to try out the main features:
3538
3639``` ruby
3740require ' annembed'
41+ require ' annembed/svd' # For SVD functionality
3842
39- # Generate some sample data (2D array)
43+ # Generate sample high-dimensional data
44+ # Imagine this is text embeddings, image features, or any high-dim data
45+ puts " Creating sample data: 100 points in 50 dimensions"
4046data = Array .new (100 ) { Array .new (50 ) { rand } }
4147
42- # Create a UMAP instance
43- umap = AnnEmbed ::UMAP .new (n_components: 2 , n_neighbors: 15 )
48+ # ============================================================
49+ # 1. UMAP - State-of-the-art non-linear dimensionality reduction
50+ # ============================================================
51+ puts " \n 1. UMAP - Reducing to 2D for visualization"
52+ embedder = AnnEmbed ::Embedder .new (
53+ method: :umap ,
54+ n_components: 2 , # Reduce to 2D
55+ n_neighbors: 15 # Balance local/global structure
56+ )
57+
58+ # Fit and transform the data
59+ umap_result = embedder.fit_transform(data)
60+ puts " Shape: #{ umap_result.size } points × #{ umap_result.first.size } dimensions"
61+ puts " First point: [#{ umap_result.first.map { |v | v.round(3 ) }.join(' , ' ) } ]"
4462
45- # Fit and transform in one step
46- embedding = umap.fit_transform(data)
63+ # Save the trained model
64+ embedder.save(" umap_model.bin" )
65+ puts " Model saved to umap_model.bin"
66+
67+ # ============================================================
68+ # 2. t-SNE - Popular for visualization, especially clusters
69+ # ============================================================
70+ puts " \n 2. t-SNE - Alternative visualization method"
71+ tsne = AnnEmbed ::Embedder .new (
72+ method: :tsne ,
73+ n_components: 2 ,
74+ perplexity: 30.0 # Balances local/global structure
75+ )
4776
48- # Or fit and transform separately
49- umap.fit(data)
50- embedding = umap.transform(data)
77+ tsne_result = tsne.fit_transform(data)
78+ puts " Shape: #{ tsne_result.size } points × #{ tsne_result.first.size } dimensions"
79+ puts " First point: [#{ tsne_result.first.map { |v | v.round(3 ) }.join(' , ' ) } ]"
80+
81+ # ============================================================
82+ # 3. SVD - Fast linear dimensionality reduction
83+ # ============================================================
84+ puts " \n 3. SVD - Linear dimensionality reduction (like PCA)"
85+ # Reduce to top 10 components
86+ u, s, vt = AnnEmbed .svd(data, 10 , n_iter: 2 )
87+ puts " U shape: #{ u.size } ×#{ u.first.size } (transformed data)"
88+ puts " S values: [#{ s[0 ..2 ].map { |v | v.round(2 ) }.join(' , ' ) } , ...]"
89+ puts " V^T shape: #{ vt.size } ×#{ vt.first.size } (components)"
90+
91+ # The reduced data is in U
92+ svd_result = u
93+ puts " First point: [#{ svd_result.first[0 ..2 ].map { |v | v.round(3 ) }.join(' , ' ) } , ...]"
94+
95+ # ============================================================
96+ # 4. Transform new data with a trained model
97+ # ============================================================
98+ puts " \n 4. Transforming new data with saved UMAP model"
99+ # Load the saved model
100+ loaded = AnnEmbed ::Embedder .load (" umap_model.bin" )
101+
102+ # New data (5 new points)
103+ new_data = Array .new (5 ) { Array .new (50 ) { rand } }
104+ new_embedding = loaded.transform(new_data)
105+ puts " New data shape: #{ new_embedding.size } ×#{ new_embedding.first.size } "
106+ puts " First new point: [#{ new_embedding.first.map { |v | v.round(3 ) }.join(' , ' ) } ]"
107+
108+ # ============================================================
109+ # 5. Comparison - Which method to use?
110+ # ============================================================
111+ puts " \n 5. Quick comparison:"
112+ puts " UMAP: Best for preserving both local and global structure"
113+ puts " t-SNE: Great for visualizing clusters, but slower"
114+ puts " SVD: Fastest, linear, good for denoising or pre-processing"
115+
116+ # ============================================================
117+ # 6. Practical tip: Reduce dimensions for faster similarity search
118+ # ============================================================
119+ puts " \n 6. Example: Speeding up similarity search"
120+ # Original: 100 points × 50 dimensions = 5000 numbers to store
121+ # After UMAP: 100 points × 2 dimensions = 200 numbers to store
122+ # That's 25× less storage and faster distance calculations!
123+
124+ puts " \n Storage comparison:"
125+ puts " Original: #{ data.size * data.first.size } floats"
126+ puts " After UMAP: #{ umap_result.size * umap_result.first.size } floats"
127+ puts " Reduction: #{ ((1 - (umap_result.first.size.to_f / data.first.size)) * 100 ).round(1 ) } %"
128+
129+ puts " \n ✅ Done! You've just reduced 50D data to 2D using three different methods!"
130+ ```
51131
52- # Check if model is fitted
53- puts " Model fitted: #{ umap.fitted? } "
132+ ## Quick Start - Simplified API
54133
55- # Save the model for later use
56- umap.save(" model.bin" )
134+ For convenience, you can also use the simplified API:
57135
58- # Load and use a saved model
59- loaded_umap = AnnEmbed ::UMAP .load (" model.bin" )
60- new_embedding = loaded_umap.transform(new_data)
136+ ``` ruby
137+ require ' annembed'
138+ require ' annembed/svd'
139+
140+ # Generate sample data
141+ data = Array .new (100 ) { Array .new (50 ) { rand } }
142+
143+ # One-line dimensionality reduction
144+ umap_2d = AnnEmbed .umap(data, n_components: 2 )
145+ tsne_2d = AnnEmbed .tsne(data, n_components: 2 )
146+ u, s, vt = AnnEmbed .svd(data, 10 ) # Top 10 components
147+
148+ # Results are ready to use!
149+ puts " UMAP result: #{ umap_2d.first } "
150+ puts " t-SNE result: #{ tsne_2d.first } "
151+ puts " SVD result: #{ u.first } "
61152```
62153
63154## API Reference
64155
156+ ### AnnEmbed::Embedder
157+
158+ The universal class for all dimensionality reduction algorithms.
159+
160+ ``` ruby
161+ # Create an embedder with any supported method
162+ embedder = AnnEmbed ::Embedder .new (
163+ method: :umap , # :umap, :tsne, :largevis, or :diffusion
164+ n_components: 2 , # Target dimensions
165+ ** options # Method-specific options
166+ )
167+
168+ # Methods work the same for all algorithms
169+ result = embedder.fit_transform(data)
170+ embedder.save(" model.bin" )
171+ loaded = AnnEmbed ::Embedder .load (" model.bin" )
172+ ```
173+
174+ ### AnnEmbed::SVD
175+
176+ Randomized Singular Value Decomposition for fast linear dimensionality reduction.
177+
178+ ``` ruby
179+ # Perform SVD
180+ u, s, vt = AnnEmbed .svd(matrix, k, n_iter: 2 )
181+
182+ # Parameters:
183+ # matrix: 2D array of data
184+ # k: Number of components to keep
185+ # n_iter: Number of iterations for randomized algorithm (default: 2)
186+
187+ # Returns:
188+ # u: Left singular vectors (transformed data)
189+ # s: Singular values (importance of each component)
190+ # vt: Right singular vectors transposed (components)
191+
192+ # Example: Reduce 100×50 matrix to 100×10
193+ data = Array .new (100 ) { Array .new (50 ) { rand } }
194+ u, s, vt = AnnEmbed .svd(data, 10 )
195+ reduced_data = u # This is your reduced 100×10 data
196+ ```
197+
65198### AnnEmbed::UMAP
66199
67200The main class for UMAP dimensionality reduction.
0 commit comments