@@ -103,39 +103,65 @@ def benchmark(
103103 lp_train_idx : Optional [Float [torch .Tensor , "n_samples," ]] = None ,
104104 lp_test_idx : Optional [Float [torch .Tensor , "n_samples," ]] = None ,
105105) -> Dict [str , float ]:
106- """
107- Benchmarks various machine learning models on a dataset using a product manifold structure.
106+ """Benchmarks various machine learning models on Riemannian manifold datasets.
107+
108+ Evaluates and compares different machine learning models on datasets with a
109+ product manifold structure, providing metrics for their performance.
108110
109111 Args:
110- X (batch, dim): Input tensor of features
111- y (batch,): Input tensor of labels.
112- pm: The defined product manifold for benchmarks.
113- split: Data splitting strategy ('train_test' or 'cross_val') .
114- device: Device for computation ('cpu' , 'cuda ', 'mps ').
115- score: Scoring metric for model evaluation ('accuracy', 'f1-micro', etc.) .
112+ X: Tensor of input features with shape (batch, dim).
113+ y: Tensor of target labels with shape (batch,) .
114+ pm: ProductManifold object defining the geometric structure for benchmarks.
115+ device: Device for computation. Options: 'cpu', 'cuda', 'mps'. Defaults to 'cpu' .
116+ score: List of scoring metrics for model evaluation (e.g. , 'accuracy ', 'f1-micro ').
117+ Defaults to None .
116118 models: List of model names to evaluate. Options include:
117- * "sklearn_dt": Decision tree from scikit-learn.
118- * "sklearn_rf": Random forest from scikit-learn.
119- * "product_dt": Product space decision tree.
120- * "product_rf": Product space random forest.
121- * "tangent_dt": Decision tree on tangent space.
122- * "tangent_rf": Random forest on tangent space.
123- * "knn": k-nearest neighbors.
124- * "ps_perceptron": Product space perceptron.
125- max_depth: Maximum depth of tree-based models in integer.
126- n_estimators: Integer number of estimators for random forest models.
127- min_samples_split: Minimum number of samples required to split an internal node.
128- min_samples_leaf: Minimum number of samples in a leaf node.
129- task: Task type ('classification' or 'regression').
130- seed: Random seed for reproducibility.
131- use_special_dims: Boolean for whether to use special manifold dimensions.
132- n_features: Feature dimensionality type ('d' or 'd_choose_2').
133- X_train, X_test, y_train, y_test: Training and testing datasets, X: feature, y: label.
134- batch_size: Batch size for certain models.
119+ * "sklearn_dt": Decision tree from scikit-learn
120+ * "sklearn_rf": Random forest from scikit-learn
121+ * "product_dt": Product space decision tree
122+ * "product_rf": Product space random forest
123+ * "tangent_dt": Decision tree on tangent space
124+ * "tangent_rf": Random forest on tangent space
125+ * "knn": k-nearest neighbors
126+ * "ps_perceptron": Product space perceptron
127+ Defaults to None.
128+ max_depth: Maximum depth of tree-based models. Defaults to 5.
129+ n_estimators: Number of estimators for ensemble models. Defaults to 12.
130+ min_samples_split: Minimum samples required to split an internal node. Defaults to 2.
131+ min_samples_leaf: Minimum samples required in a leaf node. Defaults to 1.
132+ task: Type of machine learning task. Options: 'classification' or 'regression'.
133+ Defaults to 'classification'.
134+ seed: Random seed for reproducibility. Defaults to None.
135+ use_special_dims: Whether to use special manifold dimensions. Defaults to False.
136+ n_features: Feature dimensionality type. Options: 'd' or 'd_choose_2'.
137+ Defaults to 'd_choose_2'.
138+ X_train: Training feature tensor with shape (n_samples, n_manifolds).
139+ If provided, overrides split from X. Defaults to None.
140+ X_test: Testing feature tensor with shape (n_samples, n_manifolds).
141+ If provided, used with X_train. Defaults to None.
142+ y_train: Training labels tensor with shape (n_samples,).
143+ Must be provided if X_train is given. Defaults to None.
144+ y_test: Testing labels tensor with shape (n_samples,).
145+ Must be provided if X_test is given. Defaults to None.
146+ batch_size: Batch size for neural network models. Defaults to None.
147+ adj: Adjacency matrix for graph-based models with shape (n_nodes, n_nodes).
148+ Defaults to None.
149+ A_train: Training adjacency matrix with shape (n_samples, n_samples).
150+ Defaults to None.
151+ A_test: Testing adjacency matrix with shape (n_samples, n_samples).
152+ Defaults to None.
153+ hidden_dims: List of hidden layer dimensions for neural networks.
154+ Defaults to None.
155+ epochs: Number of training epochs for iterative models. Defaults to 4000.
156+ lr: Learning rate for gradient-based optimization. Defaults to 1e-4.
157+ kappa_gcn_layers: Number of layers in GCN models. Defaults to 1.
158+ lp_train_idx: Training indices for link prediction with shape (n_samples,).
159+ Defaults to None.
160+ lp_test_idx: Testing indices for link prediction with shape (n_samples,).
161+ Defaults to None.
135162
136163 Returns:
137- Dict[str, float]: Dictionary of model names and their corresponding evaluation scores.
138-
164+ Dictionary mapping model names to their corresponding evaluation scores.
139165 """
140166 if score is None :
141167 score = ["accuracy" , "f1-micro" , "f1-macro" ]
0 commit comments