Skip to content

Latest commit

 

History

History
35 lines (32 loc) · 12.3 KB

File metadata and controls

35 lines (32 loc) · 12.3 KB

Algorithm Support Lists

The table below shows the capabilities currently available in SML. In general, the following features are rarely (partly) supported in SML:

  • Early stop for training or iterating algorithm: We do not want to reveal any intermediate information (For some algorithms, we indeed reveal some bits to accelerate).
  • Manual set of random seed: SPU can't handle randomness of float properly, so if random value (matrix) is needed, user should pass it as a parameter (such as rsvd, NMF), or you can compute it in plaintext environment.
  • Data inspection like counting the number of label, re-transforming the data or label won't be done. (So we may assume a "fixed" format for input or just tell the number of classes as a parameter)
  • single-sample SGD not implemented for the latency consideration, MiniBatch-SGD (which we just call it sgd in sml) will replace it.
  • Jax's Ops like eigh, svd can't run in SPU directly: svd implemented now is expensive and can't handle matrix that is not column full-rank matrix.
Algorithm Category Supported Features Notes
KMEANS cluster 1. init=random, k-means++
2. algorithm=lloyd only
3. support n_init
1. Only run algo once for efficiency when n_init=1
2. Support multiple initializations with best result selection
GaussianProcessClassifier gaussian_process 1. RBF kernel only
2. OVR for multi-class task only
3. Laplace approximation
1. Current implementations will NOT optimize the parameters of kernel during training
2. Support sigmoid likelihood function
PCA decomposition 1. power_iteration method
2. serial_jacobi_iteration method
3. rsvd method
1. If method=power_iteration, then cov matrix will be computed first
2. rsvd is very unstable under fixedpoint setting even in FM128, so only small data is supported
3. Support various parameter configurations for each method
NMF decomposition 1. init=random
2. solver=mu
3. beta_loss=frobenius only
4. Support L1/L2 regularization
1. Support alpha_W and alpha_H regularization parameters
2. Support transform and inverse_transform methods
T-SNE decomposition 1. init=pca or random
2. Support various PCA method configurations
3. Support early exaggeration
1. Comprehensive parameter control including learning rate, momentum
2. Support custom Y_init for random initialization
Adaboost ensemble 1. Decision tree model supported only
2. SAMME algorithm only
3. Support sample weights
1. No early stopping is implemented
2. Support multiple estimators with weight calculation
Random Forest ensemble 1. Support gini criterion
2. Support best splitter
3. Support bootstrap sampling
4. Support max_features control
1. No early stopping is implemented
2. Support feature subsampling and sample weight
Feature Selection feature_selection 1. chi2 univariate selection
2. f_classif (ANOVA F-test) supported
1. Support p-value computation with configurable parameters
2. Support different numerical stability controls
Logistic Regression linear_model 1. sgd solver only
2. All regularization methods (l1, l2, elasticnet, None)
3. Support binary and OVR multi-class
4. Support early stopping
1. sigmoid will be evaluated approximately
2. Support various sigmoid approximation methods
3. Support equal class weights only
Perceptron linear_model 1. All regularization methods (l1, l2, elasticnet, None)
2. Patience-based early stop
3. Support sample batching
1. Early stop will not cut down the training time, it just forces the update of parameters stop
2. Support various batch sizes
Ridge linear_model 1. svd and cholesky solver
2. Support bias fitting control
1. Support preprocessing and bias handling
2. Efficient matrix decomposition methods
SGDClassifier linear_model 1. Linear regression and logistic regression
2. L2 regularization supported only
1. sigmoid will be evaluated approximately
2. Support different batch sizes
GLM Regressors linear_model 1. PoissonRegressor, GammaRegressor, TweedieRegressor
2. newton-cholesky and lbfgs solvers
3. Support L2 regularization
1. Support different link functions (log, identity)
2. Support sample weights
3. Tweedie supports configurable power parameter
Quantile Regression linear_model 1. Support different quantiles (0-1)
2. L1 regularization
3. Linear programming based solver
1. Support sample weights
2. Efficient simplex algorithm implementation
SVC svm 1. Only support RBF kernel
2. Only support SMO solver
3. Support C regularization
GaussianNB naive_bayes 1. Not support manual set of priors
2. Support online learning (partial_fit)
1. Support incremental learning with proper variance updating
2. Efficient vectorized computation
KNN neighbors 1. brute algorithm only
2. uniform and distance weights supported
3. Support custom metrics
1. KD-tree or Ball-tree can't improve the efficiency in MPC setting
2. Support configurable n_neighbors and metric parameters
DecisionTreeClassifier tree 1. Implemented based on GTree
2. Support binary features (i.e. {0, 1}) and multi-class labels
3. Support gini criterion and best splitter
4. Support sample weights
1. Memory and time complexity is around O(n_samples * n_labels * n_features * 2 ** max_depth)
2. Efficient oblivious array access implementation
Preprocessing preprocessing 1. LabelBinarizer, Binarizer, Normalizer
2. RobustScaler, MinMaxScaler, MaxAbsScaler
3. KBinsDiscretizer with multiple strategies
4. OneHotEncoder, QuantileTransformer
1. Support various binning strategies (uniform, quantile, kmeans)
2. Support robust scaling with outlier handling
3. Comprehensive categorical encoding support
Manifold manifold 1. ISOMAP with k-NN graph construction
2. Spectral Embedding (SE)
3. Support custom distance metrics
1. Efficient Floyd-Warshall and Dijkstra implementations
2. Jacobi eigenvalue decomposition
3. MDS-based dimensionality reduction
Classification metrics 1. roc_auc_score (binary only)
2. accuracy_score, precision_score, recall_score, f1_score
3. Support various averaging methods
4. precision_recall_curve, average_precision_score
1. Support binary and multi-class scenarios
2. Comprehensive evaluation metrics with configurable parameters
Regression metrics 1. mean_squared_error, r2_score
2. explained_variance_score
3. GLM-specific metrics (Poisson, Gamma, Tweedie deviance)
1. Support sample weights
2. D2 score for GLM models
3. Multiple output support