The table below shows the capabilities currently available in SML. In general, the following features are rarely (partly) supported in SML:
- Early stop for training or iterating algorithm: We do not want to reveal any intermediate information (For some algorithms, we indeed reveal some bits to accelerate).
- Manual set of random seed: SPU can't handle randomness of float properly, so if random value (matrix) is needed, user should pass it as a parameter (such as
rsvd,NMF), or you can compute it in plaintext environment. - Data inspection like counting the number of label, re-transforming the data or label won't be done. (So we may assume a "fixed" format for input or just tell the number of classes as a parameter)
- single-sample SGD not implemented for the latency consideration, MiniBatch-SGD (which we just call it
sgdin sml) will replace it. - Jax's Ops like
eigh,svdcan't run in SPU directly:svdimplemented now is expensive and can't handle matrix that is not column full-rank matrix.
| Algorithm | Category | Supported Features | Notes |
|---|---|---|---|
| KMEANS | cluster | 1. init=random, k-means++2. algorithm= lloyd only3. support n_init |
1. Only run algo once for efficiency when n_init=1 2. Support multiple initializations with best result selection |
| GaussianProcessClassifier | gaussian_process | 1. RBF kernel only2. OVR for multi-class task only3. Laplace approximation |
1. Current implementations will NOT optimize the parameters of kernel during training 2. Support sigmoid likelihood function |
| PCA | decomposition | 1. power_iteration method2. serial_jacobi_iteration method3. rsvd method |
1. If method=power_iteration, then cov matrix will be computed first2. rsvd is very unstable under fixedpoint setting even in FM128, so only small data is supported3. Support various parameter configurations for each method |
| NMF | decomposition | 1. init=random2. solver= mu3. beta_loss= frobenius only4. Support L1/L2 regularization |
1. Support alpha_W and alpha_H regularization parameters 2. Support transform and inverse_transform methods |
| T-SNE | decomposition | 1. init=pca or random2. Support various PCA method configurations 3. Support early exaggeration |
1. Comprehensive parameter control including learning rate, momentum 2. Support custom Y_init for random initialization |
| Adaboost | ensemble | 1. Decision tree model supported only 2. SAMME algorithm only3. Support sample weights |
1. No early stopping is implemented 2. Support multiple estimators with weight calculation |
| Random Forest | ensemble | 1. Support gini criterion2. Support best splitter3. Support bootstrap sampling 4. Support max_features control |
1. No early stopping is implemented 2. Support feature subsampling and sample weight |
| Feature Selection | feature_selection | 1. chi2 univariate selection2. f_classif (ANOVA F-test) supported |
1. Support p-value computation with configurable parameters 2. Support different numerical stability controls |
| Logistic Regression | linear_model | 1. sgd solver only2. All regularization methods (l1, l2, elasticnet, None) 3. Support binary and OVR multi-class 4. Support early stopping |
1. sigmoid will be evaluated approximately2. Support various sigmoid approximation methods 3. Support equal class weights only |
| Perceptron | linear_model | 1. All regularization methods (l1, l2, elasticnet, None) 2. Patience-based early stop 3. Support sample batching |
1. Early stop will not cut down the training time, it just forces the update of parameters stop 2. Support various batch sizes |
| Ridge | linear_model | 1. svd and cholesky solver2. Support bias fitting control |
1. Support preprocessing and bias handling 2. Efficient matrix decomposition methods |
| SGDClassifier | linear_model | 1. Linear regression and logistic regression 2. L2 regularization supported only |
1. sigmoid will be evaluated approximately2. Support different batch sizes |
| GLM Regressors | linear_model | 1. PoissonRegressor, GammaRegressor, TweedieRegressor 2. newton-cholesky and lbfgs solvers3. Support L2 regularization |
1. Support different link functions (log, identity) 2. Support sample weights 3. Tweedie supports configurable power parameter |
| Quantile Regression | linear_model | 1. Support different quantiles (0-1) 2. L1 regularization 3. Linear programming based solver |
1. Support sample weights 2. Efficient simplex algorithm implementation |
| SVC | svm | 1. Only support RBF kernel2. Only support SMO solver 3. Support C regularization |
|
| GaussianNB | naive_bayes | 1. Not support manual set of priors 2. Support online learning (partial_fit) |
1. Support incremental learning with proper variance updating 2. Efficient vectorized computation |
| KNN | neighbors | 1. brute algorithm only2. uniform and distance weights supported3. Support custom metrics |
1. KD-tree or Ball-tree can't improve the efficiency in MPC setting 2. Support configurable n_neighbors and metric parameters |
| DecisionTreeClassifier | tree | 1. Implemented based on GTree 2. Support binary features (i.e. {0, 1}) and multi-class labels 3. Support gini criterion and best splitter4. Support sample weights |
1. Memory and time complexity is around O(n_samples * n_labels * n_features * 2 ** max_depth)2. Efficient oblivious array access implementation |
| Preprocessing | preprocessing | 1. LabelBinarizer, Binarizer, Normalizer 2. RobustScaler, MinMaxScaler, MaxAbsScaler 3. KBinsDiscretizer with multiple strategies 4. OneHotEncoder, QuantileTransformer |
1. Support various binning strategies (uniform, quantile, kmeans) 2. Support robust scaling with outlier handling 3. Comprehensive categorical encoding support |
| Manifold | manifold | 1. ISOMAP with k-NN graph construction 2. Spectral Embedding (SE) 3. Support custom distance metrics |
1. Efficient Floyd-Warshall and Dijkstra implementations 2. Jacobi eigenvalue decomposition 3. MDS-based dimensionality reduction |
| Classification | metrics | 1. roc_auc_score (binary only)2. accuracy_score, precision_score, recall_score, f1_score 3. Support various averaging methods 4. precision_recall_curve, average_precision_score |
1. Support binary and multi-class scenarios 2. Comprehensive evaluation metrics with configurable parameters |
| Regression | metrics | 1. mean_squared_error, r2_score 2. explained_variance_score 3. GLM-specific metrics (Poisson, Gamma, Tweedie deviance) |
1. Support sample weights 2. D2 score for GLM models 3. Multiple output support |