ottenbreit-data-science
diff --git a/‎.gitignore‎
Lines changed: 5 additions & 1 deletion b/‎.gitignore‎
Lines changed: 5 additions & 1 deletion
diff --git a/‎API_REFERENCE_FOR_APLR_TUNER.md‎
Lines changed: 76 additions & 0 deletions b/‎API_REFERENCE_FOR_APLR_TUNER.md‎
Lines changed: 76 additions & 0 deletions
diff --git a/‎API_REFERENCE_FOR_CLASSIFICATION.md‎
Lines changed: 7 additions & 7 deletions b/‎API_REFERENCE_FOR_CLASSIFICATION.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎API_REFERENCE_FOR_REGRESSION.md‎
Lines changed: 7 additions & 7 deletions b/‎API_REFERENCE_FOR_REGRESSION.md‎
Lines changed: 7 additions & 7 deletions
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
@@ -6,4 +6,8 @@ aplr/data
 build/
 __pycache__/
 dist/
-aplr.egg-info/
+aplr.egg-info/
+*.db
+python/*.xlsx
+python/*.zip
+python/results_min_obs_in_split/
@@ -0,0 +1,76 @@
+# APLRTuner
+
+## class aplr.APLRTuner(parameters: Union[Dict[str, List[float]], List[Dict[str, List[float]]]] = {"max_interaction_level": [0, 1], "min_observations_in_split": [4, 10, 20, 100, 500, 1000]}, is_regressor: bool = True)
+
+### Constructor parameters
+
+#### parameters (default = {"max_interaction_level": [0, 1], "min_observations_in_split": [4, 10, 20, 100, 500, 1000]})
+The parameters that you wish to tune.
+
+#### is_regressor (default = True)
+Whether you want to use APLRRegressor (True) or APLRClassifier (False).
+
+
+## Method: fit(X: FloatMatrix, y: FloatVector, **kwargs)
+
+***This method tunes the model to data.***
+
+### Parameters
+
+#### X
+A numpy matrix with predictor values.
+
+#### y
+A numpy vector with response values.
+
+#### kwargs
+Optional parameters sent to the fit methods in the underlying APLRRegressor or APLRClassifier models.
+
+
+## Method: predict(X: FloatMatrix, **kwargs)
+
+***Returns the predictions of the best tuned model as a numpy array if regression or as a list of strings if classification.***
+
+### Parameters
+
+#### X
+A numpy matrix with predictor values.
+
+#### kwargs
+Optional parameters sent to the predict method in the best tuned model.
+
+
+## Method: predict_class_probabilities(X: FloatMatrix, **kwargs)
+
+***This method returns predicted class probabilities of the best tuned model as a numpy matrix.***
+
+### Parameters
+
+#### X
+A numpy matrix with predictor values.
+
+#### kwargs
+Optional parameters sent to the predict_class_probabilities method in the best tuned model.
+
+
+## Method: predict_proba(X: FloatMatrix, **kwargs)
+
+***This method returns predicted class probabilities of the best tuned model as a numpy matrix. Similar to the predict_class_probabilities method but the name predict_proba is compatible with scikit-learn.***
+
+### Parameters
+
+#### X
+A numpy matrix with predictor values.
+
+#### kwargs
+Optional parameters sent to the predict_class_probabilities method in the best tuned model.
+
+
+## Method: get_best_estimator()
+
+***Returns the best tuned model. This is an APLRRegressor or APLRClassifier object.***
+
+
+## Method: get_cv_results()
+
+***Returns the cv results from the tuning as a list of dictionaries, List[Dict[str, float]].***
@@ -1,14 +1,14 @@
 # APLRClassifier
 
-## class aplr.APLRClassifier(m:int = 20000, v:float = 0.1, random_state:int = 0, n_jobs:int = 0, cv_folds:int = 5, bins:int = 300, verbosity:int = 0, max_interaction_level:int = 1, max_interactions:int = 100000, min_observations_in_split:int = 20, ineligible_boosting_steps_added:int = 10, max_eligible_terms:int = 5, boosting_steps_before_interactions_are_allowed: int = 0, monotonic_constraints_ignore_interactions: bool = False, early_stopping_rounds: int = 500, num_first_steps_with_linear_effects_only: int = 0, penalty_for_non_linearity: float = 0.0, penalty_for_interactions: float = 0.0, max_terms: int = 0)
+## class aplr.APLRClassifier(m:int = 20000, v:float = 0.5, random_state:int = 0, n_jobs:int = 0, cv_folds:int = 5, bins:int = 300, verbosity:int = 0, max_interaction_level:int = 1, max_interactions:int = 100000, min_observations_in_split:int = 4, ineligible_boosting_steps_added:int = 15, max_eligible_terms:int = 7, boosting_steps_before_interactions_are_allowed: int = 0, monotonic_constraints_ignore_interactions: bool = False, early_stopping_rounds: int = 500, num_first_steps_with_linear_effects_only: int = 0, penalty_for_non_linearity: float = 0.0, penalty_for_interactions: float = 0.0, max_terms: int = 0)
 
 ### Constructor parameters
 
 #### m (default = 20000)
 The maximum number of boosting steps. If validation error does not flatten out at the end of the ***m***th boosting step, then try increasing it (or alternatively increase the learning rate).
 
-#### v (default = 0.1)
-The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required. However, empirical evidence suggests that ***v <= 0.1*** gives better results. If the algorithm learns too fast (requires few boosting steps to converge) then try lowering the learning rate. Computational costs can be reduced by increasing the learning rate while simultaneously decreasing ***m***, potentially at the expense of predictiveness.
+#### v (default = 0.5)
+The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required, reducing computational costs potentially at the expense of predictiveness. Empirical evidence suggests that ***v <= 0.5*** gives good results for APLR.
 
 #### random_state (default = 0)
 Used to randomly split training observations into cv_folds if ***cv_observations*** is not specified when fitting.
@@ -31,13 +31,13 @@ Specifies the maximum allowed depth of interaction terms. ***0*** means that int
 #### max_interactions (default = 100000)
 The maximum number of interactions allowed in each underlying model. A lower value may be used to reduce computational time or to increase interpretability.
 
-#### min_observations_in_split (default = 20)
+#### min_observations_in_split (default = 4)
 The minimum effective number of observations that a term in the model must rely on. This hyperparameter should be tuned. Larger values are more appropriate for larger datasets. Larger values result in more robust models (lower variance), potentially at the expense of increased bias.
 
-#### ineligible_boosting_steps_added (default = 10)
+#### ineligible_boosting_steps_added (default = 15)
 Controls how many boosting steps a term that becomes ineligible has to remain ineligible. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
 
-#### max_eligible_terms (default = 5)
+#### max_eligible_terms (default = 7)
 Limits 1) the number of terms already in the model that can be considered as interaction partners in a boosting step and 2) how many terms remain eligible in the next boosting step. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
 
 #### boosting_steps_before_interactions_are_allowed (default = 0)
@@ -93,7 +93,7 @@ An optional list of integers specifying monotonic constraints on model terms. Fo
 An optional list containing lists of integers. Specifies interaction constraints on model terms. For example, interaction_constraints = [[0,1], [1,2,3]] means that 1) the first and second predictors may interact with each other, and that 2) the second, third and fourth predictors may interact with each other. There are no interaction constraints on predictors not mentioned in interaction_constraints.
 
 #### predictor_learning_rates
-An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
+An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1, 0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
 
 #### predictor_penalties_for_non_linearity
 An optional list of floats specifying penalties for non-linearity for each predictor. If provided then this supercedes ***penalty_for_non_linearity***. For example, if there are two predictors in ***X***, then predictor_penalties_for_non_linearity = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.2.
 
@@ -1,14 +1,14 @@
 # APLRRegressor
 
-## class aplr.APLRRegressor(m:int = 20000, v:float = 0.1, random_state:int = 0, loss_function:str = "mse", link_function:str = "identity", n_jobs:int = 0, cv_folds:int = 5, bins:int = 300, max_interaction_level:int = 1, max_interactions:int = 100000, min_observations_in_split:int = 20, ineligible_boosting_steps_added:int = 10, max_eligible_terms:int = 5, verbosity:int = 0, dispersion_parameter:float = 1.5, validation_tuning_metric:str = "default", quantile:float = 0.5, calculate_custom_validation_error_function:Optional[Callable[[FloatVector, FloatVector, FloatVector, FloatVector, FloatMatrix], float]] = None, calculate_custom_loss_function:Optional[Callable[[FloatVector, FloatVector, FloatVector, FloatVector, FloatMatrix], float]] = None, calculate_custom_negative_gradient_function:Optional[Callable[[FloatVector, FloatVector, FloatVector, FloatMatrix],FloatVector]] = None, calculate_custom_transform_linear_predictor_to_predictions_function:Optional[Callable[[FloatVector], FloatVector]] = None, calculate_custom_differentiate_predictions_wrt_linear_predictor_function:Optional[Callable[[FloatVector], FloatVector]] = None, boosting_steps_before_interactions_are_allowed:int = 0, monotonic_constraints_ignore_interactions:bool = False, group_mse_by_prediction_bins:int = 10, group_mse_cycle_min_obs_in_bin:int = 30, early_stopping_rounds:int = 500, num_first_steps_with_linear_effects_only:int = 0, penalty_for_non_linearity:float = 0.0, penalty_for_interactions:float = 0.0, max_terms:int = 0)
+## class aplr.APLRRegressor(m:int = 20000, v:float = 0.5, random_state:int = 0, loss_function:str = "mse", link_function:str = "identity", n_jobs:int = 0, cv_folds:int = 5, bins:int = 300, max_interaction_level:int = 1, max_interactions:int = 100000, min_observations_in_split:int = 4, ineligible_boosting_steps_added:int = 15, max_eligible_terms:int = 7, verbosity:int = 0, dispersion_parameter:float = 1.5, validation_tuning_metric:str = "default", quantile:float = 0.5, calculate_custom_validation_error_function:Optional[Callable[[FloatVector, FloatVector, FloatVector, FloatVector, FloatMatrix], float]] = None, calculate_custom_loss_function:Optional[Callable[[FloatVector, FloatVector, FloatVector, FloatVector, FloatMatrix], float]] = None, calculate_custom_negative_gradient_function:Optional[Callable[[FloatVector, FloatVector, FloatVector, FloatMatrix],FloatVector]] = None, calculate_custom_transform_linear_predictor_to_predictions_function:Optional[Callable[[FloatVector], FloatVector]] = None, calculate_custom_differentiate_predictions_wrt_linear_predictor_function:Optional[Callable[[FloatVector], FloatVector]] = None, boosting_steps_before_interactions_are_allowed:int = 0, monotonic_constraints_ignore_interactions:bool = False, group_mse_by_prediction_bins:int = 10, group_mse_cycle_min_obs_in_bin:int = 30, early_stopping_rounds:int = 500, num_first_steps_with_linear_effects_only:int = 0, penalty_for_non_linearity:float = 0.0, penalty_for_interactions:float = 0.0, max_terms:int = 0)
 
 ### Constructor parameters
 
 #### m (default = 20000)
 The maximum number of boosting steps. If validation error does not flatten out at the end of the ***m***th boosting step, then try increasing it (or alternatively increase the learning rate).
 
-#### v (default = 0.1)
-The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required. However, empirical evidence suggests that ***v <= 0.1*** gives better results. If the algorithm learns too fast (requires few boosting steps to converge) then try lowering the learning rate. Computational costs can be reduced by increasing the learning rate while simultaneously decreasing ***m***, potentially at the expense of predictiveness.
+#### v (default = 0.5)
+The learning rate. Must be greater than zero and not more than one. The higher the faster the algorithm learns and the lower ***m*** is required, reducing computational costs potentially at the expense of predictiveness. Empirical evidence suggests that ***v <= 0.5*** gives good results for APLR.
 
 #### random_state (default = 0)
 Used to randomly split training observations into cv_folds if ***cv_observations*** is not specified when fitting.
@@ -34,13 +34,13 @@ Specifies the maximum allowed depth of interaction terms. ***0*** means that int
 #### max_interactions (default = 100000)
 The maximum number of interactions allowed in each underlying model. A lower value may be used to reduce computational time or to increase interpretability.
 
-#### min_observations_in_split (default = 20)
+#### min_observations_in_split (default = 4)
 The minimum effective number of observations that a term in the model must rely on. This hyperparameter should be tuned. Larger values are more appropriate for larger datasets. Larger values result in more robust models (lower variance), potentially at the expense of increased bias.
 
-#### ineligible_boosting_steps_added (default = 10)
+#### ineligible_boosting_steps_added (default = 15)
 Controls how many boosting steps a term that becomes ineligible has to remain ineligible. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
 
-#### max_eligible_terms (default = 5)
+#### max_eligible_terms (default = 7)
 Limits 1) the number of terms already in the model that can be considered as interaction partners in a boosting step and 2) how many terms remain eligible in the next boosting step. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
 
 #### verbosity (default = 0)
@@ -167,7 +167,7 @@ An optional list containing lists of integers. Specifies interaction constraints
 An optional numpy matrix with other data. This is used in custom loss, negative gradient and validation error functions.
 
 #### predictor_learning_rates
-An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
+An optional list of floats specifying learning rates for each predictor. If provided then this supercedes ***v***. For example, if there are two predictors in ***X***, then predictor_learning_rates = [0.1, 0.2] means that all terms using the first predictor in ***X*** as a main effect will have a learning rate of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a learning rate of 0.2.
 
 #### predictor_penalties_for_non_linearity
 An optional list of floats specifying penalties for non-linearity for each predictor. If provided then this supercedes ***penalty_for_non_linearity***. For example, if there are two predictors in ***X***, then predictor_penalties_for_non_linearity = [0.1,0.2] means that all terms using the first predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.1 and that all terms using the second predictor in ***X*** as a main effect will have a penalty for non-linearity of 0.2.
 
@@ -2,7 +2,7 @@
 Automatic Piecewise Linear Regression.
 
 # About
-Build predictive and interpretable parametric regression or classification machine learning models in Python based on the Automatic Piecewise Linear Regression (APLR) methodology developed by Mathias von Ottenbreit. APLR is often able to compete with tree-based methods on predictiveness, but unlike tree-based methods APLR is interpretable. Please see the [documentation](https://github.com/ottenbreit-data-science/aplr/tree/main/documentation) for more information. Links to published article: [https://link.springer.com/article/10.1007/s00180-024-01475-4](https://link.springer.com/article/10.1007/s00180-024-01475-4) and [https://rdcu.be/dz7bF](https://rdcu.be/dz7bF). More functionality has been added to APLR since the article was published.
+Build predictive and interpretable parametric regression or classification machine learning models in Python based on the Automatic Piecewise Linear Regression (APLR) methodology developed by Mathias von Ottenbreit. APLR is often able to compete with tree-based methods on predictiveness, but unlike tree-based methods APLR is interpretable. Furthermore, APLR produces smoother predictions than tree-based methods. Please see the [documentation](https://github.com/ottenbreit-data-science/aplr/tree/main/documentation) for more information. Links to published article: [https://link.springer.com/article/10.1007/s00180-024-01475-4](https://link.springer.com/article/10.1007/s00180-024-01475-4) and [https://rdcu.be/dz7bF](https://rdcu.be/dz7bF). More functionality has been added to APLR since the article was published.
 
 # How to install
 ***pip install aplr***