10.7.2

mathias-von-ottenbreit · mathias-von-ottenbreit · commit 0f4270992d27 · 2024-11-02T09:20:57.000+01:00
diff --git a/API_REFERENCE_FOR_CLASSIFICATION.md b/API_REFERENCE_FOR_CLASSIFICATION.md
@@ -41,7 +41,7 @@ Controls how many boosting steps a term that becomes ineligible has to remain in
 Limits 1) the number of terms already in the model that can be considered as interaction partners in a boosting step and 2) how many terms remain eligible in the next boosting step. The default value works well according to empirical results. This hyperparameter is intended for reducing computational costs.
 
 #### boosting_steps_before_interactions_are_allowed (default = 0)
-Specifies how many boosting steps to wait before searching for interactions. If for example 800, then the algorithm will be forced to only fit main effects in the first 800 boosting steps, after which it is allowed to search for interactions (given that other hyperparameters that control interactions also allow this). The motivation for fitting main effects first may be 1) to get a cleaner looking model that puts more emphasis on main effects and 2) to speed up the algorithm since looking for interactions is computationally more demanding.
+Specifies how many boosting steps to wait before searching for interactions. If for example 800, then the algorithm will be forced to only fit main effects in the first 800 boosting steps, after which it is allowed to search for interactions (given that other hyperparameters that control interactions also allow this). The motivation for fitting main effects first may be 1) to get a cleaner looking model that puts more emphasis on main effects and 2) to speed up the algorithm since looking for interactions is computationally more demanding. Please note that when greater than zero then the algorithm chooses the model from the boosting step with the lowest validation error before proceeding to interaction terms. The latter prevents overfitting.
 
 #### monotonic_constraints_ignore_interactions (default = False)
 See ***monotonic_constraints*** in the ***fit*** method.
@@ -50,7 +50,7 @@ See ***monotonic_constraints*** in the ***fit*** method.
 If validation loss does not improve during the last ***early_stopping_rounds*** boosting steps then boosting is aborted. The point with this constructor parameter is to speed up the training and make it easier to select a high ***m***.
 
 #### num_first_steps_with_linear_effects_only (default = 0)
-Specifies the number of initial boosting steps that are reserved only for linear effects. 0 means that non-linear effects are allowed from the first boosting step. Reasons for setting this parameter to a higher value than 0 could be to 1) build a more interpretable model with more emphasis on linear effects or 2) build a linear only model by setting ***num_first_steps_with_linear_effects_only*** to no less than ***m***.
+Specifies the number of initial boosting steps that are reserved only for linear effects. 0 means that non-linear effects are allowed from the first boosting step. Reasons for setting this parameter to a higher value than 0 could be to 1) build a more interpretable model with more emphasis on linear effects or 2) build a linear only model by setting ***num_first_steps_with_linear_effects_only*** to no less than ***m***. Please note that when greater than zero then the algorithm chooses the model from the boosting step with the lowest validation error before proceeding to non-linear effects or interactions. The latter prevents overfitting.
 
 #### penalty_for_non_linearity (default = 0.0)
 Specifies a penalty in the range [0.0, 1.0] on terms that are not linear effects. A higher value increases model interpretability but can hurt predictiveness. Values outside of the [0.0, 1.0] range are rounded to the nearest boundary within the range.
diff --git a/API_REFERENCE_FOR_REGRESSION.md b/API_REFERENCE_FOR_REGRESSION.md
@@ -103,7 +103,7 @@ def calculate_custom_differentiate_predictions_wrt_linear_predictor(linear_predi
 ```
 
 #### boosting_steps_before_interactions_are_allowed (default = 0)
-Specifies how many boosting steps to wait before searching for interactions. If for example 800, then the algorithm will be forced to only fit main effects in the first 800 boosting steps, after which it is allowed to search for interactions (given that other hyperparameters that control interactions also allow this). The motivation for fitting main effects first may be 1) to get a cleaner looking model that puts more emphasis on main effects and 2) to speed up the algorithm since looking for interactions is computationally more demanding.
+Specifies how many boosting steps to wait before searching for interactions. If for example 800, then the algorithm will be forced to only fit main effects in the first 800 boosting steps, after which it is allowed to search for interactions (given that other hyperparameters that control interactions also allow this). The motivation for fitting main effects first may be 1) to get a cleaner looking model that puts more emphasis on main effects and 2) to speed up the algorithm since looking for interactions is computationally more demanding. Please note that when greater than zero then the algorithm chooses the model from the boosting step with the lowest validation error before proceeding to interaction terms. The latter prevents overfitting.
 
 #### monotonic_constraints_ignore_interactions (default = False)
 See ***monotonic_constraints*** in the ***fit*** method.
@@ -118,7 +118,7 @@ When ***loss_function*** equals ***group_mse_cycle*** then ***group_mse_cycle_mi
 If validation loss does not improve during the last ***early_stopping_rounds*** boosting steps then boosting is aborted. The point with this constructor parameter is to speed up the training and make it easier to select a high ***m***.
 
 #### num_first_steps_with_linear_effects_only (default = 0)
-Specifies the number of initial boosting steps that are reserved only for linear effects. 0 means that non-linear effects are allowed from the first boosting step. Reasons for setting this parameter to a higher value than 0 could be to 1) build a more interpretable model with more emphasis on linear effects or 2) build a linear only model by setting ***num_first_steps_with_linear_effects_only*** to no less than ***m***. 
+Specifies the number of initial boosting steps that are reserved only for linear effects. 0 means that non-linear effects are allowed from the first boosting step. Reasons for setting this parameter to a higher value than 0 could be to 1) build a more interpretable model with more emphasis on linear effects or 2) build a linear only model by setting ***num_first_steps_with_linear_effects_only*** to no less than ***m***. Please note that when greater than zero then the algorithm chooses the model from the boosting step with the lowest validation error before proceeding to non-linear effects or interactions. The latter prevents overfitting.
 
 #### penalty_for_non_linearity (default = 0.0)
 Specifies a penalty in the range [0.0, 1.0] on terms that are not linear effects. A higher value increases model interpretability but can hurt predictiveness. Values outside of the [0.0, 1.0] range are rounded to the nearest boundary within the range.
diff --git a/cpp/APLRRegressor.h b/cpp/APLRRegressor.h
@@ -76,10 +76,12 @@ class APLRRegressor
     double best_validation_error_so_far;
     size_t best_m_so_far;
     bool linear_effects_only_in_this_boosting_step;
+    bool non_linear_effects_allowed_in_this_boosting_step;
     bool max_terms_reached;
     bool round_robin_update_of_existing_terms;
     size_t term_to_update_in_this_boosting_step;
     size_t cores_to_use;
+    bool stopped_early;
 
     void validate_input_to_fit(const MatrixXd &X, const VectorXd &y, const VectorXd &sample_weight, const std::vector<std::string> &X_names,
                                const MatrixXi &cv_observations, const std::vector<size_t> &prioritized_predictors_indexes,
@@ -1171,11 +1173,26 @@ VectorXd APLRRegressor::differentiate_predictions_wrt_linear_predictor()
 
 void APLRRegressor::execute_boosting_steps(Eigen::Index fold_index)
 {
+    stopped_early = false;
     abort_boosting = false;
     for (size_t boosting_step = 0; boosting_step < m; ++boosting_step)
     {
         linear_effects_only_in_this_boosting_step = num_first_steps_with_linear_effects_only > boosting_step;
+        non_linear_effects_allowed_in_this_boosting_step = boosting_steps_before_interactions_are_allowed > boosting_step && !linear_effects_only_in_this_boosting_step;
+        bool last_linear_effects_only_step{linear_effects_only_in_this_boosting_step && boosting_step == num_first_steps_with_linear_effects_only - 1};
+        bool last_step_before_interactions{non_linear_effects_allowed_in_this_boosting_step && boosting_step == boosting_steps_before_interactions_are_allowed - 1};
         execute_boosting_step(boosting_step, fold_index);
+        if (stopped_early)
+        {
+            if (linear_effects_only_in_this_boosting_step)
+                boosting_step = std::min(num_first_steps_with_linear_effects_only - 1, m - 1);
+            else if (non_linear_effects_allowed_in_this_boosting_step)
+                boosting_step = std::min(boosting_steps_before_interactions_are_allowed - 1, m - 1);
+            best_m_so_far = boosting_step;
+            stopped_early = false;
+        }
+        else if ((last_linear_effects_only_step || last_step_before_interactions) && boosting_step + 1 < m)
+            find_optimal_m_and_update_model_accordingly();
         if (abort_boosting)
             break;
         if (loss_function == "group_mse_cycle")
@@ -1823,9 +1840,17 @@ void APLRRegressor::abort_boosting_when_no_validation_error_improvement_in_the_l
         bool no_improvement_for_too_long{boosting_step > best_m_so_far + early_stopping_rounds};
         if (no_improvement_for_too_long)
         {
-            abort_boosting = true;
-            if (verbosity >= 1)
-                std::cout << "Aborting boosting because of no validation error improvement in the last " << std::to_string(early_stopping_rounds) << " steps.\n";
+            if (linear_effects_only_in_this_boosting_step || non_linear_effects_allowed_in_this_boosting_step)
+            {
+                find_optimal_m_and_update_model_accordingly();
+                stopped_early = true;
+            }
+            else
+            {
+                abort_boosting = true;
+                if (verbosity >= 1)
+                    std::cout << "Aborting boosting because of no validation error improvement in the last " << std::to_string(early_stopping_rounds) << " steps.\n";
+            }
         }
     }
 }
diff --git a/cpp/tests.cpp b/cpp/tests.cpp
@@ -236,8 +236,62 @@ class Tests
         model.ineligible_boosting_steps_added = 10;
         model.max_eligible_terms = 5;
         model.dispersion_parameter = 1.0;
-        model.boosting_steps_before_interactions_are_allowed = 60;
+        model.boosting_steps_before_interactions_are_allowed = 90;
+        model.num_first_steps_with_linear_effects_only = 80;
+
+        // Data
+        MatrixXd X_train{load_csv_into_eigen_matrix<MatrixXd>("data/X_train.csv")};
+        MatrixXd X_test{load_csv_into_eigen_matrix<MatrixXd>("data/X_test.csv")};
+        VectorXd y_train{load_csv_into_eigen_matrix<MatrixXd>("data/y_train.csv")};
+        VectorXd y_test{load_csv_into_eigen_matrix<MatrixXd>("data/y_test.csv")};
+
+        VectorXd sample_weight{VectorXd::Constant(y_train.size(), 1.0)};
+
+        MatrixXi cv_observations = MatrixXi::Constant(y_train.rows(), 2, 1);
+        cv_observations.col(0)[273] = -1;
+        cv_observations.col(0)[272] = -1;
+        cv_observations.col(0)[271] = -1;
+        cv_observations.col(0)[270] = -1;
+        cv_observations.col(0)[269] = -1;
+        cv_observations.col(0)[268] = -1;
+        cv_observations.col(0)[267] = -1;
+        cv_observations.col(0)[266] = -1;
+        cv_observations.col(1) = -cv_observations.col(0);
+
+        // Fitting
+        // model.fit(X_train,y_train);
+        model.fit(X_train, y_train, sample_weight);
+        // model.fit(X_train, y_train, sample_weight, {}, cv_observations);
+        std::cout << "feature importance\n"
+                  << model.feature_importance << "\n\n";
+
+        VectorXd predictions{model.predict(X_test)};
+
+        // Saving results
+        save_as_csv_file("data/output.csv", predictions);
+
+        std::cout << predictions.mean() << "\n\n";
+        tests.push_back(is_approximately_equal(predictions.mean(), 17.380763842227257));
+    }
+
+    void test_aplrregressor_cauchy_linear_effects_only_first_2()
+    {
+        // Model
+        APLRRegressor model{APLRRegressor()};
+        model.m = 100;
+        model.v = 1.0;
+        model.bins = 200;
+        model.n_jobs = 1;
+        model.loss_function = "cauchy";
+        model.verbosity = 3;
+        model.max_interaction_level = 100;
+        model.min_observations_in_split = 10;
+        model.ineligible_boosting_steps_added = 10;
+        model.max_eligible_terms = 5;
+        model.dispersion_parameter = 1.0;
+        model.boosting_steps_before_interactions_are_allowed = 90;
         model.num_first_steps_with_linear_effects_only = 80;
+        model.early_stopping_rounds = 1;
 
         // Data
         MatrixXd X_train{load_csv_into_eigen_matrix<MatrixXd>("data/X_train.csv")};
@@ -271,7 +325,7 @@ class Tests
         save_as_csv_file("data/output.csv", predictions);
 
         std::cout << predictions.mean() << "\n\n";
-        tests.push_back(is_approximately_equal(predictions.mean(), 17.965154984786622));
+        tests.push_back(is_approximately_equal(predictions.mean(), 17.886569073729863));
     }
 
     void test_aplrregressor_cauchy_group_mse_validation()
@@ -2354,6 +2408,7 @@ int main()
     tests.test_aplrregressor_cauchy_predictor_specific_penalties_and_learning_rates();
     tests.test_aplrregressor_cauchy_penalties();
     tests.test_aplrregressor_cauchy_linear_effects_only_first();
+    tests.test_aplrregressor_cauchy_linear_effects_only_first_2();
     tests.test_aplrregressor_cauchy_group_mse_validation();
     tests.test_aplrregressor_cauchy_group_mse_by_prediction_validation();
     tests.test_aplrregressor_cauchy();
diff --git a/documentation/APLR 10.7.2.pdf b/documentation/APLR 10.7.2.pdf
diff --git a/setup.py b/setup.py
@@ -27,7 +27,7 @@
 
 setuptools.setup(
     name="aplr",
-    version="10.7.1",
+    version="10.7.2",
     description="Automatic Piecewise Linear Regression",
     ext_modules=[sfc_module],
     author="Mathias von Ottenbreit",