DOC fix link in HuberRegressor docstring (scikit-learn#30417)

StefanieSenger · virchan · thomasjpfan · web-flow · commit 1e6a81f322f1 · 2024-12-07T15:20:04.000Z
Co-authored-by: Virgil Chan &lt;virchan.math@gmail.com&gt;
Co-authored-by: Thomas J. Fan &lt;thomasjpfan@gmail.com&gt;
diff --git a/doc/modules/linear_model.rst b/doc/modules/linear_model.rst
@@ -1585,10 +1585,10 @@ better than an ordinary least squares in high dimension.
 Huber Regression
 ----------------
 
-The :class:`HuberRegressor` is different to :class:`Ridge` because it applies a
-linear loss to samples that are classified as outliers.
+The :class:`HuberRegressor` is different from :class:`Ridge` because it applies a
+linear loss to samples that are defined as outliers by the `epsilon` parameter.
 A sample is classified as an inlier if the absolute error of that sample is
-lesser than a certain threshold. It differs from :class:`TheilSenRegressor`
+lesser than the threshold `epsilon`. It differs from :class:`TheilSenRegressor`
 and :class:`RANSACRegressor` because it does not ignore the effect of the outliers
 but gives a lesser weight to them.
 
@@ -1603,13 +1603,13 @@ but gives a lesser weight to them.
 
 .. dropdown:: Mathematical details
 
-  The loss function that :class:`HuberRegressor` minimizes is given by
+  :class:`HuberRegressor` minimizes
 
   .. math::
 
     \min_{w, \sigma} {\sum_{i=1}^n\left(\sigma + H_{\epsilon}\left(\frac{X_{i}w - y_{i}}{\sigma}\right)\sigma\right) + \alpha {||w||_2}^2}
 
-  where
+  where the loss function is given by
 
   .. math::
 
@@ -1624,7 +1624,7 @@ but gives a lesser weight to them.
   .. rubric:: References
 
   * Peter J. Huber, Elvezio M. Ronchetti: Robust Statistics, Concomitant scale
-    estimates, pg 172
+    estimates, p. 172.
 
 The :class:`HuberRegressor` differs from using :class:`SGDRegressor` with loss set to `huber`
 in the following ways.
@@ -1638,10 +1638,10 @@ in the following ways.
   samples while :class:`SGDRegressor` needs a number of passes on the training data to
   produce the same robustness.
 
-Note that this estimator is different from the R implementation of Robust Regression
-(https://stats.oarc.ucla.edu/r/dae/robust-regression/) because the R implementation does a weighted least
-squares implementation with weights given to each sample on the basis of how much the residual is
-greater than a certain threshold.
+Note that this estimator is different from the `R implementation of Robust
+Regression <https://stats.oarc.ucla.edu/r/dae/robust-regression/>`_  because the R
+implementation does a weighted least squares implementation with weights given to each
+sample on the basis of how much the residual is greater than a certain threshold.
 
 .. _quantile_regression:
 
diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst
@@ -2543,7 +2543,7 @@ Here is a small example of usage of the :func:`mean_absolute_error` function::
 Mean squared error
 -------------------
 
-The :func:`mean_squared_error` function computes `mean square
+The :func:`mean_squared_error` function computes `mean squared
 error <https://en.wikipedia.org/wiki/Mean_squared_error>`_, a risk
 metric corresponding to the expected value of the squared (quadratic) error or
 loss.
diff --git a/examples/linear_model/plot_robust_fit.py b/examples/linear_model/plot_robust_fit.py
@@ -5,7 +5,7 @@
 Here a sine function is fit with a polynomial of order 3, for values
 close to zero.
 
-Robust fitting is demoed in different situations:
+Robust fitting is demonstrated in different situations:
 
 - No measurement errors, only modelling errors (fitting a sine with a
   polynomial)
diff --git a/examples/model_selection/plot_roc.py b/examples/model_selection/plot_roc.py
@@ -159,7 +159,7 @@
 # %%
 # In a multi-class classification setup with highly imbalanced classes,
 # micro-averaging is preferable over macro-averaging. In such cases, one can
-# alternatively use a weighted macro-averaging, not demoed here.
+# alternatively use a weighted macro-averaging, not demonstrated here.
 
 display = RocCurveDisplay.from_predictions(
     y_onehot_test.ravel(),
diff --git a/examples/preprocessing/plot_scaling_importance.py b/examples/preprocessing/plot_scaling_importance.py
@@ -12,13 +12,13 @@
 algorithms require features to be normalized, often for different reasons: to
 ease the convergence (such as a non-penalized logistic regression), to create a
 completely different model fit compared to the fit with unscaled data (such as
-KNeighbors models). The latter is demoed on the first part of the present
+KNeighbors models). The latter is demonstrated on the first part of the present
 example.
 
 On the second part of the example we show how Principal Component Analysis (PCA)
 is impacted by normalization of features. To illustrate this, we compare the
 principal components found using :class:`~sklearn.decomposition.PCA` on unscaled
-data with those obatined when using a
+data with those obtained when using a
 :class:`~sklearn.preprocessing.StandardScaler` to scale data first.
 
 In the last part of the example we show the effect of the normalization on the
diff --git a/sklearn/linear_model/_huber.py b/sklearn/linear_model/_huber.py
@@ -132,10 +132,10 @@ class HuberRegressor(LinearModel, RegressorMixin, BaseEstimator):
     ``|(y - Xw - c) / sigma| < epsilon`` and the absolute loss for the samples
     where ``|(y - Xw - c) / sigma| > epsilon``, where the model coefficients
     ``w``, the intercept ``c`` and the scale ``sigma`` are parameters
-    to be optimized. The parameter sigma makes sure that if y is scaled up
-    or down by a certain factor, one does not need to rescale epsilon to
+    to be optimized. The parameter `sigma` makes sure that if `y` is scaled up
+    or down by a certain factor, one does not need to rescale `epsilon` to
     achieve the same robustness. Note that this does not take into account
-    the fact that the different features of X may be of different scales.
+    the fact that the different features of `X` may be of different scales.
 
     The Huber loss function has the advantage of not being heavily influenced
     by the outliers while not completely ignoring their effect.
@@ -219,9 +219,9 @@ class HuberRegressor(LinearModel, RegressorMixin, BaseEstimator):
     References
     ----------
     .. [1] Peter J. Huber, Elvezio M. Ronchetti, Robust Statistics
-           Concomitant scale estimates, pg 172
-    .. [2] Art B. Owen (2006), A robust hybrid of lasso and ridge regression.
-           https://statweb.stanford.edu/~owen/reports/hhu.pdf
+           Concomitant scale estimates, p. 172
+    .. [2] Art B. Owen (2006), `A robust hybrid of lasso and ridge regression.
+           <https://artowen.su.domains/reports/hhu.pdf>`_
 
     Examples
     --------