update doc

statmlben · statmlben · commit 66e495e23f9b · 2024-09-03T13:45:25.000+08:00
diff --git a/doc/source/getting_started.rst b/doc/source/getting_started.rst
@@ -1,43 +1,148 @@
-Getting started
+Getting Started
 ===============
 
 This page provides a starter example to introduce users to the ``rehline`` package and showcase its primary features, facilitating exploration and familiarization.
 
-To proceed, make sure that you have already installed ``rehline``:
+To proceed, ensure that you have already installed ``rehline``:
 
 .. code:: bash
 
-	pip install rehline
+    pip install rehline
 
 --------------------------------
 
-``rehline`` is a generic solver for flexible machine learning Empirical Risk Minimization (ERM), particularly suited for formulations with *non-smooth* objectives.
+``rehline`` is a versatile solver for machine learning problems, particularly effective for Empirical Risk Minimization (ERM) with `non-smooth` objectives. We will use ERM as our starting example to demonstrate that:
 
+.. admonition:: Note
+   :class: tip
 
-Let's start first by generating a toy dataset and splitting it to train and test sets. For that, we will use scikit-learn make_regression
+   With ``rehline``, you can easily transform different `loss functions` and add `constraints` to your ERM with no tears!
+
+Let's begin by generating a toy dataset and splitting it into training and test sets using scikit-learn's `make_regression`.
 
 .. code:: python
 
-    # imports
+    # Import necessary libraries
+    import numpy as np
     from sklearn.datasets import make_regression
     from sklearn.model_selection import train_test_split
+    from sklearn.preprocessing import StandardScaler
+    
+    np.random.seed(1024)
+    # Generate toy data
+    n, d = 1000, 5
+    scaler = StandardScaler()
+    X, y = make_regression(n_samples=n, n_features=d, noise=1.0)
+    # Normalize X and add intercept
+    X = scaler.fit_transform(X)
+    X = np.hstack((X, np.ones((n, 1))))
+    
+    # Split data into training and test sets
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=50)
+
+Quantile Regression
+-------------------
+
+Next, let's use ``rehline`` to fit a quantile regression (QR) at quantile level 0.95 (:math:`\kappa=0.95`).
+
+The ridge-regularized QR solves the following optimization problem:
 
-    # generate toy data
-    X, y = make_regression(n_samples=100, n_features=1000)
+.. math::
 
-    # split data
-    X_train, X_test, y_train, y_test = train_test_split(X, y)
+    \min_{\beta \in \mathbb{R}^{d}} \ C \sum_{i=1}^n  \rho_\kappa ( y_i - x_i^\intercal \beta ) + \frac{1}{2} \| \beta \|^2,
 
-Then let's use ``rehline`` to fit a **quantile regression** at quantile level 0.75.
+where :math:`\rho_\kappa(u) = u \cdot (\kappa - \mathbf{1}(u < 0))` is the `check loss`, :math:`x_i \in \mathbb{R}^d` is a feature vector, and :math:`y_i \in \mathbb{R}` is the response variable.
+
+Since the `check loss` is a piecewise linear quadratic function (PLQ), it can be solved using ``rehline.plqERM_Ridge``:
 
 .. code:: python
 
-    # imports
-    from sklearn.datasets import make_regression
-    from sklearn.model_selection import train_test_split
+    from rehline import plqERM_Ridge
+    # Define a QR estimator
+    clf = plqERM_Ridge(loss={'name': 'QR', 'qt': 0.95}, C=1.0)
+    clf.fit(X=X_train, y=y_train)
+    # Make predictions
+    q_predict = clf.decision_function(X_test)
+
+    # Plot results
+    import matplotlib.pyplot as plt
+    plt.scatter(x=X_test[:, 0], y=y_test, label='y_true')
+    plt.scatter(x=X_test[:, 0], y=q_predict, alpha=0.5, label='q_95')
+    plt.legend(loc="upper left")
+    plt.show()
+
+Huber Regression
+----------------
+
+If you prefer Huber regression, it is also a PLQ function.
+
+The ridge-regularized Huber minimization solves the following optimization problem:
+
+.. math::
+
+    \min_{\mathbf{\beta}} C \sum_{i=1}^n H_\kappa( y_i - \mathbf{x}_i^\intercal \mathbf{\beta} ) + \frac{1}{2} \| \mathbf{\beta} \|_2^2,
 
-    # generate toy data
-    X, y = make_regression(n_samples=100, n_features=1000)
+where :math:`H_\kappa(\cdot)` is the Huber loss defined as follows:
+
+.. math::
+    \begin{equation*}
+    H_\kappa(z) =
+    \begin{cases}
+    z^2/2, & 0 < |z| \leq \kappa, \\
+    \kappa ( |z| - \kappa/2 ), & |z| > \kappa.
+    \end{cases}
+    \end{equation*}
+
+.. code:: python
+
+    from rehline import plqERM_Ridge
+    # Define a Huber estimator
+    clf = plqERM_Ridge(loss={'name': 'huber', 'tau': 0.5}, C=1.0)
+    clf.fit(X=X_train, y=y_train)
+    # Make predictions
+    y_huber = clf.decision_function(X_test)
+
+    # Plot results
+    import matplotlib.pyplot as plt
+    plt.scatter(x=X_test[:, 0], y=y_test, label='y_true')
+    plt.scatter(x=X_test[:, 0], y=y_huber, alpha=0.5, label='y_huber')
+    plt.legend(loc="upper left")
+    plt.show()
+
+Fairness Constraints
+--------------------
+
+You have now learned that the fitted Huber regression requires a fairness constraint for the first feature :math:`\mathbf{X}_{1}`. Specifically, the correlation between the predicted :math:`\hat{Y}` and :math:`\mathbf{X}_{1}` must be less than `tol=0.1`, that is,
+
+.. math::
+
+    \min_{\mathbf{\beta}} C \sum_{i=1}^n H_\kappa( y_i - \mathbf{x}_i^\intercal \mathbf{\beta} ) + \frac{1}{2} \| \mathbf{\beta} \|_2^2, \quad \text{s.t.} \quad \Big | \frac{1}{n} \sum_{i=1}^n \mathbf{z}_i \mathbf{\beta}^\intercal \mathbf{x}_i \Big| \leq \mathbf{\rho}
+
+With `rehline`, you can easily add a `fairness constraint` to your ERM.
+
+.. code:: python
 
-    # split data
-    X_train, X_test, y_train, y_test = train_test_split(X, y)
+    from rehline import plqERM_Ridge
+    from scipy.stats import pearsonr
+    # Define a Huber estimator with fairness constraint
+    clf = plqERM_Ridge(loss={'name': 'huber', 'tau': 0.5},
+                       constraint=[{'name': 'fair', 'X_sen': X_train[:, 0], 'tol_sen': 0.1}], 
+                       C=1.0,
+                       max_iter=10000)
+    clf.fit(X=X_train, y=y_train)
+    # Make predictions
+    y_huber_fair = clf.decision_function(X_test)
+
+    # Plot results
+    import matplotlib.pyplot as plt
+    plt.scatter(x=X_test[:, 0], y=y_test, label='y_true')
+    plt.scatter(x=X_test[:, 0], y=y_huber, alpha=0.5, label='y_huber')
+    plt.scatter(x=X_test[:, 0], y=y_huber_fair, alpha=0.5, label='y_huber_fair')
+    plt.legend(loc="upper left")
+    plt.show()
+
+.. nblinkgallery::
+   :caption: Related Examples
+   :name: rst-link-gallery
+
+   examples/QR.ipynb
diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -75,5 +75,6 @@ If you use this code please star 🌟 the repository and cite the following pape
    :maxdepth: 2
    :hidden:
 
+   getting_started
    example
    benchmark
diff --git a/rehline/_class.py b/rehline/_class.py
@@ -31,7 +31,6 @@ class ReHLine(_BaseReHLine, BaseEstimator):
 
     Parameters
     ----------
-
     C : float, default=1.0
         Regularization parameter. The strength of the regularization is
         inversely proportional to C. Must be strictly positive. 
@@ -218,7 +217,6 @@ class plqERM_Ridge(_BaseReHLine, BaseEstimator):
 
     Parameters
     ----------
-
     loss : dict
         A dictionary specifying the loss function parameters. 
     
@@ -251,7 +249,6 @@ class plqERM_Ridge(_BaseReHLine, BaseEstimator):
     b: array of shape (K, ), default=np.empty(shape=0)
         The intercept vector in the linear constraint.
     
-
     Attributes
     ----------
     coef_ : array-like