update documentation

ddbourgin · ddbourgin · commit 787924664e4f · 2020-05-10T02:10:17.000-04:00
diff --git a/docs/numpy_ml.nonparametric.rst b/docs/numpy_ml.nonparametric.rst
@@ -3,9 +3,18 @@ Nonparametric models
 
 .. raw:: html
 
-   <h2>Nearest Neighbors</h2>
-
-TODO
+   <h2>K-Nearest Neighbors</h2>
+
+The `k-nearest neighbors`_ (KNN) model is a nonparametric supervised learning
+approach that can be applied to classification or regression problems. In a
+classification context, the KNN model assigns a class label for a new datapoint
+by taking a majority vote amongst the labels for the `k` closest points
+("neighbors") in the training data. Similarly, in a regression context, the KNN
+model predicts the target value associated with a new datapoint by taking the
+average of the targets associated with the `k` closes points in the training
+data.
+
+.. _`k-nearest neighbors`: https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm
 
 **Models**
 
diff --git a/docs/numpy_ml.trees.rst b/docs/numpy_ml.trees.rst
@@ -15,8 +15,8 @@ associated with at least one data point from the original training set.
     :align: center
 
     A binary decision tree trained on the dataset :math:`X = \{ \mathbf{x}_1,
-    \ldots, \mathbf{x}_{10} \}`. Each example in the dataset is a 5-dimensional
-    vector of real-valued features labeled :math:`x_1, \ldots, x_5`. Unshaded
+    \ldots, \mathbf{x}_{10} \}`. Each example in the dataset is a 4-dimensional
+    vector of real-valued features labeled :math:`x_1, \ldots, x_4`. Unshaded
     circles correspond to internal decision nodes, while shaded circles
     correspond to leaf nodes. Each leaf node is associated with a subset of the
     examples in `X`, selected based on the decision rules along the path from
@@ -52,13 +52,13 @@ impurity after a particular split is
 .. math::
 
     \Delta \mathcal{L} = \mathcal{L}(\text{Parent}) -
-        P_{left} \mathcal{L}(\text{Left child}) -
-            (1 - P_{left})\mathcal{L}(\text{Right child})
+        P_{\text{left}} \mathcal{L}(\text{Left child}) -
+            (1 - P_{\text{left}})\mathcal{L}(\text{Right child})
 
 where :math:`\mathcal{L}(x)` is the impurity of the dataset at node `x`,
-and :math:`P_{left}`/:math:`P_{right}` are the proportion of examples at the
-current node that are partitioned into the left / right children, respectively,
-by the proposed split.
+and :math:`P_{\text{left}}`/:math:`P_{\text{right}}` are the proportion of
+examples at the current node that are partitioned into the left / right
+children, respectively, by the proposed split.
 
 .. _`Decision trees`: https://en.wikipedia.org/wiki/Decision_tree_learning
 
@@ -123,7 +123,7 @@ that proceeds by iteratively fitting a sequence of `m` weak learners such that:
 
 where `b` is a fixed initial estimate for the targets, :math:`\eta` is
 a learning rate parameter, and :math:`w_{i}` and :math:`g_{i}`
-denote the weights and predictions for `i` th learner.
+denote the weights and predictions of the :math:`i^{th}` learner.
 
 At each training iteration a new weak learner is fit to predict the negative
 gradient of the loss with respect to the previous prediction,
diff --git a/numpy_ml/bandits/bandits.py b/numpy_ml/bandits/bandits.py
@@ -406,14 +406,14 @@ def _pull(self, arm_id, context):
 class ContextualLinearBandit(Bandit):
     def __init__(self, K, D, payoff_variance=1):
         r"""
-        A linear multi-armed bandit where .
+        A contextual linear multi-armed bandit.
 
         Notes
         -----
         In a contextual linear bandit the expected payoff of an arm :math:`a
         \in \mathcal{A}` at time `t` is a linear combination of its context
         vector :math:`\mathbf{x}_{t,a}` with a coefficient vector
-        :math:`\\theta_a`:
+        :math:`\theta_a`:
 
         .. math::
 
diff --git a/numpy_ml/factorization/factors.py b/numpy_ml/factorization/factors.py
@@ -295,18 +295,16 @@ def fit(self, X, W=None, H=None, n_initializations=10, verbose=False):
         the input data matrix, and :math:`\mathbf{w}_j` and
         :math:`\mathbf{h}_j` are the :math:`j^{th}` columns of the current
         factor matrices **W** and **H**. HALS proceeds by minimizing the cost
-        for each residue, first with respect to :math:`\mathbf{w}_j` holding
-        :math:`\mathbf{h}_j` fixed, and then with respect to
-        :math:`\mathbf{h}_j`, holding the newly updated :math:`\mathbf{w}_j`
-        fixed. The residue cost :math:`\mathcal{L}^{(j)}` for
-        :math:`\mathbf{X}^{j}` is simply:
+        for each residue, first with respect to :math:`\mathbf{w}_j`, and then
+        with respect to :math:`\mathbf{h}_j`. In either case, the cost for
+        residue `j`, :math:`\mathcal{L}^{(j)}` is simply:
 
         .. math::
 
             \mathcal{L}^{(j)} :=
                 || \mathbf{X}^{(j)} - \mathbf{w}_j \mathbf{h}_j^\top ||
 
-        where :math:`||\cdot||` denotes the Frobenius norm. For NMF, this
+        where :math:`||\cdot||` denotes the Frobenius norm. For NMF,
         minimization is performed under the constraint that all elements of
         both **W** and **H** are nonnegative.