stan-dev
diff --git a/‎src/functions-reference/binary_distributions.Rmd‎
Lines changed: 58 additions & 13 deletions b/‎src/functions-reference/binary_distributions.Rmd‎
Lines changed: 58 additions & 13 deletions
diff --git a/‎src/functions-reference/bounded_discrete_distributions.Rmd‎
Lines changed: 151 additions & 5 deletions b/‎src/functions-reference/bounded_discrete_distributions.Rmd‎
Lines changed: 151 additions & 5 deletions
@@ -7,7 +7,7 @@ represents the value true and 0 the value false.
 if (knitr::is_html_output()) {
 cat(' * <a href="bernoulli-distribution.html">Bernoulli Distribution</a>\n')
 cat(' * <a href="bernoulli-logit-distribution.html">Bernoulli Distribution, Logit Parameterization</a>\n')
-cat(' * <a href="bernoulli-logit-glm.html">Bernoulli-Logit Generalised Linear Model (Logistic Regression)</a>\n')
+cat(' * <a href="bernoulli-logit-glm.html">Bernoulli-Logit generalized Linear Model (Logistic Regression)</a>\n')
 }
 ```
 
@@ -110,11 +110,11 @@ $\text{logit}^{-1}(\alpha)$; may only be used in transformed data and generated
 quantities blocks. For a description of argument and return types, see section
 [vectorized PRNG functions](#prng-vectorization).
 
-## Bernoulli-Logit Generalised Linear Model (Logistic Regression) {#bernoulli-logit-glm}
+## Bernoulli-Logit Generalized Linear Model (Logistic Regression) {#bernoulli-logit-glm}
 
-Stan also supplies a single primitive for a Generalised Linear Model
-with Bernoulli likelihood and logit link function, i.e. a primitive
-for a logistic regression. This should provide a more efficient
+Stan also supplies a single function for a generalized linear model
+with Bernoulli likelihood and logit link function, i.e. a function
+for a logistic regression. This provides a more efficient
 implementation of logistic regression than a manually written
 regression in terms of a Bernoulli likelihood and matrix
 multiplication.
@@ -142,24 +142,69 @@ dropping constant additive terms.
 
 ### Stan Functions
 
+<!-- real; bernoulli_logit_glm_lpmf; (int y | matrix x, real alpha, vector beta); -->
+\index{{\tt \bfseries bernoulli\_logit\_glm\_lpmf  }!{\tt (int y \textbar\ matrix x, real alpha, vector beta): real}|hyperpage}
+
+`real` **`bernoulli_logit_glm_lpmf`**`(int y | matrix x, real alpha, vector beta)`<br>\newline
+The log Bernoulli probability mass of y given chance of success
+`inv_logit(alpha + x * beta)`, where the same intercept `alpha` and dependant variable value `y` are used
+for all observations. The number of columns of `x` needs to match the size of the
+coefficient vector `beta`.
+If `x` and `y` are data (not parameters) this function can be executed on a GPU.
+
+<!-- real; bernoulli_logit_glm_lpmf; (int y | matrix x, vector alpha, vector beta); -->
+\index{{\tt \bfseries bernoulli\_logit\_glm\_lpmf  }!{\tt (int y \textbar\ matrix x, vector alpha, vector beta): real}|hyperpage}
+
+`real` **`bernoulli_logit_glm_lpmf`**`(int y | matrix x, vector alpha, vector beta)`<br>\newline
+The log Bernoulli probability mass of y given chance of success
+`inv_logit(alpha + x * beta)`, where an intercept `alpha` is used that is
+allowed to vary by observation. The dependant variable 
+value `y` is used for all observations. 
+The number of rows of `x` must match the size of `alpha` and 
+the number of columns of `x` needs to match the size of the coefficient vector `beta`.
+If `x` and `y` are data (not parameters) this function can be executed on a GPU.
+
+<!-- real; bernoulli_logit_glm_lpmf; (int[] y | row_vector x, real alpha, vector beta); -->
+\index{{\tt \bfseries bernoulli\_logit\_glm\_lpmf  }!{\tt (int[] y \textbar\ row\_vector x, real alpha, vector beta): real}|hyperpage}
+
+`real` **`bernoulli_logit_glm_lpmf`**`(int[] y | row_vector x, real alpha, vector beta)`<br>\newline
+The log Bernoulli probability mass of y given chance of success
+`inv_logit(alpha + x * beta)`, where the same intercept `alpha` and 
+same independent variables values `x` are used for all observations.
+The number of columns of `x` needs to match the size of the coefficient vector `beta`.
+
+<!-- real; bernoulli_logit_glm_lpmf; (int[] y | row_vector x, vector alpha, vector beta); -->
+\index{{\tt \bfseries bernoulli\_logit\_glm\_lpmf  }!{\tt (int[] y \textbar\ row\_vector x, vector alpha, vector beta): real}|hyperpage}
+
+`real` **`bernoulli_logit_glm_lpmf`**`(int[] y | row_vector x, vector alpha, vector beta)`<br>\newline
+The log Bernoulli probability mass of y given chance of success
+`inv_logit(alpha + x * beta)`, where an intercept `alpha` is used that is
+allowed to vary by observation.
+The same independent variables values `x` are used for all observations. 
+The size of `y` must match the size of `alpha` and 
+the number of columns of `x` needs to match the size of the coefficient vector `beta`.
+
+
 <!-- real; bernoulli_logit_glm_lpmf; (int[] y | matrix x, real alpha, vector beta); -->
 \index{{\tt \bfseries bernoulli\_logit\_glm\_lpmf  }!{\tt (int[] y \textbar\ matrix x, real alpha, vector beta): real}|hyperpage}
 
 `real` **`bernoulli_logit_glm_lpmf`**`(int[] y | matrix x, real alpha, vector beta)`<br>\newline
 The log Bernoulli probability mass of y given chance of success
-`inv_logit(alpha+x*beta)`, where a constant intercept `alpha` is used
+`inv_logit(alpha + x * beta)`, where the same intercept `alpha` is used
 for all observations. The number of rows of the independent variable
-matrix `x` needs to match the length of the dependent variable vector
-`y` and the number of columns of `x` needs to match the length of the
-weight vector `beta`.
+matrix `x` needs to match the size of the dependent variable vector
+`y` and the number of columns of `x` needs to match the size of the
+coefficient vector `beta`.
+If `x` and `y` are data (not parameters) this function can be executed on a GPU.
 
 <!-- real; bernoulli_logit_glm_lpmf; (int[] y | matrix x, vector alpha, vector beta); -->
 \index{{\tt \bfseries bernoulli\_logit\_glm\_lpmf  }!{\tt (int[] y \textbar\ matrix x, vector alpha, vector beta): real}|hyperpage}
 
 `real` **`bernoulli_logit_glm_lpmf`**`(int[] y | matrix x, vector alpha, vector beta)`<br>\newline
 The log Bernoulli probability mass of y given chance of success
-`inv_logit(alpha+x*beta)`, where an intercept `alpha` is used that is
-allowed to vary with the different observations. The number of rows of
-the independent variable matrix `x` needs to match the length of the
+`inv_logit(alpha + x * beta)`, where an intercept `alpha` is used that is
+allowed to vary by observation. The number of rows of
+the independent variable matrix `x` needs to match the size of the
 dependent variable vector `y` and `alpha` and the number of columns of
-`x` needs to match the length of the weight vector `beta`.
+`x` needs to match the size of the coefficient vector `beta`.
+If `x` and `y` are data (not parameters) this function can be executed on a GPU.
@@ -10,7 +10,9 @@ cat(' * <a href="binomial-distribution-logit-parameterization.html">Binomial Dis
 cat(' * <a href="beta-binomial-distribution.html">Beta-Binomial Distribution</a>\n')
 cat(' * <a href="hypergeometric-distribution.html">Hypergeometric Distribution</a>\n')
 cat(' * <a href="categorical-distribution.html">Categorical Distribution</a>\n')
+cat(' * <a href="categorical-logit-glm.html">Categorical Logit generalized Linear Model (Softmax Regression)</a>\n')
 cat(' * <a href="ordered-logistic-distribution.html">Ordered Logistic Distribution</a>\n')
+cat(' * <a href="ordered-logistic-glm.html">Ordered Logistic generalized Linear Model (Ordinal Regression)</a>\n')
 cat(' * <a href="ordered-probit-distribution.html">Ordered Probit Distribution</a>\n')
 }
 ```
@@ -238,8 +240,8 @@ an $N$-simplex (i.e., has nonnegative entries summing to one), then
 for $y \in \{1,\ldots,N\}$, \[ \text{Categorical}(y~|~\theta) =
 \theta_y. \] In addition, Stan provides a log-odds scaled categorical
 distribution, \[ \text{CategoricalLogit}(y~|~\beta) =
-\text{Categorical}(y~|~\text{softmax}(\beta)). \] See section
-[softmax](#softmax) for the definition of the softmax function.
+\text{Categorical}(y~|~\text{softmax}(\beta)). \] 
+See [the definition of softmax](#softmax) for the definition of the softmax function.
 
 ### Sampling Statement
 
@@ -296,6 +298,83 @@ Generate a categorical variate with outcome in range $1:N$ from
 log-odds vector beta; may only be used in transformed data and generated
 quantities blocks
 
+## Categorical Logit Generalized Linear Model (Softmax Regression) {#categorical-logit-glm}
+
+Stan also supplies a single function for a generalized linear model
+with categorical likelihood and logit link function, i.e. a function
+for a softmax regression. This provides a more efficient
+implementation of softmax regression than a manually written
+regression in terms of a Categorical likelihood and matrix
+multiplication.
+
+Note that the implementation does not put any restrictions on the coefficient matrix $\beta$. It is up to the user to use a reference category, a suitable prior or some other means of identifiability. See Multi-logit in the [Stan User's Guide](https://mc-stan.org/users/documentation/).
+
+### Probability Mass Functions
+
+If $N,M,K \in \mathbb{N}$, $N,M,K > 0$, and if $x\in \mathbb{R}^{M\cdot K}, \alpha \in \mathbb{R}^N, \beta\in \mathbb{R}^{K\cdot N}$, then for $y \in \{1,\ldots,N\}^M$, 
+\[ \text{CategoricalLogitGLM}(y~|~x,\alpha,\beta) = \\[5pt]
+\prod_{1\leq i \leq M}\text{CategoricalLogit}(y_i~|~\alpha+x_i\cdot\beta) = \\[15pt] 
+\prod_{1\leq i \leq M}\text{Categorical}(y_i~|~softmax(\alpha+x_i\cdot\beta)). \] 
+See [the definition of softmax](#softmax) for the definition of the softmax function.
+
+### Sampling Statement
+
+`y ~ ` **`categorical_logit_glm`**`(x, alpha, beta)`
+
+Increment target log probability density with `categorical_logit_glm(y | x, alpha, beta)`
+dropping constant additive terms.
+<!-- real; categorical_logit_glm ~; -->
+\index{{\tt \bfseries categorical\_logit\_glm }!sampling statement|hyperpage}
+
+
+### Stan Functions
+
+<!-- real; categorical_logit_glm_lpmf; (int y | row_vector x, vector alpha, matrix beta); -->
+\index{{\tt \bfseries categorical\_logit\_glm\_lpmf }!{\tt (int y \textbar\ row\_vector x, vector alpha, matrix beta): real}|hyperpage}
+
+`real` **`categorical_logit_glm_lpmf`**`(int y | row_vector x, vector alpha, matrix beta)`<br>\newline
+The log categorical probability mass function with outcome `y` in
+$1:N$ given $N$-vector of log-odds of outcomes `alpha + x * beta`.
+The size of the independent variable row vector `x` needs to match the number of rows of the
+coefficient matrix `beta`. The size of the intercept vector `alpha` must match the number 
+of columns of the coefficient matrix `beta`.
+
+<!-- real; categorical_logit_glm_lpmf; (int y | matrix x, vector alpha, matrix beta); -->
+\index{{\tt \bfseries categorical\_logit\_glm\_lpmf }!{\tt (int y \textbar\ matrix x, vector alpha, matrix beta): real}|hyperpage}
+
+`real` **`categorical_logit_glm_lpmf`**`(int y | matrix x, vector alpha, matrix beta)`<br>\newline
+The log categorical probability mass function with outcomes `y` in
+$1:N$ given $N$-vector of log-odds of outcomes `alpha + x * beta`.
+The same vector of intercepts `alpha` and the same dependent variable value `y` are used for all instances.
+The number of columns of the independent variable `x` needs to match the number of rows of the
+coefficient matrix `beta`. The size of the intercept vector `alpha` must match the number 
+of columns of the coefficient matrix `beta`. If `x` and `y` are data (not parameters) this function can be executed on a GPU.
+
+<!-- real; categorical_logit_glm_lpmf; (int[] y | vector theta); -->
+\index{{\tt \bfseries categorical\_logit\_glm\_lpmf }!{\tt (int[] y \textbar\ row\_vector x, vector alpha, matrix beta): real}|hyperpage}
+
+`real` **`categorical_logit_glm_lpmf`**`(int[] y | row_vector x, vector alpha, matrix beta)`<br>\newline
+The log categorical probability mass function with outcomes `y` in
+$1:N$ given $N$-vector of log-odds of outcomes `alpha + x * beta`.
+The same vector of intercepts `alpha` and same row vector of the independent variables `x` are used for all instances.
+The size of the independent variable matrix `x` needs to match the number of rows of the
+coefficient vector `beta`. The size of the intercept vector `alpha` must match the number 
+of columns of the coefficient vector `beta`.
+
+<!-- real; categorical_logit_glm_lpmf; (int[] y | vector theta); -->
+\index{{\tt \bfseries categorical\_logit\_glm\_lpmf }!{\tt (int[] y \textbar\ matrix x, vector alpha, matrix beta): real}|hyperpage}
+
+`real` **`categorical_logit_glm_lpmf`**`(int[] y | matrix x, vector alpha, matrix beta)`<br>\newline
+The log categorical probability mass function with outcomes `y` in
+$1:N$ given $N$-vector of log-odds of outcomes `alpha + x * beta`. 
+The same vector of intercepts `alpha` is used for all instances.
+The number of rows of the independent variable
+matrix `x` needs to match the size of the dependent variable vector
+`y`. The number of columns of independnt variable `x` needs to match the number of rows of the
+coefficient matrix `beta`. The size of the intercept vector `alpha` must match the number 
+of columns of the coefficient matrix `beta`. If `x` and `y` are data (not parameters) this function can be executed on a GPU.
+
+
 ## Ordered Logistic Distribution
 
 ### Probability Mass Function
@@ -330,14 +409,81 @@ dropping constant additive terms.
 
 `real` **`ordered_logistic_lpmf`**`(ints k | vector eta, vectors c)`<br>\newline
 The log ordered logistic probability mass of k given linear predictors
-eta, and cutpoints c.
+`eta`, and cutpoints `c`.
 
 <!-- int; ordered_logistic_rng; (real eta, vector c); -->
 \index{{\tt \bfseries ordered\_logistic\_rng }!{\tt (real eta, vector c): int}|hyperpage}
 
 `int` **`ordered_logistic_rng`**`(real eta, vector c)`<br>\newline
-Generate an ordered logistic variate with linear predictor eta and
-cutpoints c; may only be used in transformed data and generated quantities blocks
+Generate an ordered logistic variate with linear predictor `eta` and
+cutpoints `c`; may only be used in transformed data and generated quantities blocks
+
+## Ordered Logistic Generalized Linear Model (Ordinal Regression)
+
+### Probability Mass Function
+
+If $N,M,K \in \mathbb{N}$ with $N, M > 0$, $K > 2$, $c \in \mathbb{R}^{K-1}$ such that
+$c_k < c_{k+1}$ for $k \in \{1,\ldots,K-2\}$, and $x\in \mathbb{R}^{N\cdot M}, \beta\in \mathbb{R}^M$, then for $y \in \{1,\ldots,K\}^N$, 
+\[\text{OrderedLogisticGLM}(y~|~x,\beta,c) = \\[4pt]
+\prod_{1\leq i \leq N}\text{OrderedLogistic}(y_i~|~x_i\cdot \beta,c) = \\[17pt]
+\prod_{1\leq i \leq N}\left\{ \begin{array}{ll}
+1 - \text{logit}^{-1}(x_i\cdot \beta - c_1)  &  \text{if } y = 1, \\[4pt]
+\text{logit}^{-1}(x_i\cdot \beta - c_{y-1}) - \text{logit}^{-1}(x_i\cdot \beta - c_{y}) & \text{if } 1 < y < K, \text{and} \\[4pt]
+\text{logit}^{-1}(x_i\cdot \beta - c_{K-1}) - 0  &  \text{if } y = K. 
+\end{array} \right. \] The $k=K$
+case is written with the redundant subtraction of zero to illustrate
+the parallelism of the cases; the $y=1$ and $y=K$ edge cases can be
+subsumed into the general definition by setting $c_0 = -\infty$ and
+$c_K = +\infty$ with $\text{logit}^{-1}(-\infty) = 0$ and
+$\text{logit}^{-1}(\infty) = 1$.
+
+### Sampling Statement
+
+`y ~ ` **`ordered_logistic_glm`**`(x, beta, c)`
+
+Increment target log probability density with `ordered_logistic_lpmf(y | x, beta, c)`
+dropping constant additive terms.
+<!-- real; ordered_logistic ~; -->
+\index{{\tt \bfseries ordered\_logistic\_glm }!sampling statement|hyperpage}
+
+### Stan Functions
+
+<!-- real; ordered_logistic_glm_lpmf; (int y | row_vector x, vector beta, vector c); -->
+\index{{\tt \bfseries ordered\_logistic\_glm\_lpmf }!{\tt (int y \textbar\ row\_vector x, vector beta, vector c): real}|hyperpage}
+
+`real` **`ordered_logistic_glm_lpmf`**`(int y | row_vector x, vector beta, vector c)`<br>\newline
+The log ordered logistic probability mass of y, given linear predictors `x * beta`, and cutpoints c. 
+The size of the independent variable row vector `x` needs to match the size of the coefficient vector `beta`. 
+The cutpoints `c` must be ordered.
+
+<!-- real; ordered_logistic_glm_lpmf; (int y | matrix x, vector beta, vector c); -->
+\index{{\tt \bfseries ordered\_logistic\_glm\_lpmf }!{\tt (int y \textbar\ matrix x, vector beta, vector c): real}|hyperpage}
+
+`real` **`ordered_logistic_glm_lpmf`**`(int y | matrix x, vector beta, vector c)`<br>\newline
+The log ordered logistic probability mass of y, given linear predictors `x * beta`, and cutpoints c. 
+The same value of the independent variable `y` is used for all instances.
+The number of columns of the independent variable row vector `x` needs to match the size of the coefficient vector `beta`.
+The cutpoints `c` must be ordered. If `x` and `y` are data (not parameters) this function can be executed on a GPU.
+
+<!-- real; ordered_logistic_glm_lpmf; (int[] y | row_vector x, vector beta, vector c); -->
+\index{{\tt \bfseries ordered\_logistic\_glm\_lpmf }!{\tt (int[] y \textbar\ row\_vector x, vector beta, vector c): real}|hyperpage}
+
+`real` **`ordered_logistic_glm_lpmf`**`(int[] y | row_vector x, vector beta, vector c)`<br>\newline
+The log ordered logistic probability mass of y, given linear predictors `x * beta`, and cutpoints c. 
+The same row vector of the independent variables `x` is used for all instances.
+The size of the independent variable row vector `x` needs to match the size of the coefficient vector `beta`. 
+The cutpoints `c` must be ordered.
+
+<!-- real; ordered_logistic_glm_lpmf; (int[] y | matrix x, vector beta, vector c); -->
+\index{{\tt \bfseries ordered\_logistic\_glm\_lpmf }!{\tt (int[] y \textbar\ matrix x, vector beta, vector c): real}|hyperpage}
+
+`real` **`ordered_logistic_glm_lpmf`**`(int[] y | matrix x, vector beta, vector c)`<br>\newline
+The log ordered logistic probability mass of y, given linear predictors
+`x * beta`, and cutpoints c. 
+The number of rows of the independent variable matrix `x` needs to match the size of the dependent variable vector `y`.
+The number of columns of the independent variable row vector `x` needs to match the size of the coefficient vector `beta`.
+The cutpoints `c` must be ordered. If `x` and `y` are data (not parameters) this function can be executed on a GPU.
+
 
 ## Ordered Probit Distribution