@@ -10,7 +10,9 @@ cat(' * <a href="binomial-distribution-logit-parameterization.html">Binomial Dis
1010cat(' * <a href="beta-binomial-distribution.html">Beta-Binomial Distribution</a>\n')
1111cat(' * <a href="hypergeometric-distribution.html">Hypergeometric Distribution</a>\n')
1212cat(' * <a href="categorical-distribution.html">Categorical Distribution</a>\n')
13+ cat(' * <a href="categorical-logit-glm.html">Categorical Logit generalized Linear Model (Softmax Regression)</a>\n')
1314cat(' * <a href="ordered-logistic-distribution.html">Ordered Logistic Distribution</a>\n')
15+ cat(' * <a href="ordered-logistic-glm.html">Ordered Logistic generalized Linear Model (Ordinal Regression)</a>\n')
1416cat(' * <a href="ordered-probit-distribution.html">Ordered Probit Distribution</a>\n')
1517}
1618```
@@ -238,8 +240,8 @@ an $N$-simplex (i.e., has nonnegative entries summing to one), then
238240for $y \in \{ 1,\ldots,N\} $, \[ \text{Categorical}(y~ |~ \theta) =
239241\theta_y. \] In addition, Stan provides a log-odds scaled categorical
240242distribution, \[ \text{CategoricalLogit}(y~ |~ \beta) =
241- \text{Categorical}(y~ |~ \text{softmax}(\beta)). \] See section
242- [ softmax] ( #softmax ) for the definition of the softmax function.
243+ \text{Categorical}(y~ |~ \text{softmax}(\beta)). \]
244+ See [ the definition of softmax] ( #softmax ) for the definition of the softmax function.
243245
244246### Sampling Statement
245247
@@ -296,6 +298,83 @@ Generate a categorical variate with outcome in range $1:N$ from
296298log-odds vector beta; may only be used in transformed data and generated
297299quantities blocks
298300
301+ ## Categorical Logit Generalized Linear Model (Softmax Regression) {#categorical-logit-glm}
302+
303+ Stan also supplies a single function for a generalized linear model
304+ with categorical likelihood and logit link function, i.e. a function
305+ for a softmax regression. This provides a more efficient
306+ implementation of softmax regression than a manually written
307+ regression in terms of a Categorical likelihood and matrix
308+ multiplication.
309+
310+ Note that the implementation does not put any restrictions on the coefficient matrix $\beta$. It is up to the user to use a reference category, a suitable prior or some other means of identifiability. See Multi-logit in the [ Stan User's Guide] ( https://mc-stan.org/users/documentation/ ) .
311+
312+ ### Probability Mass Functions
313+
314+ If $N,M,K \in \mathbb{N}$, $N,M,K > 0$, and if $x\in \mathbb{R}^{M\cdot K}, \alpha \in \mathbb{R}^N, \beta\in \mathbb{R}^{K\cdot N}$, then for $y \in \{ 1,\ldots,N\} ^M$,
315+ \[ \text{CategoricalLogitGLM}(y~ |~ x,\alpha,\beta) = \\ [ 5pt]
316+ \prod_ {1\leq i \leq M}\text{CategoricalLogit}(y_i~ |~ \alpha+x_i\cdot\beta) = \\ [ 15pt]
317+ \prod_ {1\leq i \leq M}\text{Categorical}(y_i~ |~ softmax(\alpha+x_i\cdot\beta)). \]
318+ See [ the definition of softmax] ( #softmax ) for the definition of the softmax function.
319+
320+ ### Sampling Statement
321+
322+ ` y ~ ` ** ` categorical_logit_glm ` ** ` (x, alpha, beta) `
323+
324+ Increment target log probability density with ` categorical_logit_glm(y | x, alpha, beta) `
325+ dropping constant additive terms.
326+ <!-- real; categorical_logit_glm ~; -->
327+ \index{{\tt \bfseries categorical\_ logit\_ glm }!sampling statement|hyperpage}
328+
329+
330+ ### Stan Functions
331+
332+ <!-- real; categorical_logit_glm_lpmf; (int y | row_vector x, vector alpha, matrix beta); -->
333+ \index{{\tt \bfseries categorical\_ logit\_ glm\_ lpmf }!{\tt (int y \textbar\ row\_ vector x, vector alpha, matrix beta): real}|hyperpage}
334+
335+ ` real ` ** ` categorical_logit_glm_lpmf ` ** ` (int y | row_vector x, vector alpha, matrix beta) ` <br >\newline
336+ The log categorical probability mass function with outcome ` y ` in
337+ $1: N $ given $N$-vector of log-odds of outcomes ` alpha + x * beta ` .
338+ The size of the independent variable row vector ` x ` needs to match the number of rows of the
339+ coefficient matrix ` beta ` . The size of the intercept vector ` alpha ` must match the number
340+ of columns of the coefficient matrix ` beta ` .
341+
342+ <!-- real; categorical_logit_glm_lpmf; (int y | matrix x, vector alpha, matrix beta); -->
343+ \index{{\tt \bfseries categorical\_ logit\_ glm\_ lpmf }!{\tt (int y \textbar\ matrix x, vector alpha, matrix beta): real}|hyperpage}
344+
345+ ` real ` ** ` categorical_logit_glm_lpmf ` ** ` (int y | matrix x, vector alpha, matrix beta) ` <br >\newline
346+ The log categorical probability mass function with outcomes ` y ` in
347+ $1: N $ given $N$-vector of log-odds of outcomes ` alpha + x * beta ` .
348+ The same vector of intercepts ` alpha ` and the same dependent variable value ` y ` are used for all instances.
349+ The number of columns of the independent variable ` x ` needs to match the number of rows of the
350+ coefficient matrix ` beta ` . The size of the intercept vector ` alpha ` must match the number
351+ of columns of the coefficient matrix ` beta ` . If ` x ` and ` y ` are data (not parameters) this function can be executed on a GPU.
352+
353+ <!-- real; categorical_logit_glm_lpmf; (int[] y | vector theta); -->
354+ \index{{\tt \bfseries categorical\_ logit\_ glm\_ lpmf }!{\tt (int[ ] y \textbar\ row\_ vector x, vector alpha, matrix beta): real}|hyperpage}
355+
356+ ` real ` ** ` categorical_logit_glm_lpmf ` ** ` (int[] y | row_vector x, vector alpha, matrix beta) ` <br >\newline
357+ The log categorical probability mass function with outcomes ` y ` in
358+ $1: N $ given $N$-vector of log-odds of outcomes ` alpha + x * beta ` .
359+ The same vector of intercepts ` alpha ` and same row vector of the independent variables ` x ` are used for all instances.
360+ The size of the independent variable matrix ` x ` needs to match the number of rows of the
361+ coefficient vector ` beta ` . The size of the intercept vector ` alpha ` must match the number
362+ of columns of the coefficient vector ` beta ` .
363+
364+ <!-- real; categorical_logit_glm_lpmf; (int[] y | vector theta); -->
365+ \index{{\tt \bfseries categorical\_ logit\_ glm\_ lpmf }!{\tt (int[ ] y \textbar\ matrix x, vector alpha, matrix beta): real}|hyperpage}
366+
367+ ` real ` ** ` categorical_logit_glm_lpmf ` ** ` (int[] y | matrix x, vector alpha, matrix beta) ` <br >\newline
368+ The log categorical probability mass function with outcomes ` y ` in
369+ $1: N $ given $N$-vector of log-odds of outcomes ` alpha + x * beta ` .
370+ The same vector of intercepts ` alpha ` is used for all instances.
371+ The number of rows of the independent variable
372+ matrix ` x ` needs to match the size of the dependent variable vector
373+ ` y ` . The number of columns of independnt variable ` x ` needs to match the number of rows of the
374+ coefficient matrix ` beta ` . The size of the intercept vector ` alpha ` must match the number
375+ of columns of the coefficient matrix ` beta ` . If ` x ` and ` y ` are data (not parameters) this function can be executed on a GPU.
376+
377+
299378## Ordered Logistic Distribution
300379
301380### Probability Mass Function
@@ -330,14 +409,81 @@ dropping constant additive terms.
330409
331410` real ` ** ` ordered_logistic_lpmf ` ** ` (ints k | vector eta, vectors c) ` <br >\newline
332411The log ordered logistic probability mass of k given linear predictors
333- eta, and cutpoints c .
412+ ` eta ` , and cutpoints ` c ` .
334413
335414<!-- int; ordered_logistic_rng; (real eta, vector c); -->
336415\index{{\tt \bfseries ordered\_ logistic\_ rng }!{\tt (real eta, vector c): int}|hyperpage}
337416
338417` int ` ** ` ordered_logistic_rng ` ** ` (real eta, vector c) ` <br >\newline
339- Generate an ordered logistic variate with linear predictor eta and
340- cutpoints c; may only be used in transformed data and generated quantities blocks
418+ Generate an ordered logistic variate with linear predictor ` eta ` and
419+ cutpoints ` c ` ; may only be used in transformed data and generated quantities blocks
420+
421+ ## Ordered Logistic Generalized Linear Model (Ordinal Regression)
422+
423+ ### Probability Mass Function
424+
425+ If $N,M,K \in \mathbb{N}$ with $N, M > 0$, $K > 2$, $c \in \mathbb{R}^{K-1}$ such that
426+ $c_k < c_ {k+1}$ for $k \in \{ 1,\ldots,K-2\} $, and $x\in \mathbb{R}^{N\cdot M}, \beta\in \mathbb{R}^M$, then for $y \in \{ 1,\ldots,K\} ^N$,
427+ \[ \text{OrderedLogisticGLM}(y~ |~ x,\beta,c) = \\ [ 4pt]
428+ \prod_ {1\leq i \leq N}\text{OrderedLogistic}(y_i~ |~ x_i\cdot \beta,c) = \\ [ 17pt]
429+ \prod_ {1\leq i \leq N}\left\{ \begin{array}{ll}
430+ 1 - \text{logit}^{-1}(x_i\cdot \beta - c_1) & \text{if } y = 1, \\ [ 4pt]
431+ \text{logit}^{-1}(x_i\cdot \beta - c_ {y-1}) - \text{logit}^{-1}(x_i\cdot \beta - c_ {y}) & \text{if } 1 < y < K, \text{and} \\ [ 4pt]
432+ \text{logit}^{-1}(x_i\cdot \beta - c_ {K-1}) - 0 & \text{if } y = K.
433+ \end{array} \right. \] The $k=K$
434+ case is written with the redundant subtraction of zero to illustrate
435+ the parallelism of the cases; the $y=1$ and $y=K$ edge cases can be
436+ subsumed into the general definition by setting $c_0 = -\infty$ and
437+ $c_K = +\infty$ with $\text{logit}^{-1}(-\infty) = 0$ and
438+ $\text{logit}^{-1}(\infty) = 1$.
439+
440+ ### Sampling Statement
441+
442+ ` y ~ ` ** ` ordered_logistic_glm ` ** ` (x, beta, c) `
443+
444+ Increment target log probability density with ` ordered_logistic_lpmf(y | x, beta, c) `
445+ dropping constant additive terms.
446+ <!-- real; ordered_logistic ~; -->
447+ \index{{\tt \bfseries ordered\_ logistic\_ glm }!sampling statement|hyperpage}
448+
449+ ### Stan Functions
450+
451+ <!-- real; ordered_logistic_glm_lpmf; (int y | row_vector x, vector beta, vector c); -->
452+ \index{{\tt \bfseries ordered\_ logistic\_ glm\_ lpmf }!{\tt (int y \textbar\ row\_ vector x, vector beta, vector c): real}|hyperpage}
453+
454+ ` real ` ** ` ordered_logistic_glm_lpmf ` ** ` (int y | row_vector x, vector beta, vector c) ` <br >\newline
455+ The log ordered logistic probability mass of y, given linear predictors ` x * beta ` , and cutpoints c.
456+ The size of the independent variable row vector ` x ` needs to match the size of the coefficient vector ` beta ` .
457+ The cutpoints ` c ` must be ordered.
458+
459+ <!-- real; ordered_logistic_glm_lpmf; (int y | matrix x, vector beta, vector c); -->
460+ \index{{\tt \bfseries ordered\_ logistic\_ glm\_ lpmf }!{\tt (int y \textbar\ matrix x, vector beta, vector c): real}|hyperpage}
461+
462+ ` real ` ** ` ordered_logistic_glm_lpmf ` ** ` (int y | matrix x, vector beta, vector c) ` <br >\newline
463+ The log ordered logistic probability mass of y, given linear predictors ` x * beta ` , and cutpoints c.
464+ The same value of the independent variable ` y ` is used for all instances.
465+ The number of columns of the independent variable row vector ` x ` needs to match the size of the coefficient vector ` beta ` .
466+ The cutpoints ` c ` must be ordered. If ` x ` and ` y ` are data (not parameters) this function can be executed on a GPU.
467+
468+ <!-- real; ordered_logistic_glm_lpmf; (int[] y | row_vector x, vector beta, vector c); -->
469+ \index{{\tt \bfseries ordered\_ logistic\_ glm\_ lpmf }!{\tt (int[ ] y \textbar\ row\_ vector x, vector beta, vector c): real}|hyperpage}
470+
471+ ` real ` ** ` ordered_logistic_glm_lpmf ` ** ` (int[] y | row_vector x, vector beta, vector c) ` <br >\newline
472+ The log ordered logistic probability mass of y, given linear predictors ` x * beta ` , and cutpoints c.
473+ The same row vector of the independent variables ` x ` is used for all instances.
474+ The size of the independent variable row vector ` x ` needs to match the size of the coefficient vector ` beta ` .
475+ The cutpoints ` c ` must be ordered.
476+
477+ <!-- real; ordered_logistic_glm_lpmf; (int[] y | matrix x, vector beta, vector c); -->
478+ \index{{\tt \bfseries ordered\_ logistic\_ glm\_ lpmf }!{\tt (int[ ] y \textbar\ matrix x, vector beta, vector c): real}|hyperpage}
479+
480+ ` real ` ** ` ordered_logistic_glm_lpmf ` ** ` (int[] y | matrix x, vector beta, vector c) ` <br >\newline
481+ The log ordered logistic probability mass of y, given linear predictors
482+ ` x * beta ` , and cutpoints c.
483+ The number of rows of the independent variable matrix ` x ` needs to match the size of the dependent variable vector ` y ` .
484+ The number of columns of the independent variable row vector ` x ` needs to match the size of the coefficient vector ` beta ` .
485+ The cutpoints ` c ` must be ordered. If ` x ` and ` y ` are data (not parameters) this function can be executed on a GPU.
486+
341487
342488## Ordered Probit Distribution
343489
0 commit comments