Skip to content

Commit d98990d

Browse files
author
Alexander März
committed
Added docs
1 parent 14d4729 commit d98990d

21 files changed

+2523
-7675
lines changed

.github/workflows/mkdocs.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
name: mkdocs
2+
on:
3+
push:
4+
branches:
5+
- master
6+
pull_request:
7+
workflow_dispatch:
8+
jobs:
9+
deploy:
10+
runs-on: ubuntu-latest
11+
steps:
12+
- uses: actions/checkout@v2
13+
- uses: actions/setup-python@v2
14+
with:
15+
python-version: 3.x
16+
- run: pip install mkdocs-material mkdocstrings[python] mkdocs-jupyter
17+
- run: mkdocs gh-deploy --force

docs/LightGBMLSS.png

272 KB
Loading

docs/api.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
# API references
2+
3+
::: lightgbmlss

docs/dgbm.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Introduction
2+
3+
The development of modelling approaches that approximate and describe the data generating processes underlying the observed data in as much detail as possible is a guiding principle in both statistics and machine learning. We therefore strongly agree with the statement of Hothorn et al. (2014) that **''the ultimate goal of any regression analysis is to obtain information about the entire conditional distribution $F_{Y}(y|\mathbf{x})$ of a response given a set of explanatory variables''**.
4+
5+
<blockquote style="font-size: 12px">
6+
<em>''Practitioners expect forecasting to reduce future uncertainty by providing accurate predictions like those in hard sciences. However, this is a great misconception. A major purpose of forecasting is not to reduce uncertainty but reveal its full extent and implications by estimating it as precisely as possible. [...] The challenge for the forecasting field is how to persuade practitioners of the reality that all forecasts are uncertain and that this uncertainty cannot be ignored, as doing so could lead to catastrophic consequences.''</em> Makridakis (2022b)
7+
</blockquote>
8+
9+
It has not been too long, though, that most regression models focused on estimating the conditional mean $\mathbb{E}(Y|\mathbf{X} = \mathbf{x})$ only, implicitly treating higher moments of the conditional distribution $F_{Y}(y|\mathbf{x})$ as fixed nuisance parameters. As such, models that minimize an $\ell_{2}$-type loss for the conditional mean are not able to fully exploit the information contained in the data, since this is equivalent to assuming a Normal distribution with constant variance. In real world situations, however, the data generating process is usually less well behaved, exhibiting characteristics such as heteroskedasticity, varying degrees of skewness and kurtosis or intermittent and sporadic behaviour. In recent years, however, there has been a clear shift in both academic and corporate research toward modelling the entire conditional distribution. This change in attention is most evident in the recent M5 forecasting competition (Makridakis et al., 2022a,b), which differed from previous ones in that it consisted of two parallel competitions: in addition to providing accurate point forecasts, participants were also asked to forecast nine different quantiles to approximate the distribution of future sales.
10+
11+
# Distributional Gradient Boosting Machines
12+
13+
This section introduces the general idea of distributional modelling. For a more thorough introduction, we refer the interested reader to Rigby and Stasinopoulos (2005); Klein et al. (2015); Stasinopoulos et al. (2017).
14+
15+
## GAMLSS
16+
17+
Probabilistic forecasts are predictions in the form of a probability distribution, rather than a single point estimate only. In this context, the introduction of Generalized Additive Models for Location Scale and Shape (GAMLSS) by Rigby and Stasinopoulos (2005) has stimulated a lot of research and culminated in a new research branch that focuses on modelling the entire conditional distribution in dependence of covariates.
18+
19+
### Univariate Targets
20+
21+
In its original formulation, GAMLSS assume a univariate response to follow a distribution $\mathcal{D}$ that depends on up to four parameters, i.e., $y_{i} \stackrel{ind}{\sim} \mathcal{D}(\mu_{i}, \sigma^{2}_{i}, \nu_{i}, \tau_{i}), i=1,\ldots,n$, where $\mu_{i}$ and $\sigma^{2}_{i}$ are often location and scale parameters, respectively, while $\nu_{i}$ and $\tau_{i}$ correspond to shape parameters such as skewness and kurtosis. Hence, the framework allows to model not only the mean (or location) but all parameters as functions of explanatory variables. It is important to note that distributional modelling implies that observations are independent, but not necessarily identical realizations $y \stackrel{ind}{\sim} \mathcal{D}\big(\mathbf{\theta}(\mathbf{x})\big)$, since all distributional parameters $\mathbf{\theta}(\mathbf{x})$ are related to and allowed to change with covariates. In contrast to Generalized Linear (GLM) and Generalized Additive Models (GAM), the assumption of the response distribution belonging to an exponential family is relaxed in GAMLSS and replaced by a more general class of distributions, including highly skewed and/or kurtotic continuous, discrete and mixed discrete, as well as zero-inflated distributions. While the original formulation of GAMLSS in Rigby and Stasinopoulos (2005) suggests that any distribution can be described by location, scale and shape parameters, it is not necessarily true that the observed data distribution can actually be characterized by all of these parameters. Hence, we follow Klein et al. (2015) and use the term distributional modelling and GAMLSS interchangeably.
22+
23+
From a frequentist point of view, distributional modelling can be formulated as follows
24+
25+
\begin{equation}
26+
y_{i} \stackrel{ind}{\sim} \mathcal{D}
27+
\begin{pmatrix}
28+
h_{1}\bigl(\theta_{i1}(x_{i})\bigr) = \eta_{i1} \\
29+
h_{2}\bigl(\theta_{i2}(x_{i})\bigr) = \eta_{i2} \\
30+
\vdots \\
31+
h_{K}\bigl(\theta_{iK}(x_{i})\bigr) = \eta_{iK}
32+
\end{pmatrix}
33+
\end{equation}
34+
35+
for $i = 1, \ldots, N$, where $\mathcal{D}$ denotes a parametric distribution for the response $\textbf{y} = (y_{1}, \ldots, y_{N})^{\prime}$ that depends on $K$ distributional parameters $\theta_{k}$, $k = 1, \ldots, K$, and with $h_{k}(\cdot)$ denoting a known function relating distributional parameters to predictors $\eta_{k}$. In its most generic form, the predictor $\eta_{k}$ is given by
36+
37+
\begin{equation}
38+
\eta_{k} = f_{k}(\mathbf{x}), \qquad k = 1, \ldots, K
39+
\end{equation}
40+
41+
Within the original distributional regression framework, the functions $f_{k}(\cdot)$ usually represent a combination of linear and GAM-type predictors, which allows to estimate linear effects or categorical variables, as well as highly non-linear and spatial effects using a Spline-based basis function approach. The predictor specification $\eta_{k}$ is generic enough to use tree-based models as well, which allows us to extend LightGBM to a probabilistic framework.
42+
43+
## Normalizing Flows
44+
45+
Although the GAMLSS framework offers considerable flexibility, parametric distributions may prove not flexible enough to provide a reasonable approximation for certain dataset, e.g., for multi-modal distributions. For such cases, it is preferable to relax the assumption of a parametric distribution and approximate the data non-parametrically. While there are several alternatives for learning conditional distributions, we propose to use Normalizing Flows for their ability to fit complex distributions with only a few parameters.
46+
47+
The principle that underlies Normalizing Flows is to turn a simple base distribution, e.g., $F_{Z}(\mathbf{z}) = N(0,1)$, into a more complex and realistic distribution of the target variable $F_{Y}(\mathbf{y})$ by applying several bijective transformations $h_{j}$, $j = 1, \ldots, J$ to the variable of the base distribution
48+
49+
\begin{equation}
50+
\mathbf{y} = h_{J} \circ h_{J-1} \circ \cdots \circ h_{1}(\mathbf{z})
51+
\end{equation}
52+
53+
Based on the complete transformation function $h=h_{J}\circ\ldots\circ h_{1}$, the density of $\mathbf{y}$ is then given by the change of variables theorem
54+
55+
\begin{equation}
56+
f_{Y}(\mathbf{y}) = f_{Z}\big(h(\mathbf{y})\big) \cdot \Bigg|\frac{\partial h(\mathbf{y})}{\partial \mathbf{y}}\Bigg| \end{equation}
57+
58+
where scaling with the Jacobian determinant $|h^{\prime}(\mathbf{y})| = |\partial h(\mathbf{y}) / \partial \mathbf{y}|$ ensures $f_{Y}(\mathbf{y})$ to be a proper density integrating to one. The composition of these transformations is invertible, allowing one to sample from the complex distribution by transforming samples from the base distribution.
59+
60+
<center>
61+
<img src="https://tikz.net/janosh/normalizing-flow.png" width=400 height=120/>
62+
</center>
63+
64+
<span style="text-align: right;">
65+
<h6 style="font-size: 6px;">Image source: https://tikz.net/janosh/normalizing-flow.png</h6>
66+
</span>
67+
68+
Our Normalizing Flow approach is based on element-wise rational splines of linear or quadratic order as introduced by Durkan (2019) and Dolatabadi (2020) and implemented in Pyro, since they offer a combination of functional flexibility and numerical stability. Despite this specific choice, our framework is generic enough to accommodate the use of other parametrizable Normalizing Flows.
69+
70+
## Gradient Boosting Machines for Location, Scale and Shape
71+
72+
We draw inspiration from GAMLSS and label our model as LightGBM for Location, Scale and Shape (LightGBMLSS). Despite its nominal reference to GAMLSS, our framework is designed in such a way to accommodate the modeling of a wide range of parametrizable distributions that go beyond location, scale and shape. LightGBMLSS requires the specification of a suitable distribution from which Gradients and Hessians are derived. These represent the partial first and second order derivatives of the log-likelihood with respect to the parameter of interest. GBMLSS are based on multi-parameter optimization, where a separate tree is grown for each parameter. Estimation of Gradients and Hessians, as well as the evaluation of the loss function is done simultaneously for all parameters. Gradients and Hessians are derived using PyTorch's automatic differentiation capabilities. The flexibility offered by automatic differentiation allows users to easily implement novel or customized parametric distributions for which Gradients and Hessians are difficult to derive analytically. It also facilitates the usage of Normalizing Flows, or to add additional constraints to the loss function. To improve the convergence and stability of GBMLSS estimation, unconditional Maximum Likelihood estimates of the parameters are used as offset values. To enable a deeper understanding of the data generating process, GBMLSS also provide attribute importance and partial dependence plots using the Shapley-Value approach.
73+
74+
# References
75+
76+
- Hadi Mohaghegh Dolatabadi, Sarah Erfani, and Christopher Leckie. Invertible Generative Modeling using Linear Rational Splines. In The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), 4236–4246, 2020.
77+
- Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios. Neural Spline Flows. In Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019.
78+
- Nadja Klein, Thomas Kneib, and Stefan Lang. Bayesian Generalized Additive Models for Location, Scale, and Shape for Zero-Inflated and Overdispersed Count Data. Journal of the American Statistical Association, 110(509):405–419, 2015.
79+
- Alexander März, and Thomas Kneib. Distributional Gradient Boosting Machines, 2022b.
80+
- Alexander März. XGBoostLSS - An extension of XGBoost to probabilistic forecasting, 2019.
81+
- Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. The M5 competition: Background, organization, and implementation. International Journal of Forecasting, 38(4):1325–1336, 2022a.
82+
- Spyros Makridakis, Evangelos Spiliotis, Vassilios Assimakopoulos, Zhi Chen, Anil Gaba, Ilia Tsetlin, and Robert L. Winkler. The M5 uncertainty competition: Results, findings and conclusions. International Journal of Forecasting, 38(4):1365–1385, 2022b.
83+
- R. A. Rigby and D. M. Stasinopoulos. Generalized additive models for location, scale and shape. Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(3):507–554, 2005.
84+
- Mikis D. Stasinopoulos, Robert A. Rigby, Gillian Z. Heller, Vlasios Voudouris, and Fernanda de Bastiani. Flexible Regression and Smoothing: Using GAMLSS in R. Chapman & Hall / CRC The R Series. CRC Press, London, 2017.

docs/distributions.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Available Distributions
2+
LightGBMLSS is built upon PyTorch and Pyro, enabling users to harness a diverse set of distributional families and to leverage automatic differentiation capabilities. This greatly expands the options for probabilistic modeling and uncertainty estimation and allows users to tackle complex regression tasks.
3+
4+
LightGBMLSS currently supports the following univariate distributions.
5+
6+
| Distribution | Usage |Type | Support | Number of Parameters |
7+
| :----------------------------------------------------------------------------------------------------------------------------------: |:------------------------: |:-------------------------------------: | :-----------------------------: | :-----------------------------: |
8+
| [Beta](https://pytorch.org/docs/stable/distributions.html#beta) | `Beta()` | Continuous <br /> (Univariate) | $y \in (0, 1)$ | 2 |
9+
| [Cauchy](https://pytorch.org/docs/stable/distributions.html#cauchy) | `Cauchy()` | Continuous <br /> (Univariate) | $y \in (-\infty,\infty)$ | 2 |
10+
| [Expectile](https://epub.ub.uni-muenchen.de/31542/1/1471082x14561155.pdf) | `Expectile()` | Continuous <br /> (Univariate) | $y \in (-\infty,\infty)$ | Number of expectiles |
11+
| [Gamma](https://pytorch.org/docs/stable/distributions.html#gamma) | `Gamma()` | Continuous <br /> (Univariate) | $y \in (0, \infty)$ | 2 |
12+
| [Gaussian](https://pytorch.org/docs/stable/distributions.html#normal) | `Gaussian()` | Continuous <br /> (Univariate) | $y \in (-\infty,\infty)$ | 2 |
13+
| [Gumbel](https://pytorch.org/docs/stable/distributions.html#gumbel) | `Gumbel()` | Continuous <br /> (Univariate) | $y \in (-\infty,\infty)$ | 2 |
14+
| [Laplace](https://pytorch.org/docs/stable/distributions.html#laplace) | `Laplace()` | Continuous <br /> (Univariate) | $y \in (-\infty,\infty)$ | 2 |
15+
| [LogNormal](https://pytorch.org/docs/stable/distributions.html#lognormal) | `LogNormal()` | Continuous <br /> (Univariate) | $y \in (0,\infty)$ | 2 |
16+
| [Negative Binomial](https://pytorch.org/docs/stable/distributions.html#negativebinomial) | `NegativeBinomial()` | Discrete Count <br /> (Univariate) | $y \in (0, 1, 2, 3, \ldots)$ | 2 |
17+
| [Poisson](https://pytorch.org/docs/stable/distributions.html#poisson) | `Poisson()` | Discrete Count <br /> (Univariate) | $y \in (0, 1, 2, 3, \ldots)$ | 1 |
18+
| [Spline Flow](https://docs.pyro.ai/en/stable/distributions.html#pyro.distributions.transforms.Spline) | `SplineFlow()` | Continuous \& Discrete Count <br /> (Univariate) | $y \in (-\infty,\infty)$ <br /> <br /> $y \in [0, \infty)$ <br /> <br /> $y \in [0, 1]$ <br /> <br /> $y \in (0, 1, 2, 3, \ldots)$ | 2xcount_bins + (count_bins-1) (order=quadratic) <br /> <br /> 3xcount_bins + (count_bins-1) (order=linear) |
19+
| [Student-T](https://pytorch.org/docs/stable/distributions.html#studentt) | `StudentT()` | Continuous <br /> (Univariate) | $y \in (-\infty,\infty)$ | 3 |
20+
| [Weibull](https://pytorch.org/docs/stable/distributions.html#weibull) | `Weibull()` | Continuous <br /> (Univariate) | $y \in [0, \infty)$ | 2 |
21+
| [Zero-Adjusted Beta](https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py) | `ZABeta()` | Discrete-Continuous <br /> (Univariate) | $y \in [0, 1)$ | 3 |
22+
| [Zero-Adjusted Gamma](https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py) | `ZAGamma()` | Discrete-Continuous <br /> (Univariate) | $y \in [0, \infty)$ | 3 |
23+
| [Zero-Adjusted LogNormal](https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py) | `ZALN()` | Discrete-Continuous <br /> (Univariate) | $y \in [0, \infty)$ | 3 |
24+
| [Zero-Inflated Negative Binomial](https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L150) | `ZINB()` | Discrete-Count <br /> (Univariate) | $y \in [0, 1, 2, 3, \ldots)$ | 3 |
25+
| [Zero-Inflated Poisson](https://github.com/pyro-ppl/pyro/blob/dev/pyro/distributions/zero_inflated.py#L121) | `ZIPoisson()` | Discrete-Count <br /> (Univariate) | $y \in [0, 1, 2, 3, \ldots)$ | 2 |

0 commit comments

Comments
 (0)