You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+7Lines changed: 7 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -129,6 +129,13 @@ Finally to view the documentation
129
129
make preview_docs
130
130
```
131
131
132
+
- The docs are automatically generated from the docstrings in the `python/statsforecast` folder.
133
+
- To contribute, ensure your docstrings follow the Google style format.
134
+
- Once your docstring is correctly written, the documentation framework will scrape it and regenerate the corresponding `.mdx` files and your changes will then appear in the updated docs.
135
+
- Make an appropriate entry in the `docs/mintlify/mint.json` file.
136
+
- Run `make all_docs` to regenerate the documentation.
137
+
- Run `make preview_docs` to view and test the documentation locally.
138
+
132
139
## Start Coding
133
140
134
141
Open a jupyter notebook using `jupyter lab` (or VS Code).
Copy file name to clipboardExpand all lines: experiments/amazon_forecast/README.md
+29-28Lines changed: 29 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,62 +1,62 @@
1
1
## Amazon's AutoML vs open source statistical methods
2
-
>TL;DR: We paid USD $800 USD and spend 4 hours in the AWS Forecast console so you don't have to.
2
+
>
3
+
>TL;DR: We paid USD $800 USD and spend 4 hours in the AWS Forecast console so you don't have to.
3
4
4
-
In this reproducible experiment, we compare [Amazon Forecast](https://aws.amazon.com/forecast/) and [StatsForecast](https://github.com/Nixtla/statsforecast) a python open-source library. For this experiment, given the prominent use of AWS Forecast in demand forecasting, we used the 30,490 series of daily sales at Walmart from the [M5 competition](https://mofc.unic.ac.cy/m5-competition/). We conclude that, for this setting, Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
5
-
6
-
We also provide a step-by-step guide to [reproduce the results](https://nixtla.github.io/statsforecast/examples/aws/statsforecast.html).
5
+
In this reproducible experiment, we compare [Amazon Forecast](https://aws.amazon.com/forecast/) and [StatsForecast](https://github.com/Nixtla/statsforecast) a python open-source library. For this experiment, given the prominent use of AWS Forecast in demand forecasting, we used the 30,490 series of daily sales at Walmart from the [M5 competition](https://mofc.unic.ac.cy/m5-competition/). We conclude that, for this setting, Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
7
6
7
+
We also provide a step-by-step guide to [reproduce the results](https://nixtlaverse.nixtla.io/statsforecast/docs/experiments/AmazonStatsForecast)
8
8
9
9
### Amazon Forecast
10
-
Amazon Forecast is an AutoML time-series forecasting service.
10
+
11
+
Amazon Forecast is an AutoML time-series forecasting service.
11
12
12
13
> It uses machine learning (ML) to generate more accurate demand forecasts with just a few clicks, without requiring any prior ML experience. Amazon Forecast includes algorithms that are based on over twenty years of forecasting experience and developed expertise used by Amazon.com bringing the same technology used at Amazon to developers as a fully managed service, removing the need to manage resources. Amazon Forecast uses ML to learn not only the best algorithm for each item, but the best ensemble of algorithms for each item, automatically creating the best model for your data.
13
14
14
15
Amazon Forecast is one of the leading forecasting services out there. You can read more about its features and pricing tiers [here](https://aws.amazon.com/forecast/).
15
16
16
-
Amazon Forecast creates predictors with AutoPredictor, which involves applying the optimal combination of algorithms to each time series in your datasets. The predictor is an Amazon Forecast model that is trained using your target time series, related time series, item metadata, and any additional datasets you include.
17
+
Amazon Forecast creates predictors with AutoPredictor, which involves applying the optimal combination of algorithms to each time series in your datasets. The predictor is an Amazon Forecast model that is trained using your target time series, related time series, item metadata, and any additional datasets you include.
17
18
18
19
Included algorithms range from commonly used statistical algorithms like Autoregressive Integrated Moving Average (ARIMA), to complex neural network algorithms like CNN-QR and DeepAR+.: CNN-QR, DeepAR+, Prophet, NPTS, ARIMA, and ETS.
19
20
20
-
21
21
### StatsForecast
22
22
23
23
StatsForecast is an open-source python library from Nixtla. The library offers a collection of widely used univariate time series forecasting models, including automatic ARIMA, ETS, CES, and Theta modeling optimized for high performance using numba. It also includes a large battery of benchmarking models.
24
24
25
-
For this experiment, we used a `c5d.24xlarge` EC2 instance and trained two simple statistical models: `AutoETS`, and `DynamicOptimizedTheta`. Finally, the `AutoETS` and `DynamicOptimizedTheta` models were ensembled using the median.
26
-
25
+
For this experiment, we used a `c5d.24xlarge` EC2 instance and trained two simple statistical models: `AutoETS`, and `DynamicOptimizedTheta`. Finally, the `AutoETS` and `DynamicOptimizedTheta` models were ensembled using the median.
27
26
28
27
### Main Results
29
28
30
-
Amazon Forecast:
29
+
Amazon Forecast:
31
30
32
-
* achieved 1.617 in error (measured in wRMSSE, the official evaluation metric used in the competition),
31
+
* achieved 1.617 in error (measured in wRMSSE, the official evaluation metric used in the competition),
33
32
* took 4.1 hours to run,
34
-
* and cost 803.53 USD.
33
+
* and cost 803.53 USD.
35
34
36
35
Statsforecast with a simple ensemble of statistical methods trained on a `c5d.24xlarge` EC2 instance:
37
-
* achieved 0.669 in error (wRMSSE),
36
+
37
+
* achieved 0.669 in error (wRMSSE),
38
38
* took 14.5 minutes to run,
39
-
* and cost only 1.2 USD.
39
+
* and cost only 1.2 USD.
40
40
41
-
For this data set, we show therefore that:
41
+
For this data set, we show therefore that:
42
42
43
-
* Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
44
-
* Machine Learning methods are outperformed by classical methods in terms of speed, accuracy and cost.
43
+
* Amazon Forecast is 60% less accurate and 669 times more expensive than running an open-source alternative in a simple cloud server.
44
+
* Machine Learning methods are outperformed by classical methods in terms of speed, accuracy and cost.
45
45
46
46
Although using StatsForecast requires some basic knowledge of Python and cloud computing, the results are simply better for this dataset.
47
47
48
48
## Data
49
49
50
50
We provide open access to the data in the following URLs:
The train set contains `30,490` time series. The M5 competition is hierarchical. That is, forecasts are required for different levels of aggregation: national, state, store, etc.
58
+
59
+
The train set contains `30,490` time series. The M5 competition is hierarchical. That is, forecasts are required for different levels of aggregation: national, state, store, etc.
60
60
61
61
## Experiment
62
62
@@ -72,7 +72,6 @@ Where the `RMSSE` is defined by,
72
72
73
73
The `wRMSSE` is the official metric used in the M5 competition.
74
74
75
-
76
75
Detailed results per dataset are shown below.
77
76
78
77
## Results
@@ -89,15 +88,15 @@ The following table shows the performance across all levels of the hierarchy:
89
88
90
89
### Time
91
90
92
-
Excluding time spent in the console and just accounting for processing and computing time, Amazon Forecast took 4.1 hours to run. In comparison, StatsForecast took just 15 minutes to run. Running time accounts for the end-to-end pipeline including loading data, training and forecasting.
91
+
Excluding time spent in the console and just accounting for processing and computing time, Amazon Forecast took 4.1 hours to run. In comparison, StatsForecast took just 15 minutes to run. Running time accounts for the end-to-end pipeline including loading data, training and forecasting.
93
92
94
-
### Cost
93
+
### Cost
95
94
96
-
Amazon included a cost calculator that is quite accurate. The estimated cost was 803.53 USD.
95
+
Amazon included a cost calculator that is quite accurate. The estimated cost was 803.53 USD.
97
96
98
97
In comparison, we paid 1.2 USD of EC2 associated costs. (This could have been further reduced by using spot instances.)
99
98
100
-
Below, you can find the detailed results.
99
+
Below, you can find the detailed results.
101
100
102
101
**Speed and Cost on M5 Data set**
103
102
@@ -115,6 +114,7 @@ This conclusion might or not hold in other datasets, however, given the a priori
115
114
Although this experiment does not focus on comparing machine learning and deep learning vs statistical methods, it supports our [previous conclusions](https://github.com/Nixtla/statsforecast/tree/main/experiments/m3) on the current validity of simpler methods for many forecasting tasks.
116
115
117
116
## Unsolicited Advice
117
+
118
118
Choose your models wisely.
119
119
120
120
It would be extremely expensive and borderline irresponsible to favor AutoML in an organization before establishing solid baselines.
@@ -124,4 +124,5 @@ Simpler is sometimes better. Not everything that glows is gold.
124
124
Go and try other great open-source libraries like GluonTS, Darts and Sktime.
125
125
126
126
## Reproducibility
127
-
You can fully reproduce the experiment by following [this step-by-step notebook](https://nixtla.github.io/statsforecast/examples/aws/statsforecast.html).
127
+
128
+
You can fully reproduce the experiment by following [this step-by-step notebook](https://nixtlaverse.nixtla.io/statsforecast/docs/experiments/AmazonStatsForecast).
Copy file name to clipboardExpand all lines: experiments/ces/README.md
+6-7Lines changed: 6 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,11 @@ We are excited to release the only implementation for Python of the Complex Expo
5
5

6
6
7
7
The CES model has two main advantages over conventional exponential smoothing models:
8
+
8
9
* it can model and forecast both stationary and non-stationary processes and
9
10
* it can capture both level and trend cases
10
11
11
-
Our implementation, optimized using numba, was tested on the M4 dataset (100k time series), getting similar accuracy and computational time results than the original implementation in R.
12
+
Our implementation, optimized using numba, was tested on the M4 dataset (100k time series), getting similar accuracy and computational time results than the original implementation in R.
12
13
13
14
Additionally, with StatsForecast you can easly build ensembles of all statstical models. In this experiment, we show how the ensemble between ETS and CES gives the best results.
14
15
@@ -26,19 +27,17 @@ Additionally, with StatsForecast you can easly build ensembles of all statstical
26
27
27
28
## References
28
29
29
-
* Check the StatsForecast [documentation](https://nixtla.github.io/statsforecast/models.html#autoces) on the CES
30
+
* Check the StatsForecast [documentation](https://nixtlaverse.nixtla.io/statsforecast/models#class-autoces) on the CES
30
31
*[Link](https://forecasting.svetunkov.ru/wp-content/uploads/2022/07/Svetunkov-et-al.-2022-Complex-Exponential-Smoothing.pdf) to the original paper
31
32
*[Link](https://forecasting.svetunkov.ru/en/2022/08/02/the-long-and-winding-road-the-story-of-complex-exponential-smoothing/) to the story of the paper
32
33
33
-
34
-
35
34
## Reproducibility
36
35
37
36
To reproduce the main results you have:
38
37
39
-
1. Execute `make init`.
38
+
1. Execute `make init`.
40
39
2. Activate the environment using `conda activate ces`.
41
40
3. Run the experiments using `python -m src.ces --dataset M4 --group [group]` where `[model]` can be `[group]` can be `Hourly`, `Daily`, `Weekly`, `Monthly`, `Quarterly`, and `Yearly`.
42
41
4. Compute the ensemble model using `python -m src.ensemble --dataset M4 --group [group]` where `[model]` can be `[group]` can be `Hourly`, `Daily`, `Weekly`, `Monthly`, `Quarterly`, and `Yearly`.
43
-
4. To run R experiments you have to prepare the data using `python -m src.data --dataset M4 --group [group]` for each `[group]`. Once it is done, just run `make run_module module="Rscript src/ces_r.R [group]"`.
44
-
5. Finally you can evaluate the forecasts using `make run_module module="python -m src.evaluation"`.
42
+
5. To run R experiments you have to prepare the data using `python -m src.data --dataset M4 --group [group]` for each `[group]`. Once it is done, just run `make run_module module="Rscript src/ces_r.R [group]"`.
43
+
6. Finally you can evaluate the forecasts using `make run_module module="python -m src.evaluation"`.
0 commit comments