Skip to content

Commit 0cd5cc3

Browse files
author
andre_ramos
committed
model name change and add more examples
1 parent 836ed46 commit 0cd5cc3

File tree

15 files changed

+352
-128
lines changed

15 files changed

+352
-128
lines changed

README.md

Lines changed: 60 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -17,17 +17,15 @@ y = randn(100)
1717
output = StateSpaceLearning.fit_model(y)
1818

1919
#Main output options
20-
model_type = output.model_input # State Space Equivalent model utilized in the estimation (default = Basic Structural).
20+
model_input = output.model_input # Model inputs that were utilized to build the regression matrix.
21+
Create_X = output.Create_X # The function utilized to build the regression matrix.
2122
X = output.X # High Dimension Regression utilized in the estimation.
2223
coefs = output.coefs # High Dimension Regression coefficients estimated in the estimation.
2324
ϵ = output.ϵ # Residuals of the model.
2425
fitted = output.fitted # Fit in Sample of the model.
2526
components = output.components # Dictionary containing information about each component of the model, each component has the keys: "Values" (The value of the component in each timestamp) , "Coefs" (The coefficients estimated for each element of the component) and "Indexes" (The indexes of the elements of the component in the high dimension regression "X").
2627
residuals_variances = output.residuals_variances # Dictionary containing the estimated variances for the innovations components (that is the information that can be utilized to initialize the state space model).
27-
T = output.T # The length of the original time series.
28-
outlier = output.outlier # Boolean indicating the presence of outlier component (default = false).
2928
valid_indexes = output.valid_indexes # Vector containing valid indexes of the time series (non valid indexes represent NaN values in the time series).
30-
ζ_ω_threshold = output.ζ_ω_threshold # ζ_ω_threshold parameter (default = 0). A non 0 value for this parameter might be important in terms of forecast for some time series to lead to more stable predictions (we recommend ζ_ω_threshold = 11 for monthly series).
3129

3230
#Forecast
3331
prediction = StateSpaceLearning.forecast(output, 12) #Gets a 12 steps ahead prediction
@@ -37,11 +35,9 @@ prediction = StateSpaceLearning.forecast(output, 12) #Gets a 12 steps ahead pred
3735
## Fit Arguments
3836

3937
* `y::Vector{Fl}`: Vector of data.
40-
* `model_input::Dict`: Dictionary containing the model input parameters (default: Dict("level" => true, "stochastic_level" => true, "trend" => true, "stochastic_trend" => true, "seasonal" => true, "stochastic_seasonal" => true, "freq_seasonal" => 12)).
38+
* `model_input::Dict`: Dictionary containing the model input parameters (default: Dict("level" => true, "stochastic_level" => true, "trend" => true, "stochastic_trend" => true, "seasonal" => true, "stochastic_seasonal" => true, "freq_seasonal" => 12, "outlier" => true, "ζ_ω_threshold" => 12)).
4139
* `estimation_input::Dict`: Dictionary containing the estimation input parameters (default: Dict("α" => 0.1, "information_criteria" => "aic", ψ => 0.05, "penalize_exogenous" => true, "penalize_initial_states" => true)).
4240
* `Exogenous_X::Union{Matrix{Fl}, Missing}`: Exogenous variables matrix (default: missing).
43-
* `outlier::Bool`: Flag for considering outlier component (default: true).
44-
* `ζ_ω_threshold::Int64`: ζ_ω_threshold parameter (default: 12).
4541

4642
## Features
4743

@@ -51,9 +47,8 @@ Current features include:
5147
* Forecasting
5248
* Completion of missing values
5349
* Predefined models, including:
54-
* Basic Structural"
55-
* Local Linear Trend
56-
* Local Level
50+
* Outlier detection
51+
* Outlier robust models
5752

5853
## Quick Examples
5954

@@ -70,23 +65,49 @@ log_air_passengers = log.(airp.passengers)
7065
steps_ahead = 30
7166

7267
output = StateSpaceLearning.fit_model(log_air_passengers)
73-
prediction_raw = StateSpaceLearning.forecast(output, steps_ahead)
74-
prediction = exp.(prediction_raw)
68+
prediction_log = StateSpaceLearning.forecast(output, steps_ahead)
69+
prediction = exp.(prediction_log)
7570

7671
plot(airp.passengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
7772
plot!(vcat(ones(output.T).*NaN, prediction), lab = "Forcast", w=2, color = "blue")
7873

7974
```
8075
![quick_example_airp](./docs/assets/quick_example_airp.PNG)
8176

77+
### Component Extraction
78+
Quick example on how to perform component extraction in time series utilizing StateSpaceLearning.
79+
80+
```julia
81+
using CSV
82+
using DataFrames
83+
using Plots
84+
85+
airp = CSV.File(StateSpaceLearning.AIR_PASSENGERS) |> DataFrame
86+
log_air_passengers = log.(airp.passengers)
87+
88+
output = StateSpaceLearning.fit_model(log_air_passengers)
89+
90+
level = output.components["μ1"]["Values"] + output.components["ξ"]["Values"]
91+
slope = output.components["ν1"]["Values"] + output.components["ζ"]["Values"]
92+
seasonal = output.components["γ1"]["Values"] + output.components["ω"]["Values"]
93+
trend = level + slope
94+
95+
plot(trend, w=2 , color = "Black", lab = "Trend Component", legend = :outerbottom)
96+
plot(seasonal, w=2 , color = "Black", lab = "Seasonal Component", legend = :outerbottom)
97+
98+
```
99+
100+
| ![quick_example_trend](./docs/assets/trend.svg) | ![quick_example_seas](./docs/assets/seasonal.svg)|
101+
|:------------------------------:|:-----------------------------:|
102+
103+
82104
### Best Subset Selection
83105
Quick example on how to perform best subset selection in time series utilizing StateSpaceLearning.
84106

85107
```julia
86108
using StateSpaceLearning
87109
using CSV
88110
using DataFrames
89-
using Plots
90111
using Random
91112

92113
Random.seed!(2024)
@@ -98,18 +119,15 @@ X = rand(length(log_air_passengers), 10) # Create 10 exogenous features
98119

99120
y = log_air_passengers + X[:, 1:3]*β # add to the log_air_passengers series a contribution from only 3 exogenous features.
100121

101-
plot(y)
102-
103-
output = StateSpaceLearning.fit_model(y; Exogenous_X = X, estimation_input = Dict("α" => 1.0, "information_criteria" => "bic", "ϵ" => 0.05,
104-
"penalize_exogenous" => true, "penalize_initial_states" => true))
122+
output = StateSpaceLearning.fit_model(y; Exogenous_X = X, estimation_input = Dict("α" => 1.0, "information_criteria" => "bic", "ϵ" => 0.05, "penalize_exogenous" => true, "penalize_initial_states" => true))
105123

106124
Selected_exogenous = output.components["Exogenous_X"]["Selected"]
107125

108126
```
109127

110128
In this example, the selected exogenous features were 1, 2, 3, as expected.
111129

112-
### Completion of missing values
130+
### Missing values imputation
113131
Quick example of completion of missing values for the air passengers time-series (artificial NaN values are added to the original time-series).
114132

115133
```julia
@@ -135,6 +153,30 @@ plot!(fitted_completed_missing_values, lab = "Fit in Sample completed values", w
135153
```
136154
![quick_example_completion_airp](./docs/assets/quick_example_completion_airp.PNG)
137155

156+
### Outlier Detection
157+
Quick example of outlier detection for an altered air passengers time-series (artificial NaN values are added to the original time-series).
158+
159+
```julia
160+
using CSV
161+
using DataFrames
162+
using Plots
163+
164+
airp = CSV.File(StateSpaceLearning.AIR_PASSENGERS) |> DataFrame
165+
log_air_passengers = log.(airp.passengers)
166+
167+
log_air_passengers[60] = 10
168+
log_air_passengers[30] = 1
169+
log_air_passengers[100] = 2
170+
171+
output = StateSpaceLearning.fit_model(log_air_passengers)
172+
detected_outliers = findall(i -> i != 0, output.components["o"]["Coefs"])
173+
174+
plot(log_air_passengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
175+
scatter!([detected_outliers], log_air_passengers[detected_outliers], lab = "Detected Outliers")
176+
177+
```
178+
![quick_example_completion_airp](./docs/assets/outlier.svg)
179+
138180
### StateSpaceModels initialization
139181
Quick example on how to use StateSpaceLearning to initialize StateSpaceModels
140182

@@ -168,7 +210,6 @@ To reproduce M4 paper results you can clone the repository and run the following
168210
```shell
169211
julia paper_tests/m4_test/m4_test.jl
170212
python paper_tests/m4_test/m4_test.py
171-
1
172213
```
173214

174215
The results for SSL model in terms of MASE and sMAPE for all 48000 series will be stored in folder "paper_tests/m4_test/results_SSL". The average results of MASE, sMAPE and OWA will be saved in file "paper_tests/m4_test/metric_results/SSL_METRICS_RESULTS.csv".

docs/assets/outlier.svg

Lines changed: 49 additions & 0 deletions
Loading

docs/assets/seasonal.svg

Lines changed: 45 additions & 0 deletions
Loading

docs/assets/trend.svg

Lines changed: 45 additions & 0 deletions
Loading

docs/src/adapting_package.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,11 +14,12 @@ model_input = Dict()
1414
```
1515

1616
### create_X
17-
The create_X function constructs the matrices in the State Space Learning format. It must accept the following inputs: (model_input::Dict, Exogenous_X::Matrix{Fl}, outlier::Bool, ζ_ω_threshold::Int64, T::Int64, steps_ahead::Int64=0, Exogenous_Forecast::Matrix{Fl}=zeros(steps_ahead, size(Exogenous_X, 2))). This function may not use parameters such as outlier, ζ_ω_threshold, or Exogenous_X. It must return a matrix.
17+
The create_X function constructs the matrices in the State Space Learning format. It must accept the following inputs: (model_input::Dict, Exogenous_X::Matrix{Fl}, steps_ahead::Int64=0, Exogenous_Forecast::Matrix{Fl}). It must return a matrix.
1818

1919
```julia
20-
function create_X_LocalLevel(model_input::Dict, Exogenous_X::Matrix{Fl}, outlier::Bool, ζ_ω_threshold::Int64, T::Int64,
20+
function create_X_LocalLevel(model_input::Dict, Exogenous_X::Matrix{Fl},
2121
steps_ahead::Int64=0, Exogenous_Forecast::Matrix{Fl}=zeros(steps_ahead, size(Exogenous_X, 2))) where Fl
22+
T = size(Exogenous_X, 1)
2223
initial_states_matrix = ones(T+steps_ahead, 1)
2324
ξ_matrix = Matrix{Float64}(undef, T+steps_ahead, T - 1)
2425
for t in 1:T+steps_ahead
@@ -30,10 +31,11 @@ end
3031
```
3132

3233
### get_components_indexes
33-
The get_components_indexes function outputs a dictionary indicating the indexes of each model component, including a set of indexes for all initial states. For the Local Level Model, the only components are the initial state μ1 and its innovations ξ. The function must accept the following inputs: (T::Int64, Exogenous_X::Matrix{Fl}, model_input::Dict, outlier::Bool, ζ_ω_threshold::Int64). This function may not use parameters such as outlier, ζ_ω_threshold, or Exogenous_X. It must return a dictionary.
34+
The get_components_indexes function outputs a dictionary indicating the indexes of each model component, including a set of indexes for all initial states. For the Local Level Model, the only components are the initial state μ1 and its innovations ξ. The function must accept the following inputs: (Exogenous_X::Matrix{Fl}, model_input::Dict). It must return a dictionary.
3435

3536
```julia
36-
function get_components_indexes_LocalLevel(T::Int64, Exogenous_X::Matrix{Fl}, model_input::Dict, outlier::Bool, ζ_ω_threshold::Int64)::Dict where Fl
37+
function get_components_indexes_LocalLevel(Exogenous_X::Matrix{Fl}, model_input::Dict)::Dict where Fl
38+
T = size(Exogenous_X, 1)
3739
μ1_indexes = [1]
3840
initial_states_indexes = [1]
3941
ξ_indexes = collect(2:T)

docs/src/manual.md

Lines changed: 60 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,6 @@
44
|:-----------------:|:-----------------:|:-----------------:|
55
| [![ci](https://github.com/LAMPSPUC/StateSpaceLearning.jl/actions/workflows/ci.yml/badge.svg)](https://github.com/LAMPSPUC/StateSpaceLearning.jl/actions/workflows/ci.yml) | [![codecov](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning.jl/graph/badge.svg?token=VDpuXvPSI2)](https://codecov.io/gh/LAMPSPUC/StateSpaceLearning.jl) | [![](https://img.shields.io/badge/docs-latest-blue.svg)](https://lampspuc.github.io/StateSpaceLearning.jl/latest/)
66

7-
87
StateSpaceLearning.jl is a package for modeling and forecasting time series in a high-dimension regression framework.
98

109
## Quickstart
@@ -18,17 +17,15 @@ y = randn(100)
1817
output = StateSpaceLearning.fit_model(y)
1918

2019
#Main output options
21-
model_type = output.model_input # State Space Equivalent model utilized in the estimation (default = Basic Structural).
20+
model_input = output.model_input # Model inputs that were utilized to build the regression matrix.
21+
Create_X = output.Create_X # The function utilized to build the regression matrix.
2222
X = output.X # High Dimension Regression utilized in the estimation.
2323
coefs = output.coefs # High Dimension Regression coefficients estimated in the estimation.
2424
ϵ = output.ϵ # Residuals of the model.
2525
fitted = output.fitted # Fit in Sample of the model.
2626
components = output.components # Dictionary containing information about each component of the model, each component has the keys: "Values" (The value of the component in each timestamp) , "Coefs" (The coefficients estimated for each element of the component) and "Indexes" (The indexes of the elements of the component in the high dimension regression "X").
2727
residuals_variances = output.residuals_variances # Dictionary containing the estimated variances for the innovations components (that is the information that can be utilized to initialize the state space model).
28-
T = output.T # The length of the original time series.
29-
outlier = output.outlier # Boolean indicating the presence of outlier component (default = false).
3028
valid_indexes = output.valid_indexes # Vector containing valid indexes of the time series (non valid indexes represent NaN values in the time series).
31-
ζ_ω_threshold = output.ζ_ω_threshold # ζ_ω_threshold parameter (default = 0). A non 0 value for this parameter might be important in terms of forecast for some time series to lead to more stable predictions (we recommend ζ_ω_threshold = 11 for monthly series).
3229

3330
#Forecast
3431
prediction = StateSpaceLearning.forecast(output, 12) #Gets a 12 steps ahead prediction
@@ -38,11 +35,9 @@ prediction = StateSpaceLearning.forecast(output, 12) #Gets a 12 steps ahead pred
3835
## Fit Arguments
3936

4037
* `y::Vector{Fl}`: Vector of data.
41-
* `model_input::Dict`: Dictionary containing the model input parameters (default: Dict("level" => true, "stochastic_level" => true, "trend" => true, "stochastic_trend" => true, "seasonal" => true, "stochastic_seasonal" => true, "freq_seasonal" => 12)).
38+
* `model_input::Dict`: Dictionary containing the model input parameters (default: Dict("level" => true, "stochastic_level" => true, "trend" => true, "stochastic_trend" => true, "seasonal" => true, "stochastic_seasonal" => true, "freq_seasonal" => 12, "outlier" => true, "ζ_ω_threshold" => 12)).
4239
* `estimation_input::Dict`: Dictionary containing the estimation input parameters (default: Dict("α" => 0.1, "information_criteria" => "aic", ψ => 0.05, "penalize_exogenous" => true, "penalize_initial_states" => true)).
4340
* `Exogenous_X::Union{Matrix{Fl}, Missing}`: Exogenous variables matrix (default: missing).
44-
* `outlier::Bool`: Flag for considering outlier component (default: true).
45-
* `ζ_ω_threshold::Int64`: ζ_ω_threshold parameter (default: 12).
4641

4742
## Features
4843

@@ -52,9 +47,8 @@ Current features include:
5247
* Forecasting
5348
* Completion of missing values
5449
* Predefined models, including:
55-
* Basic Structural"
56-
* Local Linear Trend
57-
* Local Level
50+
* Outlier detection
51+
* Outlier robust models
5852

5953
## Quick Examples
6054

@@ -71,23 +65,49 @@ log_air_passengers = log.(airp.passengers)
7165
steps_ahead = 30
7266

7367
output = StateSpaceLearning.fit_model(log_air_passengers)
74-
prediction_raw = StateSpaceLearning.forecast(output, steps_ahead)
75-
prediction = exp.(prediction_raw)
68+
prediction_log = StateSpaceLearning.forecast(output, steps_ahead)
69+
prediction = exp.(prediction_log)
7670

7771
plot(airp.passengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
7872
plot!(vcat(ones(output.T).*NaN, prediction), lab = "Forcast", w=2, color = "blue")
7973

8074
```
8175
![quick_example_airp](./docs/assets/quick_example_airp.PNG)
8276

77+
### Component Extraction
78+
Quick example on how to perform component extraction in time series utilizing StateSpaceLearning.
79+
80+
```julia
81+
using CSV
82+
using DataFrames
83+
using Plots
84+
85+
airp = CSV.File(StateSpaceLearning.AIR_PASSENGERS) |> DataFrame
86+
log_air_passengers = log.(airp.passengers)
87+
88+
output = StateSpaceLearning.fit_model(log_air_passengers)
89+
90+
level = output.components["μ1"]["Values"] + output.components["ξ"]["Values"]
91+
slope = output.components["ν1"]["Values"] + output.components["ζ"]["Values"]
92+
seasonal = output.components["γ1"]["Values"] + output.components["ω"]["Values"]
93+
trend = level + slope
94+
95+
plot(trend, w=2 , color = "Black", lab = "Trend Component", legend = :outerbottom)
96+
plot(seasonal, w=2 , color = "Black", lab = "Seasonal Component", legend = :outerbottom)
97+
98+
```
99+
100+
| ![quick_example_trend](./docs/assets/trend.svg) | ![quick_example_seas](./docs/assets/seasonal.svg)|
101+
|:------------------------------:|:-----------------------------:|
102+
103+
83104
### Best Subset Selection
84105
Quick example on how to perform best subset selection in time series utilizing StateSpaceLearning.
85106

86107
```julia
87108
using StateSpaceLearning
88109
using CSV
89110
using DataFrames
90-
using Plots
91111
using Random
92112

93113
Random.seed!(2024)
@@ -99,18 +119,15 @@ X = rand(length(log_air_passengers), 10) # Create 10 exogenous features
99119

100120
y = log_air_passengers + X[:, 1:3]*β # add to the log_air_passengers series a contribution from only 3 exogenous features.
101121

102-
plot(y)
103-
104-
output = StateSpaceLearning.fit_model(y; Exogenous_X = X, estimation_input = Dict("α" => 1.0, "information_criteria" => "bic", "ϵ" => 0.05,
105-
"penalize_exogenous" => true, "penalize_initial_states" => true))
122+
output = StateSpaceLearning.fit_model(y; Exogenous_X = X, estimation_input = Dict("α" => 1.0, "information_criteria" => "bic", "ϵ" => 0.05, "penalize_exogenous" => true, "penalize_initial_states" => true))
106123

107124
Selected_exogenous = output.components["Exogenous_X"]["Selected"]
108125

109126
```
110127

111128
In this example, the selected exogenous features were 1, 2, 3, as expected.
112129

113-
### Completion of missing values
130+
### Missing values imputation
114131
Quick example of completion of missing values for the air passengers time-series (artificial NaN values are added to the original time-series).
115132

116133
```julia
@@ -136,6 +153,30 @@ plot!(fitted_completed_missing_values, lab = "Fit in Sample completed values", w
136153
```
137154
![quick_example_completion_airp](./docs/assets/quick_example_completion_airp.PNG)
138155

156+
### Outlier Detection
157+
Quick example of outlier detection for an altered air passengers time-series (artificial NaN values are added to the original time-series).
158+
159+
```julia
160+
using CSV
161+
using DataFrames
162+
using Plots
163+
164+
airp = CSV.File(StateSpaceLearning.AIR_PASSENGERS) |> DataFrame
165+
log_air_passengers = log.(airp.passengers)
166+
167+
log_air_passengers[60] = 10
168+
log_air_passengers[30] = 1
169+
log_air_passengers[100] = 2
170+
171+
output = StateSpaceLearning.fit_model(log_air_passengers)
172+
detected_outliers = findall(i -> i != 0, output.components["o"]["Coefs"])
173+
174+
plot(log_air_passengers, w=2 , color = "Black", lab = "Historical", legend = :outerbottom)
175+
scatter!([detected_outliers], log_air_passengers[detected_outliers], lab = "Detected Outliers")
176+
177+
```
178+
![quick_example_completion_airp](./docs/assets/outlier.svg)
179+
139180
### StateSpaceModels initialization
140181
Quick example on how to use StateSpaceLearning to initialize StateSpaceModels
141182

@@ -169,7 +210,6 @@ To reproduce M4 paper results you can clone the repository and run the following
169210
```shell
170211
julia paper_tests/m4_test/m4_test.jl
171212
python paper_tests/m4_test/m4_test.py
172-
1
173213
```
174214

175215
The results for SSL model in terms of MASE and sMAPE for all 48000 series will be stored in folder "paper_tests/m4_test/results_SSL". The average results of MASE, sMAPE and OWA will be saved in file "paper_tests/m4_test/metric_results/SSL_METRICS_RESULTS.csv".

0 commit comments

Comments
 (0)