MSTL trains fast, but inference is slow #1014

benhorvath · 2025-05-19T17:29:08Z

benhorvath
May 19, 2025

I have a few tens-of-thousands of time series that I used statsforecast to train MSTL. I did this on a fairly large instance, with n_jobs=72, using Sagemaker to be consistent with the rest of my pipeline. The training takes about 20 minutes, and is great.

However, to my surprise, the inference is far slower. I must be missing some functionality. Here's how I am re-loading the fit model and try to get forecast:

model = StatsForecast.load(path=f'{model_dir}/sf.pkl')
# ...
forecast_df = model.predict(h=h)  # have also tried forecast

I have tried to create a future df and get forecasts for only some unique_ids at a time, but it seems like the model expects ALL the IDs it was trained on.

Any suggestions?

(An an aside, I have noticed the uncompressed MSTL model is close to 6 GB on disc.)

Answered by benhorvath

May 19, 2025

Nevermind -- had a couple of issues that I fixed. First, I set Sagemaker to no parallelism whatsoever. Loading the model multiple times was eating up too many memory. And second, Sagemaker was complaining that my input data was >100mb. Since I don't really need my "training data" to get a forecast, I "tricked" Sagemaker -- I told it to look at a small, completely irrelevant file, and never loaded it in my transform_fn().

Once I fixed these, this worked, producing all the forecasts very quickly:

model = StatsForecast.load(path=f'{model_dir}/sf.pkl')
model.n_jobs = 1
forecast_df = model.predict(h=h)

A cleaner way would probably be to use Sagemaker Processing job, and maybe I'll switch over…

View full answer

benhorvath · 2025-05-19T20:57:51Z

benhorvath
May 19, 2025
Author

Nevermind -- had a couple of issues that I fixed. First, I set Sagemaker to no parallelism whatsoever. Loading the model multiple times was eating up too many memory. And second, Sagemaker was complaining that my input data was >100mb. Since I don't really need my "training data" to get a forecast, I "tricked" Sagemaker -- I told it to look at a small, completely irrelevant file, and never loaded it in my transform_fn().

Once I fixed these, this worked, producing all the forecasts very quickly:

model = StatsForecast.load(path=f'{model_dir}/sf.pkl')
model.n_jobs = 1
forecast_df = model.predict(h=h)

A cleaner way would probably be to use Sagemaker Processing job, and maybe I'll switch over to that eventually.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MSTL trains fast, but inference is slow #1014

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

MSTL trains fast, but inference is slow #1014

Uh oh!

Uh oh!

benhorvath May 19, 2025

Replies: 1 comment

Uh oh!

benhorvath May 19, 2025 Author

benhorvath
May 19, 2025

benhorvath
May 19, 2025
Author