[SPARK-53803][ML][Feature] Added ArimaRegression for time series forecasting in MLlib #52519

anandexplore · 2025-10-05T08:50:20Z

What changes were proposed in this pull request?

This pull request adds a new feature called ArimaRegression to Spark MLlib under org.apache.spark.ml.regression.
It brings the ARIMA (AutoRegressive Integrated Moving Average) model for one-variable (univariate) time series forecasting, along with a matching model class ArimaRegressionModel.

The update includes:
Scala code for ArimaRegression and ArimaRegressionModel
Support for ARIMA parameters: p, d, and q
PySpark API bindings for both classes
Unit tests in Scala and Python
Model save/load support using MLWritable and MLReadable
Example usage in examples/ml/ArimaRegressionExample.scala

Why are the changes needed?

Currently, Spark MLlib does not have built-in tools for time-series forecasting.
ARIMA is one of the most common models for predicting trends in time-based data.
Adding this feature allows Spark users to perform forecasting directly within MLlib, without needing outside Python libraries. It also makes Spark’s machine learning toolkit more complete.

Does this PR introduce any user-facing change?

Yes.
New APIs are available in both Scala and Python:
org.apache.spark.ml.regression.ArimaRegression
org.apache.spark.ml.regression.ArimaRegressionModel
pyspark.ml.regression.ArimaRegression
pyspark.ml.regression.ArimaRegressionModel

These follow standard Spark ML APIs and work with Pipelines, ParamMaps, save/load, and transform().

How was this patch tested?

Tests were added in:

Scala (ArimaRegressionSuite.scala) for:
Model fitting and transforming
Parameter defaults and setters
Save/load functions
Python (test_regression.py) for PySpark interface
Manual testing was also done in both:
spark-shell (Scala)
pyspark (Python)

Manual Tested
Scala:
import org.apache.spark.ml.regression.ArimaRegression
import org.apache.spark.sql.SparkSession

val spark = SparkSession.builder().appName("ArimaRegressionExample").getOrCreate()
import spark.implicits._
val data = Seq(100.0, 102.5, 101.0, 104.0, 107.5, 110.0).toDF("value")
val arima = new ArimaRegression()
.setP(1)
.setD(1)
.setQ(1)
val model = arima.fit(data)
val forecast = model.transform(data)
forecast.show(false)

Python:
from pyspark.ml.regression import ArimaRegression
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("ArimaRegressionExample").getOrCreate()
data = [(100.0,), (102.5,), (101.0,), (104.0,), (107.5,), (110.0,)]
df = spark.createDataFrame(data, ["value"])
arima = ArimaRegression(p=1, d=1, q=1)
model = arima.fit(df)
forecast = model.transform(df)
forecast.show(truncate=False)
Predictions and output schema were checked for correctness

Was this patch authored or co-authored using generative AI tooling?

No.

…sting in MLlib Add ArimaRegression for time series forecasting in MLlib

anandexplore · 2025-10-05T09:10:42Z

@sryza Could you review this commit
CC @HyukjinKwon @hvanhovell

anandexplore added 2 commits October 5, 2025 03:46

[SPARK-53803][ML][Feature] Add ArimaRegression for time series foreca…

e79db2c

…sting in MLlib Add ArimaRegression for time series forecasting in MLlib

Merge branch 'apache:master' into master

c4a8449

github-actions bot added ML MLLIB EXAMPLES DOCS PYTHON labels Oct 5, 2025

anandexplore changed the title ~~[SPARK-53803][ML][Feature] Add ArimaRegression for time series forecasting in MLlib~~ [SPARK-53803][ML][Feature] Added ArimaRegression for time series forecasting in MLlib Oct 5, 2025

Merge branch 'apache:master' into master

920900b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-53803][ML][Feature] Added ArimaRegression for time series forecasting in MLlib #52519

[SPARK-53803][ML][Feature] Added ArimaRegression for time series forecasting in MLlib #52519

Uh oh!

anandexplore commented Oct 5, 2025 •

edited

Loading

Uh oh!

anandexplore commented Oct 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

[SPARK-53803][ML][Feature] Added ArimaRegression for time series forecasting in MLlib #52519

Are you sure you want to change the base?

[SPARK-53803][ML][Feature] Added ArimaRegression for time series forecasting in MLlib #52519

Uh oh!

Conversation

anandexplore commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

anandexplore commented Oct 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

anandexplore commented Oct 5, 2025 •

edited

Loading

anandexplore commented Oct 5, 2025 •

edited

Loading