Skip to content

Input contains NaN for a non NaN data #573

@tifa64

Description

@tifa64

Describe the question you have

Hello maintainers, I want to understand why this scenario happens, I have the following timeseries

import pandas as pd
data = {
    'date': pd.date_range(start='2023-01-01', periods=10, freq='MS'),
    'value': [1, 3, 3, 4, 3, 2, 1, 1, 3, 2]
}
df = pd.DataFrame(data)
df.set_index('date', inplace=True)

Which yields this ts

            value
date             
2023-01-01      1
2023-02-01      3
2023-03-01      3
2023-04-01      4
2023-05-01      3
2023-06-01      2
2023-07-01      1
2023-08-01      1
2023-09-01      3
2023-10-01      2

image

and when I try and fit the model, it yields these information:

fitted_model = auto_arima(
                    y=df['value'],
                    max_iter=15,
                    max_d=1,
                    method='nm',
                    seasonal=False)
fitted_model

and when I try and fit the model, it yields these information:

ARIMA(2,0,2)(0,0,0)[0]          

Then I try to predict

fitted_model.predict(
                    n_periods=2,
                    return_conf_int=False)

and shows below error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In [1047], line 1
----> 1 fitted_model.predict(
      2                     n_periods=2,
      3                     return_conf_int=False)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/pmdarima/arima/arima.py:791, in ARIMA.predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
    788 arima = self.arima_res_
    789 end = arima.nobs + n_periods - 1
--> 791 f, conf_int = _seasonal_prediction_with_confidence(
    792     arima_res=arima,
    793     start=arima.nobs,
    794     end=end,
    795     X=X,
    796     alpha=alpha)
    798 if return_conf_int:
    799     # The confidence intervals may be a Pandas frame if it comes from
    800     # SARIMAX & we want Numpy. We will to duck type it so we don't add
    801     # new explicit requirements for the package
    802     return f, check_array(conf_int, force_all_finite=False)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/pmdarima/arima/arima.py:203, in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
    199     conf_int[:, 0] = f - q * np.sqrt(var)
    200     conf_int[:, 1] = f + q * np.sqrt(var)
    202 return check_endog(f, dtype=None, copy=False), \
--> 203     check_array(conf_int, copy=False, dtype=None)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/sklearn/utils/validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File ~/cluster-env/clonedenv/lib/python3.10/site-packages/sklearn/utils/validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN.

However when I increase the data by one data point

data = {
    'date': pd.date_range(start='2023-01-01', periods=11, freq='MS'),
    'value': [1, 3, 3, 4, 3, 2, 1, 1, 3, 2, 2]
}

or when I change to these values

data = {
    'date': pd.date_range(start='2023-01-01', periods=10, freq='MS'),
    'value': [5, 8, 11, 4, 6, 6, 6, 5, 6, 9]
}

or when setting the seasonal parameter to True for the same exact data

The model returned is ARIMA(0,0,0)(0,0,0)[0] intercept and the predictions are fine without errors


Another work around is to put a guradrail of maximum p, q, d to be 1 and it also works.

Can you help me understand why this happens? Is placing a guardrail the correct way to fix this?

Thank you in advance :)

Here is a video of a cute Otter as a digital bribe: https://www.youtube.com/watch?v=8O8iEz2p7rQ
Can you help me understand this behaviour?

Versions (if necessary)

System:
    python: 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
executable: /home/trusted-service-user/cluster-env/clonedenv/bin/python
   machine: Linux-4.15.0-1174-azure-x86_64-with-glibc2.27

Python dependencies:
        pip: 23.3
 setuptools: 65.5.1
    sklearn: 1.1.3
statsmodels: 0.14.0
      numpy: 1.23.4
      scipy: 1.10.1
     Cython: 0.29.32
     pandas: 1.5.3
     joblib: 1.3.2
   pmdarima: 1.8.5
Linux-4.15.0-1174-azure-x86_64-with-glibc2.27
Python 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:35:26) [GCC 10.4.0]
pmdarima 1.8.5
NumPy 1.23.4
SciPy 1.10.1
Scikit-Learn 1.1.3
Statsmodels 0.14.0
/home/trusted-service-user/cluster-env/clonedenv/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions