[BUG] Align TimeXer v2 endogenous/exogenous usage with tslib metadata (#2009)

ahmedkansulum · web-flow · commit dbfd38b4cab2 · 2025-12-20T11:00:01.000+05:30
## Summary This PR makes the TimeXer v2 implementation consistent with the v2 / `tslib` design by removing the duplicated configuration of endogenous and exogenous variables inside `TimeXer._forecast`. Instead of re-selecting series using `self.endogenous_vars` / `self.exogenous_vars` on top of the `tslib` metadata, the model now relies solely on the tensors provided by the data pipeline (`history_target` and `history_cont`). This implements **option 1** discussed in #2003 ("not overriding or passing twice"). Fixes #2003 ## Motivation / Context In v2, feature configuration is intended to be described by the `metadata` produced by `TslibDataModule` and consumed by `TslibBaseModel`. TimeXer v2 currently has: - feature names and indices described in `metadata` - *and* additional `endogenous_vars` / `exogenous_vars` kwargs that are used in `_forecast` to re-select columns from `history_cont` This leads to two different places where the endogenous / exogenous split can be defined, which is exactly the concern raised in #2003. The maintainers confirmed that option 1 is preferred: relying on the metadata / data pipeline only, i.e. not overriding or passing the configuration twice. ## What this PR changes In `pytorch_forecasting/models/timexer/_timexer_v2.py`: - `TimeXer._forecast` no longer uses `self.endogenous_vars` or `self.exogenous_vars` to re-select columns from `history_cont`. - Instead, the method now follows the v2 convention: - endogenous information is taken from `history_target` - exogenous information is taken from all continuous covariates in `history_cont` Concretely, the previous block: ```python # explicitly set endogenous and exogenous variables endogenous_cont = history_target if self.endogenous_vars: endogenous_indices = [ self.feature_names["continuous"].index(var) for var in self.endogenous_vars ] endogenous_cont = history_cont[..., endogenous_indices] exogenous_cont = history_cont if self.exogenous_vars: exogenous_indices = [ self.feature_names["continuous"].index(var) for var in self.exogenous_vars ] exogenous_cont = history_cont[..., exogenous_indices] ``` is replaced by: ```python # v2 convention: # - endogenous information comes from the target history # - exogenous information comes from all continuous covariates endogenous_cont = history_target exogenous_cont = history_cont ``` The rest of `_forecast` (embedding, encoder, head) remains unchanged. ### API / behaviour notes - The `endogenous_vars` and `exogenous_vars` arguments are still present in `TimeXer.__init__` and are stored on `self`, but they are no longer used in `_forecast`. - This keeps the public signature unchanged for now (no immediate breaking change), while removing the duplicated configuration path that conflicted with the v2 metadata design. - In practice, this means TimeXer v2 now always behaves as if the endogenous information is given by the target history and the exogenous information by the continuous covariates provided by `TslibDataModule`. - If desired, a follow-up PR can deprecate or remove these args entirely from the public API. ## Tests On Windows with Python 3.13, I ran: ```bash python -m pytest -k "TimeXer" -q --basetemp="C:\Projects\pytorch-forecasting\.pytest_tmp" ``` - 75 tests passed - 1 test skipped - 0 failures The failures reported earlier were due to a local Windows permission issue with the default pytest temp directory; pointing `--basetemp` at a project-local directory resolved that, and the TimeXer tests now run cleanly with the change in place. ## Notes - This PR is intentionally scoped to the functional change in `_forecast` only. - I am happy to follow up with a separate PR (or extend this one if preferred) to: - deprecate/remove the `endogenous_vars` / `exogenous_vars` kwargs from `TimeXer.__init__`, and - update the class docstring and docs accordingly.
diff --git a/pytorch_forecasting/models/timexer/_timexer_v2.py b/pytorch_forecasting/models/timexer/_timexer_v2.py
@@ -64,12 +64,6 @@ class TimeXer(TslibBaseModel):
         optimal backend (FlashAttention-2, Memory-Efficient Attention, or their
         own C++ implementation) based on user's input properties, hardware
         capabilities, and build configuration.
-    endogenous_vars: Optional[list[str]], default=None
-        List of endogenous variable names to be used in the model. If None, all historical values
-        for the target variable are used.
-    exogenous_vars: Optional[list[str]], default=None
-        List of exogenous variable names to be used in the model. If None, all historical values
-        for continuous variables are used.
     logging_metrics: Optional[list[nn.Module]], default=None
         List of metrics to log during training, validation, and testing.
     optimizer: Optional[Union[Optimizer, str]], default='adam'
@@ -83,7 +77,8 @@ class TimeXer(TslibBaseModel):
     metadata: Optional[dict], default=None
         Metadata for the model from TslibDataModule. This can include information about the dataset,
         such as the number of time steps, number of features, etc. It is used to initialize the model
-        and ensure it is compatible with the data being used.
+        and ensure it is compatible with the data being used, including the split between endogenous
+        (target) and exogenous covariates.
 
     References
     ----------
@@ -118,8 +113,6 @@ def __init__(
         factor: int = 5,
         activation: str = "relu",
         use_efficient_attention: bool = False,
-        endogenous_vars: Optional[list[str]] = None,
-        exogenous_vars: Optional[list[str]] = None,
         logging_metrics: Optional[list[nn.Module]] = None,
         optimizer: Optional[Union[Optimizer, str]] = "adam",
         optimizer_params: Optional[dict] = None,
@@ -156,8 +149,6 @@ def __init__(
         self.activation = activation
         self.use_efficient_attention = use_efficient_attention
         self.factor = factor
-        self.endogenous_vars = endogenous_vars
-        self.exogenous_vars = exogenous_vars
         self.save_hyperparameters(ignore=["loss", "logging_metrics", "metadata"])
 
         self._init_network()
@@ -292,22 +283,11 @@ def _forecast(self, x: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
             # change [batch_size, time_steps] to [batch_size, time_steps, features]
             history_time_idx = history_time_idx.unsqueeze(-1)
 
-        # explicitly set endogenous and exogenous variables
+        # v2 convention:
+        # - endogenous information comes from the target history
+        # - exogenous information comes from all continuous covariates
         endogenous_cont = history_target
-        if self.endogenous_vars:
-            endogenous_indices = [
-                self.feature_names["continuous"].index(var)
-                for var in self.endogenous_vars  # noqa: E501
-            ]
-            endogenous_cont = history_cont[..., endogenous_indices]
-
         exogenous_cont = history_cont
-        if self.exogenous_vars:
-            exogenous_indices = [
-                self.feature_names["continuous"].index(var)
-                for var in self.exogenous_vars  # noqa: E501
-            ]
-            exogenous_cont = history_cont[..., exogenous_indices]
 
         en_embed, n_vars = self.en_embedding(endogenous_cont)
         ex_embed = self.ex_embedding(exogenous_cont, history_time_idx)
diff --git a/tests/test_models/test_timexer_v2.py b/tests/test_models/test_timexer_v2.py
@@ -335,15 +335,10 @@ def test_missing_history_target_handling(basic_metadata):
 def test_endogenous_exogenous_variable_selection(basic_metadata):
     """Test explicit endogenous and exogenous variable selection in TimeXer model."""
 
-    endo_names = basic_metadata["feature_names"]["continuous"][0]
-    exog_names = basic_metadata["feature_names"]["continuous"][1]
-
     model = TimeXer(
         loss=MAE(),
         hidden_size=64,
         n_heads=8,
-        endogenous_vars=[endo_names],
-        exogenous_vars=[exog_names],
         e_layers=2,
         metadata=basic_metadata,
     )