Backtest hour min lookup error by davidlatte · Pull Request #975 · Lumiwealth/lumibot

davidlatte · 2026-03-13T19:35:44Z

This pull request introduces a configurable mechanism for controlling whether minute-level cached data can satisfy day-bar lookup requests in backtesting data sources, addressing inconsistencies across providers like Polygon and ThetaData. The changes add an allow_day_resampling flag to the PandasData class and its derivatives, update the timestep-matching logic, and provide comprehensive tests and documentation for this behavior.

Key changes:

Data source configuration and logic

Added an allow_day_resampling parameter (defaulting to True) to PandasData and its subclasses, allowing each data source to specify whether minute data may be resampled to fulfill day-bar requests. This is set to True for Polygon and base PandasData, and False for ThetaData to enforce provider-specific normalization rules. [1] [2] [3]
Updated the _accepts_timestep method in PandasData to use the new allow_day_resampling flag, with detailed comments explaining the rationale and differences between data sources. This ensures that day requests are only satisfied by minute data when appropriate. [1] [2]

Testing and regression coverage

Added a new regression test class (TestGetHistoricalPricesMinuteToDayRegression) to verify correct (and buggy) behavior when requesting day bars from minute-only data, especially for stocks versus crypto assets. The tests document and demonstrate the previously buggy behavior and provide a baseline for future fixes.
Refactored and improved test utilities in test_pandas_data_find_asset_timestep_match.py to support the new configuration, ensuring that tests accurately reflect the new timestep-matching logic. [1] [2]

These changes make the data source behavior more explicit and configurable, prevent silent bypassing of provider-specific normalization, and improve test coverage and documentation for this critical aspect of the backtesting engine.

Summary by CodeRabbit

New Features
- Added a configurable option to control whether minute-level data may be resampled to satisfy day-bar requests (defaults to enabled).
- Polygon backtesting now permits on-demand resampling from minute to day data.
- ThetaData backtesting enforces exact-timestep day matching (resampling disabled).
Tests
- Expanded tests covering minute-to-day resampling behavior across asset types and access patterns.

…ice requests demonstrating a bug with quirying for 15m then 1d prices.

…ior for day requests

coderabbitai · 2026-03-13T19:38:16Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: aad69a9d-aee7-415b-bf21-f99e9a6b124d

📥 Commits

Reviewing files that changed from the base of the PR and between 3300ea4 and 7ff70d8.

📒 Files selected for processing (2)

tests/test_pandas_data.py
tests/test_pandas_data_find_asset_timestep_match.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/test_pandas_data_find_asset_timestep_match.py
tests/test_pandas_data.py

📝 Walkthrough

Walkthrough

Adds an allow_day_resampling flag controlling whether minute-level data may satisfy day-bar requests: PolygonDataBacktesting sets it True, ThetaDataBacktestingPandas sets it False, and PandasData gains a True-by-default parameter plus conditional timestep-acceptance logic.

Changes

Cohort / File(s)	Summary
Backtesting Data Source Configuration `lumibot/backtesting/polygon_backtesting.py`, `lumibot/backtesting/thetadata_backtesting_pandas.py`	Introduce `allow_day_resampling` instance attribute: set to `True` in Polygon backend and `False` in ThetaData backend to control day-resampling behavior.
Core Data Source Logic `lumibot/data_sources/pandas_data.py`	Add `allow_day_resampling: bool = True` parameter to `PandasData.__init__`, store `self.allow_day_resampling`, and modify `_accepts_timestep` to conditionally allow minute data to satisfy day requests when the flag is True.
Test Coverage — Minute/Day Resampling `tests/test_pandas_data.py`	Add `TestGetHistoricalPricesMinuteToDayRegression` with helpers and tests covering minute→day lookup behaviors, sequence interactions, and crypto vs stock cases.
Test Coverage — Timestep Matching Utilities `tests/test_pandas_data_find_asset_timestep_match.py`	Refactor tests to use real `PandasData` constructor, add helpers and expanded cases validating `allow_day_resampling` behavior across minute/day native data and different flag settings.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 A little flag hopped into the stack,
Minute bars may stretch or stay back.
Polygon nibble, Theta stands firm,
PandasData learns a flexible term.
Hooray for choices — a rabbit's small perk! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 71.88% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check	❓ Inconclusive	The title is vague and uses generic phrasing that doesn't clearly convey the main change—the addition of the configurable allow_day_resampling parameter.	Use a more descriptive title that captures the core change, such as 'Add allow_day_resampling flag to control minute-to-day data resampling' or similar.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch backtest_hour_min_lookup_error

📝 Coding Plan

Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (4.0.5)

tests/test_pandas_data_find_asset_timestep_match.py

************* Module pylintrc
pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: 'pylintrc', line: 1
'known-third-party=lumibot' (config-parse-error)
[
{
"type": "convention",
"module": "tests.test_pandas_data_find_asset_timestep_match",
"obj": "",
"line": 76,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "tests/test_pandas_data_find_asset_timestep_match.py",
"symbol": "line-too-long",
"message": "Line too long (102/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "tests.test_pandas_data_find_asset_timestep_match",
"obj": "",
"line": 169,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "tests/test_pandas_data_find_asset_timestep_match.py",
"symbol": "line-too-long",
"message": "Line too long (102/100)",
"message-id":

... [truncated 7859 characters] ...

ue",
"line": 421,
"column": 4,
"endLine": 421,
"endColumn": 33,
"path": "tests/test_pandas_data_find_asset_timestep_match.py",
"symbol": "import-outside-toplevel",
"message": "Import outside toplevel (datetime.timezone)",
"message-id": "C0415"
},
{
"type": "convention",
"module": "tests.test_pandas_data_find_asset_timestep_match",
"obj": "",
"line": 5,
"column": 0,
"endLine": 5,
"endColumn": 29,
"path": "tests/test_pandas_data_find_asset_timestep_match.py",
"symbol": "wrong-import-order",
"message": "standard import "datetime.datetime" should be placed before third party imports "pytz", "pandas"",
"message-id": "C0411"
}
]

tests/test_pandas_data.py

************* Module pylintrc
pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: 'pylintrc', line: 1
'known-third-party=lumibot' (config-parse-error)
[
{
"type": "convention",
"module": "tests.test_pandas_data",
"obj": "",
"line": 1,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "tests/test_pandas_data.py",
"symbol": "missing-module-docstring",
"message": "Missing module docstring",
"message-id": "C0114"
},
{
"type": "error",
"module": "tests.test_pandas_data",
"obj": "",
"line": 3,
"column": 0,
"endLine": 3,
"endColumn": 19,
"path": "tests/test_pandas_data.py",
"symbol": "import-error",
"message": "Unable to import 'pandas'",
"message-id": "E0401"
},
{
"type": "error",
"module": "tests.test_pandas_data",
"obj"

... [truncated 6261 characters] ...

 "obj": "TestGetHistoricalPricesMinuteToDayRegression.test_1day_request_after_15m_request_same_asset",
    "line": 243,
    "column": 31,
    "endLine": 243,
    "endColumn": 65,
    "path": "tests/test_pandas_data.py",
    "symbol": "protected-access",
    "message": "Access to a protected member _find_asset_in_data_store_cache of a client class",
    "message-id": "W0212"
},
{
    "type": "warning",
    "module": "tests.test_pandas_data",
    "obj": "",
    "line": 10,
    "column": 0,
    "endLine": 10,
    "endColumn": 46,
    "path": "tests/test_pandas_data.py",
    "symbol": "unused-import",
    "message": "Unused pandas_data_fixture imported from tests.fixtures",
    "message-id": "W0611"
}

]

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

CodeRabbit can approve the review once all CodeRabbit's comments are resolved.

Enable the reviews.request_changes_workflow setting to automatically approve the review once all CodeRabbit's comments are resolved.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lumibot/data_sources/pandas_data.py`:
- Around line 427-447: The day-resolution lookup logic currently treats allowed
resampling as only minute→day by checking data_ts in {"day", "minute"}, which
ignores warmed hourly caches; update the condition in the pandas_data resolution
branch (the block using self.allow_day_resampling, requested_unit and data_ts)
to also accept "hour" when requested_unit == "day" so hourly cached data can be
resampled to daily, and add a regression test alongside the new minute→day tests
that primes an "hour" cache then requests "day" to assert the hour data is
accepted/resampled.

In `@tests/test_pandas_data.py`:
- Around line 123-139: The test class is still using the old behavior and fails
because the __new__ fixtures don't set the new allow_day_resampling flag; update
the fixtures (the __new__ methods) that construct PandasData to set
allow_day_resampling=True (or the intended default) so the new day-lookup branch
can run without AttributeError, and then update the assertions that check
result_day (and any checks mentioning PandasData behavior) to expect minute→day
resampling to be allowed (i.e., change assertions that expect result_day is None
to expect a valid result or remove the obsolete class and fold its cases into
the new flag-driven tests in
tests/test_pandas_data_find_asset_timestep_match.py), applying the same changes
to the other affected blocks referenced (lines ~165-174, 193-261, 276-279).

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 43cf81a3-b1fc-4383-acb0-5ba9513c231d

📥 Commits

Reviewing files that changed from the base of the PR and between c962eb9 and 3300ea4.

📒 Files selected for processing (5)

lumibot/backtesting/polygon_backtesting.py
lumibot/backtesting/thetadata_backtesting_pandas.py
lumibot/data_sources/pandas_data.py
tests/test_pandas_data.py
tests/test_pandas_data_find_asset_timestep_match.py

coderabbitai · 2026-03-13T19:52:52Z

lumibot/data_sources/pandas_data.py

            if requested_unit == "day":
-                # IMPORTANT:
-                # Keep explicit day requests pinned to native day datasets.
+                # IMPORTANT — two conflicting philosophies exist across data sources:
+                #
+                # allow_day_resampling=False (ThetaData):
+                #   Keep explicit day requests pinned to native day datasets.
+                #   ThetaData stores minute and day data under separate canonical keys
+                #   (asset, quote, "minute") vs (asset, quote, "day").  Allowing minute
+                #   data to satisfy day requests would silently bypass ThetaData's
+                #   split-spike repair / split-adjustment normalisation and could trigger
+                #   expensive re-fetch churn in daily-cadence backtests.
                #
-                # Allowing minute datasets to satisfy day requests can silently bypass provider-
-                # specific day-bar normalization (for example split-spike repair/timestamp
-                # alignment in IBKR helpers), and can trigger expensive minute fetch churn in
-                # daily-cadence backtests.
-                if requested_asset_type in {"stock", "index"}:
+                # allow_day_resampling=True (Polygon, base PandasData — the default):
+                #   Polygon's _update_pandas_data always tries to obtain the finest
+                #   granularity available and relies on Data.get_bars() to resample
+                #   minute → day on demand.  If only minute data is cached for a stock,
+                #   the day request must be allowed to reach Data.get_bars() so the
+                #   resampling path fires.  The same applies to user-provided minute
+                #   CSV data in the plain PandasData source.
+                if not self.allow_day_resampling:
                    return data_ts == "day"
                return data_ts in {"day", "minute"}


⚠️ Potential issue | 🟠 Major

Day lookups still reject warmed hourly caches.

With allow_day_resampling=True, this branch now re-enables minute→day reuse, but it still excludes hour. A prior 1 hour fetch followed by 1 day will still miss here, even though lumibot/backtesting/polygon_backtesting.py already preserves hourly caches for that path. Please include hour here and add a matching regression alongside the new minute→day cases.

♻️ Suggested fix

- return data_ts in {"day", "minute"} + return data_ts in {"day", "hour", "minute"}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if requested_unit == "day":

# IMPORTANT:

# Keep explicit day requests pinned to native day datasets.

# IMPORTANT — two conflicting philosophies exist across data sources:

#

# allow_day_resampling=False (ThetaData):

# Keep explicit day requests pinned to native day datasets.

# ThetaData stores minute and day data under separate canonical keys

# (asset, quote, "minute") vs (asset, quote, "day"). Allowing minute

# data to satisfy day requests would silently bypass ThetaData's

# split-spike repair / split-adjustment normalisation and could trigger

# expensive re-fetch churn in daily-cadence backtests.

#

# Allowing minute datasets to satisfy day requests can silently bypass provider-

# specific day-bar normalization (for example split-spike repair/timestamp

# alignment in IBKR helpers), and can trigger expensive minute fetch churn in

# daily-cadence backtests.

if requested_asset_type in {"stock", "index"}:

# allow_day_resampling=True (Polygon, base PandasData — the default):

# Polygon's _update_pandas_data always tries to obtain the finest

# granularity available and relies on Data.get_bars() to resample

# minute → day on demand. If only minute data is cached for a stock,

# the day request must be allowed to reach Data.get_bars() so the

# resampling path fires. The same applies to user-provided minute

# CSV data in the plain PandasData source.

if not self.allow_day_resampling:

return data_ts == "day"

return data_ts in {"day", "minute"}

if requested_unit == "day":

# IMPORTANT — two conflicting philosophies exist across data sources:

#

# allow_day_resampling=False (ThetaData):

# Keep explicit day requests pinned to native day datasets.

# ThetaData stores minute and day data under separate canonical keys

# (asset, quote, "minute") vs (asset, quote, "day"). Allowing minute

# data to satisfy day requests would silently bypass ThetaData's

# split-spike repair / split-adjustment normalisation and could trigger

# expensive re-fetch churn in daily-cadence backtests.

#

# allow_day_resampling=True (Polygon, base PandasData — the default):

# Polygon's _update_pandas_data always tries to obtain the finest

# granularity available and relies on Data.get_bars() to resample

# minute → day on demand. If only minute data is cached for a stock,

# the day request must be allowed to reach Data.get_bars() so the

# resampling path fires. The same applies to user-provided minute

# CSV data in the plain PandasData source.

if not self.allow_day_resampling:

return data_ts == "day"

return data_ts in {"day", "hour", "minute"}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@lumibot/data_sources/pandas_data.py` around lines 427 - 447, The day-resolution lookup logic currently treats allowed resampling as only minute→day by checking data_ts in {"day", "minute"}, which ignores warmed hourly caches; update the condition in the pandas_data resolution branch (the block using self.allow_day_resampling, requested_unit and data_ts) to also accept "hour" when requested_unit == "day" so hourly cached data can be resampled to daily, and add a regression test alongside the new minute→day tests that primes an "hour" cache then requests "day" to assert the hour data is accepted/resampled.

tests/test_pandas_data.py

davidlatte added 2 commits March 13, 2026 10:45

test: add regression tests for stock and index guard in historical pr…

2964ea6

…ice requests demonstrating a bug with quirying for 15m then 1d prices.

feat: introduce allow_day_resampling flag to control resampling behav…

3300ea4

…ior for day requests

davidlatte requested a review from grzesir as a code owner March 13, 2026 19:35

davidlatte temporarily deployed to unit-tests March 13, 2026 19:35 — with GitHub Actions Inactive

davidlatte temporarily deployed to unit-tests March 13, 2026 19:36 — with GitHub Actions Inactive

davidlatte had a problem deploying to unit-tests March 13, 2026 19:36 — with GitHub Actions Failure

davidlatte temporarily deployed to unit-tests March 13, 2026 19:36 — with GitHub Actions Inactive

coderabbitai bot reviewed Mar 13, 2026

View reviewed changes

tests: replace PandasData.__new__() with real constructor

7ff70d8

davidlatte temporarily deployed to unit-tests March 13, 2026 19:59 — with GitHub Actions Inactive

davidlatte temporarily deployed to unit-tests March 13, 2026 20:00 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backtest hour min lookup error#975

Backtest hour min lookup error#975
davidlatte wants to merge 3 commits intodevfrom
backtest_hour_min_lookup_error

davidlatte commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 13, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

davidlatte commented Mar 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Data source configuration and logic

Testing and regression coverage

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

davidlatte commented Mar 13, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 13, 2026 •

edited

Loading