Conversation
…ice requests demonstrating a bug with quirying for 15m then 1d prices.
…ior for day requests
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds an allow_day_resampling flag controlling whether minute-level data may satisfy day-bar requests: PolygonDataBacktesting sets it True, ThetaDataBacktestingPandas sets it False, and PandasData gains a True-by-default parameter plus conditional timestep-acceptance logic. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
📝 Coding Plan
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (4.0.5)tests/test_pandas_data_find_asset_timestep_match.py************* Module pylintrc ... [truncated 7859 characters] ... ue", tests/test_pandas_data.py************* Module pylintrc ... [truncated 6261 characters] ... ] Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment Tip CodeRabbit can approve the review once all CodeRabbit's comments are resolved.Enable the |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@lumibot/data_sources/pandas_data.py`:
- Around line 427-447: The day-resolution lookup logic currently treats allowed
resampling as only minute→day by checking data_ts in {"day", "minute"}, which
ignores warmed hourly caches; update the condition in the pandas_data resolution
branch (the block using self.allow_day_resampling, requested_unit and data_ts)
to also accept "hour" when requested_unit == "day" so hourly cached data can be
resampled to daily, and add a regression test alongside the new minute→day tests
that primes an "hour" cache then requests "day" to assert the hour data is
accepted/resampled.
In `@tests/test_pandas_data.py`:
- Around line 123-139: The test class is still using the old behavior and fails
because the __new__ fixtures don't set the new allow_day_resampling flag; update
the fixtures (the __new__ methods) that construct PandasData to set
allow_day_resampling=True (or the intended default) so the new day-lookup branch
can run without AttributeError, and then update the assertions that check
result_day (and any checks mentioning PandasData behavior) to expect minute→day
resampling to be allowed (i.e., change assertions that expect result_day is None
to expect a valid result or remove the obsolete class and fold its cases into
the new flag-driven tests in
tests/test_pandas_data_find_asset_timestep_match.py), applying the same changes
to the other affected blocks referenced (lines ~165-174, 193-261, 276-279).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 43cf81a3-b1fc-4383-acb0-5ba9513c231d
📒 Files selected for processing (5)
lumibot/backtesting/polygon_backtesting.pylumibot/backtesting/thetadata_backtesting_pandas.pylumibot/data_sources/pandas_data.pytests/test_pandas_data.pytests/test_pandas_data_find_asset_timestep_match.py
| if requested_unit == "day": | ||
| # IMPORTANT: | ||
| # Keep explicit day requests pinned to native day datasets. | ||
| # IMPORTANT — two conflicting philosophies exist across data sources: | ||
| # | ||
| # allow_day_resampling=False (ThetaData): | ||
| # Keep explicit day requests pinned to native day datasets. | ||
| # ThetaData stores minute and day data under separate canonical keys | ||
| # (asset, quote, "minute") vs (asset, quote, "day"). Allowing minute | ||
| # data to satisfy day requests would silently bypass ThetaData's | ||
| # split-spike repair / split-adjustment normalisation and could trigger | ||
| # expensive re-fetch churn in daily-cadence backtests. | ||
| # | ||
| # Allowing minute datasets to satisfy day requests can silently bypass provider- | ||
| # specific day-bar normalization (for example split-spike repair/timestamp | ||
| # alignment in IBKR helpers), and can trigger expensive minute fetch churn in | ||
| # daily-cadence backtests. | ||
| if requested_asset_type in {"stock", "index"}: | ||
| # allow_day_resampling=True (Polygon, base PandasData — the default): | ||
| # Polygon's _update_pandas_data always tries to obtain the finest | ||
| # granularity available and relies on Data.get_bars() to resample | ||
| # minute → day on demand. If only minute data is cached for a stock, | ||
| # the day request must be allowed to reach Data.get_bars() so the | ||
| # resampling path fires. The same applies to user-provided minute | ||
| # CSV data in the plain PandasData source. | ||
| if not self.allow_day_resampling: | ||
| return data_ts == "day" | ||
| return data_ts in {"day", "minute"} |
There was a problem hiding this comment.
Day lookups still reject warmed hourly caches.
With allow_day_resampling=True, this branch now re-enables minute→day reuse, but it still excludes hour. A prior 1 hour fetch followed by 1 day will still miss here, even though lumibot/backtesting/polygon_backtesting.py already preserves hourly caches for that path. Please include hour here and add a matching regression alongside the new minute→day cases.
♻️ Suggested fix
- return data_ts in {"day", "minute"}
+ return data_ts in {"day", "hour", "minute"}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if requested_unit == "day": | |
| # IMPORTANT: | |
| # Keep explicit day requests pinned to native day datasets. | |
| # IMPORTANT — two conflicting philosophies exist across data sources: | |
| # | |
| # allow_day_resampling=False (ThetaData): | |
| # Keep explicit day requests pinned to native day datasets. | |
| # ThetaData stores minute and day data under separate canonical keys | |
| # (asset, quote, "minute") vs (asset, quote, "day"). Allowing minute | |
| # data to satisfy day requests would silently bypass ThetaData's | |
| # split-spike repair / split-adjustment normalisation and could trigger | |
| # expensive re-fetch churn in daily-cadence backtests. | |
| # | |
| # Allowing minute datasets to satisfy day requests can silently bypass provider- | |
| # specific day-bar normalization (for example split-spike repair/timestamp | |
| # alignment in IBKR helpers), and can trigger expensive minute fetch churn in | |
| # daily-cadence backtests. | |
| if requested_asset_type in {"stock", "index"}: | |
| # allow_day_resampling=True (Polygon, base PandasData — the default): | |
| # Polygon's _update_pandas_data always tries to obtain the finest | |
| # granularity available and relies on Data.get_bars() to resample | |
| # minute → day on demand. If only minute data is cached for a stock, | |
| # the day request must be allowed to reach Data.get_bars() so the | |
| # resampling path fires. The same applies to user-provided minute | |
| # CSV data in the plain PandasData source. | |
| if not self.allow_day_resampling: | |
| return data_ts == "day" | |
| return data_ts in {"day", "minute"} | |
| if requested_unit == "day": | |
| # IMPORTANT — two conflicting philosophies exist across data sources: | |
| # | |
| # allow_day_resampling=False (ThetaData): | |
| # Keep explicit day requests pinned to native day datasets. | |
| # ThetaData stores minute and day data under separate canonical keys | |
| # (asset, quote, "minute") vs (asset, quote, "day"). Allowing minute | |
| # data to satisfy day requests would silently bypass ThetaData's | |
| # split-spike repair / split-adjustment normalisation and could trigger | |
| # expensive re-fetch churn in daily-cadence backtests. | |
| # | |
| # allow_day_resampling=True (Polygon, base PandasData — the default): | |
| # Polygon's _update_pandas_data always tries to obtain the finest | |
| # granularity available and relies on Data.get_bars() to resample | |
| # minute → day on demand. If only minute data is cached for a stock, | |
| # the day request must be allowed to reach Data.get_bars() so the | |
| # resampling path fires. The same applies to user-provided minute | |
| # CSV data in the plain PandasData source. | |
| if not self.allow_day_resampling: | |
| return data_ts == "day" | |
| return data_ts in {"day", "hour", "minute"} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@lumibot/data_sources/pandas_data.py` around lines 427 - 447, The
day-resolution lookup logic currently treats allowed resampling as only
minute→day by checking data_ts in {"day", "minute"}, which ignores warmed hourly
caches; update the condition in the pandas_data resolution branch (the block
using self.allow_day_resampling, requested_unit and data_ts) to also accept
"hour" when requested_unit == "day" so hourly cached data can be resampled to
daily, and add a regression test alongside the new minute→day tests that primes
an "hour" cache then requests "day" to assert the hour data is
accepted/resampled.
This pull request introduces a configurable mechanism for controlling whether minute-level cached data can satisfy day-bar lookup requests in backtesting data sources, addressing inconsistencies across providers like Polygon and ThetaData. The changes add an
allow_day_resamplingflag to thePandasDataclass and its derivatives, update the timestep-matching logic, and provide comprehensive tests and documentation for this behavior.Key changes:
Data source configuration and logic
allow_day_resamplingparameter (defaulting toTrue) toPandasDataand its subclasses, allowing each data source to specify whether minute data may be resampled to fulfill day-bar requests. This is set toTruefor Polygon and base PandasData, andFalsefor ThetaData to enforce provider-specific normalization rules. [1] [2] [3]_accepts_timestepmethod inPandasDatato use the newallow_day_resamplingflag, with detailed comments explaining the rationale and differences between data sources. This ensures that day requests are only satisfied by minute data when appropriate. [1] [2]Testing and regression coverage
TestGetHistoricalPricesMinuteToDayRegression) to verify correct (and buggy) behavior when requesting day bars from minute-only data, especially for stocks versus crypto assets. The tests document and demonstrate the previously buggy behavior and provide a baseline for future fixes.test_pandas_data_find_asset_timestep_match.pyto support the new configuration, ensuring that tests accurately reflect the new timestep-matching logic. [1] [2]These changes make the data source behavior more explicit and configurable, prevent silent bypassing of provider-specific normalization, and improve test coverage and documentation for this critical aspect of the backtesting engine.
Summary by CodeRabbit
New Features
Tests