Skip to content

Optimize safe backtesting hot paths#979

Open
mpelteshki wants to merge 2 commits intodevfrom
codex/safe-backtest-speedups
Open

Optimize safe backtesting hot paths#979
mpelteshki wants to merge 2 commits intodevfrom
codex/safe-backtest-speedups

Conversation

@mpelteshki
Copy link
Collaborator

@mpelteshki mpelteshki commented Mar 16, 2026

Summary

  • optimize shared backtesting broker/session hot paths with array-backed time-to-close lookups and cached early-assignment config resolution
  • tighten strategy-side cash and non-cash position lookups with safe caches, plus cheaper asset/position hot-path behavior
  • add focused regression coverage for the new caches and Yahoo daily last-price behavior

Validation

  • uv run --with-requirements requirements_dev.txt pytest -q tests/test_cash.py tests/test_strategy_get_positions_cache.py tests/test_backtesting_broker_assignment_config_cache.py tests/test_position_quantity_cache.py tests/test_strategy_dividend_cash_batch.py tests/test_strategy_executor_progress_payload.py tests/test_yahoo_daily_last_price_fastpath.py tests/test_backtesting_crypto_cash_unit.py tests/test_backtesting_broker.py tests/test_asset.py tests/test_helpers.py
    • 112 passed, 3 warnings, 22 subtests passed
  • external parity benchmark: diversified_leverage_with_threshold stock backtest still returns total_return = 23.960373107784644
  • observed 5-run wall-time batches for the kept safe tree are now roughly in the 1.50s-1.53s band on the local machine, with a best observed batch mean of about 1.50s

Summary by CodeRabbit

  • New Features

    • get_active_tracked_orders() now supports optional asset filtering for targeted order retrieval.
    • Progress bar pacing controls added for backtesting with configurable update intervals.
    • Revision tracking added to SafeList and SafeOrderDict for mutation detection.
  • Performance

    • Extensive caching improvements for position and order retrieval, trading calendars, and price lookups.
    • Disk-backed caching for trading day schedules reduces repeated computation.

@coderabbitai
Copy link

coderabbitai bot commented Mar 16, 2026

📝 Walkthrough

Walkthrough

The pull request introduces extensive caching mechanisms throughout the backtesting framework and data sources. Key additions include: trading calendar and market data caching in the broker and helpers; tracked orders/positions caching with optional asset filtering; portfolio value computation caching; progress bar throttling; daily last price caching in Yahoo data source; asset type and minimal dictionary precomputation; mutation tracking via revision counters on SafeList/SafeOrderDict; and dividend cash batching to reduce position updates.

Changes

Cohort / File(s) Summary
Broker Order & Position Caching
lumibot/backtesting/backtesting_broker.py, lumibot/brokers/broker.py
Added comprehensive caching for tracked orders, positions, and asset potential totals with asset filtering support. get_active_tracked_orders signature expanded to include optional asset parameter. Internal caches track order bucket revisions and filter-specific results. Early-assignment config now cached per-strategy.
Data Source Optimization
lumibot/data_sources/data_source_backtesting.py, lumibot/data_sources/yahoo_data.py
Progress bar printing now throttled by interval (default 0.25s). Yahoo data source introduces per-asset caches for index/open values and daily last price, with new helper methods _get_source_symbol_data, _get_filtered_end_index, and _get_last_daily_open_price for efficient data retrieval and caching.
Entity Performance
lumibot/entities/asset.py, lumibot/entities/position.py, lumibot/trading_builtins/safe_list.py
Asset now caches type key and minimal dictionary for fast lookups. Position quantity synchronized between Decimal and float representations with precision clamping. SafeList and SafeOrderDict introduce revision counter attribute to track mutations for cache invalidation.
Strategy Portfolio & Cash Tracking
lumibot/strategies/_strategy.py, lumibot/strategies/strategy.py
Introduced _get_cash_position() for cached cash position retrieval and _set_cash_position() to maintain internal reference. Portfolio value caching by broker datetime and position revision. Dividend cash batching applies accumulated delta in single update. get_positions and get_last_price now include per-run backtesting-aware caches. Sanitized asset cache (256-entry limit) avoids recreating identical Asset objects.
Strategy Execution & Progress
lumibot/strategies/strategy_executor.py
New _build_backtest_progress_payload method centralizes payload construction for backtesting with throttling. Payload includes cash, portfolio value, and conditionally positions/orders for logging. Integration with broker _update_datetime standardizes progress tracking across backtest paths.
Calendar & Trading Days Caching
lumibot/tools/helpers.py
Disk caching layer added for trading day calendars using SHA1-hashed composite keys (version, pandas version, market, date range, timezone). Schedule timezone normalization ensures consistent tz-aware indices. Long windows (≥365 days) now cached to memory and disk for subsequent lookups.
Test Coverage
tests/test_backtesting_broker.py, tests/test_cash.py, tests/test_helpers.py, tests/test_strategy_dividend_cash_batch.py, tests/test_strategy_executor_progress_payload.py, tests/test_yahoo_daily_last_price_fastpath.py, tests/test_minimal_serialization.py, tests/test_safe_list.py
Added 8 new test methods covering asset filtering in order retrieval, cash position mutation tracking, long-window trading day caching, dividend batching with single cash update, progress payload construction and error handling, daily last price cache reuse, Decimal quantity serialization, and SafeList immutability.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

  • PR #969 — Modifies BacktestingBroker option early-assignment and expiration caching logic, overlapping core broker caching infrastructure changes.
  • PR #972 — Changes Strategy backtesting daily-last-price logic and backtest progress/cadence handling, directly related to new daily price caching and progress payload work.
  • PR #959 — Modifies trading-calendar and backtesting caching logic in backtesting_broker.py and tools/helpers.py, affecting overlapping calendar and trading-day caching layers.

Poem

🐰 Whiskers twitching with delight,
Caches nested, shining bright,
Calendars cached, orders too,
Revisions tracked, cash positions true—
Performance hops faster still! 🏃‍♂️💨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Optimize safe backtesting hot paths' directly and clearly describes the main change: optimizing critical backtesting code paths with caching and performance improvements across multiple modules.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch codex/safe-backtest-speedups
📝 Coding Plan
  • Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (4.0.5)
lumibot/trading_builtins/safe_list.py

************* Module pylintrc
pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: 'pylintrc', line: 1
'known-third-party=lumibot' (config-parse-error)
[
{
"type": "convention",
"module": "lumibot.trading_builtins.safe_list",
"obj": "",
"line": 1,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "lumibot/trading_builtins/safe_list.py",
"symbol": "missing-module-docstring",
"message": "Missing module docstring",
"message-id": "C0114"
},
{
"type": "convention",
"module": "lumibot.trading_builtins.safe_list",
"obj": "SafeList",
"line": 7,
"column": 0,
"endLine": 7,
"endColumn": 14,
"path": "lumibot/trading_builtins/safe_list.py",
"symbol": "missing-class-docstring",
"message": "Missing class docstring",
"message-id": "C0115"
},
{
"typ

... [truncated 5882 characters] ...

ule": "lumibot.trading_builtins.safe_list",
"obj": "SafeOrderDict.extend",
"line": 334,
"column": 4,
"endLine": 334,
"endColumn": 14,
"path": "lumibot/trading_builtins/safe_list.py",
"symbol": "missing-function-docstring",
"message": "Missing function or method docstring",
"message-id": "C0116"
},
{
"type": "convention",
"module": "lumibot.trading_builtins.safe_list",
"obj": "SafeOrderDict.get_list",
"line": 344,
"column": 4,
"endLine": 344,
"endColumn": 16,
"path": "lumibot/trading_builtins/safe_list.py",
"symbol": "missing-function-docstring",
"message": "Missing function or method docstring",
"message-id": "C0116"
}
]

lumibot/data_sources/yahoo_data.py

************* Module pylintrc
pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: 'pylintrc', line: 1
'known-third-party=lumibot' (config-parse-error)
[
{
"type": "convention",
"module": "lumibot.data_sources.yahoo_data",
"obj": "",
"line": 27,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "lumibot/data_sources/yahoo_data.py",
"symbol": "line-too-long",
"message": "Line too long (113/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "lumibot.data_sources.yahoo_data",
"obj": "",
"line": 38,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "lumibot/data_sources/yahoo_data.py",
"symbol": "line-too-long",
"message": "Line too long (134/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module":

... [truncated 21269 characters] ...

",
"obj": "YahooData.get_historical_prices",
"line": 474,
"column": 4,
"endLine": 474,
"endColumn": 29,
"path": "lumibot/data_sources/yahoo_data.py",
"symbol": "too-many-positional-arguments",
"message": "Too many positional arguments (8/5)",
"message-id": "R0917"
},
{
"type": "refactor",
"module": "lumibot.data_sources.yahoo_data",
"obj": "YahooData.get_historical_prices",
"line": 497,
"column": 8,
"endLine": 500,
"endColumn": 23,
"path": "lumibot/data_sources/yahoo_data.py",
"symbol": "no-else-return",
"message": "Unnecessary "elif" after "return", remove the leading "el" from "elif"",
"message-id": "R1705"
}
]

lumibot/strategies/strategy_executor.py

************* Module pylintrc
pylintrc:1:0: F0011: error while parsing the configuration: File contains no section headers.
file: 'pylintrc', line: 1
'known-third-party=lumibot' (config-parse-error)
[
{
"type": "convention",
"module": "lumibot.strategies.strategy_executor",
"obj": "",
"line": 60,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "lumibot/strategies/strategy_executor.py",
"symbol": "line-too-long",
"message": "Line too long (112/100)",
"message-id": "C0301"
},
{
"type": "convention",
"module": "lumibot.strategies.strategy_executor",
"obj": "",
"line": 61,
"column": 0,
"endLine": null,
"endColumn": null,
"path": "lumibot/strategies/strategy_executor.py",
"symbol": "line-too-long",
"message": "Line too long (114/100)",
"message-id": "C0301"
},
{
"type": "convention

... [truncated 137526 characters] ...

g_session",
"line": 1858,
"column": 12,
"endLine": 1858,
"endColumn": 35,
"path": "lumibot/strategies/strategy_executor.py",
"symbol": "attribute-defined-outside-init",
"message": "Attribute 'check_queue_thread' defined outside init",
"message-id": "W0201"
},
{
"type": "warning",
"module": "lumibot.strategies.strategy_executor",
"obj": "StrategyExecutor._run_trading_session",
"line": 1882,
"column": 20,
"endLine": 1882,
"endColumn": 44,
"path": "lumibot/strategies/strategy_executor.py",
"symbol": "attribute-defined-outside-init",
"message": "Attribute '_last_updated_cloud' defined outside init",
"message-id": "W0201"
}
]

  • 6 others

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use your project's `ruff` configuration to improve the quality of Python code reviews.

Add a Ruff configuration file to your project to customize how CodeRabbit runs ruff.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
lumibot/entities/position.py (1)

113-120: ⚠️ Potential issue | 🟠 Major

Keep dust filtering out of Position.quantity.

Line 117's hard-coded 1e-6 epsilon is larger than the supported precision of many crypto assets. Returning 0.0 from the accessor can make a real position disappear from sell/valuation paths; if this is only to hide float noise, do it at the presentation boundary instead.

Suggested fix
     `@property`
     def quantity(self):
-        result = self._quantity_float
-
-        # If result is less than 0.000001, return 0.0 to avoid rounding errors.
-        if abs(result) < 0.000001:
-            return 0.0
-
-        return result
+        return self._quantity_float
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/entities/position.py` around lines 113 - 120, The Position.quantity
accessor is removing small balances using a hard-coded epsilon (abs(result) <
1e-6) which can hide real crypto positions; remove the dust-filtering check from
Position.quantity so it simply returns self._quantity_float, and if you need to
suppress float noise implement a separate presentation helper (e.g.,
format_quantity or a UI-layer clamp) or make a configurable method like
Position.quantity_clamped(epsilon) for display/valuation only.
lumibot/trading_builtins/safe_list.py (1)

13-17: ⚠️ Potential issue | 🟠 Major

Own the backing list before exposing revision.

self.__items = initial keeps the caller's list alias. Any out-of-band mutation of that list bypasses revision += 1, so caches that trust revision can go stale immediately. At minimum, copy initial here; the same invariant should hold anywhere the raw list is exposed.

Suggested fix
         if initial is None:
             initial = []
         self.__lock = lock
-        self.__items = initial
+        self.__items = list(initial)
         self.revision = 0
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/trading_builtins/safe_list.py` around lines 13 - 17, The backing list
assigned to self.__items keeps the caller's alias (self.__items = initial),
allowing out-of-band mutations to bypass revision updates; change the assignment
to take ownership by copying the input (e.g., self.__items = list(initial) or
initial.copy()) and audit any places that expose the raw list
(methods/properties referencing __items) to return a copy instead so mutations
always go through the SafeList APIs that increment revision.
lumibot/backtesting/backtesting_broker.py (1)

390-400: ⚠️ Potential issue | 🟠 Major

Fallback path drops the asset filter.

On Line 393, the fallback returns all active orders for the strategy, even when asset was passed. That can over-select orders and affect downstream cancellation/fill logic.

💡 Proposed fix
         except Exception:
             # Fallback to the slower path if internal buckets are unavailable.
             orders = self.get_tracked_orders(strategy=strategy)
-            return [o for o in orders if o.is_active()] if orders else []
+            if not orders:
+                return []
+            return [
+                o for o in orders
+                if o.is_active() and (asset is None or getattr(o, "asset", None) == asset)
+            ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/backtesting/backtesting_broker.py` around lines 390 - 400, The
fallback return in backtesting_broker.py currently ignores the asset filter and
returns all active orders for the strategy; update the fallback so it also
filters by asset when asset is provided. After retrieving orders via
get_tracked_orders(strategy=strategy), return only orders where o.is_active() is
True and (asset is None or getattr(o, "asset", None) == asset) — use the
existing get_tracked_orders, is_active, strategy and asset attributes to
implement this filter.
🧹 Nitpick comments (5)
lumibot/entities/asset.py (1)

274-307: These one-time caches need an immutability invariant.

_cached_hash, _cached_asset_type_key, and _cached_minimal_dict are computed once and then reused by __eq__, __hash__, and to_minimal_dict(), but the corresponding fields are still publicly mutable. Please either freeze those fields after construction or add cache refreshes in setters, and verify that no caller mutates them today.

As per coding guidelines, add comments explaining 'why/invariants' for non-obvious logic.

Also applies to: 372-384, 489-489

lumibot/strategies/strategy.py (1)

1315-1335: Consider adding a brief comment explaining the single-entry cache design.

The cache.clear() before storing means only one configuration is cached at a time. While this is a safe and correct approach to prevent stale data, adding a brief inline comment would clarify the intent for future maintainers.

📝 Suggested comment for clarity
 cache = getattr(self, "_positions_cache", None)
 if cache is None:
     cache = {}
     self._positions_cache = cache
+# Single-entry cache: clear to avoid stale results from previous revisions/parameters
 cache.clear()
 cache[cache_key] = tuple(result)
 return list(result)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/strategies/strategy.py` around lines 1315 - 1335, Add a brief inline
comment explaining that the cache is intentionally single-entry: the code clears
the entire self._positions_cache (cache.clear()) before storing a new entry for
cache_key (constructed from filled_positions_revision, include_cash,
quote_asset) so only the latest configuration returned by
broker.get_tracked_positions(self.name) is kept; place this comment near the
cache = getattr(self, "_positions_cache", None) / cache.clear() block inside the
method that builds `result` to make the single-entry design explicit for future
maintainers.
lumibot/strategies/_strategy.py (1)

835-842: Persist the computed asset-type key to realize the intended cache hit path.

Both hot paths read _cached_asset_type_key, but neither writes it after computation, so normalization work is repeated every call.

♻️ Proposed refactor
             def _asset_type_key(asset_obj):
                 cached_asset_type_key = getattr(asset_obj, "_cached_asset_type_key", None)
                 if cached_asset_type_key is not None:
                     return cached_asset_type_key
                 raw_asset_type = getattr(asset_obj, "asset_type", "")
                 raw_asset_type = getattr(raw_asset_type, "value", raw_asset_type)
-                return str(raw_asset_type).lower()
+                asset_type_key = str(raw_asset_type).lower()
+                try:
+                    setattr(asset_obj, "_cached_asset_type_key", asset_type_key)
+                except Exception:
+                    pass
+                return asset_type_key
@@
         base_asset_type = getattr(base_asset, "_cached_asset_type_key", None)
         if base_asset_type is None:
             base_asset_type = getattr(base_asset, "asset_type", None)
             base_asset_type = getattr(base_asset_type, "value", base_asset_type)
             base_asset_type = str(base_asset_type).lower()
+            try:
+                setattr(base_asset, "_cached_asset_type_key", base_asset_type)
+            except Exception:
+                pass

Also applies to: 1005-1010

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/strategies/_strategy.py` around lines 835 - 842, The helper
_asset_type_key computes a normalized string but never stores it, so the
_cached_asset_type_key read path never hits; after computing raw_asset_type and
normalizing it to str(...).lower(), assign that value back to
asset_obj._cached_asset_type_key before returning to enable caching. Apply the
identical change to the other duplicate helper (the second occurrence around the
similar logic) so both functions persist the computed key onto the asset_obj
attribute.
lumibot/tools/helpers.py (1)

315-320: Use atomic file replacement for disk cache writes.

Current to_pickle() directly to the target path can leave partial files if interrupted. A temp-write + os.replace() avoids corrupted cache artifacts.

💡 Proposed fix
         if disk_cache_path:
             try:
                 os.makedirs(os.path.dirname(disk_cache_path), exist_ok=True)
-                days.to_pickle(disk_cache_path)
+                tmp_path = f"{disk_cache_path}.tmp.{os.getpid()}"
+                days.to_pickle(tmp_path)
+                os.replace(tmp_path, disk_cache_path)
             except Exception:
                 pass
     if disk_cache_path:
         try:
             os.makedirs(os.path.dirname(disk_cache_path), exist_ok=True)
-            days.to_pickle(disk_cache_path)
+            tmp_path = f"{disk_cache_path}.tmp.{os.getpid()}"
+            days.to_pickle(tmp_path)
+            os.replace(tmp_path, disk_cache_path)
         except Exception:
             pass

Also applies to: 332-337

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/tools/helpers.py` around lines 315 - 320, The current disk cache
write uses days.to_pickle(disk_cache_path) which can leave partial files if
interrupted; change both occurrences (the block around days.to_pickle and the
similar block at lines 332-337) to write to a temporary file in the same
directory (e.g., using tempfile.NamedTemporaryFile or mkstemp), flush/close (and
optionally fsync) the temp file, then atomically rename/replace it to
disk_cache_path with os.replace(); still create the parent dir with os.makedirs
as shown and catch/log exceptions, but avoid writing directly to disk_cache_path
to prevent corrupted cache artifacts.
lumibot/strategies/strategy_executor.py (1)

227-233: Document the 1.9 second throttle invariant.

1.9 is now a behavior boundary, but it's not obvious why this should be 1.9 instead of 2, a config value, or a named constant. A short invariant comment or constant name would make this much easier to maintain.

As per coding guidelines, "add comments explaining 'why/invariants' for non-obvious logic."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/strategies/strategy_executor.py` around lines 227 - 233, The
1.9-second throttle is a non-obvious magic number; replace the literal 1.9 with
a named constant (e.g., SNAPSHOT_THROTTLE_SECONDS or DEFAULT_SNAPSHOT_INTERVAL)
or make it configurable, update the check in the block that computes
should_capture_snapshot using data_source, last_logging_time, and
_last_logging_time, and add a short comment explaining the invariant/why it is
1.9 (e.g., to avoid double-logging within ~2s for downstream consumers) so
future readers understand the behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lumibot/backtesting/backtesting_broker.py`:
- Around line 618-627: The branch in get_time_to_close that handles idx >=
len(self._trading_days) returns None but does not mark end-of-data; call
self._mark_end_of_trading_days(now) before setting
self._time_to_close_cache_datetime and self._time_to_close_cache_value = None
and returning None so the end-of-trading-days state is consistent with
get_time_to_open/_get_next_trading_day and the cache reflects the terminal
condition for subsequent calls.

In `@lumibot/data_sources/yahoo_data.py`:
- Around line 287-312: The daily fast-path uses raw self._datetime instead of
the already-normalized current_dt from to_default_timezone(), which can shift
the end_filter across day boundaries; inside _get_filtered_end_index replace dt
= self._datetime.replace(...) with dt = current_dt.replace(...) (i.e., use
current_dt when computing dt and end_filter) so the timezone-normalized datetime
is used for searchsorted calculations.

In `@lumibot/entities/order.py`:
- Around line 1403-1409: The minimal order payload currently returns raw
self.quantity which can be a Decimal or NumPy scalar and will break json.dumps
in DataSourceBacktesting._update_datetime; update the "qty" value in the result
dict (where result is built in lumibot.entities.order, referencing
self.quantity) to be JSON-serializable by coercing it (e.g., use
float(self.quantity) or int(...) where appropriate) and fall back to 0 when
falsy so the "qty" field is always a JSON-serializable number.

In `@lumibot/strategies/strategy_executor.py`:
- Around line 239-254: Wrap the entire snapshot-building block (calls to
strategy.get_positions, getattr(self.strategy, "_initial_budget"),
broker.get_active_tracked_orders, broker.get_tracked_orders and any
to_minimal_dict conversions) in a broad try/cept that prevents exceptions from
escaping safe_sleep()/_process_pandas_daily_data(); on any exception, set
payload["positions"] and payload["orders"] to None (leave
payload["initial_budget"] via getattr if available) so the backtest continues
with a minimal payload; also defensively catch exceptions around each
to_minimal_dict conversion when building lists so a single bad item won't fail
the whole snapshot.

---

Outside diff comments:
In `@lumibot/backtesting/backtesting_broker.py`:
- Around line 390-400: The fallback return in backtesting_broker.py currently
ignores the asset filter and returns all active orders for the strategy; update
the fallback so it also filters by asset when asset is provided. After
retrieving orders via get_tracked_orders(strategy=strategy), return only orders
where o.is_active() is True and (asset is None or getattr(o, "asset", None) ==
asset) — use the existing get_tracked_orders, is_active, strategy and asset
attributes to implement this filter.

In `@lumibot/entities/position.py`:
- Around line 113-120: The Position.quantity accessor is removing small balances
using a hard-coded epsilon (abs(result) < 1e-6) which can hide real crypto
positions; remove the dust-filtering check from Position.quantity so it simply
returns self._quantity_float, and if you need to suppress float noise implement
a separate presentation helper (e.g., format_quantity or a UI-layer clamp) or
make a configurable method like Position.quantity_clamped(epsilon) for
display/valuation only.

In `@lumibot/trading_builtins/safe_list.py`:
- Around line 13-17: The backing list assigned to self.__items keeps the
caller's alias (self.__items = initial), allowing out-of-band mutations to
bypass revision updates; change the assignment to take ownership by copying the
input (e.g., self.__items = list(initial) or initial.copy()) and audit any
places that expose the raw list (methods/properties referencing __items) to
return a copy instead so mutations always go through the SafeList APIs that
increment revision.

---

Nitpick comments:
In `@lumibot/strategies/_strategy.py`:
- Around line 835-842: The helper _asset_type_key computes a normalized string
but never stores it, so the _cached_asset_type_key read path never hits; after
computing raw_asset_type and normalizing it to str(...).lower(), assign that
value back to asset_obj._cached_asset_type_key before returning to enable
caching. Apply the identical change to the other duplicate helper (the second
occurrence around the similar logic) so both functions persist the computed key
onto the asset_obj attribute.

In `@lumibot/strategies/strategy_executor.py`:
- Around line 227-233: The 1.9-second throttle is a non-obvious magic number;
replace the literal 1.9 with a named constant (e.g., SNAPSHOT_THROTTLE_SECONDS
or DEFAULT_SNAPSHOT_INTERVAL) or make it configurable, update the check in the
block that computes should_capture_snapshot using data_source,
last_logging_time, and _last_logging_time, and add a short comment explaining
the invariant/why it is 1.9 (e.g., to avoid double-logging within ~2s for
downstream consumers) so future readers understand the behavior.

In `@lumibot/strategies/strategy.py`:
- Around line 1315-1335: Add a brief inline comment explaining that the cache is
intentionally single-entry: the code clears the entire self._positions_cache
(cache.clear()) before storing a new entry for cache_key (constructed from
filled_positions_revision, include_cash, quote_asset) so only the latest
configuration returned by broker.get_tracked_positions(self.name) is kept; place
this comment near the cache = getattr(self, "_positions_cache", None) /
cache.clear() block inside the method that builds `result` to make the
single-entry design explicit for future maintainers.

In `@lumibot/tools/helpers.py`:
- Around line 315-320: The current disk cache write uses
days.to_pickle(disk_cache_path) which can leave partial files if interrupted;
change both occurrences (the block around days.to_pickle and the similar block
at lines 332-337) to write to a temporary file in the same directory (e.g.,
using tempfile.NamedTemporaryFile or mkstemp), flush/close (and optionally
fsync) the temp file, then atomically rename/replace it to disk_cache_path with
os.replace(); still create the parent dir with os.makedirs as shown and
catch/log exceptions, but avoid writing directly to disk_cache_path to prevent
corrupted cache artifacts.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8dbe4f79-a68c-4bbc-b801-92106b021789

📥 Commits

Reviewing files that changed from the base of the PR and between 4ed1e62 and 5a13897.

📒 Files selected for processing (17)
  • lumibot/backtesting/backtesting_broker.py
  • lumibot/brokers/broker.py
  • lumibot/data_sources/data_source_backtesting.py
  • lumibot/data_sources/yahoo_data.py
  • lumibot/entities/asset.py
  • lumibot/entities/order.py
  • lumibot/entities/position.py
  • lumibot/strategies/_strategy.py
  • lumibot/strategies/strategy.py
  • lumibot/strategies/strategy_executor.py
  • lumibot/tools/helpers.py
  • lumibot/trading_builtins/safe_list.py
  • tests/test_cash.py
  • tests/test_helpers.py
  • tests/test_strategy_dividend_cash_batch.py
  • tests/test_strategy_executor_progress_payload.py
  • tests/test_yahoo_daily_last_price_fastpath.py

@mpelteshki
Copy link
Collaborator Author

mpelteshki commented Mar 16, 2026

Addressed the safe CodeRabbit items in 3afcb7a:

  • end-of-data marking in get_time_to_close()
  • timezone-normalized daily Yahoo filter
  • JSON-safe minimal order qty serialization
  • best-effort backtest progress snapshot serialization
  • SafeList ownership of the initial backing list
  • asset-filter preservation in the active-order fallback
  • added focused regression coverage for these paths

I did not apply the suggested Position.quantity dust-filter change in this PR because that one is behavior-sensitive and can affect parity for tiny crypto balances.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
lumibot/backtesting/backtesting_broker.py (1)

375-398: ⚠️ Potential issue | 🟠 Major

Override narrows API contract: strategy parameter must accept None and Strategy objects like the base class.

The override signature def get_active_tracked_orders(self, strategy: str, asset=None) requires strategy as a string, while the base class Broker.get_active_tracked_orders(self, strategy=None, asset=None) accepts None and Strategy objects. This violates Liskov Substitution Principle and will cause TypeError in any polymorphic code calling without arguments or passing a Strategy instance.

The base class uses _strategy_name_from_input() (broker.py:834) to handle Strategy objects and None; the override must do the same. Apply the proposed fix to match the base contract:

  • Change strategy: str to strategy=None
  • Call strategy_name = self._strategy_name_from_input(strategy) to normalize input
  • Update the comparison at line 402 to check if strategy_name is not None and getattr(order, "strategy", None) != strategy_name
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/backtesting/backtesting_broker.py` around lines 375 - 398, The
override get_active_tracked_orders currently restricts strategy to str; change
its signature to accept strategy=None to match Broker.get_active_tracked_orders,
then normalize the input by calling strategy_name =
self._strategy_name_from_input(strategy) near the top of the method, and use
strategy_name (not the original strategy) when filtering orders (replace the
asset/strategy comparison with a check like: if strategy_name is not None and
getattr(order, "strategy", None) != strategy_name) while keeping the existing
asset check and the fallback to get_tracked_orders.
lumibot/strategies/strategy_executor.py (1)

21-22: ⚠️ Potential issue | 🟡 Minor

Duplicate import of Asset.

Line 21 already imports Asset from lumibot.entities. Line 22 is redundant.

 from lumibot.entities import Asset, Order
-from lumibot.entities import Asset
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/strategies/strategy_executor.py` around lines 21 - 22, Remove the
duplicate import of Asset in strategy_executor.py: keep a single import from
lumibot.entities that includes both Asset and Order (referencing the existing
import lines that use Asset and Order) and delete the redundant "from
lumibot.entities import Asset" line so there is only one import statement for
these symbols.
🧹 Nitpick comments (5)
lumibot/data_sources/yahoo_data.py (3)

200-205: Hot-path info logging will erode the speedup.

This block runs on every bar pull; info logging here adds avoidable overhead. Prefer debug (or guard by log level) for this path.

Suggested fix
-        logger.info(
+        logger.debug(
             "Inside _pull_source_symbol_bars for %s: self._datetime = %s, requesting length %s",
             asset.symbol,
             self._datetime,
             length,
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/data_sources/yahoo_data.py` around lines 200 - 205, The hot-path
logging in _pull_source_symbol_bars uses logger.info and should be changed to a
lower-overhead level or guarded: replace the logger.info call in
_pull_source_symbol_bars with logger.debug (or wrap the logger.info call in a
log-level check such as logger.isEnabledFor(logging.INFO)) so the per-bar
overhead is eliminated; update the logging invocation that currently references
asset.symbol, self._datetime, and length accordingly.

376-386: Please document the daily-path invariant explicitly in-code.

The daily fast path caches and returns an open-price-derived value for the previous fully-closed session. Add a short invariant comment/docstring in _get_last_daily_open_price to prevent future regressions/misuse.

As per coding guidelines: **/*.{py,ts,tsx,js}: add comments explaining 'why/invariants' for non-obvious logic.

Also applies to: 424-449

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/data_sources/yahoo_data.py` around lines 376 - 386, Add an explicit
invariant comment/docstring to the daily fast-path logic and the helper
_get_last_daily_open_price describing that this path returns an
open-price-derived value for the previous fully-closed trading session (i.e., it
uses the prior session's open as the canonical "last price" and is cached
per-day). Update the docstring for function _get_last_daily_open_price and add a
short inline comment in the fast-path block (the code that checks
isinstance(timestep, str) and "day" in timestep.lower()) clarifying the
prior-session/open-derived semantics, cache key behavior, and why the cache is
invalidated when current_date changes so future maintainers won't misuse the
API; apply the same explanatory comment to the equivalent block referenced
around the other occurrence (lines 424-449).

267-271: Use logger.exception instead of traceback.print_exc() in the fetch loop.

Direct traceback printing bypasses structured logging and is noisier in production. logger.exception(...) keeps stack traces in the logger pipeline and is the idiomatic approach for exception logging in Python.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/data_sources/yahoo_data.py` around lines 267 - 271, In
_pull_source_symbol_bars replace the explicit traceback.print_exc() and the
logger.warning call with a single logger.exception call so the stack trace is
captured in the logging pipeline; specifically, inside the except Exception as e
block for symbol 'sym' use logger.exception("_pull_source_symbol_bars: Error
fetching data for symbol %s", sym) (or include str(e) in the message) instead of
calling traceback.print_exc(), ensuring the logger variable is used to record
the full exception and stack trace.
tests/test_safe_list.py (1)

6-12: Add regression cases for revision mutation semantics.

This test is good, but it doesn’t cover the newly introduced revision behavior (increment on mutation, no increment on no-op paths) or SafeOrderDict revision updates.

As per coding guidelines tests/**/*.py: Add unit tests for any new functionality.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_safe_list.py` around lines 6 - 12, Add regression unit tests that
assert the new revision mutation semantics: for SafeList, verify that mutating
operations (e.g., append) increment the object's revision (check
SafeList.revision before and after a real change) and that no-op paths (calling
a method that results in no change) do not increment revision; likewise add
tests for SafeOrderDict to ensure its revision increments on actual mutations
(insertion/update/delete) and remains unchanged for no-op operations. Use the
existing test pattern (create instance with RLock(), capture revision, perform
operation, then assert revision changed or not and that contents match expected)
and add descriptive test functions like test_safe_list_revision_on_mutation,
test_safe_list_revision_noop, test_safe_order_dict_revision_on_mutation, and
test_safe_order_dict_revision_noop to cover both behaviors.
lumibot/strategies/strategy.py (1)

2487-2515: Optional: cache failed sleeptime parses too.

Line 2497/2501 returns None without updating the cache, so invalid values re-run regex/parsing on every call. Caching None for the current input would make this fully memoized.

♻️ Suggested micro-optimization
-        if isinstance(value, str):
+        if isinstance(value, str):
             normalized = value.strip().upper().replace(" ", "")
             if not normalized:
-                return None
+                self._sleeptime_seconds_cache_input = value
+                self._sleeptime_seconds_cache_value = None
+                return None
             match = re.match(r"^(\d+(?:\.\d+)?)([A-Z]*)$", normalized)
             if not match:
-                return None
+                self._sleeptime_seconds_cache_input = value
+                self._sleeptime_seconds_cache_value = None
+                return None
             qty = float(match.group(1))
             suffix = match.group(2) or "M"
@@
             result = qty * multiplier
             self._sleeptime_seconds_cache_input = value
             self._sleeptime_seconds_cache_value = result
             return result
-        return None
+        self._sleeptime_seconds_cache_input = value
+        self._sleeptime_seconds_cache_value = None
+        return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/strategies/strategy.py` around lines 2487 - 2515, The sleeptime
parsing early-returns that yield None don't update the memoization, causing
repeated work; before each return None in the method (specifically the branches
checking if not normalized and if not match), set
self._sleeptime_seconds_cache_input = value and
self._sleeptime_seconds_cache_value = None so invalid inputs are cached; also
consider adding the same cache assignment for any other code paths that
implicitly return None to ensure full memoization of failures.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lumibot/data_sources/yahoo_data.py`:
- Around line 249-251: The cache key for stored symbol data uses only asset,
causing mixed-timestep collisions; update _get_source_symbol_data and any cache
accesses that read/write self._data_store to include interval (and any other
relevant request params like start/end or adjusted params) in the cache key
(e.g., use a tuple or formatted string key like (asset, interval)) so lookups
and stores are consistent for different timesteps; ensure both the early return
that checks self._data_store and the later assignment that writes into
self._data_store use the same composite key so 1d/15m/1m requests do not share
cached data.

---

Outside diff comments:
In `@lumibot/backtesting/backtesting_broker.py`:
- Around line 375-398: The override get_active_tracked_orders currently
restricts strategy to str; change its signature to accept strategy=None to match
Broker.get_active_tracked_orders, then normalize the input by calling
strategy_name = self._strategy_name_from_input(strategy) near the top of the
method, and use strategy_name (not the original strategy) when filtering orders
(replace the asset/strategy comparison with a check like: if strategy_name is
not None and getattr(order, "strategy", None) != strategy_name) while keeping
the existing asset check and the fallback to get_tracked_orders.

In `@lumibot/strategies/strategy_executor.py`:
- Around line 21-22: Remove the duplicate import of Asset in
strategy_executor.py: keep a single import from lumibot.entities that includes
both Asset and Order (referencing the existing import lines that use Asset and
Order) and delete the redundant "from lumibot.entities import Asset" line so
there is only one import statement for these symbols.

---

Nitpick comments:
In `@lumibot/data_sources/yahoo_data.py`:
- Around line 200-205: The hot-path logging in _pull_source_symbol_bars uses
logger.info and should be changed to a lower-overhead level or guarded: replace
the logger.info call in _pull_source_symbol_bars with logger.debug (or wrap the
logger.info call in a log-level check such as logger.isEnabledFor(logging.INFO))
so the per-bar overhead is eliminated; update the logging invocation that
currently references asset.symbol, self._datetime, and length accordingly.
- Around line 376-386: Add an explicit invariant comment/docstring to the daily
fast-path logic and the helper _get_last_daily_open_price describing that this
path returns an open-price-derived value for the previous fully-closed trading
session (i.e., it uses the prior session's open as the canonical "last price"
and is cached per-day). Update the docstring for function
_get_last_daily_open_price and add a short inline comment in the fast-path block
(the code that checks isinstance(timestep, str) and "day" in timestep.lower())
clarifying the prior-session/open-derived semantics, cache key behavior, and why
the cache is invalidated when current_date changes so future maintainers won't
misuse the API; apply the same explanatory comment to the equivalent block
referenced around the other occurrence (lines 424-449).
- Around line 267-271: In _pull_source_symbol_bars replace the explicit
traceback.print_exc() and the logger.warning call with a single logger.exception
call so the stack trace is captured in the logging pipeline; specifically,
inside the except Exception as e block for symbol 'sym' use
logger.exception("_pull_source_symbol_bars: Error fetching data for symbol %s",
sym) (or include str(e) in the message) instead of calling
traceback.print_exc(), ensuring the logger variable is used to record the full
exception and stack trace.

In `@lumibot/strategies/strategy.py`:
- Around line 2487-2515: The sleeptime parsing early-returns that yield None
don't update the memoization, causing repeated work; before each return None in
the method (specifically the branches checking if not normalized and if not
match), set self._sleeptime_seconds_cache_input = value and
self._sleeptime_seconds_cache_value = None so invalid inputs are cached; also
consider adding the same cache assignment for any other code paths that
implicitly return None to ensure full memoization of failures.

In `@tests/test_safe_list.py`:
- Around line 6-12: Add regression unit tests that assert the new revision
mutation semantics: for SafeList, verify that mutating operations (e.g., append)
increment the object's revision (check SafeList.revision before and after a real
change) and that no-op paths (calling a method that results in no change) do not
increment revision; likewise add tests for SafeOrderDict to ensure its revision
increments on actual mutations (insertion/update/delete) and remains unchanged
for no-op operations. Use the existing test pattern (create instance with
RLock(), capture revision, perform operation, then assert revision changed or
not and that contents match expected) and add descriptive test functions like
test_safe_list_revision_on_mutation, test_safe_list_revision_noop,
test_safe_order_dict_revision_on_mutation, and
test_safe_order_dict_revision_noop to cover both behaviors.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cca65e1d-27a3-48f3-8aff-2f6342b75ba0

📥 Commits

Reviewing files that changed from the base of the PR and between 5a13897 and 3afcb7a.

📒 Files selected for processing (9)
  • lumibot/backtesting/backtesting_broker.py
  • lumibot/data_sources/yahoo_data.py
  • lumibot/strategies/strategy.py
  • lumibot/strategies/strategy_executor.py
  • lumibot/trading_builtins/safe_list.py
  • tests/test_backtesting_broker.py
  • tests/test_minimal_serialization.py
  • tests/test_safe_list.py
  • tests/test_strategy_executor_progress_payload.py

Comment on lines 249 to +251
if asset in self._data_store:
data = self._data_store[asset]
else:
# Try each symbol format until we get data
data = None
successful_symbol = None

for sym in symbols_to_try:
logger.info(f"Attempting to fetch data for symbol: {sym}")
try:
# Fetch data using the helper without restricting dates here
data = YahooHelper.get_symbol_data(
sym,
interval=interval,
auto_adjust=self.auto_adjust,
last_needed_datetime=self.datetime_end, # Keep this if needed for caching logic
)
if data is not None and data.shape[0] > 0:
logger.info(f"Successfully fetched data for symbol: {sym}")
successful_symbol = sym
break
except Exception as e:
logger.warning(f"_pull_source_symbol_bars: Error fetching data for symbol {sym}: {str(e)}")
# Print the traceback for debugging
import traceback
traceback.print_exc()



if data is None or data.shape[0] == 0:
# Use self.datetime_start and self.datetime_end in the error message for clarity
message = f"{self.SOURCE} did not return data for symbol {asset.symbol}. Tried: {symbols_to_try}. Make sure this symbol is valid and data exists for the period {self.datetime_start} to {self.datetime_end}."
logger.error(message)
return None

data = self._append_data(asset, data)
return self._data_store[asset]

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Cache key misses interval, so mixed-timestep requests can return wrong data.

_get_source_symbol_data() computes interval but cache lookup/storage uses only asset. If the same asset is requested with different timesteps (1d, 15m, 1m), cached frames/arrays can be reused incorrectly.

Suggested fix
-    def _append_data(self, asset, data):
+    def _append_data(self, cache_key, data):
@@
-        self._data_store[asset] = data
-        self._data_index_values[asset] = data.index.values
-        self._data_open_values[asset] = data["open"].to_numpy(copy=False)
+        self._data_store[cache_key] = data
+        self._data_index_values[cache_key] = data.index.values
+        self._data_open_values[cache_key] = data["open"].to_numpy(copy=False)
         return data
@@
-        if asset in self._data_store:
-            return self._data_store[asset]
+        cache_key = (asset, interval)
+        if cache_key in self._data_store:
+            return self._data_store[cache_key]
@@
-        data = self._append_data(asset, data)
+        data = self._append_data(cache_key, data)

Also applies to: 282-285

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lumibot/data_sources/yahoo_data.py` around lines 249 - 251, The cache key for
stored symbol data uses only asset, causing mixed-timestep collisions; update
_get_source_symbol_data and any cache accesses that read/write self._data_store
to include interval (and any other relevant request params like start/end or
adjusted params) in the cache key (e.g., use a tuple or formatted string key
like (asset, interval)) so lookups and stores are consistent for different
timesteps; ensure both the early return that checks self._data_store and the
later assignment that writes into self._data_store use the same composite key so
1d/15m/1m requests do not share cached data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant