Skip to content

Conversation

@Satvik-Singh192
Copy link
Contributor

… pytest failing a lot

✨ Add tqdm-Based Progress Bars for Long-Running Operations

Closes #32

📝 Description

This PR introduces tqdm-based progress bars to improve user experience during long-running operations such as:

  • Synthetic data generation
  • Factor computation
  • Backtest execution

Previously, these steps ran silently, making it difficult for users to estimate how long a process would take. This PR resolves that by adding lightweight progress indicators without altering existing workflows.


✅ Changes Made

1. Synthetic Data Generation

  • Added a tqdm progress bar that displays generation progress for each symbol.

2. Factor Computation

  • Added tqdm progress bars for factor-wise averaging operations.

3. Backtest Execution

  • Wrapped the internal backtest loop with tqdm to visualize iteration progress.

4. Minimal & Non-Intrusive Changes

  • No refactoring or redesign of core functionality.
  • Only small patches added where needed to integrate progress bars cleanly.

[x] I have read the contributing guidelines

@ayushkrtiwari ayushkrtiwari requested a review from Copilot November 9, 2025 09:29
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR attempts to add progress bars using tqdm to various operations in the CLI, but introduces multiple critical bugs that break core functionality. The changes include adding tqdm import, modifying data generation to use a loop, attempting to add progress bars to factor computation, and monkey-patching the VectorizedBacktest.run method.

Key changes attempted:

  • Addition of tqdm progress bars to data generation, factor computation, and backtesting
  • Modified data generation to generate symbols one at a time instead of in batch
  • Monkey-patched VectorizedBacktest.run method to show progress during backtesting

Reviewed Changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/quant_research_starter/cli.py Adds tqdm import and attempts to integrate progress bars throughout CLI commands, but introduces multiple bugs including broken monkey-patching, incorrect tqdm usage, and performance regressions
.gitignore Adds myenv/ to ignore virtual environment directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +193 to +201
results = pd.DataFrame({"returns": returns}, index=idx[1:])
results["portfolio_value"] = (
self.initial_capital * (1 + results["returns"]).cumprod()
)
results["final_value"] = results["portfolio_value"].iloc[-1]
results["total_return"] = results["final_value"] / self.initial_capital - 1

return results

Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return value structure is inconsistent with the original VectorizedBacktest.run() method. The original returns a dictionary (from _generate_results()) containing multiple keys like 'positions', 'cash', 'trades', etc., but this implementation returns a DataFrame. This breaking change will cause the code at line 213 (metrics_calc = RiskMetrics(results["returns"])) to fail because results is now a DataFrame, not a dictionary.

Suggested change
results = pd.DataFrame({"returns": returns}, index=idx[1:])
results["portfolio_value"] = (
self.initial_capital * (1 + results["returns"]).cumprod()
)
results["final_value"] = results["portfolio_value"].iloc[-1]
results["total_return"] = results["final_value"] / self.initial_capital - 1
return results
results_df = pd.DataFrame({"returns": returns}, index=idx[1:])
results_df["portfolio_value"] = (
self.initial_capital * (1 + results_df["returns"]).cumprod()
)
final_value = results_df["portfolio_value"].iloc[-1]
total_return = final_value / self.initial_capital - 1
# Return a dictionary matching the original structure
return {
"returns": results_df["returns"],
"portfolio_value": results_df["portfolio_value"],
"final_value": final_value,
"total_return": total_return,
# Add other keys as needed to match the original structure
}

Copilot uses AI. Check for mistakes.
Comment on lines +34 to 42
all_prices = []
for _ in tqdm(range(symbols), desc="Generating price series"):
prices = generator.generate_price_data(
n_symbols=1, days=days, start_date="2020-01-01"
)
all_prices.append(prices)

prices = pd.concat(all_prices, axis=1)

Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generating price data one symbol at a time in a loop defeats the purpose of the vectorized implementation in SyntheticDataGenerator.generate_price_data(). The original method was designed to efficiently generate multiple symbols at once, including correlation structure between them. This change removes the ability to generate correlated price series and introduces unnecessary overhead. Additionally, all generated symbols will have the same name "SYMBOL_00" since n_symbols=1 is used repeatedly, which will cause issues when concatenating.

Suggested change
all_prices = []
for _ in tqdm(range(symbols), desc="Generating price series"):
prices = generator.generate_price_data(
n_symbols=1, days=days, start_date="2020-01-01"
)
all_prices.append(prices)
prices = pd.concat(all_prices, axis=1)
prices = generator.generate_price_data(
n_symbols=symbols, days=days, start_date="2020-01-01"
)

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +106
k: tqdm(v.mean(axis=1), desc=f"Averaging {k} factor")
for k, v in factor_data.items()
Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tqdm usage here is incorrect. tqdm is being called on the result of v.mean(axis=1) (a pandas Series), which will not display a progress bar. Instead, it will just return the Series unchanged. To show progress during dictionary comprehension, you should wrap the iterator (e.g., factor_data.items()) with tqdm, not the values being computed.

Suggested change
k: tqdm(v.mean(axis=1), desc=f"Averaging {k} factor")
for k, v in factor_data.items()
k: v.mean(axis=1)
for k, v in tqdm(factor_data.items(), desc="Averaging factors")

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +193
returns = []
idx = self.prices.index

for i in tqdm(range(1, len(idx)), desc="Running backtest"):
ret = self._compute_daily_return(
self.prices.iloc[i - 1],
self.prices.iloc[i],
weight_scheme,
)
returns.append(ret)

results = pd.DataFrame({"returns": returns}, index=idx[1:])
Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is attempting to monkey-patch the VectorizedBacktest.run method but references self._compute_daily_return() which does not exist in the VectorizedBacktest class. The original run method uses a vectorized approach with pct_change() instead of iterating through each day. This will cause an AttributeError when the backtest runs.

Suggested change
returns = []
idx = self.prices.index
for i in tqdm(range(1, len(idx)), desc="Running backtest"):
ret = self._compute_daily_return(
self.prices.iloc[i - 1],
self.prices.iloc[i],
weight_scheme,
)
returns.append(ret)
results = pd.DataFrame({"returns": returns}, index=idx[1:])
# Compute daily returns for each symbol
daily_returns = self.prices.pct_change().fillna(0)
# Compute portfolio weights (assume equal weight or rank-based)
if weight_scheme == "rank":
weights = self.signals.rank(axis=1, pct=True)
else:
weights = self.signals
# Align weights and returns
weights = weights.loc[daily_returns.index]
# Compute portfolio returns (weighted average across symbols)
portfolio_returns = (daily_returns * weights).mean(axis=1)
results = pd.DataFrame({"returns": portfolio_returns}, index=self.prices.index)

Copilot uses AI. Check for mistakes.
Comment on lines +181 to 204
def run_with_progress(self, weight_scheme="rank"):
returns = []
idx = self.prices.index

for i in tqdm(range(1, len(idx)), desc="Running backtest"):
ret = self._compute_daily_return(
self.prices.iloc[i - 1],
self.prices.iloc[i],
weight_scheme,
)
returns.append(ret)

results = pd.DataFrame({"returns": returns}, index=idx[1:])
results["portfolio_value"] = (
self.initial_capital * (1 + results["returns"]).cumprod()
)
results["final_value"] = results["portfolio_value"].iloc[-1]
results["total_return"] = results["final_value"] / self.initial_capital - 1

return results

VectorizedBacktest.run = run_with_progress

backtest = VectorizedBacktest(
Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Monkey-patching a class method inside a CLI command function is a poor practice that affects global state and can lead to unexpected behavior if the command is called multiple times or in tests. This breaks the original vectorized implementation of VectorizedBacktest.run(). Consider either: (1) creating a subclass of VectorizedBacktest with the desired progress bar implementation, or (2) adding progress bar support directly to the VectorizedBacktest class in its own module.

Suggested change
def run_with_progress(self, weight_scheme="rank"):
returns = []
idx = self.prices.index
for i in tqdm(range(1, len(idx)), desc="Running backtest"):
ret = self._compute_daily_return(
self.prices.iloc[i - 1],
self.prices.iloc[i],
weight_scheme,
)
returns.append(ret)
results = pd.DataFrame({"returns": returns}, index=idx[1:])
results["portfolio_value"] = (
self.initial_capital * (1 + results["returns"]).cumprod()
)
results["final_value"] = results["portfolio_value"].iloc[-1]
results["total_return"] = results["final_value"] / self.initial_capital - 1
return results
VectorizedBacktest.run = run_with_progress
backtest = VectorizedBacktest(
class VectorizedBacktestWithProgress(VectorizedBacktest):
def run(self, weight_scheme="rank"):
returns = []
idx = self.prices.index
for i in tqdm(range(1, len(idx)), desc="Running backtest"):
ret = self._compute_daily_return(
self.prices.iloc[i - 1],
self.prices.iloc[i],
weight_scheme,
)
returns.append(ret)
results = pd.DataFrame({"returns": returns}, index=idx[1:])
results["portfolio_value"] = (
self.initial_capital * (1 + results["returns"]).cumprod()
)
results["final_value"] = results["portfolio_value"].iloc[-1]
results["total_return"] = results["final_value"] / self.initial_capital - 1
return results
backtest = VectorizedBacktestWithProgress(

Copilot uses AI. Check for mistakes.
Comment on lines +182 to +193
returns = []
idx = self.prices.index

for i in tqdm(range(1, len(idx)), desc="Running backtest"):
ret = self._compute_daily_return(
self.prices.iloc[i - 1],
self.prices.iloc[i],
weight_scheme,
)
returns.append(ret)

results = pd.DataFrame({"returns": returns}, index=idx[1:])
Copy link

Copilot AI Nov 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation defeats the purpose of a "vectorized" backtest. The original implementation uses pandas vectorized operations for performance, but this change introduces a Python loop that iterates through each day individually, which will be significantly slower for large datasets. This is a performance regression, not an improvement.

Suggested change
returns = []
idx = self.prices.index
for i in tqdm(range(1, len(idx)), desc="Running backtest"):
ret = self._compute_daily_return(
self.prices.iloc[i - 1],
self.prices.iloc[i],
weight_scheme,
)
returns.append(ret)
results = pd.DataFrame({"returns": returns}, index=idx[1:])
# Vectorized implementation for daily returns
# Calculate daily returns for each symbol
price_returns = self.prices.pct_change().iloc[1:]
# Get weights for each day and symbol
weights = self._get_weights(weight_scheme)
weights = weights.loc[price_returns.index, price_returns.columns]
# Calculate portfolio returns as weighted sum
portfolio_returns = (weights * price_returns).sum(axis=1)
results = pd.DataFrame({"returns": portfolio_returns}, index=price_returns.index)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Progress bars for data downloads and backtests

1 participant