feat: added loading bars centrally cuz file wise approach resulted in… #34

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

Satvik-Singh192 wants to merge 1 commit into OPCODE-Open-Spring-Fest:main from Satvik-Singh192:feat/new-loading

Contributor

Satvik-Singh192 commented Nov 7, 2025

… pytest failing a lot

✨ Add tqdm-Based Progress Bars for Long-Running Operations

Closes #32

📝 Description

This PR introduces tqdm-based progress bars to improve user experience during long-running operations such as:

Synthetic data generation
Factor computation
Backtest execution

Previously, these steps ran silently, making it difficult for users to estimate how long a process would take. This PR resolves that by adding lightweight progress indicators without altering existing workflows.

✅ Changes Made

1. Synthetic Data Generation

Added a tqdm progress bar that displays generation progress for each symbol.

2. Factor Computation

Added tqdm progress bars for factor-wise averaging operations.

3. Backtest Execution

Wrapped the internal backtest loop with tqdm to visualize iteration progress.

4. Minimal & Non-Intrusive Changes

No refactoring or redesign of core functionality.
Only small patches added where needed to integrate progress bars cleanly.

[x] I have read the contributing guidelines


          feat: added loading bars centrally cuz file wise approach resulted in…

ccf7d8f

… pytest failing a lot

ayushkrtiwari requested a review from Copilot

November 9, 2025 09:29

Copilot AI reviewed

View reviewed changes

Copilot AI left a comment

Pull Request Overview

This PR attempts to add progress bars using tqdm to various operations in the CLI, but introduces multiple critical bugs that break core functionality. The changes include adding tqdm import, modifying data generation to use a loop, attempting to add progress bars to factor computation, and monkey-patching the VectorizedBacktest.run method.

Key changes attempted:

Addition of tqdm progress bars to data generation, factor computation, and backtesting
Modified data generation to generate symbols one at a time instead of in batch
Monkey-patched VectorizedBacktest.run method to show progress during backtesting

Reviewed Changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 6 comments.

File	Description
src/quant_research_starter/cli.py	Adds tqdm import and attempts to integrate progress bars throughout CLI commands, but introduces multiple bugs including broken monkey-patching, incorrect tqdm usage, and performance regressions
.gitignore	Adds myenv/ to ignore virtual environment directory

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/quant_research_starter/cli.py

Comment on lines +193 to +201

    
                      results = pd.DataFrame({"returns": returns}, index=idx[1:])

                      results["portfolio_value"] = (

                          self.initial_capital * (1 + results["returns"]).cumprod()

                      )

                      results["final_value"] = results["portfolio_value"].iloc[-1]

                      results["total_return"] = results["final_value"] / self.initial_capital - 1

                      return results

Copilot AI Nov 9, 2025

The return value structure is inconsistent with the original VectorizedBacktest.run() method. The original returns a dictionary (from _generate_results()) containing multiple keys like 'positions', 'cash', 'trades', etc., but this implementation returns a DataFrame. This breaking change will cause the code at line 213 (metrics_calc = RiskMetrics(results["returns"])) to fail because results is now a DataFrame, not a dictionary.

Suggested change

      
                    results = pd.DataFrame({"returns": returns}, index=idx[1:])
          
                    results["portfolio_value"] = (
          
                        self.initial_capital * (1 + results["returns"]).cumprod()
          
                    )
          
                    results["final_value"] = results["portfolio_value"].iloc[-1]
          
                    results["total_return"] = results["final_value"] / self.initial_capital - 1
          
                    return results
          
                    results_df = pd.DataFrame({"returns": returns}, index=idx[1:])
          
                    results_df["portfolio_value"] = (
          
                        self.initial_capital * (1 + results_df["returns"]).cumprod()
          
                    )
          
                    final_value = results_df["portfolio_value"].iloc[-1]
          
                    total_return = final_value / self.initial_capital - 1
          
                    # Return a dictionary matching the original structure
          
                    return {
          
                        "returns": results_df["returns"],
          
                        "portfolio_value": results_df["portfolio_value"],
          
                        "final_value": final_value,
          
                        "total_return": total_return,
          
                        # Add other keys as needed to match the original structure
          
                    }

Copilot uses AI. Check for mistakes.

src/quant_research_starter/cli.py

Comment on lines +34 to 42

    
                  all_prices = []

                  for _ in tqdm(range(symbols), desc="Generating price series"):

                      prices = generator.generate_price_data(

                          n_symbols=1, days=days, start_date="2020-01-01"

                      )

                      all_prices.append(prices)

                  prices = pd.concat(all_prices, axis=1)

Copilot AI Nov 9, 2025

Generating price data one symbol at a time in a loop defeats the purpose of the vectorized implementation in SyntheticDataGenerator.generate_price_data(). The original method was designed to efficiently generate multiple symbols at once, including correlation structure between them. This change removes the ability to generate correlated price series and introduces unnecessary overhead. Additionally, all generated symbols will have the same name "SYMBOL_00" since n_symbols=1 is used repeatedly, which will cause issues when concatenating.

Suggested change

      
                all_prices = []
          
                for _ in tqdm(range(symbols), desc="Generating price series"):
          
                    prices = generator.generate_price_data(
          
                        n_symbols=1, days=days, start_date="2020-01-01"
          
                    )
          
                    all_prices.append(prices)
          
                prices = pd.concat(all_prices, axis=1)
          
                prices = generator.generate_price_data(
          
                    n_symbols=symbols, days=days, start_date="2020-01-01"
          
                )

Copilot uses AI. Check for mistakes.

src/quant_research_starter/cli.py

Comment on lines +105 to +106

    
                          k: tqdm(v.mean(axis=1), desc=f"Averaging {k} factor")

                          for k, v in factor_data.items()

Copilot AI Nov 9, 2025

The tqdm usage here is incorrect. tqdm is being called on the result of v.mean(axis=1) (a pandas Series), which will not display a progress bar. Instead, it will just return the Series unchanged. To show progress during dictionary comprehension, you should wrap the iterator (e.g., factor_data.items()) with tqdm, not the values being computed.

Suggested change

      
                        k: tqdm(v.mean(axis=1), desc=f"Averaging {k} factor")
          
                        for k, v in factor_data.items()
          
                        k: v.mean(axis=1)
          
                        for k, v in tqdm(factor_data.items(), desc="Averaging factors")

Copilot uses AI. Check for mistakes.

src/quant_research_starter/cli.py

Comment on lines +182 to +193

    
                      returns = []

                      idx = self.prices.index

                      for i in tqdm(range(1, len(idx)), desc="Running backtest"):

                          ret = self._compute_daily_return(

                              self.prices.iloc[i - 1],

                              self.prices.iloc[i],

                              weight_scheme,

                          )

                          returns.append(ret)

                      results = pd.DataFrame({"returns": returns}, index=idx[1:])

Copilot AI Nov 9, 2025

This function is attempting to monkey-patch the VectorizedBacktest.run method but references self._compute_daily_return() which does not exist in the VectorizedBacktest class. The original run method uses a vectorized approach with pct_change() instead of iterating through each day. This will cause an AttributeError when the backtest runs.

Suggested change

      
                    returns = []
          
                    idx = self.prices.index
          
                    for i in tqdm(range(1, len(idx)), desc="Running backtest"):
          
                        ret = self._compute_daily_return(
          
                            self.prices.iloc[i - 1],
          
                            self.prices.iloc[i],
          
                            weight_scheme,
          
                        )
          
                        returns.append(ret)
          
                    results = pd.DataFrame({"returns": returns}, index=idx[1:])
          
                    # Compute daily returns for each symbol
          
                    daily_returns = self.prices.pct_change().fillna(0)
          
                    # Compute portfolio weights (assume equal weight or rank-based)
          
                    if weight_scheme == "rank":
          
                        weights = self.signals.rank(axis=1, pct=True)
          
                    else:
          
                        weights = self.signals
          
                    # Align weights and returns
          
                    weights = weights.loc[daily_returns.index]
          
                    # Compute portfolio returns (weighted average across symbols)
          
                    portfolio_returns = (daily_returns * weights).mean(axis=1)
          
                    results = pd.DataFrame({"returns": portfolio_returns}, index=self.prices.index)

Copilot uses AI. Check for mistakes.

src/quant_research_starter/cli.py

Comment on lines +181 to 204

    
                  def run_with_progress(self, weight_scheme="rank"):

                      returns = []

                      idx = self.prices.index

                      for i in tqdm(range(1, len(idx)), desc="Running backtest"):

                          ret = self._compute_daily_return(

                              self.prices.iloc[i - 1],

                              self.prices.iloc[i],

                              weight_scheme,

                          )

                          returns.append(ret)

                      results = pd.DataFrame({"returns": returns}, index=idx[1:])

                      results["portfolio_value"] = (

                          self.initial_capital * (1 + results["returns"]).cumprod()

                      )

                      results["final_value"] = results["portfolio_value"].iloc[-1]

                      results["total_return"] = results["final_value"] / self.initial_capital - 1

                      return results

                  VectorizedBacktest.run = run_with_progress

                  backtest = VectorizedBacktest(

Copilot AI Nov 9, 2025

Monkey-patching a class method inside a CLI command function is a poor practice that affects global state and can lead to unexpected behavior if the command is called multiple times or in tests. This breaks the original vectorized implementation of VectorizedBacktest.run(). Consider either: (1) creating a subclass of VectorizedBacktest with the desired progress bar implementation, or (2) adding progress bar support directly to the VectorizedBacktest class in its own module.

Suggested change

      
                def run_with_progress(self, weight_scheme="rank"):
          
                    returns = []
          
                    idx = self.prices.index
          
                    for i in tqdm(range(1, len(idx)), desc="Running backtest"):
          
                        ret = self._compute_daily_return(
          
                            self.prices.iloc[i - 1],
          
                            self.prices.iloc[i],
          
                            weight_scheme,
          
                        )
          
                        returns.append(ret)
          
                    results = pd.DataFrame({"returns": returns}, index=idx[1:])
          
                    results["portfolio_value"] = (
          
                        self.initial_capital * (1 + results["returns"]).cumprod()
          
                    )
          
                    results["final_value"] = results["portfolio_value"].iloc[-1]
          
                    results["total_return"] = results["final_value"] / self.initial_capital - 1
          
                    return results
          
                VectorizedBacktest.run = run_with_progress
          
                backtest = VectorizedBacktest(
          
                class VectorizedBacktestWithProgress(VectorizedBacktest):
          
                    def run(self, weight_scheme="rank"):
          
                        returns = []
          
                        idx = self.prices.index
          
                        for i in tqdm(range(1, len(idx)), desc="Running backtest"):
          
                            ret = self._compute_daily_return(
          
                                self.prices.iloc[i - 1],
          
                                self.prices.iloc[i],
          
                                weight_scheme,
          
                            )
          
                            returns.append(ret)
          
                        results = pd.DataFrame({"returns": returns}, index=idx[1:])
          
                        results["portfolio_value"] = (
          
                            self.initial_capital * (1 + results["returns"]).cumprod()
          
                        )
          
                        results["final_value"] = results["portfolio_value"].iloc[-1]
          
                        results["total_return"] = results["final_value"] / self.initial_capital - 1
          
                        return results
          
                backtest = VectorizedBacktestWithProgress(

Copilot uses AI. Check for mistakes.

src/quant_research_starter/cli.py

Comment on lines +182 to +193

    
                      returns = []

                      idx = self.prices.index

                      for i in tqdm(range(1, len(idx)), desc="Running backtest"):

                          ret = self._compute_daily_return(

                              self.prices.iloc[i - 1],

                              self.prices.iloc[i],

                              weight_scheme,

                          )

                          returns.append(ret)

                      results = pd.DataFrame({"returns": returns}, index=idx[1:])

Copilot AI Nov 9, 2025

This implementation defeats the purpose of a "vectorized" backtest. The original implementation uses pandas vectorized operations for performance, but this change introduces a Python loop that iterates through each day individually, which will be significantly slower for large datasets. This is a performance regression, not an improvement.

Suggested change

      
                    returns = []
          
                    idx = self.prices.index
          
                    for i in tqdm(range(1, len(idx)), desc="Running backtest"):
          
                        ret = self._compute_daily_return(
          
                            self.prices.iloc[i - 1],
          
                            self.prices.iloc[i],
          
                            weight_scheme,
          
                        )
          
                        returns.append(ret)
          
                    results = pd.DataFrame({"returns": returns}, index=idx[1:])
          
                    # Vectorized implementation for daily returns
          
                    # Calculate daily returns for each symbol
          
                    price_returns = self.prices.pct_change().iloc[1:]
          
                    # Get weights for each day and symbol
          
                    weights = self._get_weights(weight_scheme)
          
                    weights = weights.loc[price_returns.index, price_returns.columns]
          
                    # Calculate portfolio returns as weighted sum
          
                    portfolio_returns = (weights * price_returns).sum(axis=1)
          
                    results = pd.DataFrame({"returns": portfolio_returns}, index=price_returns.index)

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet