Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 19 additions & 14 deletions subjects/ai/backtesting-sp500/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,6 @@ The input files are:
data.

The adjusted close price may be unavailable for three main reasons:

- The company doesn't exist at date `d`
- The company is not publicly traded
- Its close price hasn't been reported
Expand Down Expand Up @@ -68,7 +67,6 @@ There are four parts:
#### 2. Data wrangling and preprocessing

- Create a Jupyter Notebook to analyze the data sets and perform EDA (Exploratory Data Analysis). This notebook should contain at least:

- Missing values analysis
- Outliers analysis (there are a lot of outliers)
- Visualize and analyze the average price for companies over time or compare the price consistency across different companies within the dataset. Save the plot as an image.
Expand All @@ -77,11 +75,9 @@ There are four parts:
_Note: create functions that generate the plots and save them in the `images` directory. Add a parameter `plot` with a default value `False` which doesn't return the plot. This will be useful for the correction to let people run your code without overriding your plots._

- Here is how the `prices` data should be preprocessed:

- Resample data on month and keep the last value
- Filter prices outliers: Remove prices outside the range 0.1$, 10k$
- Compute monthly returns:

- Historical returns. **returns(current month) = price(current month) - price(previous month) / price(previous month)**
- Future returns. **returns(current month) = price(next month) - price(current month) / price(current month)**

Expand All @@ -102,7 +98,6 @@ At this stage the DataFrame should look like this:
- Print `prices.isna().sum()`

- Here is how the `sp500.csv` data should be preprocessed:

- Resample data on month and keep the last value
- Compute historical monthly returns on the adjusted close

Expand Down Expand Up @@ -183,47 +178,38 @@ project
### Tips:

1. Data Quality Management:

- Be prepared to encounter messy data. Financial datasets often contain errors, outliers, and missing values.
- Develop a systematic approach to identify and handle data quality issues.

2. Memory Optimization:

- When working with large datasets, optimize memory usage by selecting appropriate data types for each column.
- Consider using smaller data types like np.float32 for floating-point numbers when precision allows.

3. Exploratory Data Analysis:

- Spend time understanding the data through visualization and statistical analysis before diving into strategy development.
- Pay special attention to outliers and their potential impact on your strategy.

4. Preprocessing Financial Data:

- When resampling time series data, be mindful of which value to keep (e.g., last value for month-end prices).
- Calculate both historical and future returns to avoid look-ahead bias in your strategy.

5. Handling Outliers:

- Develop a method to identify and handle outliers that is specific to each company's historical data.
- Be cautious about removing outliers during periods of high market volatility (e.g., 2008-2009 financial crisis).

6. Signal Creation:

- Start with a simple signal (like past 12-month average returns) before exploring more complex strategies.
- Ensure your signal doesn't use future information that wouldn't have been available at the time of decision.

7. Backtesting:

- Implement your backtesting logic without using loops for better performance.
- Compare your strategy's performance against a relevant benchmark (in this case, the S&P 500).

8. Visualization:

- Create clear, informative visualizations to communicate your strategy's performance.
- Include cumulative return plots to show how your strategy performs over time compared to the benchmark.

9. Code Structure:

- Organize your code into modular functions for better readability and reusability.
- Use a main script to orchestrate the entire process from data loading to results visualization.

Expand All @@ -232,3 +218,22 @@ project
- Be prepared to explain any anomalies or unexpected results in your strategy's performance.

Remember, the goal is not just to create a strategy that looks good on paper, but to develop a robust process for analyzing financial data and testing investment ideas.

### Resources

- **Python & Data Analysis**
- [pandas Documentation](https://pandas.pydata.org/docs/) – handling time series, resampling, returns.
- [NumPy Documentation](https://numpy.org/doc/) – vectorized operations and memory optimization.
- [Matplotlib Documentation](https://matplotlib.org/stable/index.html) – plotting cumulative returns and EDA visuals.

- **Finance & Backtesting**
- [Investopedia – Backtesting](https://www.investopedia.com/terms/b/backtesting.asp) – introduction to strategy testing.
- [QuantStart – What is Backtesting?](https://corporatefinanceinstitute.com/resources/data-science/backtesting/#:~:text=Backtesting%20involves%20applying%20a%20strategy,employ%20and%20tweak%20successful%20strategies.) – practical overview of backtesting logic.
- [S&P 500 Index (Wikipedia)](https://en.wikipedia.org/wiki/S%26P_500) – background on the index and its historical changes.

- **Data Cleaning & Outliers**
- [Handling Missing Data in Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html).

- **Quantitative Strategies**
- [Momentum Investing (Investopedia)](https://www.investopedia.com/terms/m/momentum_investing.asp) – theory behind using past returns as a signal.
- [Risk and Return Basics (CFA Institute)](https://www.investopedia.com/terms/r/riskadjustedreturn.asp) – risk-adjusted performance understanding.
Loading