01-edu · Oumaimafisaoui · Oct 5, 2025 · Oct 4, 2025 · Oct 5, 2025
diff --git a/subjects/ai/backtesting-sp500/README.md b/subjects/ai/backtesting-sp500/README.md
@@ -36,7 +36,6 @@ The input files are:
   data.
 
   The adjusted close price may be unavailable for three main reasons:
-
   - The company doesn't exist at date `d`
   - The company is not publicly traded
   - Its close price hasn't been reported
@@ -68,7 +67,6 @@ There are four parts:
 #### 2. Data wrangling and preprocessing
 
 - Create a Jupyter Notebook to analyze the data sets and perform EDA (Exploratory Data Analysis). This notebook should contain at least:
-
   - Missing values analysis
   - Outliers analysis (there are a lot of outliers)
   - Visualize and analyze the average price for companies over time or compare the price consistency across different companies within the dataset. Save the plot as an image.
@@ -77,11 +75,9 @@ There are four parts:
 _Note: create functions that generate the plots and save them in the `images` directory. Add a parameter `plot` with a default value `False` which doesn't return the plot. This will be useful for the correction to let people run your code without overriding your plots._
 
 - Here is how the `prices` data should be preprocessed:
-
   - Resample data on month and keep the last value
   - Filter prices outliers: Remove prices outside the range 0.1$, 10k$
   - Compute monthly returns:
-
     - Historical returns. **returns(current month) = price(current month) - price(previous month) / price(previous month)**
     - Future returns. **returns(current month) = price(next month) - price(current month) / price(current month)**
 
@@ -102,7 +98,6 @@ At this stage the DataFrame should look like this:
 - Print `prices.isna().sum()`
 
 - Here is how the `sp500.csv` data should be preprocessed:
-
   - Resample data on month and keep the last value
   - Compute historical monthly returns on the adjusted close
 
@@ -183,47 +178,38 @@ project
 ### Tips:
 
 1. Data Quality Management:
-
    - Be prepared to encounter messy data. Financial datasets often contain errors, outliers, and missing values.
    - Develop a systematic approach to identify and handle data quality issues.
 
 2. Memory Optimization:
-
    - When working with large datasets, optimize memory usage by selecting appropriate data types for each column.
    - Consider using smaller data types like np.float32 for floating-point numbers when precision allows.
 
 3. Exploratory Data Analysis:
-
    - Spend time understanding the data through visualization and statistical analysis before diving into strategy development.
    - Pay special attention to outliers and their potential impact on your strategy.
 
 4. Preprocessing Financial Data:
-
    - When resampling time series data, be mindful of which value to keep (e.g., last value for month-end prices).
    - Calculate both historical and future returns to avoid look-ahead bias in your strategy.
 
 5. Handling Outliers:
-
    - Develop a method to identify and handle outliers that is specific to each company's historical data.
    - Be cautious about removing outliers during periods of high market volatility (e.g., 2008-2009 financial crisis).
 
 6. Signal Creation:
-
    - Start with a simple signal (like past 12-month average returns) before exploring more complex strategies.
    - Ensure your signal doesn't use future information that wouldn't have been available at the time of decision.
 
 7. Backtesting:
-
    - Implement your backtesting logic without using loops for better performance.
    - Compare your strategy's performance against a relevant benchmark (in this case, the S&P 500).
 
 8. Visualization:
-
    - Create clear, informative visualizations to communicate your strategy's performance.
    - Include cumulative return plots to show how your strategy performs over time compared to the benchmark.
 
 9. Code Structure:
-
    - Organize your code into modular functions for better readability and reusability.
    - Use a main script to orchestrate the entire process from data loading to results visualization.
 
@@ -232,3 +218,22 @@ project
     - Be prepared to explain any anomalies or unexpected results in your strategy's performance.
 
 Remember, the goal is not just to create a strategy that looks good on paper, but to develop a robust process for analyzing financial data and testing investment ideas.
+
+### Resources
+
+- **Python & Data Analysis**
+  - [pandas Documentation](https://pandas.pydata.org/docs/) – handling time series, resampling, returns.
+  - [NumPy Documentation](https://numpy.org/doc/) – vectorized operations and memory optimization.
+  - [Matplotlib Documentation](https://matplotlib.org/stable/index.html) – plotting cumulative returns and EDA visuals.
+
+- **Finance & Backtesting**
+  - [Investopedia – Backtesting](https://www.investopedia.com/terms/b/backtesting.asp) – introduction to strategy testing.
+  - [QuantStart – What is Backtesting?](https://corporatefinanceinstitute.com/resources/data-science/backtesting/#:~:text=Backtesting%20involves%20applying%20a%20strategy,employ%20and%20tweak%20successful%20strategies.) – practical overview of backtesting logic.
+  - [S&P 500 Index (Wikipedia)](https://en.wikipedia.org/wiki/S%26P_500) – background on the index and its historical changes.
+
+- **Data Cleaning & Outliers**
+  - [Handling Missing Data in Pandas](https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html).
+
+- **Quantitative Strategies**
+  - [Momentum Investing (Investopedia)](https://www.investopedia.com/terms/m/momentum_investing.asp) – theory behind using past returns as a signal.
+  - [Risk and Return Basics (CFA Institute)](https://www.investopedia.com/terms/r/riskadjustedreturn.asp) – risk-adjusted performance understanding.