FidelFolio DL Valuation Project

Problem Statement

Predicting the market capitalization growth of listed companies is a complex time-series regression task. This project aims to predict future company valuation targets using historical fundamental financial indicators, addressing challenges like missing data, outliers, and temporal dependencies.

Key Features

Modular Pipeline: Clean separation of concerns (Data -> Features -> Models) for scalability.
Config-Driven: All hyperparameters and paths controlled via config/config.yaml.
Deep Learning Architectures:
- MLP: Baseline feed-forward network with embeddings.
- LSTM: Captures temporal trends in financial history.
- Encoder-Decoder: Advanced sequence-to-vector modeling.
Robust Preprocessing:
- KNN Imputation for missing values.
- RobustScaler for handling financial outliers.
- PCA for dimensionality reduction.
Expanding Window Validation: Realistic backtesting that retrains models on all available history for each test year.
Automated Logging: Timestamped, model-specific logs saved to logs/ for full auditability.
Results Tracking: Automated storage of validation metrics (RMSE) to metrics.json.

Additional Documentation

For a deeper dive into the project background and technical implementation, please refer to:

Problem Statement PDF: Original challenge description and requirements.
Presentation Deck (PPT): Project overview slideshow.
Implementation Details: Comprehensive technical guide explaining the "How" and "Why" of every pipeline step (Data Cleaning, Engineering, PCA, Model Architectures).

Project Structure

FidelFolio_Project/
├── config/
│   └── config.yaml           # Hyperparameters & settings
├── data/
│   └── FidelFolio_Dataset.csv # [REQUIRED] Place your dataset here
├── experiments/              # Original Jupyter Notebooks
├── src/                      # Source code
│   ├── data/                 # Loading & Preprocessing
│   ├── features/             # Feature Engineering & Sequences
│   ├── models/               # MLP, LSTM, Encoder-Decoder Architectures
│   └── utils/                # Utilities
├── main.py                   # CLI Entry Point
├── pyproject.toml            # Build configuration
└── setup.py                  # Setup script

Setup

Install Dependencies:
```
pip install -e .
```
Data Setup: Place your FidelFolio_Dataset.csv file into the data/ directory.

Usage

Run the training pipeline using main.py:

# Run the pipeline (uses config/config.yaml by default)
python main.py

# Run with a specific configuration file (e.g. for testing)
python main.py --config config/test_config.yaml

To switch models (MLP / LSTM / Encoder-Decoder), edit model_type in config/config.yaml.

Configuration

Modify config/config.yaml to adjust:

preprocessing: Imputation neighbors, outlier capping thresholds, PCA parameters.
models: Layer sizes, dropout rates, embedding dimensions.
training: Epochs, batch size, learning rate.

Models Implemented

1. MLP (Multi-Layer Perceptron)

A traditional feed-forward network that flattens time-series data into a single vector, combined with learned company embeddings.

graph TD
    subgraph Inputs
    A["Sequence Input (Time x Feats)"] --> B[Flatten]
    C["Company ID"] --> D[Embedding]
    D --> E[Flatten Embedding]
    end
    
    B --> F[Concatenate]
    E --> F
    
    subgraph MLP_Layers
    F --> G["Dense Layer 1 (ReLU)"]
    G --> H[Dropout]
    H --> I["Dense Layer 2 (ReLU)"]
    I --> J[Dropout]
    end
    
    J --> K["Output (Regression)"]

2. LSTM (Long Short-Term Memory)

A Recurrent Neural Network (RNN) designed to capture temporal dependencies in financial data.

graph TD
    subgraph Inputs
    A["Sequence Input"] --> B[Masking]
    C["Company ID"] --> D[Embedding]
    D --> E[Flatten Embedding]
    end
    
    subgraph LSTM_Stack
    B --> F["LSTM Layer 1 (return_seq=True)"]
    F --> G[Dropout]
    G --> H["LSTM Layer 2 (return_seq=False)"]
    H --> I[Dropout]
    end
    
    I --> J[Concatenate]
    E --> J
    
    subgraph Prediction
    J --> K["Dense Layer (ReLU)"]
    K --> L[Dropout]
    L --> M["Output"]
    end

3. Encoder-Decoder

Uses an LSTM as an encoder to compress the time-series context into a hidden state, which is then passed to a Dense decoder for prediction.

graph TD
    subgraph Encoder
    A["Sequence Input"] --> B[Masking]
    B --> C["LSTM Encoder"]
    C -- "Extract Context (State H)" --> D[Context Vector]
    end
    
    subgraph Context_Fusion
    E["Company ID"] --> F[Embedding]
    F --> G[Flatten]
    D --> H[Concatenate]
    G --> H
    end
    
    subgraph Decoder
    H --> I["Dense Decoder (ReLU)"]
    I --> J[Dropout]
    J --> K["Output"]
    end

Pipeline Flow

graph TD
    A["Start: main.py"] --> B{"Load Config"}
    B --> C["Load Config.yaml"]
    C --> D["Load Data"]
    D --> E["Data Cleaning"]
    E --> F["Feature Engineering: YoY Diffs"]
    F --> G["Preprocessing (Impute, Scale, Cap)"]
    G --> H{"PCA Enabled?"}
    H -- Yes --> I["Apply PCA"]
    H -- No --> J["Skip PCA"]
    I --> K["Encode Company IDs"]
    J --> K
    K --> L["Start Loop (Expanding Window)"]
    L --> M["Generate Sequences"]
    M --> N["Train Model (MLP/LSTM/Enc-Dec)"]
    N --> O["Predict & Evaluate"]
    O --> P{"Next Year?"}
    P -- Yes --> L
    P -- No --> Q["Finish"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FidelFolio DL Valuation Project

Problem Statement

Key Features

Additional Documentation

Project Structure

Setup

Usage

Configuration

Models Implemented

1. MLP (Multi-Layer Perceptron)

2. LSTM (Long Short-Term Memory)

3. Encoder-Decoder

Pipeline Flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
config		config
data		data
experiments		experiments
src		src
FidelFolio_Problem_statement.pdf		FidelFolio_Problem_statement.pdf
PPT.pdf		PPT.pdf
README.md		README.md
implementation_details.md		implementation_details.md
main.py		main.py
pyproject.toml		pyproject.toml
setup.py		setup.py

ankit-raj00/FidelFolio-DL-Valuation-Project

Folders and files

Latest commit

History

Repository files navigation

FidelFolio DL Valuation Project

Problem Statement

Key Features

Additional Documentation

Project Structure

Setup

Usage

Configuration

Models Implemented

1. MLP (Multi-Layer Perceptron)

2. LSTM (Long Short-Term Memory)

3. Encoder-Decoder

Pipeline Flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages