Skip to content

Commit f2a9f11

Browse files
committed
Updated Code Explanation
1 parent 222fee9 commit f2a9f11

File tree

1 file changed

+29
-47
lines changed

1 file changed

+29
-47
lines changed

CodeExplanation.md

Lines changed: 29 additions & 47 deletions
Original file line numberDiff line numberDiff line change
@@ -4,74 +4,56 @@
44
#### Access Complete Dataset used in the Study : [Here](https://www.kaggle.com/datasets/isiddharth/spatio-temporal-data-of-moon-rise-in-raw-and-tif)
55
#### Access Latest Updates [Here](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion)
66

7+
78
## Introduction
8-
Developing a novel approach to video super-resolution by harnessing the potential of Auto-Encoders, LSTM Networks, and the Maximum Entropy Principle. The project aims to refine the spatial and temporal resolution of video data, unlocking new possibilities in high-resolution, high-fps, more-color-dense videos and beyond.
9+
This project is a novel approach to enhance video resolution both spatially and temporally using generative AI techniques. By leveraging Auto-Encoders and LSTM Networks, the project aims to interpolate high-temporal-resolution grayscale images and colorize them by learning from a corresponding set of RGB images, ultimately achieving high-fidelity video super-resolution.
10+
911

1012
## Research Objective
1113
The main goals of the project are:
1214
- To learn temporal dependencies among spatially-sparse-temporally-dense greyscale image frames to predict and interpolate new frames, hence, increasing temporal resolution.
1315
- To learn spatial dependencies through spatially-dense-temporally-sparse sequences that include both greyscale and corresponding RGB image frames to generate colorized versions of greyscale frames, thus, enhancing spatial resolution.
1416

15-
# Code Explanation:
16-
17-
### [data.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/data.py)
18-
The `data.py` module is tasked with pre-processing and preparing the dataset consisting of 1-channel grayscale and 3-channel RGB TIF images. Here’s an in-depth look at the classes and functions:
1917

20-
- `CustomDataset` class:
21-
- __init__: Initializes the paths to grayscale and RGB directories (`grayscale_dir`, `rgb_dir`), the image size (`image_size`), and batch size (`batch_size`). It also includes a list of valid filename extensions (`valid_exts`) and transforms to apply to images before converting them to tensors.
22-
- __len__: Returns the number of images in the dataset.
23-
- __getitem__: Retrieves a grayscale and corresponding RGB image pair by index, applies the predefined transformations, and returns the tensor representations.
24-
- `get_autoencoder_batches`: Splits the dataset into training and validation sets for the AutoEncoder, and wraps them in DataLoader objects.
25-
- `get_lstm_batches`: Processes filenames to create sequential data for the LSTM model, respecting the sequence stride and length, and then splits into training and validation sets before creating their DataLoader objects.
26-
- `create_sequence_pairs`: Generates tuples of sequences with their corresponding target sequences, which are used as input and target for LSTM training.
18+
## Code Structure Overview
19+
The codebase is organized into several Python modules, each serving a distinct purpose in the project pipeline. Here's a broader overview of the file structure and functionality:
2720

28-
### [autoencoder_model.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/autoencoder_model.py)
29-
Defines `Grey2RGBAutoEncoder`, a deep learning PyTorch model comprising of a series of convolutional layers:
21+
```
22+
├── Code/
23+
│ ├── data.py # Dataset preparation and data loader definitions
24+
│ ├── main.py # Orchestrator for initializing and training models
25+
│ ├── training.py # Defines the Trainer class for model training
26+
│ ├── autoencoder_model.py # Contains the AutoEncoder architecture
27+
│ ├── lstm_model.py # Defines the LSTM architecture for frame interpolation
28+
│ └── losses.py # Custom loss functions utilized in training
29+
```
3030

31-
- `__init__`: Specifies the encoder and decoder components of the AutoEncoder architecture, containing stacks of `Conv2d` and `ConvTranspose2d` layers respectively, interleaved with `BatchNorm2d` and `LeakyReLU`, except the final layer (in Decoder only) which employs `Sigmoid`.
32-
- `_make_layers`: Utility function to create sequential layers for either the encoder or decoder part of the AutoEncoder. This function builds layers based on the provided `channels` list, accommodating both `Conv2d` layers for the encoder and `ConvTranspose2d` layers for the decoder, ensuring the use of appropriate activation functions.
33-
- `forward`: Defines the forward pass through both encoder and decoder components, outputting the reconstructed 3-channel RGB image from a single-channel grayscale image.
31+
- **[data.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/data.py)**: This script is the starting point for data pipeline operations. The `CustomDataset` class inherits from `torch.utils.data.Dataset` and implements methods for data preparation, including `__getitem__` for lazy loading. It utilizes PIL for image manipulations and `torchvision.transforms` to resize images and convert them to PyTorch tensors. It’s essential for coders to ensure the correct data directory paths and acceptable image extensions are specified to avoid loading issues.
3432

35-
### [lstm_model.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/lstm_model.py)
36-
Implements a Sequential model utilizing ConvLSTM cells for predicting interpolated frames:
33+
- **[main.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/main.py)**: The epicenter of model execution, this script employs PyTorch's distributed computing features when training on multiple GPUs. The `main_worker` function distributes GPU workload among parallel processes. It invokes `main` function, which instantiates the models, sets up training data, initializes loss functions, and loops through various training configurations, one for each set of loss function combinations.
3734

38-
- `ConvLSTMCell` class:
39-
- `__init__`: Sets up a convolutional layer (`self.conv`) with the appropriate number of input and output channels based on `input_dim`, `hidden_dim`, and `num_features`.
40-
- `forward`: Performs a forward pass through the ConvLSTM cell, generating the next hidden and cell states from the given input tensor and previous states.
41-
- `ConvLSTM` class:
42-
- `__init__`: Constructs layers of ConvLSTM cells based on specified `input_dim`, `hidden_dims`, `kernel_size`, and `num_layers`. The `alpha` parameter is used to weight the input to each cell during sequence processing.
43-
- `init_hidden`: Initializes the hidden and cell states for all layers to zeros.
44-
- `forward`: Propagates a sequence of input tensors through the network, producing an output sequence and the final states. The sequence includes predictions for in-between frames, as well as an extra frame at the end.
35+
- **[training.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/training.py)**: Within this file, the `Trainer` class manages the training loops. Coders should pay attention to the `train_autoencoder` and `train_lstm` functions, each tailored specifically for its respective model. These functions utilize PyTorch's automatic differentiation mechanism for gradient computation (`backward()`) and apply optimizer steps (`step()`) to update the model weights. The code supports distributed training using `DistributedDataParallel`, and it is crucial to manage device assignments correctly to avoid device misalignment issues.
4536

46-
### [losses.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/losses.py)
47-
This module contains PyTorch loss functions:
37+
- **[autoencoder_model.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/autoencoder_model.py)**: Coders can find the model definition of the AutoEncoder in `Grey2RGBAutoEncoder`, which uses a typical encoder-decoder structure with a series of `nn.Conv2d` and `nn.ConvTranspose2d` paired with batch normalization and activation functions. The final sigmoid activation function in the decoder guarantees the output image's pixel values range between 0 and 1.
4838

49-
- `LossMEP` class: Implements a composite loss which is a weighted combination of Mean Squared Error (MSE) and entropy:
50-
- `__init__`: Initializes the weighting factor `alpha`.
51-
- `forward`: Calculates MSE and entropy of the predictions, outputting the composite loss.
52-
- `LossMSE` class: Represents a standard MSE loss for evaluating pixel-wise intensity differences.
53-
- `SSIMLoss` class: Encapsulates the Structural Similarity Index Measure (SSIM) for assessing perceptual quality of images. The `forward` method averages the SSIM across all frames and constructs the loss as `1 - SSIM`.
39+
- **[lstm_model.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/lstm_model.py)**: Here, `ConvLSTM` and `ConvLSTMCell` classes implement the components of a convolutional LSTM network capable of handling spatial-temporal data. The `ConvLSTMCell` performs gated operations using convolutional layers, while `ConvLSTM` manages temporal sequences and predicts intermediate frames. Coders intending to enhance this functionality should have a firm grasp on sequence processing and recurrent neural network principles.
5440

55-
### [main.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/main.py)
56-
The `main.py` script orchestrates the entire process:
41+
- **[losses.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/losses.py)**: Loss functions are defined as classes, inheriting from `nn.Module`. The `LossMSE` and `SSIMLoss` are standard while `LossMEP` introduces a custom composite loss involving a maximum entropy regularization term. The novelty here lies in the balancing act performed using an `alpha` parameter, which controls the trade-off between fidelity (MSE) and diversity (entropy). This is a key area for coders looking to experiment with loss function formulation and its effects on training dynamics.
5742

58-
- Initializes the `CustomDataset` for handling input data.
59-
- Prepares loss functions `LossMSE`, `LossMEP`, `SSIMLoss`.
60-
- Declares AutoEncoder (`Grey2RGBAutoEncoder`) and LSTM (`ConvLSTM`) models.
61-
- Forms four training configurations using combinations of loss functions and initializes trainer instances (`Trainer`) for each method.
62-
- Executes the training loops for each configuration by calling `train_autoencoder` and `train_lstm` methods provided by the `Trainer` instances.
63-
- Manages error handling, debugging prints, and model saving routines throughout the training process.
6443

65-
### [training.py](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/Code/training.py)
66-
The `Trainer` class encapsulates the setup and execution of training:
44+
### Navigating the Code for Development
45+
To effectively navigate and contribute to the codebase, it's recommended that coders:
46+
1. Begin by examining `main.py` to understand the orchestration logic and to gather insights into how the different modules fit into the broader workflow.
47+
2. Delve into `data.py` to understand the dataset structure expected by model training routines and how data augmentation is achieved through transformations.
48+
3. Explore the model definitions (`autoencoder_model.py` and `lstm_model.py`) to comprehend the network architectures or to modify them for experimental purposes.
49+
4. Study `training.py` to grasp the training loops and mechanism utilized for the two types of models. Any enhancements in training procedures, optimization, or logging should happen here.
50+
5. Assess and potentially refine the loss functions (`losses.py`) for improved model performance or to execute novel training strategies.
6751

68-
- `__init__`: Sets the device for training, prepares the model (potentially wrapped in `nn.DataParallel`), chooses the loss function and optimizer, and sets the model save path.
69-
- `save_model`: Commits the model state to disk at the specified model save path.
70-
- `train_autoencoder`: Executes training for the AutoEncoder model, with loops over epochs for both training and validation phases, managing forward and backward passes, optimizing parameters, and saving the best model based on the validation loss.
71-
- `train_lstm`: Similar to `train_autoencoder`, it accommodates training for the LSTM model, handling sequences of images, and maintains the same logic for optimization and model saving based on validation loss.
7252

7353
## Contributions Welcome!
7454
Your interest in contributing to the project is highly respected. Aiming for collaborative excellence, your insights, code improvements, and innovative ideas are highly appreciated. Make sure to check [Contributing Guidelines](https://github.com/iSiddharth20/Generative-AI-Based-Spatio-Temporal-Fusion/blob/main/CONTRIBUTING.md) for more information on how you can become an integral part of this project.
7555

56+
7657
## Acknowledgements
7758
A heartfelt thank you to all contributors and supporters who are on this journey to break new ground in video super-resolution technology.
59+

0 commit comments

Comments
 (0)