Team TBG Lab
Task 3

- Load WSI and divide into patches of 560x560 of the useful regions, given by the mask.
- Extract features and coordinates using Virchow2, a foundation model for histopathology images.
- Create clusters based on agglomerative clustering and make a single node with the mean of all patch-level features in the Cluster.
- Create a graph of each patient's WSI, where an unweighted edge is made if the distance between the central coordinates of 2 nodes is <= 3000 pixels.
- Extract embeddings using a custom GCN designed by us.
- Filter out the top 2000 genes based on the BRS deviation between BRS 1/2 and 3.
- Concatenate the embeddings of GCN, RNA seq data, and one-hot encoded data to form the final embedding.
- Finally, train attention MLP, Random Survival Forest, CoxPH, Survival SVM, and Survival Gradient boosted trees, and take a weighted ensemble of the risk scores.
Follow the steps below to download the dataset.
Ensure that you have the AWS CLI installed on your system. You can find installation instructions for various platforms (Windows, macOS, Linux) in the link above.
Once AWS CLI is installed, use the following command to sync the dataset from the S3 bucket:
aws s3 sync --no-sign-request s3://chimera-challenge/v2/task3 .
This command will download the dataset to the current directory. Make sure you are in the desired directory where you want the data to be stored.
Note: The --no-sign-request
flag ensures you can access the dataset without AWS credentials.
To set up the environment, make sure you have the following dependencies:
- Python: Version 3.10
- CUDA: Version 12.6 (ensure that your GPU supports this version)
Once the prerequisites are met, follow these steps to set up your environment:
-
Create a virtual environment:
python -m venv .venv
-
Activate the virtual environment:
- On macOS/Linux:
source .venv/bin/activate
- On Windows:
.venv\Scripts\activate
- On macOS/Linux:
-
Install the required dependencies:
python -m ven .venv source .venv/bin/activate pip install -r requirements.txt
To visualize the dataset, you can use the provided Jupyter notebook. Open a terminal or command prompt, and run the following command to start the Jupyter notebook:
jupyter notebook notebooks/data_visualization.ipynb
Survival analysis optimization report using 5-fold cross-validation and Optuna-based hyperparameter tuning
(1) Mean C-index: 0.8211 ± 0.0344
(2) Best C-index: 0.8668
(3) Improvement over baseline: +0.1123
(4) Optimization trials per model: 100
(1) Seed 42: 0.8441
(2) Seed 121: 0.8243
(3) Seed 144: 0.7661
(4) Seed 245: 0.8668
(5) Seed 1212: 0.8044
(1) Best Performing Models: The optimization successfully improved model performance.
(2) Ensemble Benefits: Optimized ensembles showed consistent improvements.
(3) Parameter Insights: Systematic hyperparameter tuning revealed optimal configurations.
(1) Use the optimized hyperparameters for production models.
(2) Consider the ensemble approach for the best performance.
(3) Monitor model stability across different seeds.
We were unable to submit our model for the competition due to an error in our Docker implementation. However, we will evaluate the model on the hidden test set once it becomes publicly available.
Developers: Madhav Arora, Sumit Kumar, Dhairya Gupta