|
8462 | 8462 | "\n", |
8463 | 8463 | "for file,content in test_K562_POLR2A_enhancer_centric_yaml_contents.items():\n", |
8464 | 8464 | " with open(config_paths[18] / file, 'w') as f:\n", |
8465 | | - " f.write(content)\n", |
8466 | | - "\n" |
| 8465 | + " f.write(content)" |
8467 | 8466 | ] |
8468 | 8467 | }, |
8469 | 8468 | { |
|
12054 | 12053 | "cell_type": "markdown", |
12055 | 12054 | "metadata": {}, |
12056 | 12055 | "source": [ |
12057 | | - "## <center> VI) Extending CLASTER to new cell types: K562 (human) <center>\n", |
| 12056 | + "## <center> VIII) Extending CLASTER to new cell types: K562 (human) <center>\n", |
12058 | 12057 | "\n", |
12059 | 12058 | "Reviewer 1 asked us to benchmark our _in silico_ perturbations with experimental data. Most experimental data on genome-wide enhancer KOs is obtained in K562 cells. We did not find nascent transcription data matching our protocol, and hence decided to predict two widespread transcriptional readouts: RNA-seq and POLR2A ChIP-seq.\n", |
12060 | 12059 | "\n", |
@@ -12696,12 +12695,13 @@ |
12696 | 12695 | "> - Samples where either input or output crossed chromosome boundaries (enhancers at the ends of the chromosomes).\n", |
12697 | 12696 | ">- Predict using models trained on K562 (human) data.\n", |
12698 | 12697 | ">- Quantify POLR2A /RNA-seq changes:\n", |
12699 | | - "> - For POLR2A: Integrate between 1 kbp upstream and 2 kbp downstream of all genes in predicted window.\n", |
| 12698 | + "> - For POLR2A: Integrate between 2 kbp upstream and 3 kbp downstream of all genes in predicted window. (-1,2) kbp yielded similar results.\n", |
12700 | 12699 | "> - For RNA-seq: Integrate inside gene boundaries of all genes in predicted window.\n", |
12701 | 12700 | ">- Downstream analyses:\n", |
12702 | 12701 | "> - Precision-Recall and ROC curves for the following models:\n", |
12703 | 12702 | "> - Gene-enhancer distance: $Score = - Distance$\n", |
12704 | 12703 | "> - RNA and POLR2A models: $Score = abs($ Area difference $)$\n", |
| 12704 | + "> - Ratio to max models: area difference divided by max area difference found in predicted window.\n", |
12705 | 12705 | "> - Confusion matrices:\n", |
12706 | 12706 | "> - Primary target (most affected gene in a single prediction run): True / False\n", |
12707 | 12707 | "> - Closest gene: True / False\n", |
|
13451 | 13451 | " integration_type: str,\n", |
13452 | 13452 | " window_size: int = 200500,\n", |
13453 | 13453 | " resolution: int = 1000,\n", |
13454 | | - " upstream_bins: int = 1,\n", |
13455 | | - " downstream_bins: int = 2,\n", |
| 13454 | + " upstream_bins: int = 2,\n", |
| 13455 | + " downstream_bins: int = 3,\n", |
13456 | 13456 | " save_path: Path = None,\n", |
13457 | 13457 | " show_plot: bool = True\n", |
13458 | 13458 | ") -> plt.Figure:\n", |
|
14726 | 14726 | " ax1.set_xlim(0,200)\n", |
14727 | 14727 | " fig.show()\n", |
14728 | 14728 | "\n", |
14729 | | - " threshold_polr2a = 10\n", |
14730 | | - " merged_crispr_df = merged_crispr_df[(merged_crispr_df['baseline_area_polr2a'] > threshold_polr2a)]\n", |
| 14729 | + " #threshold_polr2a = 10\n", |
| 14730 | + " #merged_crispr_df = merged_crispr_df[(merged_crispr_df['baseline_area_polr2a'] > threshold_polr2a)]\n", |
14731 | 14731 | " \n", |
14732 | 14732 | " # 1. Plot correlation between methods\n", |
14733 | 14733 | " print(\"- Generating correlation scatter plot...\")\n", |
|
14813 | 14813 | "source": [ |
14814 | 14814 | "**Plotting ground truth Enhancer-Gene pairs**\n", |
14815 | 14815 | "\n", |
14816 | | - "Here we will plot ground truth Enhancer-Gene pairs in K562 cells, obtained by CRISPR KO of enhancers and measuring the induced gene expression changes. This data was downloaded from the [Engreitz lab's github](https://github.com/EngreitzLab/CRISPR_comparison/tree/main/resources/crispr_data), referenced as a benchmarking dataset in [A. Gschwind et al.](https://doi.org/10.1101/2023.11.09.563812)." |
| 14816 | + "Here we will plot ground truth Enhancer-Gene pairs in K562 cells, obtained by CRISPR KO of enhancers and measuring the induced gene expression changes. This data was downloaded from the [Engreitz lab's github](https://github.com/EngreitzLab/CRISPR_comparison/tree/main/resources/crispr_data), referenced as a benchmarking dataset in [Gschwind et al.](https://doi.org/10.1101/2023.11.09.563812)." |
14817 | 14817 | ] |
14818 | 14818 | }, |
14819 | 14819 | { |
|
0 commit comments