Skip to content

Commit e8afe61

Browse files
committed
🎨 🔥 ✨ Revisions
- Add IV_Revisions.ipynb: new features, improved perturbation scheme and performance evaluation. - Add prom-CHiC preprocessing scripts - Update Readme.md
1 parent fc042ec commit e8afe61

File tree

11 files changed

+16347
-64
lines changed

11 files changed

+16347
-64
lines changed

.gitignore

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,16 +5,29 @@ configurations/conf_microc_rotated_pure_conv_tutorial
55
figures
66
GEO_files
77
inputs/DNA_sequences
8+
inputs/landscape_arrays/K562/
89
inputs/landscape_arrays/training/
10+
inputs/landscape_arrays/training_20kbp_context/
11+
inputs/landscape_arrays/training_no_H3K27ac/
12+
inputs/landscape_arrays/test_20kbp_context/
13+
inputs/landscape_arrays/test_chr19/
14+
inputs/landscape_arrays/test_no_H3K27ac/
915
inputs/microC
16+
inputs/Promoter-CHiC/
1017
inputs/microC_rotated/training/
1118
inputs/perturbed_landscape_arrays/
19+
results_CRISPR/
1220
runs
1321
runs_tutorial
1422
scripts/*.py
1523
scripts/*.sh
16-
targets/perturbed_targets.csv
17-
targets/test_targets_Enformer.csv
24+
tables/
25+
targets/K562/
26+
targets/Enhancer_centered_targets.csv
27+
targets/*perturbed_targets.csv
28+
targets/test_targets_*.csv
29+
targets/*1kbp*
1830
targets/training_targets_Enformer.csv
1931
training_targets.csv
32+
HiC-Pro-3.1.0
2033
*.log

Readme.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ The folder contains the test set inputs for both data modalities, i.e. samples e
2626
### `scripts`
2727

2828
- [`0_Tutorial.ipynb`](https://github.com/RasmussenLab/CLASTER/blob/master/scripts/0_Tutorial.ipynb): The notebook provides a rapid overview of the most important steps in CLASTER's pipeline, including training and validating the network using the EIR framework.
29-
- `1_Data_obtention.ipynb`: This notebook guides the user through the data obtention process, including:
29+
- `I_Data_obtention.ipynb`: This notebook guides the user through the data obtention process, including:
3030
- Data download from publicly available repositories:
3131
- Inputs: Chromatin landscape (ATAC-seq, H3K4me3, H3K27ac and H3K27me3 in mESCs) and structure (Micro-C maps in mESCs)
3232
- Outputs: Nascent transcription profiles (EU-seq).
@@ -35,15 +35,16 @@ The folder contains the test set inputs for both data modalities, i.e. samples e
3535
- Data filtering and preprocessing:
3636
- Obtain numpy arrays for the inputs.
3737
- Obtain csv files for the targets.
38-
- `2_Run_CLASTER.ipynb`: This notebook creates the configuration files required to train and test CLASTER using the EIR framework.
39-
- `2b_Run_HyenaDNA_and_Enformer.ipynb`: The notebook contains our adaptations of the code building
38+
- `II_Run_CLASTER.ipynb`: This notebook creates the configuration files required to train and test CLASTER using the EIR framework.
39+
- `IIb_Run_HyenaDNA_and_Enformer.ipynb`: The notebook contains our adaptations of the code building
4040
- Hyena-DNA (https://github.com/HazyResearch/hyena-dna) in its public colab version.
4141
- Enformer (https://github.com/lucidrains/enformer-pytorch) in its python implementation.
4242
These were used to benchmark CLASTER. It includes:
4343
- The obtention of sequence embeddings from both model's backbones when loading the pretrained weights.
4444
- The addition of a model head on top of the embeddings to match our regression outputs.
4545
- Code to fine-tune Hyena-DNA's backbone and the added head together.
46-
- `3_Data_analysis.ipynb`: The notebook contains the functions used to perform the data analysis and create the figures included in the manuscript.
46+
- `III_Data_analysis.ipynb`: The notebook contains the functions used to perform the data analysis and create the figures included in the manuscript.
47+
- `IV_Revisions.ipynb`: Code and analyses during the revisions.
4748

4849
### `targets`
4950

index.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,8 @@ Modeling nascent RNA transcription from chromatin landscape and structure
1717
:maxdepth: 2
1818
:caption: Scripts:
1919

20-
scripts/1_Data_obtention.ipynb
21-
scripts/2_Run_CLASTER.ipynb
22-
scripts/2b_Run_HyenaDNA_and_Enformer.ipynb
23-
scripts/3_Data_analysis.ipynb
20+
scripts/I_Data_obtention.ipynb
21+
scripts/II_Run_CLASTER.ipynb
22+
scripts/IIb_Run_HyenaDNA_and_Enformer.ipynb
23+
scripts/III_Data_analysis.ipynb
24+
scripts/IV_Revisions.ipynb

scripts/0_Tutorial.ipynb

Lines changed: 2 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1658,7 +1658,7 @@
16581658
"formats": "ipynb,py:percent"
16591659
},
16601660
"kernelspec": {
1661-
"display_name": "Python 3.11.9 ('claster_env_tutorial')",
1661+
"display_name": "Python 3",
16621662
"language": "python",
16631663
"name": "python3"
16641664
},
@@ -1672,12 +1672,7 @@
16721672
"name": "python",
16731673
"nbconvert_exporter": "python",
16741674
"pygments_lexer": "ipython3",
1675-
"version": "3.undefined.undefined"
1676-
},
1677-
"vscode": {
1678-
"interpreter": {
1679-
"hash": "12187645b0fe86aeeeb7c7a27e94503590520c2ee4730b6cfd81a67885c82988"
1680-
}
1675+
"version": "3.11.5"
16811676
}
16821677
},
16831678
"nbformat": 4,
Lines changed: 6 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
"cell_type": "markdown",
2222
"metadata": {},
2323
"source": [
24-
"### Packages"
24+
"### Packages and Functions"
2525
]
2626
},
2727
{
@@ -34,6 +34,7 @@
3434
},
3535
"outputs": [],
3636
"source": [
37+
"# %%writefile claster_utils.py\n",
3738
"import os\n",
3839
"import numpy as np\n",
3940
"import matplotlib as mpl\n",
@@ -54,26 +55,9 @@
5455
"import matplotlib.cm as cm\n",
5556
"from scipy.ndimage import rotate\n",
5657
"\n",
57-
"plt.rcParams['font.family'] = 'Nimbus Roman'"
58-
]
59-
},
60-
{
61-
"cell_type": "markdown",
62-
"metadata": {},
63-
"source": [
64-
"### Functions"
65-
]
66-
},
67-
{
68-
"cell_type": "code",
69-
"execution_count": 69,
70-
"metadata": {
71-
"tags": [
72-
"hide-input"
73-
]
74-
},
75-
"outputs": [],
76-
"source": [
58+
"plt.rcParams['font.family'] = 'Nimbus Roman'\n",
59+
"\n",
60+
"################################# Functions #####################################\n",
7761
"def read_gene_positions(csv_path, resolution=1):\n",
7862
" \"\"\" \n",
7963
" This function reads a csv containing genes and enhancers (entities), and creates a dictionary with the relative coordinates of\n",
@@ -2557,7 +2541,7 @@
25572541
"formats": "ipynb,py:percent"
25582542
},
25592543
"kernelspec": {
2560-
"display_name": "Python 3.11.8",
2544+
"display_name": "Python 3",
25612545
"language": "python",
25622546
"name": "python3"
25632547
},
@@ -2572,11 +2556,6 @@
25722556
"nbconvert_exporter": "python",
25732557
"pygments_lexer": "ipython3",
25742558
"version": "3.11.5"
2575-
},
2576-
"vscode": {
2577-
"interpreter": {
2578-
"hash": "c305a8ed7bd3fb5b79b1e9049e998ba6b84af3f1b497d7bcc87b4717f669b9d3"
2579-
}
25802559
}
25812560
},
25822561
"nbformat": 4,
Lines changed: 1 addition & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5843,7 +5843,7 @@
58435843
"formats": "ipynb,py:percent"
58445844
},
58455845
"kernelspec": {
5846-
"display_name": "Python 3.11.5",
5846+
"display_name": "Python 3",
58475847
"language": "python",
58485848
"name": "python3"
58495849
},
@@ -5858,11 +5858,6 @@
58585858
"nbconvert_exporter": "python",
58595859
"pygments_lexer": "ipython3",
58605860
"version": "3.11.5"
5861-
},
5862-
"vscode": {
5863-
"interpreter": {
5864-
"hash": "c305a8ed7bd3fb5b79b1e9049e998ba6b84af3f1b497d7bcc87b4717f669b9d3"
5865-
}
58665861
}
58675862
},
58685863
"nbformat": 4,
File renamed without changes.

0 commit comments

Comments
 (0)