Skip to content

Commit 93749bd

Browse files
authored
Merge pull request #157 from NYU-RTS/ood
OOD Tutorials
2 parents 6744a7c + d0cb294 commit 93749bd

40 files changed

+347
-107
lines changed

docs/hpc/09_ood/01_ood_intro.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ Under the **clusters** menu you can select the **Greene Shell Access** option to
1818

1919
![img](./static/open_ondemand_gif.gif)
2020

21-
Please see our documentation on [Submitting Jobs](http://localhost:3000/rts-docs-dev/docs/hpc/submitting_jobs/slurm_submitting_jobs/) if you'd like detailed instructions.
21+
Please see our documentation on [Submitting Jobs](../05_submitting_jobs/01_slurm_submitting_jobs.md) if you'd like detailed instructions.
2222

2323
**Interactive Applications**
2424

@@ -63,4 +63,4 @@ Just click on the `Session ID` link and a tab will open with the contents of the
6363

6464
#### From terminal
6565

66-
If your session is no longer visible from within OOD you may still be able to find your logs via the terminal. Simply [log into Greene](https://sites.google.com/nyu.edu/nyu-hpc/accessing-hpc) and `cd` to `/home/$USER/ondemand/data/sys/dashboard/batch_connect/sys/` and then `cd` into the directory for the app that you're interested in. You should find the file `output.log` there.
66+
If your session is no longer visible from within OOD you may still be able to find your logs via the terminal. Simply [log into Greene](../02_connecting_to_hpc/01_connecting_to_hpc.mdx) and `cd` to `/home/$USER/ondemand/data/sys/dashboard/batch_connect/sys/` and then `cd` into the directory for the app that you're interested in. You should find the file `output.log` there.

docs/hpc/09_ood/02_CellACDC.mdx

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Cell-ACDC in OOD
2+
3+
[Cell-ACDC](https://cell-acdc.readthedocs.io) is a GUI-based Python framework for segmentation, tracking, cell cycle annotations and quantification of microscopy data.
4+
5+
## Getting Started
6+
You can run Cell-ACDC in OOD by going to the URL [ood.hpc.nyu.edu](http://ood.hpc.nyu.edu) in your browser and selecting `Cell-ACDC` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header.
7+
8+
:::note
9+
Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your Cell-ACDC session differently, you'll need to select it from the menu.
10+
:::
11+
12+
## Configuration
13+
14+
You can select the number of cores, amount of memory, and number of hours.
15+
16+
![OOD Cell-ACDC Configuration](./static/ood_cellacdc_config.png)
17+
18+
## Cell-ACDC running in OOD
19+
20+
After you hit the `Launch` button you'll have to wait for the scheduler to find node(s) for you to run on:
21+
![OOD Cell-ACDC in queue](./static/ood_cellacdc_in_queue.png)
22+
23+
Then you'll have a short wait for the Cell-ACDC itself to start up.<br />
24+
Once that happens you'll get one last page that will give you links to:
25+
- open a terminal window on the compute node your Cell-ACDC session is running on
26+
- go to the directory associated with your Session ID that stores output, config and other related files for your session
27+
- make changes to compression and image qualtiy
28+
- get a link that you can share that will allow others to view your Cell-ACDC session
29+
30+
![Pre-launch Cell-ACDC OOD](./static/ood_cellacdc_prelaunch.png)
31+
32+
Please click the `Launch Cell-ACDC` button and a Cell-ACDC window will open.

docs/hpc/09_ood/03_Dask.mdx

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
# Dask in Jupyter Notebook in OOD
2+
3+
[Dask](https://docs.dask.org/en/stable/) is a Python library for parallel and distributed computing.
4+
5+
## Getting Started
6+
You can run Dask in a Jupyter Notebook in OOD by going to the URL [ood.hpc.nyu.edu](http://ood.hpc.nyu.edu) in your browser and selecting `DS-GA.1004 - Jupyter Dask` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header.
7+
8+
:::note
9+
Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your Dask session differently, you'll need to select it from the menu.
10+
:::
11+
12+
## Configuration
13+
14+
You can select the Dask version, number of cores, amount of memory, root directory, number of hours, and optional Slurm options.
15+
16+
![OOD Dask Configuration](./static/ood_dask_config.png)
17+
18+
:::warning
19+
If you select to use `/home` as your root directory be careful not to go over your quota. You can find your current usage with the `myquota` command. Please see our [Storage documentation](../03_storage/01_intro_and_data_management.mdx) for details about your storage options.
20+
:::
21+
22+
## Dask with Jupyter Notebook running in OOD
23+
24+
After you hit the `Launch` button you'll have to wait for the scheduler to find node(s) for you to run on:
25+
![OOD Dask in queue](./static/ood_dask_in_queue.png)
26+
27+
Then you'll have a short wait for Dask itself to start up.<br />
28+
Once that happens you'll get one last page that will give you links to:
29+
- open a terminal window on the compute node your Dask session is running on
30+
- go to the directory associated with your Session ID that stores output, config and other related files for your session
31+
32+
![Pre-launch Dask OOD](./static/ood_dask_prelaunch.png)
33+
34+
Please click the `Connect to Jupyter` button and a Jupyter window will open.
35+
36+
## Dask Example
37+
38+
Start a new Jupyter notebook with 4 cores, 16GB memory, and set your root directory to `/scratch`. Enter the following code in the first cell and execute it by pressing the `Shift` and `Enter` keys at the same time.
39+
```python
40+
import os
41+
import pandas as pd
42+
import numpy as np
43+
import time
44+
45+
# Create a directory for the large files
46+
output_dir = "tmp/large_data_files"
47+
os.makedirs(output_dir, exist_ok=True)
48+
49+
num_files = 5 # Number of files to create
50+
rows_per_file = 10_000_000 # 10 million rows per file
51+
for i in range(num_files):
52+
data = {
53+
'col1': np.random.randint(0, 100, size=rows_per_file),
54+
'value': np.random.rand(rows_per_file) * 100
55+
}
56+
df = pd.DataFrame(data)
57+
df.to_csv(os.path.join(output_dir, f'data_{i}.csv'), index=False)
58+
print(f"{num_files} large CSV files created in '{output_dir}'.")
59+
60+
import dask.dataframe as dd
61+
from dask.distributed import Client
62+
import time
63+
import os
64+
65+
# Start a Dask client for distributed processing (optional but recommended)
66+
# This allows you to monitor the computation with the Dask dashboard
67+
client = Client(n_workers=4, threads_per_worker=2, memory_limit='16GB') # Adjust these as per your system resources
68+
print(client)
69+
70+
# Load multiple CSV files into a Dask DataFrame
71+
# Dask will automatically partition and parallelize the reading of these files
72+
output_dir = '/scratch/rjy1/tmp/large_data_files'
73+
dask_df = dd.read_csv(os.path.join(output_dir, 'data_*.csv'))
74+
75+
# Perform a calculation (e.g., calculate the mean of the 'value' column)
76+
# This operation will be parallelized across the available workers
77+
result_dask = dask_df['value'].mean()
78+
79+
# Trigger the computation and measure the time
80+
start_time = time.time()
81+
computed_result_dask = result_dask.compute()
82+
end_time = time.time()
83+
84+
print(f"Dask took {end_time - start_time} seconds to compute the mean across {num_files} files.")
85+
print(f"Result (Dask): {computed_result_dask}")
86+
87+
import pandas as pd
88+
import time
89+
import os
90+
91+
# Perform the same calculation sequentially with Pandas
92+
start_time_pandas = time.time()
93+
total_mean = 0
94+
total_count = 0
95+
for i in range(num_files):
96+
df = pd.read_csv(os.path.join(output_dir, f'data_{i}.csv'))
97+
total_mean += df['value'].sum()
98+
total_count += len(df)
99+
computed_result_pandas = total_mean / total_count
100+
end_time_pandas = time.time()
101+
102+
print(f"Pandas took {end_time_pandas - start_time_pandas} seconds to compute the mean across {num_files} files.")
103+
print(f"Result (Pandas): {computed_result_pandas}")
104+
```
105+
You should get output like:
106+
```
107+
5 large CSV files created in 'tmp/large_data_files'.
108+
<Client: 'tcp://127.0.0.1:45511' processes=4 threads=8, memory=59.60 GiB>
109+
Dask took 3.448112726211548 seconds to compute the mean across 5 files.
110+
Result (Dask): 50.010815178612596
111+
Pandas took 9.641847610473633 seconds to compute the mean across 5 files.
112+
Result (Pandas): 50.01081517861258
113+
```

docs/hpc/09_ood/04_Desktop.mdx

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Desktop in OOD
2+
3+
You can get a basic desktop interface to HPC resources.
4+
5+
## Getting Started
6+
You can get a desktop in OOD by going to the URL [ood.hpc.nyu.edu](http://ood.hpc.nyu.edu) in your browser and selecting `Desktop` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header.
7+
8+
:::note
9+
Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your Desktop session differently, you'll need to select it from the menu.
10+
:::
11+
12+
## Configuration
13+
14+
You can select the number of cores, amount of memory, GPU type (if any), number of hours, and optional Slurm options.
15+
16+
![OOD Desktop Configuration](./static/ood_desktop_config.png)
17+
18+
## Desktop running in OOD
19+
20+
After you hit the `Launch` button you'll have to wait for the scheduler to find node(s) for you to run on:
21+
![OOD Desktop in queue](./static/ood_desktop_in_queue.png)
22+
23+
Then you'll have a short wait for the Desktop itself to start up.<br />
24+
Once that happens you'll get one last page that will give you links to:
25+
- open a terminal window on the compute node your Desktop session is running on
26+
- go to the directory associated with your Session ID that stores output, config and other related files for your session
27+
- make changes to compression and image qualtiy
28+
- get a link that you can share that will allow others to view your Desktop session
29+
30+
![Pre-launch Desktop OOD](./static/ood_desktop_prelaunch.png)
31+
32+
Please click the `Launch Desktop` button and a Desktop window will open.

docs/hpc/09_ood/05_FileZilla.mdx

Lines changed: 0 additions & 24 deletions
This file was deleted.
Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Integrative Genomics Viewer (IGV)
1+
# Integrative Genomics Viewer (IGV) in OOD
22

33
The IGV is a high-performance, easy-to-use, interactive tool for the visual exploration of genomic data.
44

@@ -7,14 +7,12 @@ Please see the following links for details:
77
- [Tutorial Videos](https://www.youtube.com/channel/UCb5W5WqauDOwubZHb-IA_rA)
88

99
## Getting Started
10-
You can run IGV in OOD by going to the URL [ood.hpc.nyu.edu](http://ood.hpc.nyu.edu) in your browser and selecting `IGV` from the `Interactive Apps` pull-down menu at the top of the page. As you can see below, once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header.
10+
You can run IGV in OOD by going to the URL [ood.hpc.nyu.edu](http://ood.hpc.nyu.edu) in your browser and selecting `IGV` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header.
1111

1212
:::note
1313
Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your IGV session differently, you'll need to select it from the menu.
1414
:::
1515

16-
![OOD Interactive Apps menu IGV](./static/ood_interactive_apps_igv.png)
17-
1816
## Configuration
1917

2018
You can select the number or cores, amount of memory, amount of time, and optional Slurm options.

docs/hpc/09_ood/06_Alphafold2.mdx

Lines changed: 0 additions & 61 deletions
This file was deleted.

docs/hpc/09_ood/09_jbrowse_genome_browser.mdx renamed to docs/hpc/09_ood/06_jbrowse_genome_browser.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# JBrowse Genome Browser
1+
# JBrowse Genome Browser in OOD
22

33
[JBrowse](https://jbrowse.org/jbrowse1.html) is a web-based genome browser for visualizing genomic features in common file formats, such as variants (VCF), genes (GFF3, BigBed) and gene expression (BigWig), and sequence alignments (BAM, CRAM, and GFF3).
44

docs/hpc/09_ood/02_jupyter_with_conda_singularity.mdx renamed to docs/hpc/09_ood/07_jupyter_with_conda_singularity.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Jupyter with Conda/Singularity in OOD
1+
# Jupyter Notebook with Conda/Singularity in OOD
22

33
## OOD + Singularity + conda
44
This page describes how to use your Singularity with conda environment in OOD GUI at Greene.
Lines changed: 6 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
1-
# Matlab in OOD
1+
# Matlab-Proxy in OOD
22

33
## Getting Started
4-
You can run Matlab in OOD by going to the URL [ood.hpc.nyu.edu](http://ood.hpc.nyu.edu) in your browser and selecting `MATLAB` from the `Interactive Apps` pull-down menu at the top of the page. As you can see below, once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header.
4+
You can run Matlab-Proxy in OOD by going to the URL [ood.hpc.nyu.edu](http://ood.hpc.nyu.edu) in your browser and selecting `MATLAB` from the `Interactive Apps` pull-down menu at the top of the page. Once you've used it and other interactive apps they'll show up on your home screen under the `Recently Used Apps` header.
55

66
:::note
77
Be aware that when you start from `Recently Used Apps` it will start with the same configuration that you used previously. If you'd like to configure your Matlab session differently, you'll need to select it from the menu.
88
:::
99

10-
![OOD Interactive Apps menu](./static/ood_interactive_apps_matlab.png)
11-
1210
## Configuration
1311

14-
You can select the version of Matlab to use, the number or cores, amount of memory, GPU type (if any), amount of time, and optional Slurm options.
12+
You can select the version of Matlab to use, the number or cores, amount of memory, GPU type (if any), amount of time, account, and optional Slurm options.
1513

1614
![OOD Matlab Configuration](./static/ood_matlab_config.png)
1715

@@ -47,13 +45,10 @@ After you hit the `Launch` button you'll have to wait for the scheduler to find
4745
![OOD Matlab in queue](./static/ood_matlab_in_queue.png)
4846

4947
Then you'll have a short wait for Matlab itself to start up.<br />
50-
Once that happens you'll get one last form that will allow you to:
51-
- make changes to compression and image qualtiy
48+
Once that happens you'll get a form that will allow you to:
5249
- open a terminal window on the compute node your Matlab session is running on
53-
- get a link that you can share that will allow others to view your Matlab session
50+
- go to the directory associated with your Session ID that stores output, config and other related files for your session
5451

5552
![Pre-launch matlab OOD](./static/ood_matlab_prelaunch.png)
5653

57-
Then after you hit the `Launch Matlab` button you'll have the familiar Matlab Desktop to use.
58-
59-
![OOD Matlab Running](./static/ood_matlab_running.png)
54+
Then after you hit the `Connect to Matlab` button you'll have the familiar Matlab Desktop to use.

0 commit comments

Comments
 (0)