Skip to content

Commit 867a238

Browse files
authored
Merge pull request #134 from NYU-RTS/cloud_begin_port
Cloud section first pass
2 parents 006ae1e + d9f2a4c commit 867a238

File tree

18 files changed

+571
-14
lines changed

18 files changed

+571
-14
lines changed

_typos.toml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,8 @@
11
[default.extend-words]
22
# Scientific Software
33
namd = "namd"
4+
5+
[default.extend-identifiers]
6+
# Technical terms
7+
SerDe = "SerDe"
8+
JsonSerDe = "JsonSerDe"
Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
# Start here!
2+
3+
We facilitate access to cloud computing resources (GCP) for research, HPC Bursting, visualization, data analysis (hadoop) and host an OpenShift Kubernetes cluster on-prem. Please proceed to the relevant section to know more about the offerings and how you could harness them.
4+
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"label": "Getting Started"
3+
}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
# Google Cloud Platform for Research
2+
3+
[internet2-gcp]: https://internet2.edu/services/google-cloud-platform/
4+
[waive-egress]: https://cloud.google.com/billing/docs/how-to/data-transfer-waiver
5+
[nih-strides]: https://cloud.nih.gov/
6+
[internet2-cc]: https://internet2.edu/services/cloud-connect/
7+
[google-for-edu]: https://cloud.google.com/edu?hl=en
8+
[gcp-get-started]: https://cloud.google.com/docs/get-started/
9+
[gcp-credits]: https://cloud.google.com/edu/researchers?hl=en
10+
11+
We facilitate access to Google Cloud Platform (GCP) for your research projects. NYU is a member of the [Internet2 Net+ GCP program][internet2-gcp] that allows for community-negotiated GCP terms that provides NYU researchers with benefits that include, amongst other things, price discounts, [waivers for data egress fees][waive-egress], and [NIH STRIDES][nih-strides] initiative benefits.
12+
13+
In addition to the I2 Net+ GCP benefits, NYU scholars can enjoy significant discounts in using GCP resources in their research project through a 3 year commitment (started in 2019) NYU made in GCP. The NYU network connects with GCP via Partner Interconnect using [Internet2 Cloud Connect (I2CC)][internet2-cc] service.
14+
15+
## Why work with the NYU Research Cloud team to deploy your research project on GCP?
16+
17+
NYU researchers who work with the Research Cloud team to deploy projects on GCP may benefit from the following:
18+
- Discounted rates, lower GCP project cost: Through NYU's participation in the Internet2 Net+ agreement, as well as the 3-year commitment NYU made to using GCP, GCP projects enjoy discounted rates, lowering the cost of the project. The exact discounts depend on the GCP service used in the research project and can vary between 5% and 25%. Free data egress is usually included.
19+
- GCP Expertise and project setup: The NYU research cloud team can work closely with researchers to Identity and Access Management, groups access, establish billing, etc.
20+
- GCP Support cases: Given proper permissions the NYU Research Cloud team can open support cases on behalf of the researchers or directly discuss GCP project issues with GCP experts
21+
- Cost monitoring, billing alerts, and spending reports: The Research Cloud team has access to additional tools that can provide cost monitoring, switch between billing options and provide spending reports.
22+
- Research project funding options: The NYU research cloud team can help with providing invoices, internal fund transfers, establishing GCP Billing ids, etc. that can enable NYU researchers to pay for GCP services.
23+
24+
## Getting Started with GCP
25+
There are a number of ways you can get started with using GCP in research projects:
26+
- To start using GCP resources in a research project, request a consultation with the Research Cloud team (via email research-cloud-support@nyu.edu). The Research Cloud team can advise on ways to set up your research project, available discounts, etc.
27+
- If a project involves Data Science and Machine Learning, consult with the DS3 team before starting your project on GCP.
28+
:::tip
29+
Creating a GCP project using your NYU account (NetID@nyu.edu) will place the project under the nyu.edu organization, an environment managed by the NYU Research Cloud team. However, your project doesn't automatically qualify for the price discounts and benefits that NYU has negotiated with GCP.
30+
:::
31+
:::warning
32+
For non-NYU work on GCP, please use your personal, non-NYU, Google email when creating GCP projects.
33+
:::
34+
35+
The NYU Research Cloud team does not currently offer training on how to deploy and utilize resources on GCP in research projects or teaching. If you are new to GCP and you want to learn the GCP fundamentals or if you want to learn how to perform specific tasks on GCP (obtain skill badges), please consider the following resources:
36+
- Through the [Google For Education][google-for-edu] program, GCP offers training credits and discounts to Students, Faculty, and IT Admins. To apply for training credits and discounts, please [click here](https://services.google.com/fb/forms/googlecloudskillsbooststudenttrainingcreditsapplication/).
37+
- [Getting started with Google Cloud Platform][gcp-get-started] offers quick starts and sample projects on GCP.
38+
39+
## How can I fund my research project on GCP?
40+
### GCP Free Tier
41+
Apply for credits using your NYU account (https://cloud.google.com/free/). After credits expire, if you would like to switch to another type of funding and are approved to do so, we will modify your project so it can use other funds:
42+
- https://edu.google.com/programs/credits/research/?modal_active=none
43+
- https://edu.google.com/programs/?modal_active=none
44+
### Sources of funding for GCP project
45+
Please consider options below and explore other options which may exist for your specific field.
46+
- [Google Cloud research credits][gcp-credits]
47+
- NIH STRIDES
48+
- NSF CloudBank
49+
- GCP seed grant through NYU HPC
50+
- Departmental funds
51+
- Apply using this form [GCP project request form](https://docs.google.com/forms/d/e/1FAIpQLSfyIxlsqBXtYlVgE8j_MZ1030HKdkTPWphXwwBUM5g6Lhi44w/viewform)
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# NIH Strides
2+
3+
[nih-strides]: https://cloud.nih.gov/
4+
5+
The NIH Science and Technology Research Infrastructure for Discovery, Experimentation, and Sustainability ([STRIDES][nih-strides]) Initiative allows NIH to explore the use of cloud environments to streamline NIH data use by partnering with commercial providers (GCP/AWS). The NIH STRIDES Initiative:
6+
- Provides biomedical researchers with access to advanced, cost-effective cloud computing infrastructure, tools, and resources
7+
- Enable researchers to work collaboratively in the cloud by establishing an interconnected ecosystem of biomedical research data
8+
- Equip researchers with emerging cloud solutions for data management & computation to enable experimentation and innovation
9+
10+
The benefits of participating in NIH Strides program include:
11+
- Pre-negotiated favorable pricing for cloud services
12+
- Access to training to help researchers harness the power of the cloud
13+
- Receive opportunities for professional service engagements to help drive success
14+
- Receive guidance for best practices in areas such as data storage, governance, and controlled access
15+
16+
## Enrolling to the NIH Strides initiative
17+
18+
NYU has enrolled in the NIH Strides initiative in December 2020 by signing an agreement with Carahsoft, GCP's billing and administrative partner. Thus NIH-funded NYU researchers with an active NIH award may take advantage of the STRIDES Initiative for their NIH-funded research projects. The NYU RTS team works closely with Burwood Group, a GCP reseller, to provide access to GCP resources for NYU researchers who are approved to participate in the NIH STRIDES initiative. NYU researchers who wish to participate must follow the steps outlined below.
19+
20+
21+
## Contacts
22+
- For general questions about Research Cloud/GCP please email the NYU HPC Research Cloud team: hpc-cloud@nyu.edu
23+
- To learn more about the NIH Strides Initiative, email the team at strides@nih.gov
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"label": "GCP self-managed projects"
3+
}
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# HPC Bursting
2+
3+
[gcp-cost-calculator]: https://cloudpricingcalculator.appspot.com/
4+
[bursting-form]: https://sites.google.com/nyu.edu/nyu-hpc/hpc-systems/cloud-computing/hpc-bursting-to-cloud/hpc-bursting-request-form?authuser=0
5+
6+
HPC may provide bursting capabilities to researchers or classes, in some cases, in order to augment the available resources. Bursting is ideal for when you need a large amount of resources for a very short period of time. The way that bursting is made possible is by running a scalable SLURM cluster in the Google Cloud Platform (GCP), which is separate from the on-premise HPC clusters.
7+
8+
Bursting is not available to all users and requires advanced approval. In order to get access to these capabilities, please contact hpc@nyu.edu to check your eligibility. Please let us know the amount of storage, total CPUs, Memory, GPU, the number of days you require access, and the estimated total CPU/GPU hours you will use. For reference, please review the [GCP cost calculator][gcp-cost-calculator]. Please send a copy of your cost calculation to hpc@nyu.edu as well.
9+
10+
:::tip
11+
To request access to the HPC Bursting capabilities, [please complete this form][bursting-form].
12+
:::
13+
14+
## Running a Bursting Job
15+
:::note
16+
This is not public, only per request of eligible classes or researchers
17+
:::
18+
```sh
19+
ssh <NetID>@greene.hpc.nyu.edu
20+
```
21+
ssh to the class on GCP (burst login node) - anyone can login but you can only submit jobs if you have approval
22+
```sh
23+
ssh burst
24+
```
25+
Start an interactive job
26+
```sh
27+
srun --account=hpc --partition=interactive --pty /bin/bash
28+
```
29+
If you got an error "Invalid account or account/partition combination specified" it means your account is not approved to use cloud bursting.
30+
31+
Once your files are copied to the bursting instance you can run a batch job from the interactive session.
32+
33+
## Access to Slurm Partitions
34+
In the example above the partition "interactive" is used. You can list current partitions by running command
35+
```sh
36+
sinfo
37+
```
38+
39+
However, approval is required to submit jobs to the partitions. Partitions are set up by the resources available to a job, such as the number of CPU, amount of memory, and number of GPUs. Please email hpc@nyu.edu to request access to a specific partition or create a new partition (e.g. 10 CPUs and 64 GB Memory) for more optimal cost/performance of your job.
40+
41+
### Storage
42+
43+
Torch's `/home` and `/scratch` are mounted (available) at login node of bursting setup.
44+
45+
Compute node however, do have independent `/home` and `/scratch`. These `/home` and `/scratch` mounts are persistent, are available from any compute node and independent from `/home` and `/scratch` at Torch.
46+
47+
User may need to copy data from Torch's `/home` or `/scratch` to GCP mounted `/home` or `/scratch`
48+
49+
When you run a bursting job the compute nodes will not see those file mounts. This means that you need to copy data to the burst instance.
50+
51+
The file systems are independent, so you must copy data to the GCP location.
52+
53+
To copy data, you must first start an interactive job. Once started, you can copy your data using scp from the HPC Data Transfer Nodes (greene-dtn). Below is the basic setup to copy files from Torch to your home directory while you are in an interactive bursting job:
54+
```sh
55+
scp <NetID>@greene-dtn.hpc.nyu.edu:/path/to/files /home/<NetID>/
56+
```
57+
58+
59+
### Current Limits
60+
61+
20,000 CPUs available at any given time for all active bursting users
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Visualization Workstations
2+
3+
[vnc-clients]: https://help.ubuntu.com/community/VNC/Clients
4+
5+
The burst cluster includes a partition (`nvgrid`) that can be used to run graphical applications on NVIDIA GPUs for visualization purposes. You can use this partition by following the instructions below.
6+
7+
8+
Add the following to your SSH config file (~/.ssh/config) on your local workstation so that you can log into the burst login node directly:
9+
```sh
10+
Host burst
11+
HostName burst.hpc.nyu.edu
12+
User <NetID>
13+
ProxyJump <NetID>@greene.hpc.nyu.edu
14+
ProxyJump <NetID>@burst.hpc.nyu.edu
15+
StrictHostKeyChecking no
16+
UserKnownHostsFile /dev/null
17+
LogLevel ERROR
18+
```
19+
20+
Log into the burst login node by running `ssh <NetID>@burst` while on-campus or connected to the VPN. Run the following command on the login node to request an interactive command line session:
21+
```sh
22+
srun --account=hpc --partition=nvgrid --gres=gpu:p100:1 --pty /bin/bash
23+
```
24+
25+
When your interaction session is active, run the following command to start the VNC (remote desktop) server. If this is the first time you’ve used a visualization node, you will be prompted to set a password to use when you access your remote session:
26+
```sh
27+
/opt/TurboVNC/bin/vncserver
28+
```
29+
30+
Note the hostname of the node that you are running on. This hostname is displayed in the NODELIST column of the output from the squeue command:
31+
```
32+
[NetID@b-23-1 ~]$ squeue
33+
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
34+
35+
92727 nvgrid bash jp6546 R 2:55 1 b-23-1
36+
```
37+
38+
In another terminal on your local machine, run the following command:
39+
```
40+
ssh -N -L 5901:<Hostname>:5901 <NetID>@burst
41+
```
42+
This command will ensure that you can connect to the remote desktop service from your local computer.
43+
44+
45+
If you do not already have a VNC remote desktop client installed on your computer, you will need to install one. A list of VNC clients available for various platforms can be found [here][vnc-clients]. Note that Mac OS X comes with a built-in VNC client, which is accessible from the Finder by navigating to Go → Connect to Server and then typing vnc:// at the beginning of the server field.
46+
47+
Within your VNC client, connect to `localhost:5901` (`vnc://localhost:5901` on Mac OS).
48+
49+
You should now be presented with a desktop environment. If you are using any OpenGL-based applications that are started from a terminal, be sure to type vglrun before the command name in order to ensure that the application uses the GPU.
50+
51+
After your first time using the `nvgrid` partition, you can start the remote desktop server non-interactively using the following batch script (although you will need to remember the password that you set in step 3). Note that the sleep command should have the length of time that you want the server to run (in seconds) after it (3600 seconds for 1 hour in the example below).
52+
```sh
53+
#!/bin/bash
54+
55+
#SBATCH --gres=gpu:p100:1
56+
#SBATCH --partition=nvgridk
57+
#SBATCH --account=hpc
58+
#SBATCH --job-name=vnc
59+
#SBATCH --time=1:00
60+
#SBATCH --output=slurm_%j.out
61+
62+
/opt/TurboVNC/bin/vncserver
63+
64+
sleep 3600
65+
```
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"label": "HPC bursting to cloud"
3+
}

docs/cloud/04_dataproc/01_intro.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# Dataproc
2+
3+
[gcs]: https://cloud.google.com/storage?hl=en
4+
[bigquery]: https://cloud.google.com/bigquery?hl=en
5+
[bigtable]: https://cloud.google.com/bigtable?hl=en
6+
[data-locality]: https://en.wikipedia.org/wiki/Locality_of_reference
7+
[yarn-ui]: https://dataproc.hpc.nyu.edu/yarn/
8+
9+
10+
Dataproc is a cloud-based Hadoop distribution that is managed by Google. Google administers updates to Dataproc so that it is kept current. Google also packages and maintains additional software that can be run on top of Hadoop. Additionally Dataproc includes other cloud-specific features, such as the ability to automatically add/remove nodes depending upon how busy the cluster is (autoscaling). It can also use object storage ([GCS][gcs]) or [BigQuery][bigquery] as an alternative to HDFS, and provides integration with [BigTable][bigtable] using HBase interfaces.
11+
12+
### What is Hadoop?
13+
14+
Hadoop is an open-source software framework for storing and processing big data in a distributed/parallel fashion on large clusters of commodity hardware. At its core, Hadoop strives to increase processing speed by increasing [data locality][data-locality] (i.e., it moves computation to servers where the data is located). There are three components to Hadoop: HDFS (the Hadoop Distributed File System), the Hadoop implementation of MapReduce, and YARN (Yet Another Resource Negotiator; a scheduler).
15+
16+
### Autoscaling
17+
18+
NYU Dataproc is configured to be as cloud-agnostic as possible, and still uses HDFS and non-proprietary HBase components. It does, however, leverage autoscaling. This means that if the cluster hasn't been used for a while, it might take a while for resources to become available (typically 3-5 minutes) as nodes are turned on and NodeManagers register with YARN. During this time, the following warning message will appear and indicate that the cluster is at capacity and resources are not available:
19+
```sh
20+
WARN org.apache.spark.scheduler.cluster.YarnScheduler: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
21+
```
22+
This warning will go away after new nodes have been added by autoscaling and more resources are available for YARN applications. If it does not go away after more than 10 minutes, please contact the HPC team. Autoscaling is actively monitored, but a duration of more than 10 minutes may indicate a failure of Dataproc's monitoring infrastructure.
23+
24+
Currently, NYU Dataproc's autoscaling is configured so that the cluster will have between 3 and 43 nodes depending upon demand. The number of nodes that are currently active can be seen in the YARN web UI. Additionally, percentage of cluster capacity that is used can be seen on the Scheduler page in the lefthand menu in the [YARN web UI][yarn-ui].
25+
26+
## Accessing the NYU Dataproc Hadoop Cluster
27+
28+
Access to the NYU Dataproc cluster is granted via your NYU Google account. If you are in a class that uses Dataproc, your instructor or TA will request Dataproc access for your NetID.
29+
30+
Once this is granted you can log in by navigating to https://dataproc.hpc.nyu.edu/ssh in your web browser. After you’ve reached this page you will have access to a browser-based terminal interface where you can run Hadoop commands.
31+
32+
If you are having difficulty connecting to a terminal, please make sure that you are not logged in to a non-NYU Google account by clicking the icon displayed in the upper right corner in the Gmail web interface (see [here](https://support.google.com/accounts/answer/1721977)). If you are using Google Chrome, you may also need to switch to your NYU account profile using the instructions [here](https://support.google.com/chrome/answer/2364824).
33+
34+
If you continue to have difficulties connecting via https://dataproc.hpc.nyu.edu/ssh, you can also log in by navigating to https://shell.cloud.google.com/ and running the following command in the terminal that appears:
35+
```sh
36+
gcloud compute ssh nyu-dataproc-m --project=hpc-dataproc-19b8 --zone=us-central1-f
37+
```
38+
You may need to authorize Google to log in to Dataproc after running the above command.
39+
40+
Once logged in, your username will be of the form NetID_nyu_edu rather than just your NetID.

0 commit comments

Comments
 (0)