You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user/_sources/cloud.txt
+28-20Lines changed: 28 additions & 20 deletions
Original file line number
Diff line number
Diff line change
@@ -4,23 +4,25 @@ Running C-PAC in the Cloud
4
4
Introduction
5
5
^^^^^^^^^^^^
6
6
7
-
An Amazon Marketplace AMI for C-PAC has been released, making it easier for researchers to use C-PAC in the cloud. You can use the AMI to either launch a single machine for basic runs or create a high performance computing (HPC) cluster using Starcluster. Clusters can be dynamically scaled up as your computational needs increase. Detailed explanations of cloud computing and HPC are beyond the scope of this documentation, but we will define a few key terms before we start. If these terms are familiar, you may skip them and proceed to later sections.
7
+
The C-PAC team has released an Amazon Marketplace AMI, making it easier for researchers to use C-PAC in the cloud. You can use the AMI to either launch a single machine for basic runs or create a high performance computing (HPC) cluster using Starcluster. Clusters can be dynamically scaled up as your computational needs increase. Detailed explanations of cloud computing and HPC are beyond the scope of this documentation, but we will define a few key terms before we start. If these terms are familiar, you may skip them and proceed to later sections.
8
8
9
9
* Amazon Machine Instance (AMI) - A disk image of an operating system and any additional installed software that can be used to create a virtual machine.
10
10
11
11
* Instance - A single running virtual machine whose initial state is based on the AMI that it is launched from. Instances can be classified as spot instances or on-demand instances. On-demand instances are reliably created the moment they are requested for a fixed rate. Spot instances are created based on whether or not a bid that you set is accepted by Amazon. They can be significantly cheaper than on-demand instances, but are only created when Amazon accepts your bid.
12
12
13
13
* Instance Type - The hardware specification for a given instance. A list of the instance types made available by Amazon may be found `here <http://aws.amazon.com/ec2/instance-types>`_.
14
14
15
-
* Terminated Instance - An instance is considered terminated when its resources have been completely freed up for use by others in the Amazon cloud. Any data on a terminated instance that is not relocated to persistent storage such as EBS (see below) will be completely discarded. Instance termination is the virtual equivalent of throwing out a physical server. When you have terminated an instance, you are no longer paying for it. If your data and results are in persistent storage, you should terminate any instances you are using when you are done. Note that by default, instances do not have persistent storage attached to them- you will need to configure persistent storage when you set up the instance.
15
+
* Terminated Instance - An instance is considered terminated when its resources have been completely freed up for use by others in the Amazon cloud. Any data on a terminated instance that is not relocated to persistent storage such as EBS (see below) will be completely discarded. Instance termination is the virtual equivalent of throwing out a physical server. When you have terminated an instance, you are no longer paying for it. Note that by default, instances do not have persistent storage attached to them- you will need to configure persistent storage when you set up the instance.
16
16
17
17
* Stopped Instance - An instance is considered stopped when it is not active, but its resources are still available for future use whenever you choose to reactivate it. Stopping an instance is the virtual equivalent of turning a computer off or putting it in hibernate mode. When you stop an instance, you continue to pay for the storage associated with it (i.e., the main and other volumes attached to it), but not for the instance itself. You should stop an instance when the analyses you are working on are not fully done and you would like to preserve the current state of a running instance.
18
18
19
-
* Simple Storage Service (S3) - A form of storage offered by Amazon. S3 is not intended for use with instances since it lacks a filesystem, but it can be used to archive large datasets. It is less costly than EBS.
19
+
* Simple Storage Service (S3) - A form of storage offered by Amazon. S3 is not intended to be directly attached to instances since it lacks a filesystem, but it can be used to archive large datasets. Amazon provides tools for uploading data to S3 'buckets' where it can be stored. It is less costly than EBS.
20
20
21
21
* Elastic Block Storage (EBS) - A form of persistent storage offered by Amazon for use with instances. When you have terminated an instance, items stored in an EBS volume can be accessed by any future instances that you start up.
22
22
23
-
* Head Node - The primary node of an HPC cluster, which all other nodes are connected to. The head node will run a job scheduler (such as Sun Grid Engine) to allocate jobs to the other nodes. Jobs may also be run on the head node.
23
+
* EC2 Instance Store - A form of temporary storage that comes included with some instance types. Instance store volumes must be added manually before launching an instance, and all files stored on them will be lost when the instance is terminated. The instance store is typically mounted at ``/mnt``.
24
+
25
+
* Head Node - The primary node of an HPC cluster, to which all other nodes are connected. The head node will run a job scheduler (such as Sun Grid Engine) to allocate jobs to the other nodes.
24
26
25
27
* Worker Node - A node in an HPC cluster to which tasks are delegated by the head node via a job scheduler.
26
28
@@ -37,19 +39,19 @@ Before you can create a single C-PAC machine or a C-PAC HPC cluster, you must fi
37
39
38
40
#. Click the `Sign in to the AWS Console` button
39
41
40
-
#. Enter your e-mail address and password. If you do not already have an account, enter your e-mail address, select `I am a new user.` and click the `Sign in` button.
42
+
#. Enter your e-mail address and password. If you do not already have an account, enter your e-mail address, select `I am a new user.` and click the `Sign in` button. Provide Amazon with the information (e-mail address, payment method) needed to create your account.
41
43
42
-
#. Amazon has different regions that it hosts its web services from (e.g. Oregon, Northern Virginia, Tokyo). In the upper right-hand corner there will be a region that you are logged into next to your user name. Change this to your preferred region.
44
+
#. Amazon has different regions that it hosts its web services from (e.g. Oregon, Northern Virginia, Tokyo). In the upper right-hand corner there will be a region that you are logged into next to your user name. Change this to your preferred region. The Marketplace AMI is available in all regions, although public AMIs (non-Marketplace AMIs shared from personal accounts) may not be.
43
45
44
46
#. Click on your name in the upper right corner and navigate to `Security Credentials`. Accept the disclaimer that appears on the page.
45
47
46
48
#. Click on `Access Keys` and click on the blue `Create New Access Key` button. Click `Download Key File` and move the resulting csv file to a safe and memorable location on your hard drive.
47
49
48
50
#. Click on the box in the upper left corner of AWS. Click on `EC2`. Click on `Key Pairs` in the left-hand column.
49
51
50
-
#. Click on the blue `Create Key Pair` button. Give it an appropriate name and click on the blue `Create` button. A .pem file will now save to disk. Move this file to a safe and memorable location on your hard drive.
52
+
#. Click on the blue `Create Key Pair` button. Give your key an appropriate name and click on the blue `Create` button. A .pem file will now save to disk. Move this file to a safe and memorable location on your hard drive.
51
53
52
-
#. On your local drive, open a terminal and run the following command: ``chmod 600 /path/to/pem/file``
54
+
#. On your local drive, open a terminal and run the following command: ``chmod 400 /path/to/pem/file``
53
55
54
56
Starting a Single C-PAC Instance via the AWS Console
@@ -62,11 +64,11 @@ Now that you have generated the access keys and a pem file, you may launch a sin
62
64
63
65
#. Click the blue `Select` button next to the C-PAC AMI. Click the blue `Continue` button on the next screen.
64
66
65
-
#. Now choose the instance type that you would like to use. Note that C-PAC requires at least 8 GB of RAM- the m3.xlarge instance type has 15 GB of RAM and 4 CPUs and functions well with C-PAC for small runs and experimentation. This instance type is equivalent to a standard desktop machine in terms of processing power. To select this type, click on the `General purpose` tab and select the m3.xlarge size instance and click the `Next: Configure Instance Details` button. Note that for most larger runs you will want to choose a more powerful instance type, such as c3.4xlarge or c3.8xlarge.
67
+
#. Now choose the instance type that you would like to use. Note that C-PAC requires at least 8 GB of RAM- the m3.xlarge instance type has 15 GB of RAM and 4 CPUs and functions well with C-PAC for small runs and experimentation. This instance type is equivalent to a standard desktop machine in terms of processing power. To select this type, click on the `General purpose` tab and click the box next to `m3.xlarge`. Then, click the `Next: Configure Instance Details` button. Note that for most larger runs you will want to choose a more powerful instance type, such as c3.4xlarge or c3.8xlarge.
66
68
67
69
#. The details page can be used to request spot instances, as well as other functionality (including VPN, VPC options). For a basic run you do not need to change anything, although you can tailor it according to your future needs. Hovering over the 'i' icons on this page will give you more insight into the options available. When done, click `Next: Add Storage.`
68
70
69
-
#. On the storage page, you can allocate space for the workstation, such as user and system directories. If you want the files stored in these directories to be kept after the instance is terminated, uncheck the box below the `Delete on Termination` column. Note that persistent storage for the datasets will be allocated and attached in later steps below. Click `Next: Tag Instance`.
71
+
#. On the storage page, you can allocate space for the workstation, such as user and system directories. This is where you can attach instance store volumes if your instance type comes with them. To do this, click the `Add New Volume` button and select the instance store via the dropdown menu in the `Type` column. You may need to do this multiple times if your instance comes with multiple instance stores. If you want the files stored on the root volume to be kept after the instance is terminated, uncheck the box below the `Delete on Termination` column. Note that persistent storage for the datasets can be allocated and attached as described in a later section. Click `Next: Tag Instance`.
70
72
71
73
#. On this page you can tag the instance with metadata (e.g., details related to the specific purpose for the instance). Tags are key-value pairs, so any contextual data that can be encapsulated in this format can be saved. Click `Next: Configure Security Group`.
72
74
@@ -80,26 +82,32 @@ Now that you have generated the access keys and a pem file, you may launch a sin
80
82
81
83
#. When the `Instance State` column reads `running` and the `Status Checks` column reads `2/2`, the instance should be active. Click on the row for the new instance. In the bottom pane, take note of the values for the `Instance ID`, `Public DNS`, and `Availability zone` fields under the `Description` tab.
82
84
83
-
#. Now, create a persistent storage volume for your data and results. In the left-hand column under the `ELASTIC BLOCK STORE` header in the AWS console, click `Volumes`. This is a dashboard of all volumes that you currently have stored in EBS. Click the blue `Create Volume` button.
85
+
Attaching Persistent EBS Storage to an Instance
86
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
87
+
88
+
#. Once your instance is up and running, you can create a persistent storage volume for your data and results. In the left-hand column under the `ELASTIC BLOCK STORE` header in the AWS console, click `Volumes`. This is a dashboard of all volumes that you currently have stored in EBS. Click the blue `Create Volume` button.
84
89
85
-
#. Change the size field in the proceeding dialogue to have enough space to encompass your raw data, preprocessed data, and derivatives. A single volume can be as small as 1 GB or as large as 16 TB. Change the availability zone to match the zone from your instance's `Description` tab.
90
+
#. Change the size field in the proceeding dialogue to have enough space to encompass the amount of data you expect to store. A single volume can be as small as 1 GB or as large as 16 TB. Change the availability zone to match the zone from your instance's `Description` tab.
86
91
87
92
#. Click the checkbox next to the newly-created volume. Click `Actions` followed by `Attach Volumes`. Enter the `Instance ID` from the instance's `Description` tab in the `Instance` field. The `Device` field should fill itself automatically and should be of the form `/dev/sdb` or similar. Note the letter used after the `sd`. Click the blue `Attach` button.
88
93
89
94
#. Execute the following command from the terminal to make it so that your instance can see the volume (replace the letter `b` at the end of `/dev/xvdb` with the letter from the previous step).
ssh -i /path/to/pem/file ubuntu@<public_dns> 'sudo mount /dev/xvdb /media/ebs'
100
105
101
106
Note that the creation of a persistent volume is heavily automated in Starcluster, so if you will be creating many different persistent volumes you should use Starcluster instead.
102
107
108
+
Accessing Your Instance
109
+
^^^^^^^^^^^^^^^^^^^^^^^
110
+
103
111
There are now two different means of accessing the instance. Either through X2Go (a desktop GUI-based session) or through ssh (a command line session).
104
112
105
113
ssh
@@ -139,9 +147,9 @@ To upload data to your newly-created AWS instance, you can run the following com
If you have configured persistent storage, you will want to ensure that `/path/to/server/directory` is pointing to the mount point for the persistent storage. If you followed the instructions above or the instructions in the Starcluster section below, the mount point should be `/mnt`.
152
+
If you have configured persistent storage, you will want to ensure that `/path/to/server/directory` is pointing to the mount point for the persistent storage. If you followed the instructions above or the instructions in the Starcluster section below, the mount point should be `/media/ebs`.
145
153
146
154
Starting a C-PAC HPC Cluster via Starcluster
147
155
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -198,7 +206,7 @@ Add the following cluster definition to your configuration file::
198
206
PLUGINS = cpac_sge, mnt_config
199
207
CLUSTER_SIZE = 1
200
208
CLUSTER_SHELL = bash
201
-
NODE_IMAGE_ID = ami-ed993586
209
+
NODE_IMAGE_ID = ami-ffb3cf95
202
210
MASTER_INSTANCE_TYPE = t2.medium
203
211
NODE_INSTANCE_TYPE = c3.8xlarge
204
212
@@ -218,7 +226,7 @@ Also add the following two plug-in definitions for the C-PAC Starcluster plug-in
218
226
Attaching Persistent Storage to Your Cluster
219
227
''''''''''''''''''''''''''''''''''''''''''''
220
228
221
-
By default, the cluster will not have any persistent storage (i.e., all storage devices will be destroyed when the cluster terminates). A shared directory mounted at `/home` on the head node can be used across nodes. If you need more storage than what is available on the head node or if you want to keep your data after the cluster is terminated, you will need to create a new volume that can be attached to all nodes in the cluster. To do so, begin by creating an EBS-backed volume:
229
+
By default, the cluster will have an EBS-backed root volume and, if available, an instance store volume mounted at ``/mnt``. Neither of these volumes are persistent and they will be destroyed when the cluster terminates. A shared directory mounted at `/home` on the head node can be used across nodes. If you need more storage than what is available on the head node or if you want to keep your data after the cluster is terminated, you will need to create a new volume that can be attached to all nodes in the cluster. To do so, begin by creating an EBS-backed volume:
222
230
223
231
.. code-block:: bash
224
232
@@ -228,7 +236,7 @@ Type ``starcluster listvolumes`` and get the `volume-id` for the volume that you
228
236
229
237
[volume cpac_volume]
230
238
VOLUME_ID = <volume_id>
231
-
MOUNT_PATH = /mnt
239
+
MOUNT_PATH = /media/ebs
232
240
233
241
Append the following line to your `cpac_cluster` definition::
234
242
@@ -319,7 +327,7 @@ If an error has occurred on any of the nodes while your pipeline executes, you s
319
327
Terminating a Starcluster Instance
320
328
''''''''''''''''''''''''''''''''''
321
329
322
-
When you are done and have exited from your cluster, the following command will stop the cluster:
330
+
When you are done and have exited from your cluster, the following command will terminate the cluster:
0 commit comments