You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/user/_sources/cloud.txt
+9-16Lines changed: 9 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -6,29 +6,23 @@ Introduction
6
6
7
7
The C-PAC team has released an Amazon Marketplace AMI, making it easier for researchers to use C-PAC in the cloud. You can use the AMI to either launch a single machine for basic runs or create a high performance computing (HPC) cluster using Starcluster. Clusters can be dynamically scaled up as your computational needs increase. Detailed explanations of cloud computing and HPC are beyond the scope of this documentation, but we will define a few key terms before we start. If these terms are familiar, you may skip them and proceed to later sections.
8
8
9
-
* Amazon Machine Instance (AMI) - A disk image of an operating system and any additional installed software that can be used to create a virtual machine.
9
+
* **Amazon Machine Instance (AMI)** - A disk image of an operating system and any additional installed software that can be used to create a virtual machine.
10
10
11
-
* Instance - A single running virtual machine whose initial state is based on the AMI that it is launched from. Instances can be classified as spot instances or on-demand instances. On-demand instances are reliably created the moment they are requested for a fixed rate. Spot instances are created based on whether or not a bid that you set is accepted by Amazon. They can be significantly cheaper than on-demand instances, but are only created when Amazon accepts your bid.
11
+
* **Instance** - A single running virtual machine whose initial state is based on the AMI that it is launched from. Instances can be classified as spot instances or on-demand instances. On-demand instances are reliably created the moment they are requested for a fixed rate. Spot instances are created based on whether or not a bid that you set is accepted by Amazon. They can be significantly cheaper than on-demand instances, but are only created when Amazon accepts your bid.
12
12
13
-
* Instance Type - The hardware specification for a given instance. A list of the instance types made available by Amazon may be found `here <http://aws.amazon.com/ec2/instance-types>`_.
13
+
* **Instance Type** - The hardware specification for a given instance. A list of the instance types made available by Amazon may be found `here <http://aws.amazon.com/ec2/instance-types>`_.
14
14
15
-
* Terminated Instance - An instance is considered terminated when its resources have been completely freed up for use by others in the Amazon cloud. Any data on a terminated instance that is not relocated to persistent storage such as EBS (see below) will be completely discarded. Instance termination is the virtual equivalent of throwing out a physical server. When you have terminated an instance, you are no longer paying for it. Note that by default, instances do not have persistent storage attached to them- you will need to configure persistent storage when you set up the instance.
15
+
* **Terminated Instance** - An instance is considered terminated when its resources have been completely freed up for use by others in the Amazon cloud. Any data on a terminated instance that is not relocated to persistent storage such as EBS (see below) will be completely discarded. Instance termination is the virtual equivalent of throwing out a physical server. When you have terminated an instance, you are no longer paying for it. Note that by default, instances do not have persistent storage attached to them- you will need to configure persistent storage when you set up the instance.
16
16
17
-
* Stopped Instance - An instance is considered stopped when it is not active, but its resources are still available for future use whenever you choose to reactivate it. Stopping an instance is the virtual equivalent of turning a computer off or putting it in hibernate mode. When you stop an instance, you continue to pay for the storage associated with it (i.e., the main and other volumes attached to it), but not for the instance itself. You should stop an instance when the analyses you are working on are not fully done and you would like to preserve the current state of a running instance.
17
+
* **Stopped Instance** - An instance is considered stopped when it is not active, but its resources are still available for future use whenever you choose to reactivate it. Stopping an instance is the virtual equivalent of turning a computer off or putting it in hibernate mode. When you stop an instance, you continue to pay for the storage associated with it (i.e., the main and other volumes attached to it), but not for the instance itself. You should stop an instance when the analyses you are working on are not fully done and you would like to preserve the current state of a running instance.
18
18
19
-
* Simple Storage Service (S3) - A form of storage offered by Amazon. S3 is not intended to be directly attached to instances since it lacks a filesystem, but it can be used to archive large datasets. Amazon provides tools for uploading data to S3 'buckets' where it can be stored. It is less costly than EBS.
19
+
* **Simple Storage Service (S3)** - A form of storage offered by Amazon. S3 is not intended to be directly attached to instances since it lacks a filesystem, but it can be used to archive large datasets. Amazon provides tools for uploading data to S3 'buckets' where it can be stored. It is less costly than EBS.
20
20
21
-
* Elastic Block Storage (EBS) - A form of persistent storage offered by Amazon for use with instances. When you have terminated an instance, items stored in an EBS volume can be accessed by any future instances that you start up.
21
+
* **Elastic Block Storage (EBS)** - A form of persistent storage offered by Amazon for use with instances. When you have terminated an instance, items stored in an EBS volume can be accessed by any future instances that you start up.
22
22
23
-
* EC2 Instance Store - A form of temporary storage that comes included with some instance types. Instance store volumes must be added manually before launching an instance, and all files stored on them will be lost when the instance is terminated. The instance store is typically mounted at ``/mnt``.
23
+
* **EC2 Instance Store** - A form of temporary storage that comes included with some instance types. Instance store volumes must be added manually before launching an instance, and all files stored on them will be lost when the instance is terminated. The instance store is typically mounted at ``/mnt``.
24
24
25
-
* Head Node - The primary node of an HPC cluster, to which all other nodes are connected. The head node will run a job scheduler (such as Sun Grid Engine) to allocate jobs to the other nodes.
26
-
27
-
* Worker Node - A node in an HPC cluster to which tasks are delegated by the head node via a job scheduler.
28
-
29
-
* Job Scheduler - A program that can allocate computational resources in an HPC cluster to jobs based on availability and distribute jobs across nodes. The C-PAC AMI uses Sun Grid Engine (SGE) as its job scheduler.
30
-
31
-
* Job Submission Script - A shell script with a series of commands to be executed as part of the job. Submission scripts may also include flags that activate functionality specific to the scheduler.
25
+
Lastly, it would be import to review any terms related to :doc:`the Sun Grid Engine job scheduler <compute_config>`.
32
26
33
27
Creating AWS Access and Network Keys
34
28
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -345,5 +339,4 @@ Additional Links
345
339
^^^^^^^^^^^^^^^^
346
340
347
341
* `The StarCluster User Manual <http://star.mit.edu/cluster/docs/latest/manual/index.html>`_
348
-
* `The Sun Grid Engine User Guide <http://www.csb.yale.edu/userguides/sysresource/batch/doc/UserGuide_6.1.pdf>`_
349
342
* `Getting Started with AWS <http://docs.aws.amazon.com/gettingstarted/latest/awsgsg-intro/gsg-aws-intro.html>`_
Copy file name to clipboardExpand all lines: docs/user/_sources/compute_config.txt
+67-5Lines changed: 67 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -6,11 +6,11 @@ Computer Settings
6
6
7
7
#. **FSL Path - [path]:** Full path to the FSL version to be used by CPAC. If you have specified an FSL path in your .bashrc file, this path will be set automatically.
8
8
9
-
#. **Job Scheduler / Resource Manager - [SGE, PBS]:** Sun Grid Engine (SGE) or Portable Batch System (PBS). Only applies if you are running on a grid or compute cluster.
9
+
#. **Job Scheduler / Resource Manager - [SGE, PBS]:** Sun Grid Engine (SGE) or Portable Batch System (PBS). Only applies if you are running on a grid or compute cluster. See the section below entitled `SGE Configuration` for more information on how to set up a SGE.
10
10
11
-
#. **SGE Parallel Environment - [text]:** SGE Parallel Environment to use when running CPAC. Only applies when you are running on a grid or compute cluster using SGE.
11
+
#. **SGE Parallel Environment - [text]:** SGE Parallel Environment to use when running CPAC. Only applies when you are running on a grid or compute cluster using SGE. See the section below entitled `SGE Configuration` for more information on how to set up a SGE.
12
12
13
-
#. **SGE Queue - [text]:** SGE Queue to use when running CPAC. Only applies when you are running on a grid or compute cluster using SGE.
13
+
#. **SGE Queue - [text]:** SGE Queue to use when running CPAC. Only applies when you are running on a grid or compute cluster using SGE. See the section below entitled `SGE Configuration` for more information on how to set up a SGE.
14
14
15
15
#. **Number of Cores Per Subject - [integer]:** Number of cores (on a single machine) or slots on a node (cluster/grid) per subject. Slots are cores on a cluster/grid node. 'Number of Cores Per Subject' multiplied by 'Number of Subjects to Run Simultaneously' multiplied by 'Number of Cores for Anatomical Registration (ANTS)' must not be greater than the total number of cores.
16
16
@@ -25,8 +25,7 @@ The following key/value pairs must be defined in your :doc:`pipeline configurati
| runOnGrid | Run using a Grid Resource Manager (such as Sun Grid Engine or HTCondor)? | True,False |
28
+
+===============================+==============================================================================================================================+====================================================================================+ | runOnGrid | Run using a Grid Resource Manager (such as Sun Grid Engine or HTCondor)? | True,False |
@@ -53,3 +52,66 @@ The box below contains an example of what these parameters might look like when
53
52
numCoresPerSubject : 1
54
53
numSubjectsAtOnce : 1
55
54
num_ants_threads : 1
55
+
56
+
Setting up SGE
57
+
"""""""""""""""
58
+
59
+
Preliminaries
60
+
^^^^^^^^^^^^^
61
+
62
+
Before you configure Sun Grid Engine so that it works with C-PAC, you should understand the following concepts:
63
+
64
+
* **Job Scheduler** - A program that can allocate computational resources in an HPC cluster to jobs based on availability and distribute jobs across nodes. C-PAC can use Sun Grid Engine (SGE) as its job scheduler (and SGE is comes pre-configured with C-PAC's :doc:`cloud image <cloud>`).
65
+
66
+
* **Parallel Environment** - A specification for how SGE parallelizes work. Parallel environments can have limits on the number of CPUs used, whitelists and blacklists that dictate who can use resources, and specific methods for balancing server load during distributed tasks.
67
+
68
+
* **The Job Queue** - A grouping of jobs that run at the same time. The queue can be frozen, in which case all jobs that it contains will cease.
69
+
70
+
* **Head Node** - The primary node of an HPC cluster, to which all other nodes are connected. The head node will run a job scheduler (such as Sun Grid Engine) to allocate jobs to the other nodes.
71
+
72
+
* **Worker Node** - A node in an HPC cluster to which tasks are delegated by the head node via a job scheduler.
73
+
74
+
* **Job Submission Script** - A shell script with a series of commands to be executed as part of the job. Submission scripts may also include flags that activate functionality specific to the scheduler.
75
+
76
+
Configuring A Parallel Environment
77
+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
78
+
79
+
The specifics of configuring a parallel environment in SGE are beyond the scope of this guide (see `Oracle's blog <https://blogs.oracle.com/templedf/entry/configuring_a_new_parallel_environment>`_ for a good primer on how to prepare a parallel environment). Nevertheless, we will discuss how to configure a simple example (the `mpi_conf` environment used by the C-PAC cloud image). To do this, we will first create a file named mpi_smp.conf that will appear as follows:
80
+
81
+
.. code-block:: bash
82
+
pe_name mpi_smp
83
+
slots 999
84
+
user_lists NONE
85
+
xuser_lists NONE
86
+
start_proc_args NONE
87
+
stop_proc_args NONE
88
+
allocation_rule $pe_slots
89
+
control_slaves TRUE
90
+
job_is_first_task FALSE
91
+
urgency_slots min
92
+
accounting_summary TRUE
93
+
94
+
This configuration ensures that:
95
+
* Up to 999 slots will be used.
96
+
* No users are whitelisted or blacklisted and no special hooks or cleanup tasks occur before or after a job.
97
+
* All job slots that a C-PAC job submission requests are on the same machine.
98
+
* SGE has full control over the jobs submitted (in terms of resource scheduling).
99
+
* The C-PAC run is not part of a parallel job that would require an awareness of which task was performed first.
100
+
* An accounting record is written concerning how the job used resources.
101
+
102
+
To activate this parallel environment and tie it to a job queue named 'all.q', use the following commands:
103
+
104
+
.. code-block:: bash
105
+
qconf -Ap /path/to/mpi_smp.conf
106
+
qconf -mattr queue pe_list "mpi_smp" all.q
107
+
108
+
You would then set the SGE Parallel Environment to "mpi_smp" and the SGE queue to "all.q" in your pipeline configuration file before starting your C-PAC run.
109
+
110
+
Additional Links
111
+
""""""""""""""""
112
+
113
+
* `The Sun Grid Engine User Guide <http://www.csb.yale.edu/userguides/sysresource/batch/doc/UserGuide_6.1.pdf>`_
114
+
* `Starcluster's Sun Grid Engine Tutorial <http://star.mit.edu/cluster/docs/0.93.3/guides/sge.html>`_
0 commit comments