Skip to content

Commit fbb952f

Browse files
committed
CfnCluster process documentation.
1 parent 7bc6c74 commit fbb952f

File tree

7 files changed

+55
-1
lines changed

7 files changed

+55
-1
lines changed

docs/source/functional.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ How cfncluster Works
66
cfncluster was built not only as a way to manage clusters, but as a reference on how to use AWS services to build your HPC environment
77

88
.. toctree::
9-
9+
10+
processes
1011
aws_services
1112
autoscaling
133 KB
Loading

docs/source/images/nodewatcher.png

110 KB
Loading
52.3 KB
Loading

docs/source/images/sqswatcher.png

114 KB
Loading

docs/source/images/workflow.png

156 KB
Loading

docs/source/processes.rst

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
.. _processes
2+
3+
CfnCluster Processes
4+
====================
5+
There are a number of processes running within CfnCluster which are used to manage it's behavior.
6+
7+
.. toctree::
8+
9+
General Overview
10+
----------------
11+
A cluster's lifecycle begins after it is created by a user. Typically, this is done from the Command Line Interface (CLI). Once created, a cluster will exist until it's deleted.
12+
13+
.. image:: images/workflow.png
14+
:align: center
15+
:width: 35%
16+
17+
publish_pending_jobs
18+
--------------------
19+
Once a cluster is running, a cronjob owned by the root user will monitor the configured scheduler (SGE, Torque, Openlava, etc) and publish the number of pending jobs to CloudWatch. This is the metric utilized by :ref:`Auto Scaling <auto_scaling>` to add more nodes to the cluster.
20+
21+
.. image:: images/publish_pending_jobs.png
22+
:align: center
23+
:width: 15%
24+
25+
.. _auto_scaling:
26+
27+
Auto Scaling
28+
------------
29+
Auto Scaling, along with Cloudwatch alarms are used to manage the number of running nodes in the cluster.
30+
31+
.. image:: images/auto_scaling.png
32+
:align: center
33+
:width: 40%
34+
35+
The number of instances added, along with the thresholds in which to add them are all configurable via the :doc:`Scaling <configuration>` configuration section.
36+
37+
sqswatcher
38+
-----------
39+
The sqswatcher process monitors for SQS messages emitted by Auto Scaling which notifies of state changes within the cluster. When an instance comes online, it will submit an "instance ready" message to SQS, which is picked up by sqs_watcher running on the master server. These messages are used to notify the queue manager when new instances come online or are terminated, so they can be added or removed from the queue accordingly.
40+
41+
.. image:: images/sqswatcher.png
42+
:align: center
43+
:width: 35%
44+
45+
nodewatcher
46+
------------
47+
The nodewatcher process runs on each node in the compute fleet. This process is used to determine when an instance is terminated. Because EC2 is billed by the instance hour, this process will wait until an instance has been running for 95% of an instance hour before it is terminated.
48+
49+
.. image:: images/nodewatcher.png
50+
:align: center
51+
:width: 20%
52+
53+

0 commit comments

Comments
 (0)