-
Notifications
You must be signed in to change notification settings - Fork 314
CloudWatch Logs
Starting in ParallelCluster 2.6.0, CloudWatch logging is enabled by default. The documentation can be found here. Please refer to this page to know how to retrieve cluster's logs.
The following instructions are kept for reference for ParallelCluster version < 2.6.0
Keeping track of log files from a running cluster can be a pain, some logs, such as /var/log/nodewatcher, are stored on the compute instances and disappear when compute nodes are removed. Other logs are stored on the master node, but are inaccessible once a cluster has been deleted.
This adds these logs to CloudWatch, which are accessible even after the cluster has been deleted.
/var/log/sqswatcher
/var/log/jobwatcher
/var/log/nodewatcher # for each compute node
/opt/sge/default/spool/qmaster/messages
Note: CloudWatch does incur additional minimal costs, generally < $1, see https://aws.amazon.com/cloudwatch/pricing/ for more information.
- Add to the CloudFormation Template #L1674 the following additional permissions:
{
"Sid": "CloudWatchLogs",
"Action": [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogStreams"
],
"Effect": "Allow",
"Resource": [
"arn:aws:logs:*:*:*"
]
}- Upload this new template to your S3 bucket:
$ aws s3 cp aws-parallelcluster.cfn.json s3://[your_bucket]- Create a file
post_install.shwith the following contents:
#!/bin/bash
########
# NOTE #
########
#
# THIS FILE IS PROVIDED AS AN EXAMPLE AND NOT INTENDED TO BE USED BESIDES TESTING
# USE IT AS AN EXAMPLE BUT NOT AS IS FOR PRODUCTION
#
# Setup the SSH authentication
ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
AZ=$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)
REGION=$(echo "${AZ}" | sed 's/[a-z]$//')
export AWS_DEFAULT_REGION=${REGION}
# install and setup cloudwatch to push in logs in the local region
sudo yum install awslogs -y
cat > /etc/awslogs/awscli.conf << EOF
[plugins]
cwlogs = cwlogs
[default]
region = ${REGION}
EOF
# check if this is the master instance
MASTER=false
if [[ $(aws ec2 describe-instances \
--instance-id ${ID} \
--query 'Reservations[].Instances[].Tags[?Key==`Name`].Value[]' \
--output text) = "Master" ]]; then
MASTER=true
fi
if ${MASTER}; then
# Setup cloudwatch logs for master
cat >>/etc/awslogs/awslogs.conf << EOF
[/opt/sge/default/spool/qmaster/messages]
datetime_format = %b %d %H:%M:%S
file = /opt/sge/default/spool/qmaster/messages
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-sge-qmaster-messages
[/var/log/jobwatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/jobwatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-job-watcher
[/var/log/sqswatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/sqswatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-master-sqs-watcher
EOF
else
# Setup cloudwatch logs for compute
cat >>/etc/awslogs/awslogs.conf << EOF
[/var/log/nodewatcher]
datetime_format = %b %d %H:%M:%S
file = /var/log/nodewatcher
buffer_duration = 5000
log_stream_name = {instance_id}
initial_position = start_of_file
log_group_name = pcluster-compute-node-watcher
EOF
fi
# start awslogs
sudo service awslogs start
sudo chkconfig awslogs on- Upload this file to S3
$ aws s3 cp --acl public-read post_install.sh s3://[your_cluster]- Create a cluster with your custom template and your post_install file:
[cluster default]
...
post_install = s3://[your_bucket]/post_install.sh
template_url = https://s3.amazonaws.com/[your_bucket]/template/aws-parallelcluster.cfn.json- Create the cluster
$ pcluster create mycluster
Status: CREATE_COMPLETE
MasterServer: RUNNING
MasterPublicIP: 18.214.13.107
ClusterUser: ec2-user
MasterPrivateIP: 172.31.18.7- Now go to the CloudWatch Console > Logs section
You'll see your log files there:
pcluster-compute-node-watcher # is /var/log/nodewatcher
pcluster-master-job-watcher # is /var/log/jobwatcher
pcluster-master-sge-qmaster-messages # is /opt/sge/default/spool/qmaster/messages
pcluster-master-sqs-watcher # is /var/log/sqswatcher