Skip to content

Account Cost Monitoring #216

@hahnd

Description

@hahnd

Describe the New Feature

This feature would periodically check to see if there are any long-running or idle clusters that may indicate a job is stuck or failed and the cluster did not get shutdown properly. If any such condition is detected, and email would be sent to the account admin and the user who ran the job. The email would include instructions on how to terminate the cluster both through the web application, and, if that fails, how to terminate the cluster from the AWS console.

  • Define criteria to identify a stuck cluster condition.
  • Write code to look for CloudFormation stacks that match the Job ID format that are stuck.
  • Create an email template with shutdown instructions
  • Write code to send an email to the admin and user who started the job.
  • Create a Lambda function in the CF template to define a new function with appropriate permissions in its IAM role.
  • Create a CloudWatch (EventBridge) Event in the CF template to call the Lambda function (hourly?).

Acceptance Testing

Pass unit tests with good code coverage.
Check that the system detects stuck clusters and sends email notifications.

Time Estimate

16h

Sub-Issues

Consider breaking the new feature down into sub-issues.

  • Add a checkbox for each sub-issue here.

Relevant Deadlines

List relevant project deadlines here or state NONE.

Define the Metadata

Assignee

  • Select engineer(s) or no engineer required
  • Select scientist(s) or no scientist required

Labels

  • Select component(s)
  • Select priority

Projects and Milestone

  • Select Project
  • Select Milestone as the next official version or Backlog of Development Ideas

New Feature Checklist

  • Complete the issue definition above, including the Time Estimate and Funding source.
  • Fork this repository or create a branch of develop.
    Branch name: feature_<Issue Number>/<Description>
  • Complete the development and test your changes.
  • Add/update log messages for easier debugging.
  • Add/update tests.
  • Add/update documentation.
  • Push local changes to GitHub.
  • Submit a pull request to merge into develop.
    Pull request: feature <Issue Number> <Description>
  • Define the pull request metadata, as permissions allow.
    Select: Reviewer(s), Project, and Development issue
    Select: Milestone as the next official version
  • Iterate until the reviewer(s) accept and merge your changes.
  • Delete your fork or branch.
  • Close this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions