forked from kimiyoung/transformer-xl
-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
idea for babysitter job, cc @8enmann
job = ncluster.make_job(name=args.name,
run_name=f"{args.name}",
num_tasks=config.machines,
image_name=config.image_name,
instance_type=config.instance_type,
spot=not args.nospot,
skip_setup=args.skip_setup)
killer_task = ncluster.make_task()
killer_task.run(f'export AWS_ACCESS_KEY_ID={os.environ["AWS_ACCESS_KEY_ID"]}')
killer_task.run(f'export AWS_SECRET_ACCESS_KEY={os.environ["AWS_SECRET_ACCESS_KEY"]')
killer_task.run(f'export AWS_DEFAULT_REGION={os.environ["AWS_DEFAULT_REGION"]')
killer_task.run(f'python hung_job_killer.py --watchdir={job.logdir} --instances=",".join(t.name for t in job.tasks)')
The hung_job_killer would check watch_dir on a regular basis and kill all instances in --instances if watch_dir had no modifications for an hour.
Killing can be done with subset of the logic from ncluster command-line tool. Currently lookup_instances has special logic for exact_match which kicks in when fragment is wrapped in '', this should probably be a keyword argument instead
def kill(fragment=''):
instances = u.lookup_instances(fragment, valid_states=['running', 'stopped'])
instances_to_kill = []
for i in instances:
state = i.state['Name']
if LIMIT_TO_CURRENT_USER and i.key_name != u.get_keypair_name():
print(f"Skipping instance launched with key {i.key_name}, use reallykill to kill")
continue
print(u.get_name(i), i.instance_type, i.key_name,
state if state == 'stopped' else '')
instances_to_kill.append(i)
action = 'terminating'
if not _check_instance_found(instances, fragment):
return
ec2_client = u.get_ec2_client()
if answer.lower() == "y":
instance_ids = [i.id for i in instances_to_kill]
response = ec2_client.terminate_instances(InstanceIds=instance_ids)
assert u.is_good_response(response), response
print(f"{action}: success")
else:
print("Didn't get y, doing nothing")
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels