Skip to content

yascheduler: Node deallocation failure leads to zombie servers #151

@akvatol

Description

@akvatol

yascheduler fails to properly terminate cloud nodes (Hetzner) after task completion.

hcloud server list output:

ID          NAME            STATUS    IPV4             IPV6                      PRIVATE NET   LOCATION   AGE
125115508   node-mhequwrf   running   135.181.90.247   2a01:4f9:c011:b3ef::/64   -             hel1       1d
125148160   node-njtzvnal   running   65.109.224.173   2a01:4f9:c013:9b4f::/64   -             hel1       1d
125148184   node-weaciubj   running   204.168.204.51   2a01:4f9:c010:a192::/64   -             hel1       1d

yanodes output:

ip=65.109.224.173 ncpus=MAX enabled=True occupied_by=aiida-284095 (task_id=6428) hetzner
ip=204.168.204.51 ncpus=MAX enabled=True occupied_by=aiida-284079 (task_id=6427) hetzner

yascheduler.log:

Details
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup fleur engine...
INFO:yascheduler.Scheduler:Disconnecting from machines: 135.181.90.247
DEBUG:yascheduler.Scheduler.RemoteMachine:root@135.181.90.247:Close connection
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner:DELETED 135.181.90.247
WARNING:yascheduler.Scheduler.CloudAPIManager.hetzner:Setup node 135.181.90.247 failed - deallocate
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner:NODE 135.181.90.247 NOT DELETED AS UNKNOWN
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner:CREATED 135.181.90.247
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Open connection
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Open connection
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Open connection
INFO:backoff:Backing off create(...) for 1.9s (ConnectionRefusedError: [Errno 111] Connect call failed ('135.181.90.247', 22))
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Open connection
INFO:backoff:Backing off create(...) for 1.4s (ConnectionRefusedError: [Errno 111] Connect call failed ('135.181.90.247', 22))
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Open connection
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Detected platform: linux
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:CPUs count: 32
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup fleur engine...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of fleur engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pcrystal engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pcrystal/Pcrystal) to data/engines/pcrystal
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of pcrystal engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pproperties engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pproperties/Pproperties) to data/engines/pproperties
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup fleur engine...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of fleur engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pcrystal engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pcrystal/Pcrystal) to data/engines/pcrystal
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of pcrystal engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pproperties engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pproperties/Pproperties) to data/engines/pproperties
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup fleur engine...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of fleur engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pcrystal engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pcrystal/Pcrystal) to data/engines/pcrystal
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of pcrystal engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pproperties engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pproperties/Pproperties) to data/engines/pproperties
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup fleur engine...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of fleur engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pcrystal engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pcrystal/Pcrystal) to data/engines/pcrystal
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup of pcrystal engine is done...
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup pproperties engine...
DEBUG:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Uploading files (/data/engines/pproperties/Pproperties) to data/engines/pproperties
INFO:yascheduler.Scheduler.CloudAPIManager.hetzner.RemoteMachine:root@135.181.90.247:Setup fleur engine...

<\details>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions