Skip to content

ECS Service Deployment Downtime #38

@jgard

Description

@jgard

The ECS Service definition committed by @krithivasan contains:

  // Assuming we cannot have more than one instance at a time. Ever. 
  deployment_maximum_percent         = 100
  deployment_minimum_healthy_percent = 0

Which results in downtime as the old task is stopped before the new task comes up. I've tested with min=100/max=200 and it worked just fine. No downtime for the UI, and a script invoking jobs constantly via the REST API encountered no issues launching the jobs, and they all ran fine.

Potentially relevant details:

  • The controller task definition uses EFS volumes to persist buildsDir, workspacesDir and JENKINS_HOME.
  • The controller has no executors, we run all jobs on agents (EC2Fleet, autoscaling instances)

Wondering if anyone can speak authoritatively to the risk in letting controller container tasks run concurrently for a few minutes. I'd appreciate a technical explanation (like, "both processes will touch this file and could cause a deadlock"). But I'd take any real world anecdotes or documented best practices.

I'm also going to chime in similarly at https://community.jenkins.io/t/error-more-than-one-instance-sharing-var-lib-jenkins/8599

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions