ECS Service Deployment Downtime

The [ECS Service definition](https://github.com/aws-samples/serverless-jenkins-on-aws-fargate/blob/main/modules/jenkins_platform/ecs.tf#L93-L95) committed by @krithivasan contains:

```terraform
  // Assuming we cannot have more than one instance at a time. Ever. 
  deployment_maximum_percent         = 100
  deployment_minimum_healthy_percent = 0
```
Which results in downtime as the old task is stopped before the new task comes up.  I've tested with `min=100/max=200` and it worked just fine.  No downtime for the UI, and a script invoking jobs constantly via the REST API encountered no issues launching the jobs, and they all ran fine.

Potentially relevant details:
* The controller task definition uses EFS volumes to persist `buildsDir`, `workspacesDir` and `JENKINS_HOME`.
* The controller has no executors, we run all jobs on agents (EC2Fleet, autoscaling instances)

Wondering if anyone can speak authoritatively to the risk in letting controller container tasks run concurrently for a few minutes.  I'd appreciate a technical explanation (like, "both processes will touch _this_ file and could cause a deadlock").  But I'd take any real world anecdotes or documented best practices.

I'm also going to chime in similarly at https://community.jenkins.io/t/error-more-than-one-instance-sharing-var-lib-jenkins/8599

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ECS Service Deployment Downtime #38

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

ECS Service Deployment Downtime #38

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions