Fix race condition on deployment creation by agarciamontoro · Pull Request #979 · mattermost/mattermost-load-test-ng

agarciamontoro · 2026-03-17T19:44:17Z

Summary

In my recent deployments for running load tests, I've been seeing this bug where the deployment fails the n-th time (with n > 1, never n = 1) with an error on trying to remove a nested subdirectory of browser/node_modules.

This is a race condition caused by the ltbrowserapi pr ocess actively writing to that directory, so rm -rf complains about it when trying to delete what it expects to be an empty directory.

This is fixed by stopping the processes (I also stop ltapi, just in case) before removing all the directories, since the goal is to restart everything.

One of my first hypothesis on this error was that the node_modules directory somehow had the permissions messed up, so I added a change to the user_data template to run the provisioner scripts by the base AMI user, not by root. This ended up not being the root cause of the problem, but I still left the change because the scripts were originally designed to run under a plain user, and I think this is safer.

Ticket Link

--

Summary by CodeRabbit

Release Notes

Chores
- Improved deployment cleanup process to prevent race conditions during service shutdown.
- Enhanced provisioning reliability with better error handling and status tracking during system initialization.

agarciamontoro · 2026-03-17T19:44:40Z

@CodeRabbit review

coderabbitai · 2026-03-17T19:44:46Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-03-17T19:48:29Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 38baedda-cb82-4b0f-a3bb-39c8c0d87796

📥 Commits

Reviewing files that changed from the base of the PR and between 841f485 and 2d28eab.

📒 Files selected for processing (2)

deployment/terraform/agent.go
deployment/terraform/assets/user_data.sh.tpl

📝 Walkthrough

Walkthrough

The pull request modifies two Terraform deployment scripts. The agent cleanup routine now stops services before removing directories to prevent race conditions. The user data provisioner transitions from direct script execution to using runuser with proper login shell setup and adds exit code handling for improved error tracking.

Changes

Cohort / File(s)	Summary
Agent Cleanup Logic `deployment/terraform/agent.go`	Added service stop commands (`ltapi` and `ltbrowserapi`) before directory removal to prevent race conditions; suppresses errors during service shutdown.
User Data Provisioning `deployment/terraform/assets/user_data.sh.tpl`	Removed explicit HOME and USER environment exports; replaced direct provisioner script execution with `runuser -l` for full login shell; added exit code capture and logging to files for success/failure cases.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

mattermost-load-test-ng#973: Directly related to the user_data.sh.tpl provisioner invocation changes and exit code handling improvements for cloud-init script execution.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Fix race condition on deployment creation' accurately describes the main change: stopping services before removing directories to prevent race conditions during cleanup.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings (stacked PR)
📝 Generate docstrings (commit on current branch)

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix.file.owner

📝 Coding Plan

Generate coding plan for human review comments

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.11.3)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

carlisgg

LGTM

* Run the provisioner scripts as the base AMI user * Stop processes before removing directory

agarciamontoro added 2 commits March 17, 2026 18:18

Run the provisioner scripts as the base AMI user

28ac8f0

Stop processes before removing directory

2d28eab

agarciamontoro requested a review from carlisgg March 17, 2026 19:44

agarciamontoro added the 2: Dev Review Requires review by a core committer label Mar 17, 2026

coderabbitai bot approved these changes Mar 17, 2026

View reviewed changes

carlisgg approved these changes Mar 18, 2026

View reviewed changes

agarciamontoro merged commit 43de7d6 into master Mar 19, 2026
2 checks passed

agarciamontoro deleted the fix.file.owner branch March 19, 2026 11:10

agarciamontoro added a commit that referenced this pull request Mar 20, 2026

Fix race condition on deployment creation (#979)

e469787

* Run the provisioner scripts as the base AMI user * Stop processes before removing directory

agarciamontoro added a commit that referenced this pull request Mar 20, 2026

Fix race condition on deployment creation (#979) (#982)

c9f120a

* Run the provisioner scripts as the base AMI user * Stop processes before removing directory

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix race condition on deployment creation#979

Fix race condition on deployment creation#979
agarciamontoro merged 2 commits intomasterfrom
fix.file.owner

agarciamontoro commented Mar 17, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

agarciamontoro commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

carlisgg left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

agarciamontoro commented Mar 17, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Ticket Link

Summary by CodeRabbit

Release Notes

Uh oh!

agarciamontoro commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Uh oh!

coderabbitai bot commented Mar 17, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

carlisgg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

agarciamontoro commented Mar 17, 2026 •

edited by coderabbitai bot

Loading