Skip to content

Fix state.yml race condition with PID-based file locking#1783

Merged
firecow merged 4 commits intomasterfrom
pid-based-state-locking
Feb 28, 2026
Merged

Fix state.yml race condition with PID-based file locking#1783
firecow merged 4 commits intomasterfrom
pid-based-state-locking

Conversation

@firecow
Copy link
Copy Markdown
Owner

@firecow firecow commented Feb 28, 2026

  • Add PID-based file lock to incrementPipelineIid to prevent concurrent processes from clobbering state.yml
  • Stale locks from crashed/killed processes are auto-cleaned by checking PID liveness
  • incrementPipelineIid returns the new IID directly, eliminating redundant getPipelineIid calls in handler
  • Atomic state file writes via temp file + rename

Use fs.openSync with O_CREAT|O_EXCL for atomic lock creation, write PID
to lock file, and auto-detect stale locks from dead processes. This
prevents stale lock directories from accumulating after crashes/SIGKILL.

incrementPipelineIid now returns the new IID directly, eliminating the
separate getPipelineIid call after increment in handler.ts.
@firecow firecow self-assigned this Feb 28, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/state.ts">

<violation number="1" location="src/state.ts:53">
P1: Lock file with invalid/empty PID content (e.g., from a crash between `openSync` and `writeFileSync`) is never cleaned up, causing a 30-second timeout deadlock. When `parseInt` returns `NaN`, the `!isNaN(stalePid!)` check is `false`, so the cleanup branch is skipped entirely. An unrecognizable PID should be treated as stale.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Break the monolithic lock function into focused helpers:
- acquireLock: atomic create-or-fail via writeFileSync with wx flag
- tryRemoveStaleLock: read PID, check liveness, clean if dead
- withFileLock: simple retry loop orchestrating the above
@firecow firecow changed the title Replace mkdir-based locking with PID-based locking for state.yml Fix state.yml race condition with PID-based file locking Feb 28, 2026
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/pid-file-lock.ts">

<violation number="1" location="src/pid-file-lock.ts:25">
P1: Bug: The bare `catch` after `process.kill(pid, 0)` treats **all** errors as "process is dead", but `EPERM` means the process **is alive** (just owned by another user). Only `ESRCH` indicates a dead process. This can cause a live process's lock to be stolen.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

- Only treat ESRCH as dead process; EPERM means the process is alive
  but owned by another user, so the lock must not be stolen.
- Treat unparseable/empty PID in lock file as stale and remove it,
  preventing a permanent 30s deadlock after a crash mid-write.
@sonarqubecloud
Copy link
Copy Markdown

@firecow firecow merged commit aa97f40 into master Feb 28, 2026
12 checks passed
@firecow firecow deleted the pid-based-state-locking branch February 28, 2026 22:29
tmeijn pushed a commit to tmeijn/dotfiles that referenced this pull request Mar 5, 2026
This MR contains the following updates:

| Package | Change | [Age](https://docs.renovatebot.com/merge-confidence/) | [Adoption](https://docs.renovatebot.com/merge-confidence/) | [Passing](https://docs.renovatebot.com/merge-confidence/) | [Confidence](https://docs.renovatebot.com/merge-confidence/) |
|---|---|---|---|---|---|
| [npm:gitlab-ci-local](https://github.com/firecow/gitlab-ci-local) | `4.67.1` → `4.67.2` | ![age](https://developer.mend.io/api/mc/badges/age/npm/gitlab-ci-local/4.67.2?slim=true) | ![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/gitlab-ci-local/4.67.2?slim=true) | ![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/gitlab-ci-local/4.67.1/4.67.2?slim=true) | ![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/gitlab-ci-local/4.67.1/4.67.2?slim=true) |

MR created with the help of [el-capitano/tools/renovate-bot](https://gitlab.com/el-capitano/tools/renovate-bot).

**Proposed changes to behavior should be submitted there as MRs.**

---

### Release Notes

<details>
<summary>firecow/gitlab-ci-local (npm:gitlab-ci-local)</summary>

### [`v4.67.2`](https://github.com/firecow/gitlab-ci-local/releases/tag/4.67.2)

[Compare Source](firecow/gitlab-ci-local@4.67.1...4.67.2)

#### What's Changed

- chore(deps): lock file maintenance by [@&#8203;renovate](https://github.com/renovate)\[bot] in [#&#8203;1775](firecow/gitlab-ci-local#1775)
- chore(deps): update dependency bun to v1.3.10 by [@&#8203;renovate](https://github.com/renovate)\[bot] in [#&#8203;1776](firecow/gitlab-ci-local#1776)
- Fix state.yml race condition with PID-based file locking by [@&#8203;firecow](https://github.com/firecow) in [#&#8203;1783](firecow/gitlab-ci-local#1783)

**Full Changelog**: <firecow/gitlab-ci-local@4.67.1...4.67.2>

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever MR is behind base branch, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this MR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this MR, check this box

---

This MR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiI0My41Mi4wIiwidXBkYXRlZEluVmVyIjoiNDMuNTIuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiUmVub3ZhdGUgQm90IiwiYXV0b21hdGlvbjpib3QtYXV0aG9yZWQiLCJkZXBlbmRlbmN5LXR5cGU6OnBhdGNoIl19-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants