Prevent race condition in pipeline number calculation#5477
Prevent race condition in pipeline number calculation#5477henkka wants to merge 5 commits intowoodpecker-ci:mainfrom
Conversation
|
@henkka can you check the failing linter? |
Head branch was pushed to by a user without write access
Done! However, 8fe8d48 might be controversial since it changes all storage methods from value to pointer receivers. I couldn't find another way to fix the Happy to discuss alternative approaches if there's some concerns about this change. |
|
Why do we even calculate the pipeline number manually? Cant we simply use an auto-increment table field? |
|
@henkka Linting and db tests still failing. |
Pipeline numbers are numbered per-repository. The only other option would be a unique constraint and insert with select max+1 and retry again if unique constraint fails |
|
Understood, thanks. I would prefer a DB-level locking mechanism but Im also still fine with the current approach. |
|
I would also prefer a database solution as that unique index would always guarantee correct numbering |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #5477 +/- ##
==========================================
+ Coverage 26.41% 26.44% +0.02%
==========================================
Files 404 404
Lines 28842 28849 +7
==========================================
+ Hits 7620 7629 +9
+ Misses 20527 20524 -3
- Partials 695 696 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@woodpecker-ci/maintainers somebody has objections here? Esp. the pointer change (from |
|
Yes @lafriks and I had objections. Id still prefere a DB level fix. Ill try to look into it, otherwise Im fine with the current approach for now. |
|
Ok then I'd rather wait and do not include this in 3.10. The bug is present for at least a year so that shouldn't be a problem… |
|
|
||
| func (s storage) CreatePipeline(pipeline *model.Pipeline, stepList ...*model.Step) error { | ||
| func (s *storage) CreatePipeline(pipeline *model.Pipeline, stepList ...*model.Step) error { | ||
| mutexInterface, _ := s.pipelineMutexes.LoadOrStore(pipeline.RepoID, &sync.Mutex{}) |
There was a problem hiding this comment.
where do you unload it ... else we have a memory leak
|
a database solution would be to set a write lock to that table at the beginning. the only downside ... we have to write DBMS specific sql: |
|
|
ok looking into documentation... sqlite just needs all others just need to append |
|
Yeah I know. Will you create a PR then? |
|
tested I go with table lock ... |
|
Why not use unique index and retry on unique index exception? |
|
sql is kind of an mess but the rest works just fine :) -> #5534 |
Fixes #3884
Problem
We've observed intermittent pipeline creation failures when our forge (Bitbucket Datacenter) sends concurrent webhook calls to
POST /api/hook(e.g., "refs changed" and "pull request changed" events arriving simultaneously). This results in the following error:Root Cause
The issue is caused by a race condition where both concurrent API calls calculate the same pipeline number from the logic at
woodpecker/server/store/datastore/pipeline.go
Lines 146 to 154 in ac9101c
Both requests execute this sequence simultaneously:
MAX(number)for the repository → both get the same valuepipeline.Number = maxNumber + 1→ both set the same numberSolution
After investigating database-level locking approaches (see original discussion), we determined that XORM's
ForUpdate()support varies across databases and wouldn't provide a universal solution.This PR implements repository-specific application-level mutexes to serialize pipeline creation per repository while maintaining concurrency between different repositories.