Skip to content

Conversation

@jum
Copy link

@jum jum commented Jan 27, 2026

This reduces polling by using the redis BLPOP command instead of LPOP and retrying via backoff, reducing the load on redis. It also adds a test for the workerqueue abstraction via redis.

This reduces polling by using the redis BLPOP command instead of LPOP and retrying via backoff, reducing the load on redis. It also adds a test for the workerqueue abstraction via redis.
@GiteaBot GiteaBot added the lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. label Jan 27, 2026
@github-actions github-actions bot added the modifies/go Pull requests that update Go code label Jan 27, 2026
q.mu.Unlock()

return data, nil
}
Copy link
Contributor

@wxiaoguang wxiaoguang Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the implementation be simplified like this?

res, err := q.client.BLPop(ctx, 0, q.cfg.QueueFullName).Result()
if err ... {
    return
}
if q.isUnique {
    _ = q.client.SRem(ctx, q.cfg.SetFullName, data).Err()
}

Use 0 to "block indefinitely". I would suppose redis client can cancel the request correctly if the context is canceled.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tested this and with a timeout of 0 I could not cancel the context reliably for BLPOP, it did hang. I searched and found an issue discussing this: redis/go-redis#2556

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, maybe it's the reason that why we didn't use BLPop because it is unable to be canceled immediately, especially when we need to shutdown the queue in short time.

Copy link
Contributor

@wxiaoguang wxiaoguang Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, overall, this change doesn't really change the situation?

Before: "LPop" per 2 seconds (backoffUpper = 2 * time.Second)
After: "BLPop" per second (q.client.BLPop(ctx, time.Second))

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the timeout cannot be increased (e.g. the cancel is really necessary to be immediate) than it is probably worthless, we can close the pull request.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some more experiments with BLPOP and the only sure way to cancel one of this blocking ones without polling is to close the underlying REDIS connection. In one of my projects I will exactly do that, have separate connection for each queue and close it when stopping the worker. But I am not sure if this would be the proper way for gitea.

Copy link
Contributor

@wxiaoguang wxiaoguang Jan 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't really looked into the problem.

I saw you have some discussions with @techknowlogick in discord, so they can help to dig in and answer.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Redis-backed queue pop implementation to reduce active polling by using Redis BLPOP for blocking retrieval, while retaining retry/backoff behavior and adjusting uniqueness-set cleanup.

Changes:

  • Replace LPOP + backoff polling with a BLPOP-based loop in PopItem.
  • Add connection-error backoff handling around BLPOP.
  • Ensure uniqueness-set cleanup (SREM) runs with a dedicated timeout context.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +90 to +92
res, err := q.client.BLPop(ctx, time.Second, q.cfg.QueueFullName).Result()
if err != nil {
return true, nil, nil
if err == redis.Nil {
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLPop is executed outside q.mu, but q.mu is documented/used to keep the list+set operations in sync for unique queues. With the current flow, an item can be popped from the list (BLPop returns) while it’s still present in the uniqueness set until the mutex is acquired and SRem runs; a concurrent PushItem can then see SAdd==0 and return ErrAlreadyInQueue even though the item has already been removed from the queue. Consider adjusting the uniqueness strategy to avoid this window (e.g., when SAdd returns 0, verify the item is actually still in the list before returning ErrAlreadyInQueue, and/or clean stale set entries and retry).

Copilot uses AI. Check for mistakes.
Comment on lines +119 to +122
// use a separate context for cleanup to ensure it runs even if request context is canceled
cleanupCtx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
_ = q.client.SRem(cleanupCtx, q.cfg.SetFullName, data).Err()
cancel()
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleanup timeout 5*time.Second is a new magic duration. Consider extracting it to a named constant (e.g. a package-level redisCleanupTimeout) to make the intent configurable/consistent and easier to tune. Also prefer defer cancel() immediately after WithTimeout(...) to avoid leaking the timer if this block is refactored to have early returns.

Copilot uses AI. Check for mistakes.
Comment on lines +90 to 91
res, err := q.client.BLPop(ctx, time.Second, q.cfg.QueueFullName).Result()
if err != nil {
Copy link

Copilot AI Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLPop is a blocking call that ties up a Redis connection for up to the timeout duration. Because nosql.Manager shares a single UniversalClient per connection string, each running queue will effectively occupy a pooled connection most of the time when idle; with multiple queues on the same Redis conn string this can lead to connection-pool starvation for PushItem/other Redis users. Consider documenting/tuning this (e.g. ensure the queue Redis URI sets an adequate pool_size, or use a dedicated connection string/client for queues).

Copilot uses AI. Check for mistakes.
@wxiaoguang wxiaoguang marked this pull request as draft January 30, 2026 04:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

lgtm/need 2 This PR needs two approvals by maintainers to be considered for merging. modifies/go Pull requests that update Go code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants