Skip to content

Conversation

@zzzming
Copy link
Contributor

@zzzming zzzming commented Dec 12, 2021

Motivation

A group of batch counters are not properly critical section protected in the producer's default_router. Read and update of these counters should be protected as a whole, instead of individually synchronized using Atomic access. Multiple goroutines can access and update these counters at the same time. Basically it is not thread safe.

Modifications

The change use mutex lock to protect the entire counters during the batch partition decision making process.
Minor improvement includes to evaluate message max size, byte size, and batch window on a needed basis.

Verifying this change

  • [ v] Make sure that the change passes the CI checks.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): ( no)
  • The public API: (no)
  • The schema: ( no)
  • The default values of configurations: (no)
  • The wire protocol: (no)

Documentation

  • Does this pull request introduce a new feature? (/ no)
  • If yes, how is the feature documented? (not applicable )

@wolfstudy wolfstudy added this to the v0.8.0 milestone Dec 22, 2021
lastChangeTimestamp int64
msgCounter uint32
cumulativeBatchSize uint32
sync.RWMutex
Copy link
Contributor

@dferstay dferstay Dec 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we only ever acquire write access; can this be a regular sync.Mutex?

Copy link
Contributor

@dferstay dferstay Dec 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm concerned that the use of a mutex will incur overhead and affect performance, especially under concurrent publishing. I've added a bench test to get quantitative numbers in #693

Copy link
Contributor

@dferstay dferstay Dec 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zzzming , I've attempted to address the races when updating the default router state in a lock-free manner here: Please take a look and see if you are satisfied: #694

Comparing the results from the parallel default router bench test we can see that the use of a mutex has an impact on performance (old==this PR, new==#694):

name                      old time/op  new time/op  delta
DefaultRouterParallel     27.4ns ± 3%  14.8ns ± 2%  -45.79%  (p=0.000 n=10+8)
DefaultRouterParallel-2   35.2ns ± 1%  41.9ns ± 0%  +18.96%  (p=0.000 n=9+7)
DefaultRouterParallel-4   49.4ns ± 6%  44.1ns ± 8%  -10.84%  (p=0.000 n=10+9)
DefaultRouterParallel-8   58.8ns ± 6%  53.2ns ± 3%   -9.53%  (p=0.000 n=10+8)
DefaultRouterParallel-16  69.7ns ± 3%  51.3ns ± 0%  -26.43%  (p=0.000 n=10+8)

EDIT: I can't explain why DefaultRouterParallel-2 is slower on the branch that doesn't use the mutex.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: the implementation in #694 was modified slightly and now DefaultRouterParallel-2 is no longer slower in the lock-free implementation. New bench comparison is below:

name                      old time/op  new time/op  delta
DefaultRouterParallel     27.4ns ± 3%  14.9ns ± 6%  -45.51%  (p=0.000 n=10+9)
DefaultRouterParallel-2   35.2ns ± 1%  33.7ns ± 8%     ~     (p=0.161 n=9+9)
DefaultRouterParallel-4   49.4ns ± 6%  28.8ns ± 4%  -41.66%  (p=0.000 n=10+9)
DefaultRouterParallel-8   58.8ns ± 6%  36.3ns ± 1%  -38.32%  (p=0.000 n=10+8)
DefaultRouterParallel-16  69.7ns ± 3%  39.5ns ±21%  -43.38%  (p=0.000 n=10+10)

@wolfstudy wolfstudy modified the milestones: v0.8.0, 0.9.0 Feb 16, 2022
@freeznet freeznet modified the milestones: v0.9.0, v0.10.0 Jul 4, 2022
@RobertIndie RobertIndie modified the milestones: v0.10.0, v0.11.0 Mar 27, 2023
@RobertIndie RobertIndie modified the milestones: v0.11.0, v0.12.0 Jul 4, 2023
@RobertIndie RobertIndie modified the milestones: v0.12.0, v0.13.0 Jan 10, 2024
@RobertIndie RobertIndie modified the milestones: v0.13.0, v0.14.0 Jul 15, 2024
@RobertIndie RobertIndie modified the milestones: v0.14.0, v0.15.0 Oct 8, 2024
@RobertIndie RobertIndie modified the milestones: v0.15.0, v0.16.0 May 15, 2025
@RobertIndie RobertIndie modified the milestones: v0.16.0, v0.17.0 Jul 29, 2025
@RobertIndie RobertIndie modified the milestones: v0.17.0, v0.18.0 Oct 23, 2025
@RobertIndie RobertIndie removed this from the v0.18.0 milestone Dec 1, 2025
@RobertIndie RobertIndie added this to the v0.19.0 milestone Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants