Skip to content

Conversation

@josecelano
Copy link
Member

@josecelano josecelano commented Dec 17, 2024

Use atomics for tracker stats instead of RwLock. Atomics are lock-free. This eliminates the overhead of locks and reduces contention.

Subtasks

  • Replace global RwLock for Metrics with individual AtomicU64 for each metric.
  • Benchmarking

instead of RwLock. Atomics are lock-free. This eliminates the overhead of locks and reduces contention.
@codecov
Copy link

codecov bot commented Dec 17, 2024

Codecov Report

Attention: Patch coverage is 87.12871% with 13 lines in your changes missing coverage. Please review.

Project coverage is 76.24%. Comparing base (f113d2c) to head (c6c6331).

Files with missing lines Patch % Lines
src/core/statistics/repository.rs 86.88% 8 Missing ⚠️
src/core/statistics/event/handler.rs 86.11% 4 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1138      +/-   ##
===========================================
+ Coverage    76.21%   76.24%   +0.03%     
===========================================
  Files          173      173              
  Lines        11543    11560      +17     
  Branches     11543    11560      +17     
===========================================
+ Hits          8797     8814      +17     
  Misses        2584     2584              
  Partials       162      162              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@josecelano
Copy link
Member Author

josecelano commented Dec 17, 2024

It seems the performance has not changed too much.

Best case, from 357088 to 347733 (−2.6%)
Worst case, from 314495 to 321783 (2.31%)

However the aquatic_udp_load_test is not a real environment. For example, we are not including requests to the API to get the stats, although they are irrelevant, considering we can use that endpoint only every 15 seconds in the live demo.

I guess I could use Relaxed Ordering, too. That should be better and viable in this case.

I will try with Relaxed Ordering just to see if it makes a difference.

This PR

Best case

Requests out: 347733.98/second
Responses in: 312961.26/second
  - Connect responses:  154821.03
  - Announce responses: 155036.09
  - Scrape responses:   3104.13
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 103
  - p99.9: 257
  - p100: 325

Worst case

Requests out: 321783.77/second
Responses in: 289604.67/second
  - Connect responses:  143434.91
  - Announce responses: 143310.11
  - Scrape responses:   2859.66
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 102
  - p99.9: 239
  - p100: 293

develop branch

Best case

Requests out: 357088.53/second
Responses in: 321379.80/second
  - Connect responses:  159080.38
  - Announce responses: 159118.44
  - Scrape responses:   3180.98
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 104
  - p99.9: 261
  - p100: 333

Worst case

Requests out: 314495.33/second
Responses in: 283045.99/second
  - Connect responses:  140198.39
  - Announce responses: 140099.37
  - Scrape responses:   2748.24
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 102
  - p99.9: 233
  - p100: 285

@josecelano
Copy link
Member Author

It seems the performance has not changed too much with Relaxed Ordering either.

Best case, from 357088 to 340008 (−4.7%)
Worst case, from 314495 to 310436 (-1.2%)

Best case

Requests out: 340008.54/second
Responses in: 306007.52/second
  - Connect responses:  151332.61
  - Announce responses: 151658.51
  - Scrape responses:   3016.40
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 102
  - p99.9: 251
  - p100: 313

Worst case

Requests out: 310436.73/second
Responses in: 279392.89/second
  - Connect responses:  138305.11
  - Announce responses: 138306.10
  - Scrape responses:   2781.68
  - Error responses:    0.00
Peers per announce response: 0.00
Announce responses per info hash:
  - p10: 1
  - p25: 1
  - p50: 1
  - p75: 1
  - p90: 2
  - p95: 3
  - p99: 101
  - p99.9: 231
  - p100: 283

@josecelano josecelano marked this pull request as ready for review December 17, 2024 19:09
@josecelano
Copy link
Member Author

ACk c6c6331

@josecelano josecelano linked an issue Dec 17, 2024 that may be closed by this pull request
@josecelano
Copy link
Member Author

Hi @da2ce7 I think the problem is we use the channel. In torrust-actix, stats are a direct attribute of the TorrentTracker

#[derive(Debug)]
pub struct TorrentTracker {
    // ..
    pub stats: Arc<StatsAtomics>,
}

We are using a channel sending stats events to the event handler thread:

pub struct Keeper {
    pub repository: Repository,
}

impl Keeper {
    #[must_use]
    pub fn new() -> Self {
        Self {
            repository: Repository::new(),
        }
    }

    #[must_use]
    pub fn new_active_instance() -> (Box<dyn Sender>, Repository) {
        let mut stats_tracker = Self::new();

        let stats_event_sender = stats_tracker.run_event_listener();

        (stats_event_sender, stats_tracker.repository)
    }

    pub fn run_event_listener(&mut self) -> Box<dyn Sender> {
        let (sender, receiver) = mpsc::channel::<Event>(CHANNEL_BUFFER_SIZE);

        let stats_repository = self.repository.clone();

        tokio::spawn(async move { dispatch_events(receiver, stats_repository).await });

        Box::new(ChannelSender { sender })
    }
}

Therefore, stats updates are anyway sequential. I will keep the current state because I think decoupling stats from the tracker with events makes the tracker code simpler and easier to test. I think it's also easier to refactor metrics while sending the same events. I think events will always be the same even if we reorganise them, but metrics can change a lot.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider using Atomics instead of Lokcs for stats

1 participant