-
Notifications
You must be signed in to change notification settings - Fork 49
Feat: ban IPs not sending a valid connection ID #1124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: ban IPs not sending a valid connection ID #1124
Conversation
60a9f29 to
3bb718d
Compare
3bb718d to
30cae9b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1124 +/- ##
===========================================
+ Coverage 75.96% 76.20% +0.24%
===========================================
Files 168 169 +1
Lines 11437 11528 +91
Branches 11437 11528 +91
===========================================
+ Hits 8688 8785 +97
+ Misses 2585 2580 -5
+ Partials 164 163 -1 ☔ View full report in Codecov by Sentry. |
|
Benchmarking results (from current implementation to new one):
Counting Bloom filter values: PR base (current implementation)Best case Requests out: 469963.10/second
Worst case Requests out: 432509.73/second
PR (new implementation)Best case: Requests out: 445998.57/second
Worst case: Requests out: 422968.03/second
|
b94359f to
446a906
Compare
2690bcd to
4801edf
Compare
|
Benchmarking results after extracting
The problem is the way I clean bans. I'm going to move the check to the service to avoid executing the check to clean bans on each loop iteration. Counting Bloom filter values: PR (new implementation after extracting
|
|
Benchmarking results after extracting
Counting Bloom filter values: PR (new implementation running ban cleaner in a new thread)Best case: Requests out: 440593.07/second
Worst case: Requests out: 416496.57/second
|
010a2e5 to
77cf089
Compare
d539959 to
7b4ec75
Compare
7b4ec75 to
88b447f
Compare
|
We are planning to make some big changes to this implementation to avoid False Positives: |
6ca82e9 feat: [#1128] add new metric UDP total requests aborted (Jose Celano) 9499fd8 feat: [#1128] add new metric UDP total responses (Jose Celano) 286fe02 feat: [#1128] add new metric UDP total requests (Jose Celano) Pull request description: Add more metrics to the UDP tracker stats. The new values are: - `udp4_requests`: total number of requests received from IPv4 clients. - `udp6_requests`: total number of requests received from IPv6 clients. - `udp4_responses`: total number of responses sent to IPv4 clients. - `udp6_responses`: total number of responses sent to IPv6 clients. - `udp_requests_aborted`: total number of requests aborted to make room in the active requests buffer. ### Notes - Responses sent might differ from requests received because of aborted requests. - When we [merge the IP ban service](#1124), we can add a new metric for the total number of IPs banned. - I want to add these new metrics to the [live demo Grafana dashboard](torrust/torrust-demo#20). ### Subtasks - [x] `udp4_requests` - [x] `udp6_requests` - [x] `udp4_responses` - [x] `udp6_responses` - [x] `udp_requests_aborted` - [x] Benchmarking to check how it affects performance before merging it. ACKs for top commit: josecelano: ACK 6ca82e9 Tree-SHA512: 7fbf75b264b191f5c58fcecde8d5e783bbe54ee1c1799acdddc04a9ef64b7196d8b95d1bcad420b1df269bc7929e44417a1d164c6953b00804b0d1e5f0b36e7d
88b447f to
26c05e5
Compare
6af01ff to
e0f54d3
Compare
…limit The life demo tracker is receiving many UDP requests with a wrong conenctions IDs. Errors are logged (write disk) and that decreases the tracker performance. This counts errors and bans Ips after 10 errors for 2 minutes. We use two levels of counters. 1. First level: A Counting Bloom Filter: fast and low memory consumption but innacurate (False Positives). 2. HashMap: Exact Counter for Ips. CBFs are fast and use litle memory but they are also innaccurate. They have False Positives meaning some IPs would be banned only becuase there are bucket colissions (IPs sharing the same counter). To avoid banning IPs incorrectly we decided to introduce a second counter, which is a HashMap that counts error precisely. IPs are only banned when this counter reaches the limit (over 10 errors). We keep the CBF as a first level filter. It's a fast-check IP filter without affecting tracker's performance. When the IP is banned according to the first filter we double-check in the HashMap. CBF is faster than checking always for banned IPs against the HashMap. This solution should be good if the number of IPs is low. We have to find another solution anyway for IPv6 where is cheaper to own a range of IPs.
Becuase we are using aquatic_udp_load_test with this ocndifugration
```
Starting client with config: Config {
server_address: 127.0.0.1:3000,
log_level: Error,
workers: 1,
duration: 0,
summarize_last: 0,
extra_statistics: true,
network: NetworkConfig {
multiple_client_ipv4s: true,
sockets_per_worker: 4,
recv_buffer: 8000000,
},
requests: RequestConfig {
number_of_torrents: 1000000,
number_of_peers: 2000000,
scrape_max_torrents: 10,
announce_peers_wanted: 30,
weight_connect: 50,
weight_announce: 50,
weight_scrape: 1,
peer_seeder_probability: 0.75,
},
}
```
e0f54d3 to
29e506d
Compare
|
Benchmarking results after adding the HashMap:
Counting Bloom filter values: PR (new implementation running ban cleaner in a new thread)Best case: Requests out: 413429.53/second
Worst case: Requests out: 396124.13/second
Develop BranchBest case: Requests out: 417682.64/second
Worst case: Requests out: 404915.33/second
|
|
ACK 29e506d |
|
Hi @da2ce7 Tomorrow (09:30 UTC) I will merge this, deploy it to the live demo, run it for some hours and compare the Grafana dashboard before and after the deployment. I will also compare CPU and memory consumption. I expect to have again the problem we had that the tracker started comsuming more and more memory until the docker container is restarted. And I expect the number of errors decrease drastically becuase of the IP banning. Current data (2024-12-17 09:29 UTC) |
|
Hi @da2ce7, after running the new IP ban filter for 24h. Errors have decreased comparing rate between announce requests and error responses. Without ban service: Announces 450-650 -> Errors 70-110 I don't have the exact value because I should have created a graph for the previous version with that rate. They have not decreased too much, maybe for two reasons:
As you can see in the "UDP4 requests and responses (per second)" graph we are not sending the response (becuase the IP was banned) for 100 requests per second on average. NOTES:
Current data (2024-12-18 09:29 UTC) |





This PR uses a Counting Bloom Filter to count IP sending UDP requests with wrong connection IDs.
The IP is banned when the tracker receives more than 10 requests from a given IP with a bad connection ID. Bad connection IDs are cookie values that have expired or are from the future.
With the current
CountingBloomFilterconfiguration (0.01 rate), we would have a False Positive for every 10000 IPs, meaning when two IPs have a collision, and one of them is misbehaving, the other one would also be banned.To avoid false positives, we introduced a second counter with a HashMap. This consumes more memory, but it's reset every 120 seconds. The HashMap is only used when the CBF detects a potential bad client.
TODO
CountingBloomFilter::with_rate(4, 0.01, 100)BanServiceQuestions
Future PR