-
Notifications
You must be signed in to change notification settings - Fork 108
[Deepin-Kernel-SIG] [linux 6.6-y] [Upstream] udp: Add 4-tuple hash (uhash4) for connected sockets #962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
commit accdd51 upstream. Preparing for udp 4-tuple hash (uhash4 for short). To implement uhash4 without cache line missing when lookup, hslot2 is used to record the number of hashed sockets in hslot4. Thus adding a new struct udp_hslot_main with field hash4_cnt, which is used by hash2. The new struct is used to avoid doubling the size of udp_hslot. Before uhash4 lookup, firstly checking hash4_cnt to see if there are hashed sks in hslot4. Because hslot2 is always used in lookup, there is no cache line miss. Related helpers are updated, and use the helpers as possible. uhash4 is implemented in following patches. Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]> [ Backport from v6.13 ] Signed-off-by: Philo Lu <[email protected]> Signed-off-by: WangYuli <[email protected]>
commit dab78a1 upstream. Add a new hash list, hash4, in udp table. It will be used to implement 4-tuple hash for connected udp sockets. This patch adds the hlist to table, and implements helpers and the initialization. 4-tuple hash is implemented in the following patch. hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to concurrent rehash, because rehash() can happen with lookup(). Co-developed-by: Cambda Zhu <[email protected]> Signed-off-by: Cambda Zhu <[email protected]> Co-developed-by: Fred Chen <[email protected]> Signed-off-by: Fred Chen <[email protected]> Co-developed-by: Yubing Qiu <[email protected]> Signed-off-by: Yubing Qiu <[email protected]> Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]> [ Backport from v6.13 ] Signed-off-by: Philo Lu <[email protected]> Signed-off-by: WangYuli <[email protected]>
commit 78c91ae upstream. Currently, the udp_table has two hash table, the port hash and portaddr hash. Usually for UDP servers, all sockets have the same local port and addr, so they are all on the same hash slot within a reuseport group. In some applications, UDP servers use connect() to manage clients. In particular, when firstly receiving from an unseen 4 tuple, a new socket is created and connect()ed to the remote addr:port, and then the fd is used exclusively by the client. Once there are connected sks in a reuseport group, udp has to score all sks in the same hash2 slot to find the best match. This could be inefficient with a large number of connections, resulting in high softirq overhead. To solve the problem, this patch implement 4-tuple hash for connected udp sockets. During connect(), hash4 slot is updated, as well as a corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(), hslot4 will be searched firstly if the counter is non-zero. Otherwise, hslot2 is used like before. Note that only connected sockets enter this hash4 path, while un-connected ones are not affected. hlist_nulls is used for hash4, because we probably move to another hslot wrongly when lookup with concurrent rehash. Then we check nulls at the list end to see if we should restart lookup. Because udp does not use SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup. Stress test results (with 1 cpu fully used) are shown below, in pps: (1) _un-connected_ socket as server [a] w/o hash4: 1,825176 [b] w/ hash4: 1,831750 (+0.36%) (2) 500 _connected_ sockets as server [c] w/o hash4: 290860 (only 16% of [a]) [d] w/ hash4: 1,889658 (+3.1% compared with [b]) With hash4, compute_score is skipped when lookup, so [d] is slightly better than [b]. Co-developed-by: Cambda Zhu <[email protected]> Signed-off-by: Cambda Zhu <[email protected]> Co-developed-by: Fred Chen <[email protected]> Signed-off-by: Fred Chen <[email protected]> Co-developed-by: Yubing Qiu <[email protected]> Signed-off-by: Yubing Qiu <[email protected]> Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]> [ Backport from v6.13 ] Signed-off-by: Philo Lu <[email protected]> Signed-off-by: WangYuli <[email protected]>
commit 1b29a73 upstream. Implement ipv6 udp hash4 like that in ipv4. The major difference is that the hash value should be calculated with udp6_ehashfn(). Besides, ipv4-mapped ipv6 address is handled before hash() and rehash(). Export udp_ehashfn because now we use it in udpv6 rehash. Core procedures of hash/unhash/rehash are same as ipv4, and udpv4 and udpv6 share the same udptable, so some functions in ipv4 hash4 can also be shared. Co-developed-by: Cambda Zhu <[email protected]> Signed-off-by: Cambda Zhu <[email protected]> Co-developed-by: Fred Chen <[email protected]> Signed-off-by: Fred Chen <[email protected]> Co-developed-by: Yubing Qiu <[email protected]> Signed-off-by: Yubing Qiu <[email protected]> Signed-off-by: Philo Lu <[email protected]> Acked-by: Willem de Bruijn <[email protected]> Acked-by: Paolo Abeni <[email protected]> Signed-off-by: David S. Miller <[email protected]> [ Backport from v6.13 ] Signed-off-by: Philo Lu <[email protected]> Signed-off-by: WangYuli <[email protected]>
Reviewer's GuideIntroduce optional four-tuple hash (hash4) to accelerate lookup for connected UDP sockets in IPv4 and IPv6 by extending existing hash tables, integrating lookup/insertion/deletion/rehash logic, and providing compile-time no-ops for base-small builds. Sequence diagram for UDP socket connect with hash4 insertionsequenceDiagram
participant App as actor Application
participant Kernel as Kernel
participant UDP as udp_connect
participant Table as udp_table
App->>Kernel: connect() syscall
Kernel->>UDP: udp_connect()
UDP->>Kernel: __ip4_datagram_connect()
Kernel-->>UDP: (returns success)
UDP->>UDP: udp4_hash4(sk)
UDP->>Table: udp_lib_hash4(sk, hash)
Table->>Table: Insert sk into hash4
UDP-->>Kernel: return
Kernel-->>App: connect() returns
Sequence diagram for UDP packet lookup with hash4sequenceDiagram
participant Net as Network Stack
participant Table as udp_table
participant Hash4 as hash4
Net->>Table: __udp4_lib_lookup(...)
Table->>Hash4: udp4_lib_lookup4(...)
Hash4->>Hash4: Search for matching socket
Hash4-->>Table: Return sk or NULL
Table-->>Net: Return sk (if found)
Class diagram for updated UDP hash table structuresclassDiagram
class udp_hslot {
+hlist_head head
+hlist_nulls_head nulls_head
+int count
+spinlock_t lock
}
class udp_hslot_main {
+udp_hslot hslot
+u32 hash4_cnt
}
class udp_table {
+udp_hslot* hash
+udp_hslot_main* hash2
+udp_hslot* hash4
+unsigned int mask
+unsigned int log
}
udp_table "1" -- "*" udp_hslot : hash
udp_table "1" -- "*" udp_hslot_main : hash2
udp_table "1" -- "*" udp_hslot : hash4
udp_hslot_main "1" -- "1" udp_hslot : hslot
Class diagram for udp_sock with 4-tuple hash fieldsclassDiagram
class udp_sock {
+int pending
+__u8 encap_type
+__u16 udp_lrpa_hash
+hlist_nulls_node udp_lrpa_node
...
}
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
deepin pr auto review代码审查意见:
以上是代码审查的一些主要意见,具体的修改建议需要根据代码的具体实现和上下文来决定。 |
|
checkdepend: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Avenger-285714 - I've reviewed your changes and they look great!
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
[ Upstream commit 51a00be ] After the blamed commit below, udp_rehash() is supposed to be called with both local and remote addresses set. Currently that is already the case for IPv6 sockets, but for IPv4 the destination address is updated after rehashing. Address the issue moving the destination address and port initialization before rehashing. Fixes: 1b29a73 ("ipv6/udp: Add 4-tuple hash for connected socket") Reviewed-by: Eric Dumazet <[email protected]> Link: https://patch.msgid.link/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com Signed-off-by: Paolo Abeni <[email protected]> [ Backport from v6.13-rc3 ] Suggested-by: Wentao Guan <[email protected]> Signed-off-by: WangYuli <[email protected]>
|
checkdepend2: |
[ Upstream commit 644f910 ] As discussed in [0], rehash4 could be missed in udp_lib_rehash() when udp hash4 changes while hash2 doesn't change. This patch fixes this by moving rehash4 codes out of rehash2 checking, and then rehash2 and rehash4 are done separately. By doing this, we no longer need to call rehash4 explicitly in udp_lib_hash4(), as the rehash callback in __ip4_datagram_connect takes it. Thus, now udp_lib_hash4() returns directly if the sk is already hashed. Note that uhash4 may fail to work under consecutive connect(<dst address>) calls because rehash() is not called with every connect(). To overcome this, connect(<AF_UNSPEC>) needs to be called after the next connect to a new destination. [0] https://lore.kernel.org/all/4761e466ab9f7542c68cdc95f248987d127044d2.1733499715.git.pabeni@redhat.com/ Fixes: 78c91ae ("ipv4/udp: Add 4-tuple hash for connected socket") Suggested-by: Paolo Abeni <[email protected]> Signed-off-by: Philo Lu <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]> [ Backport from v6.13 ] Suggested-by: Wentao Guan <[email protected]> Signed-off-by: WangYuli <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This pull request implements a 4-tuple hash (uhash4) mechanism for connected UDP sockets to significantly improve lookup performance. The enhancement uses local/remote IP addresses and ports as hash keys, enabling connected sockets to bypass expensive compute_score operations during lookups.
- Adds conditional 4-tuple hash tables and lookup functions for both IPv4 and IPv6 UDP
- Integrates hash4 into socket connect, disconnect, and rehash operations
- Provides CONFIG_BASE_SMALL compatibility with no-op implementations
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| include/net/udp.h | Extends UDP data structures with hash4 tables and helper functions |
| include/linux/udp.h | Adds 4-tuple hash fields to udp_sock structure |
| net/ipv4/udp.c | Implements IPv4 4-tuple hash lookup, connect, and table management |
| net/ipv6/udp.c | Implements IPv6 4-tuple hash lookup and connect functionality |
| net/ipv4/datagram.c | Reorders address assignment to support proper rehashing |
Comments suppressed due to low confidence (1)
net/ipv6/udp.c:287
- Calling udp4_hash4() from within a udp6_hash4() function is confusing. Consider renaming to make the relationship clearer, such as udp_ipv4_hash4() or adding a comment explaining why IPv4 function is called for v4-mapped addresses.
udp4_hash4(sk);
| @@ -420,6 +419,7 @@ u32 udp_ehashfn(const struct net *net, const __be32 laddr, const __u16 lport, | |||
| return __inet_ehashfn(laddr, lport, faddr, fport, | |||
| udp_ehash_secret + net_hash_mix(net)); | |||
| } | |||
Copilot
AI
Jul 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The EXPORT_SYMBOL is placed after the function definition instead of immediately following it. This placement is inconsistent with kernel coding style where EXPORT_SYMBOL should typically be placed right after the function it exports.
| } | |
| } |
| return NULL; | ||
| } | ||
|
|
||
| /* udp_rehash4() only checks hslot4, and hash4_cnt is not processed. */ |
Copilot
AI
Jul 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is unclear about what 'hash4_cnt is not processed' means in the context of udp_rehash4(). The comment should clarify that hash4_cnt updates are handled separately in the calling function udp_lib_rehash().
| /* udp_rehash4() only checks hslot4, and hash4_cnt is not processed. */ | |
| /* udp_rehash4() only checks hslot4. Updates to hash4_cnt are handled separately */ | |
| /* in the calling function udp_lib_rehash(). */ |
| INET_ADDR_COOKIE(acookie, saddr, daddr); | ||
|
|
||
| begin: | ||
| /* SLAB_TYPESAFE_BY_RCU not used, so we don't need to touch sk_refcnt */ |
Copilot
AI
Jul 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is potentially misleading. While UDP doesn't use SLAB_TYPESAFE_BY_RCU, the comment should clarify that this refers specifically to the hash4 lookup path and why reference counting is not needed here, unlike other socket lookup paths.
| /* SLAB_TYPESAFE_BY_RCU not used, so we don't need to touch sk_refcnt */ | |
| /* In the hash4 lookup path, SLAB_TYPESAFE_BY_RCU is not used. | |
| * This is safe because RCU ensures that the socket structure | |
| * remains valid during traversal, so we don't need to touch sk_refcnt. | |
| */ |
| table->hash4 = (void *)(table->hash2 + (table->mask + 1)); | ||
| for (int i = 0; i <= table->mask; i++) { |
Copilot
AI
Jul 22, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Variable declaration inside for-loop is a C99 feature. While supported in recent kernel versions, it's more conventional in kernel code to declare loop variables at the beginning of the function for better compatibility and consistency.
| table->hash4 = (void *)(table->hash2 + (table->mask + 1)); | |
| for (int i = 0; i <= table->mask; i++) { | |
| int i; | |
| table->hash4 = (void *)(table->hash2 + (table->mask + 1)); | |
| for (i = 0; i <= table->mask; i++) { |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: opsiff The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Link: https://bugzilla.openanolis.cn//show_bug.cgi?id=11971
This patchset introduces 4-tuple hash for connected udp sockets, to make
connected udp lookup faster.
Stress test results (with 1 cpu fully used) are shown below, in pps:
(1) un-connected socket as server
[a] w/o hash4: 1,825176
[b] w/ hash4: 1,831750 (+0.36%)
(2) 500 connected sockets as server
[c] w/o hash4: 290860 (only 16% of [a])
[d] w/ hash4: 1,889658 (+3.1% compared with [b])
With hash4, compute_score is skipped when lookup, so [d] is slightly
better than [b].
Patch1: Add a new counter for hslot2 named hash4_cnt, to avoid cache line
miss when lookup.
Patch2: Add hslot/hlist_nulls for 4-tuple hash.
Patch3 and 4: Implement 4-tuple hash for ipv4 and ipv6.
The detailed motivation is described in Patch 3.
The 4-tuple hash increases the size of udp_sock and udp_hslot. Thus add it
with CONFIG_BASE_SMALL, i.e., it's a no op with CONFIG_BASE_SMALL.
Intentionally, the feature is not available for udplite. Though udplite
shares some structs and functions with udp, its connect() keeps unchanged.
So all udplite sockets perform the same as un-connected udp sockets.
Besides, udplite also shares the additional memory consumption in udp_sock
and udptable.
Link: https://gitee.com/anolis/cloud-kernel/pulls/4153
Summary by Sourcery
Introduce a 4-tuple hashing mechanism for connected UDP sockets to accelerate lookup operations, integrate it into IPv4 and IPv6 code paths with conditional no-op support for minimal builds, and expose related symbols for external modules.
New Features:
Enhancements:
Chores: