Skip to content

Commit abd0c0f

Browse files
author
Martin KaFai Lau
committed
Merge branch 'make-tc-bpf-helpers-preserve-skb-metadata'
Jakub Sitnicki says: ==================== Make TC BPF helpers preserve skb metadata Changes in v4: - Fix copy-paste bug in check_metadata() test helper (AI review) - Add "out of scope" section (at the bottom) - Link to v3: https://lore.kernel.org/r/[email protected] Changes in v3: - Use the already existing BPF_STREAM_STDERR const in tests (Martin) - Unclone skb head on bpf_dynptr_write to skb metadata (patch 3) (Martin) - Swap order of patches 1 & 2 to refer to skb_postpush_data_move() in docs - Mention in skb_data_move() docs how to move just the metadata - Note in pskb_expand_head() docs to move metadata after skb_push() (Jakub) - Link to v2: https://lore.kernel.org/r/[email protected] Changes in v2: - Tweak WARN_ON_ONCE check in skb_data_move() (patch 2) - Convert all tests to verify skb metadata in BPF (patches 9-10) - Add test coverage for modified BPF helpers (patches 12-15) - Link to RFCv1: https://lore.kernel.org/r/[email protected] This patch set continues our work [1] to allow BPF programs and user-space applications to attach multiple bytes of metadata to packets via the XDP/skb metadata area. The focus of this patch set it to ensure that skb metadata remains intact when packets pass through a chain of TC BPF programs that call helpers which operate on skb head. Currently, several helpers that either adjust the skb->data pointer or reallocate skb->head do not preserve metadata at its expected location, that is immediately in front of the MAC header. These are: - bpf_skb_adjust_room - bpf_skb_change_head - bpf_skb_change_proto - bpf_skb_change_tail - bpf_skb_vlan_pop - bpf_skb_vlan_push In TC BPF context, metadata must be moved whenever skb->data changes to keep the skb->data_meta pointer valid. I don't see any way around it. Creative ideas how to avoid that would be very welcome. With that in mind, we can patch the helpers in at least two different ways: 1. Integrate metadata move into header move Replace the existing memmove, which follows skb_push/pull, with a helper that moves both headers and metadata in a single call. This avoids an extra memmove but reduces transparency. skb_pull(skb, len); - memmove(skb->data, skb->data - len, n); + skb_postpull_data_move(skb, len, n); skb->mac_header += len; skb_push(skb, len) - memmove(skb->data, skb->data + len, n); + skb_postpush_data_move(skb, len, n); skb->mac_header -= len; 2. Move metadata separately Add a dedicated metadata move after the header move. This is more explicit but costs an additional memmove. skb_pull(skb, len); memmove(skb->data, skb->data - len, n); + skb_metadata_postpull_move(skb, len); skb->mac_header += len; skb_push(skb, len) + skb_metadata_postpush_move(skb, len); memmove(skb->data, skb->data + len, n); skb->mac_header -= len; This patch set implements option (1), expecting that "you can have just one memmove" will be the most obvious feedback, while readability is a, somewhat subjective, matter of taste, which I don't claim to have ;-) The structure of the patch set is as follows: - patches 1-4 prepare ground for safe-proofing the BPF helpers - patches 5-9 modify the BPF helpers to preserve skb metadata - patches 10-11 prepare ground for metadata tests with BPF helper calls - patches 12-16 adapt and expand tests to cover the made changes Out of scope for this series: - safe-proofing tunnel & tagging devices - VLAN, GRE, ... (next in line, in development preview at [2]) - metadata access after packet foward (to do after Rx path - once metadata reliably reaches sk_filter) Thanks, -jkbs [1] https://lore.kernel.org/all/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com/ [2] https://github.com/jsitnicki/linux/commits/skb-meta/safeproof-netdevs/ ==================== Link: https://patch.msgid.link/[email protected] Signed-off-by: Martin KaFai Lau <[email protected]>
2 parents a0c3aef + d2c5cca commit abd0c0f

File tree

8 files changed

+475
-183
lines changed

8 files changed

+475
-183
lines changed

include/linux/filter.h

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1781,6 +1781,8 @@ int __bpf_xdp_store_bytes(struct xdp_buff *xdp, u32 offset, void *buf, u32 len);
17811781
void *bpf_xdp_pointer(struct xdp_buff *xdp, u32 offset, u32 len);
17821782
void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off,
17831783
void *buf, unsigned long len, bool flush);
1784+
int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
1785+
const void *from, u32 len, u64 flags);
17841786
void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset);
17851787
#else /* CONFIG_NET */
17861788
static inline int __bpf_skb_load_bytes(const struct sk_buff *skb, u32 offset,
@@ -1817,6 +1819,13 @@ static inline void bpf_xdp_copy_buf(struct xdp_buff *xdp, unsigned long off, voi
18171819
{
18181820
}
18191821

1822+
static inline int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
1823+
const void *from, u32 len,
1824+
u64 flags)
1825+
{
1826+
return -EOPNOTSUPP;
1827+
}
1828+
18201829
static inline void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
18211830
{
18221831
return ERR_PTR(-EOPNOTSUPP);

include/linux/if_vlan.h

Lines changed: 6 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -355,16 +355,17 @@ static inline int __vlan_insert_inner_tag(struct sk_buff *skb,
355355
__be16 vlan_proto, u16 vlan_tci,
356356
unsigned int mac_len)
357357
{
358+
const u8 meta_len = mac_len > ETH_TLEN ? skb_metadata_len(skb) : 0;
358359
struct vlan_ethhdr *veth;
359360

360-
if (skb_cow_head(skb, VLAN_HLEN) < 0)
361+
if (skb_cow_head(skb, meta_len + VLAN_HLEN) < 0)
361362
return -ENOMEM;
362363

363364
skb_push(skb, VLAN_HLEN);
364365

365366
/* Move the mac header sans proto to the beginning of the new header. */
366367
if (likely(mac_len > ETH_TLEN))
367-
memmove(skb->data, skb->data + VLAN_HLEN, mac_len - ETH_TLEN);
368+
skb_postpush_data_move(skb, VLAN_HLEN, mac_len - ETH_TLEN);
368369
if (skb_mac_header_was_set(skb))
369370
skb->mac_header -= VLAN_HLEN;
370371

@@ -731,18 +732,16 @@ static inline void vlan_set_encap_proto(struct sk_buff *skb,
731732
*
732733
* Expects the skb to contain a VLAN tag in the payload, and to have skb->data
733734
* pointing at the MAC header.
734-
*
735-
* Returns: a new pointer to skb->data, or NULL on failure to pull.
736735
*/
737-
static inline void *vlan_remove_tag(struct sk_buff *skb, u16 *vlan_tci)
736+
static inline void vlan_remove_tag(struct sk_buff *skb, u16 *vlan_tci)
738737
{
739738
struct vlan_hdr *vhdr = (struct vlan_hdr *)(skb->data + ETH_HLEN);
740739

741740
*vlan_tci = ntohs(vhdr->h_vlan_TCI);
742741

743-
memmove(skb->data + VLAN_HLEN, skb->data, 2 * ETH_ALEN);
744742
vlan_set_encap_proto(skb, vhdr);
745-
return __skb_pull(skb, VLAN_HLEN);
743+
__skb_pull(skb, VLAN_HLEN);
744+
skb_postpull_data_move(skb, VLAN_HLEN, 2 * ETH_ALEN);
746745
}
747746

748747
/**

include/linux/skbuff.h

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4564,6 +4564,81 @@ static inline void skb_metadata_clear(struct sk_buff *skb)
45644564
skb_metadata_set(skb, 0);
45654565
}
45664566

4567+
/**
4568+
* skb_data_move - Move packet data and metadata after skb_push() or skb_pull().
4569+
* @skb: packet to operate on
4570+
* @len: number of bytes pushed or pulled from &sk_buff->data
4571+
* @n: number of bytes to memmove() from pre-push/pull &sk_buff->data
4572+
*
4573+
* Moves @n bytes of packet data, can be zero, and all bytes of skb metadata.
4574+
*
4575+
* Assumes metadata is located immediately before &sk_buff->data prior to the
4576+
* push/pull, and that sufficient headroom exists to hold it after an
4577+
* skb_push(). Otherwise, metadata is cleared and a one-time warning is issued.
4578+
*
4579+
* Prefer skb_postpull_data_move() or skb_postpush_data_move() to calling this
4580+
* helper directly.
4581+
*/
4582+
static inline void skb_data_move(struct sk_buff *skb, const int len,
4583+
const unsigned int n)
4584+
{
4585+
const u8 meta_len = skb_metadata_len(skb);
4586+
u8 *meta, *meta_end;
4587+
4588+
if (!len || (!n && !meta_len))
4589+
return;
4590+
4591+
if (!meta_len)
4592+
goto no_metadata;
4593+
4594+
meta_end = skb_metadata_end(skb);
4595+
meta = meta_end - meta_len;
4596+
4597+
if (WARN_ON_ONCE(meta_end + len != skb->data ||
4598+
meta_len > skb_headroom(skb))) {
4599+
skb_metadata_clear(skb);
4600+
goto no_metadata;
4601+
}
4602+
4603+
memmove(meta + len, meta, meta_len + n);
4604+
return;
4605+
4606+
no_metadata:
4607+
memmove(skb->data, skb->data - len, n);
4608+
}
4609+
4610+
/**
4611+
* skb_postpull_data_move - Move packet data and metadata after skb_pull().
4612+
* @skb: packet to operate on
4613+
* @len: number of bytes pulled from &sk_buff->data
4614+
* @n: number of bytes to memmove() from pre-pull &sk_buff->data
4615+
*
4616+
* See skb_data_move() for details.
4617+
*/
4618+
static inline void skb_postpull_data_move(struct sk_buff *skb,
4619+
const unsigned int len,
4620+
const unsigned int n)
4621+
{
4622+
DEBUG_NET_WARN_ON_ONCE(len > INT_MAX);
4623+
skb_data_move(skb, len, n);
4624+
}
4625+
4626+
/**
4627+
* skb_postpush_data_move - Move packet data and metadata after skb_push().
4628+
* @skb: packet to operate on
4629+
* @len: number of bytes pushed onto &sk_buff->data
4630+
* @n: number of bytes to memmove() from pre-push &sk_buff->data
4631+
*
4632+
* See skb_data_move() for details.
4633+
*/
4634+
static inline void skb_postpush_data_move(struct sk_buff *skb,
4635+
const unsigned int len,
4636+
const unsigned int n)
4637+
{
4638+
DEBUG_NET_WARN_ON_ONCE(len > INT_MAX);
4639+
skb_data_move(skb, -len, n);
4640+
}
4641+
45674642
struct sk_buff *skb_clone_sk(struct sk_buff *skb);
45684643

45694644
#ifdef CONFIG_NETWORK_PHY_TIMESTAMPING

kernel/bpf/helpers.c

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1842,10 +1842,8 @@ int __bpf_dynptr_write(const struct bpf_dynptr_kern *dst, u32 offset, void *src,
18421842
return -EINVAL;
18431843
return __bpf_xdp_store_bytes(dst->data, dst->offset + offset, src, len);
18441844
case BPF_DYNPTR_TYPE_SKB_META:
1845-
if (flags)
1846-
return -EINVAL;
1847-
memmove(bpf_skb_meta_pointer(dst->data, dst->offset + offset), src, len);
1848-
return 0;
1845+
return __bpf_skb_meta_store_bytes(dst->data, dst->offset + offset, src,
1846+
len, flags);
18491847
default:
18501848
WARN_ONCE(true, "bpf_dynptr_write: unknown dynptr type %d\n", type);
18511849
return -EFAULT;

net/core/filter.c

Lines changed: 22 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -3253,11 +3253,11 @@ static void bpf_skb_change_protocol(struct sk_buff *skb, u16 proto)
32533253

32543254
static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
32553255
{
3256-
/* Caller already did skb_cow() with len as headroom,
3256+
/* Caller already did skb_cow() with meta_len+len as headroom,
32573257
* so no need to do it here.
32583258
*/
32593259
skb_push(skb, len);
3260-
memmove(skb->data, skb->data + len, off);
3260+
skb_postpush_data_move(skb, len, off);
32613261
memset(skb->data + off, 0, len);
32623262

32633263
/* No skb_postpush_rcsum(skb, skb->data + off, len)
@@ -3281,7 +3281,7 @@ static int bpf_skb_generic_pop(struct sk_buff *skb, u32 off, u32 len)
32813281
old_data = skb->data;
32823282
__skb_pull(skb, len);
32833283
skb_postpull_rcsum(skb, old_data + off, len);
3284-
memmove(skb->data, old_data, off);
3284+
skb_postpull_data_move(skb, len, off);
32853285

32863286
return 0;
32873287
}
@@ -3326,10 +3326,11 @@ static int bpf_skb_net_hdr_pop(struct sk_buff *skb, u32 off, u32 len)
33263326
static int bpf_skb_proto_4_to_6(struct sk_buff *skb)
33273327
{
33283328
const u32 len_diff = sizeof(struct ipv6hdr) - sizeof(struct iphdr);
3329+
const u8 meta_len = skb_metadata_len(skb);
33293330
u32 off = skb_mac_header_len(skb);
33303331
int ret;
33313332

3332-
ret = skb_cow(skb, len_diff);
3333+
ret = skb_cow(skb, meta_len + len_diff);
33333334
if (unlikely(ret < 0))
33343335
return ret;
33353336

@@ -3489,6 +3490,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
34893490
u8 inner_mac_len = flags >> BPF_ADJ_ROOM_ENCAP_L2_SHIFT;
34903491
bool encap = flags & BPF_F_ADJ_ROOM_ENCAP_L3_MASK;
34913492
u16 mac_len = 0, inner_net = 0, inner_trans = 0;
3493+
const u8 meta_len = skb_metadata_len(skb);
34923494
unsigned int gso_type = SKB_GSO_DODGY;
34933495
int ret;
34943496

@@ -3499,7 +3501,7 @@ static int bpf_skb_net_grow(struct sk_buff *skb, u32 off, u32 len_diff,
34993501
return -ENOTSUPP;
35003502
}
35013503

3502-
ret = skb_cow_head(skb, len_diff);
3504+
ret = skb_cow_head(skb, meta_len + len_diff);
35033505
if (unlikely(ret < 0))
35043506
return ret;
35053507

@@ -3873,6 +3875,7 @@ static const struct bpf_func_proto sk_skb_change_tail_proto = {
38733875
static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
38743876
u64 flags)
38753877
{
3878+
const u8 meta_len = skb_metadata_len(skb);
38763879
u32 max_len = BPF_SKB_MAX_LEN;
38773880
u32 new_len = skb->len + head_room;
38783881
int ret;
@@ -3882,7 +3885,7 @@ static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
38823885
new_len < skb->len))
38833886
return -EINVAL;
38843887

3885-
ret = skb_cow(skb, head_room);
3888+
ret = skb_cow(skb, meta_len + head_room);
38863889
if (likely(!ret)) {
38873890
/* Idea for this helper is that we currently only
38883891
* allow to expand on mac header. This means that
@@ -3894,6 +3897,7 @@ static inline int __bpf_skb_change_head(struct sk_buff *skb, u32 head_room,
38943897
* for redirection into L2 device.
38953898
*/
38963899
__skb_push(skb, head_room);
3900+
skb_postpush_data_move(skb, head_room, 0);
38973901
memset(skb->data, 0, head_room);
38983902
skb_reset_mac_header(skb);
38993903
skb_reset_mac_len(skb);
@@ -12102,6 +12106,18 @@ void *bpf_skb_meta_pointer(struct sk_buff *skb, u32 offset)
1210212106
return skb_metadata_end(skb) - skb_metadata_len(skb) + offset;
1210312107
}
1210412108

12109+
int __bpf_skb_meta_store_bytes(struct sk_buff *skb, u32 offset,
12110+
const void *from, u32 len, u64 flags)
12111+
{
12112+
if (unlikely(flags))
12113+
return -EINVAL;
12114+
if (unlikely(bpf_try_make_writable(skb, 0)))
12115+
return -EFAULT;
12116+
12117+
memmove(bpf_skb_meta_pointer(skb, offset), from, len);
12118+
return 0;
12119+
}
12120+
1210512121
__bpf_kfunc_start_defs();
1210612122
__bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags,
1210712123
struct bpf_dynptr *ptr__uninit)
@@ -12129,9 +12145,6 @@ __bpf_kfunc int bpf_dynptr_from_skb(struct __sk_buff *s, u64 flags,
1212912145
* XDP context with bpf_xdp_adjust_meta(). Serves as an alternative to
1213012146
* &__sk_buff->data_meta.
1213112147
*
12132-
* If passed @skb_ is a clone which shares the data with the original, the
12133-
* dynptr will be read-only. This limitation may be lifted in the future.
12134-
*
1213512148
* Return:
1213612149
* * %0 - dynptr ready to use
1213712150
* * %-EINVAL - invalid flags, dynptr set to null
@@ -12149,9 +12162,6 @@ __bpf_kfunc int bpf_dynptr_from_skb_meta(struct __sk_buff *skb_, u64 flags,
1214912162

1215012163
bpf_dynptr_init(ptr, skb, BPF_DYNPTR_TYPE_SKB_META, 0, skb_metadata_len(skb));
1215112164

12152-
if (skb_cloned(skb))
12153-
bpf_dynptr_set_rdonly(ptr);
12154-
1215512165
return 0;
1215612166
}
1215712167

net/core/skbuff.c

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2234,6 +2234,10 @@ EXPORT_SYMBOL(__pskb_copy_fclone);
22342234
*
22352235
* All the pointers pointing into skb header may change and must be
22362236
* reloaded after call to this function.
2237+
*
2238+
* Note: If you skb_push() the start of the buffer after reallocating the
2239+
* header, call skb_postpush_data_move() first to move the metadata out of
2240+
* the way before writing to &sk_buff->data.
22372241
*/
22382242

22392243
int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
@@ -2305,8 +2309,6 @@ int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
23052309
skb->nohdr = 0;
23062310
atomic_set(&skb_shinfo(skb)->dataref, 1);
23072311

2308-
skb_metadata_clear(skb);
2309-
23102312
/* It is not generally safe to change skb->truesize.
23112313
* For the moment, we really care of rx path, or
23122314
* when skb is orphaned (not attached to a socket).

0 commit comments

Comments
 (0)